Paper Rundown: Online List Labeling with Predictions

September 21, 2023

Learning Augmented Algorithms Data Structures Research

Algorithms with Predictions?

As it turns out, most algorithms and data structures can benefit from predictions. These predictions can come in many forms — a neural network trained on historical input data, or really any machine learning model capable of anticipating future inputs. The goal of this approach is to design an algorithm (or data structure) that leverages these predictions as much as possible, while still matching the expected performance and behavior of state-of-the-art alternatives that do not use predictions.

Online List Labeling

The goal for this paper was to create a data structure (they called it LearnedLLA) to optimally leverage input predictions to achieve an improvement over the List Labeling Array.
(If you need a refresher on the classical LLA model or don't know what LLA means 😃, check my blog post on LLAs here.)

Prediction Model

The authors made the assumption that for every item $x$ arriving in the LearnedLLA, we would get a predicted rank $\hat{r}_{x}$ . Define the rank estimation error of an element $x$ , $\eta_{x}$ as:

\eta_{x} = |\hat{r}_{x} - r_x|

Then define $\eta = \max_x{\eta_{x}}$ as the maximal error of all the predictions.

LearnedLLA

LearnedLLA, like the classical LLA, contains an array of size $c n$ for some $c > 1$ ( $c = 6$ in the case of LearnedLLA), and a calibrator tree of height $O(\log n)$ . The twist? Each node in the calibrator tree could hold its own LLA — which can be any implementation you like, from the classical to state-of-the-art with $O(\log n \text{polylog} \log n)$ amortized cost per insertion.

A closer look on LearnedLLA

At all times, a LearnedLLA instance with capacity $n$ is split into $l$ partitions $P_1, P_2, \ldots, P_l$ , where $l \leq n$ , such that each partition is handled by its own black-box LLA, and for any $x \in P_i$ and $y \in P_j$ , if $i > j$ then $x > y$ . This structure is reflected in LearnedLLA's calibrator tree. Each node in the tree has a set of assigned ranks and assigned slots, with all $n$ ranks divided equally among the nodes at the same height. The root is responsible for $6n$ slots and all ranks $1, \ldots, n$ , its left child is responsible for ranks $1, \ldots, \left\lfloor n/2 \right\rfloor$ , and its right child is responsible for ranks $\left\lfloor n/2 \right\rfloor + 1, \ldots, n$ , and so on. The leaves of the calibrator tree contain $n$ nodes, each one has $6$ slots and responsible for a rank (assuming $n$ is a power of two). To put this more precisly, the $i$ th node at height $h$ has $2^h$ assigned ranks $\{ 2^h (i-1) + 1, 2^h (i-1) + 2, \ldots, 2^h i \}$ ,
and $6 \cdot 2^h$ assigned slots $\{ 2^h \cdot 6 (i-1) + 1, 2^h \cdot 6 (i-1) + 2, \ldots, 2^h \cdot 6 i \}$ . Each node in the calibrator tree $T$ either corresponds to an LLA — we call these nodes actual LLAs — or it does not, in which case we call them potential LLAs, as they could become LLAs during runtime. It's useful to think of all other nodes in T to be "potential" LLAs. Each root to leaf path could only have one and only one actual LLA. At initialization, all $n$ leaves of $T$ corrospond to actual LLAs. We define $|P_i|$ to be the number of ranks assigned to an actual LLA $P_i$ , and let $\text{size}(P_i)$ be the number of elements present in an actual LLA $P_i$ .

Insertion

Insertion is fairly simple. To insert some element $x$ with a predicted rank $\hat{r_{x}}$ :

Find $i_p$ and $i_s$ , where $P_{i_p}$ and $P_{i_s}$ are the actual LLAs that hold $x$ 's predecessor and successor, respectively.
Find $i_x$ , where $P_{i_x}$ is the actual LLA responsible for rank $\hat{r_{x}}$ .
Compute i where:

i = \begin{cases} i_p & \text{if } i_p > i_x \\ i_s & \text{if } i_s < i_x \\ i_x & \text{otherwise} \end{cases}

If $P_i$ would exceed the threshold density of $1/2$ , merge it with its sibling LLA in the calibrator tree, and call init on the potential LLA at their parent. Otherwise, call insert on $P_i$ .

Main Results

Insertion Cost

For arbitrary predictions, using the classical LLA model as a black-box, they proved that their insertion algorithm acheives a $O(\log^{2} \eta)$ amoritzed cost per insertion.

When predictions are perfect (i.e., each predicted rank exactly matches the true target position), the insertion cost is constant — $O(1)$ .
When predictions are poor — specifically, if at least one prediction is maximally wrong (as far from the target as possible) — the insertion cost falls back to $O(\log^2 n)$ , matching the complexity of traditional, non-predictive LLA.

More generally, for any LLA insertion algorithm with an amortized cost of $C(n)$ (where $C$ is an admissible "reasonable" list labeling cost function), plugging it into the LearnedLLA model as a black box yields an amortized insertion cost of $C(\eta)$ .

They took it a step further, and proved a more powerful result:

For prediction errors that are sampled from an unknown distribution with mean $\mu$ , and variance $s^2$ , their algorithm acheives an expected cost of $O(\log^2{|\mu| + s^2})$ . That means $O(1)$ for distributions with constant mean and variance.

Optimality

They claim that their algorithm is optimal. Following their prediction model, they prove that any deterministic list labeling algorithm must acheive $\Omega(\log^2{\eta})$ .

Open Questions.

Well, there is a big problem they don't highlight in their paper: If only one prediction was maximally bad, and the rest were perfect, their algorithm will still incur $O(\log^2{\eta})$ . I will explain this in good detail in a later post!!

References

McCauley, S., Moseley, B., Niaparast, A., & Singh, S. (2023). Online List Labeling with Predictions. arXiv:2305.10536
Bender, Michael A., Alex Conway, Martín Farach-Colton, Hanna Komlós, Michal Koucký, William Kuszmaul, and Michael Saks. Nearly Optimal List Labeling. arXiv, 2024. https://doi.org/10.48550/arXiv.2405.00807
Bender, M. A., Demaine, E. D., & Farach-Colton, M. (2000). Cache-oblivious B-trees. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (pp. 399–409). IEEE.