Thumbnail for Paper Rundown: Online List Labeling with Predictions

Paper Rundown: Online List Labeling with Predictions

June 09, 2025

Algorithms with Predictions?

As it turns out, most algorithms and data structures can benefit from predictions. These predictions can come in many forms — a neural network trained on historical input data, or really any machine learning model capable of anticipating future inputs. The goal of this approach is to design an algorithm (or data structure) that leverages these predictions as much as possible, while still matching the expected performance and behavior of state-of-the-art alternatives that do not use predictions.

Online List Labeling

The goal for this paper was to create a data structure (they called it LearnedLLA) to optimally leverage input predictions to achieve an improvement over the List Labeling Array.
(If you need a refresher on the classical LLA model or don't know what LLA means 😃, check my blog post on LLAs here.)

Prediction Model

The authors made the assumption that for every item xx arriving in the LearnedLLA, we would get a predicted rank r^x\hat{r}_{x}. Define the rank estimation error of an element xx, ηx\eta_{x} as:

ηx=r^xrx \eta_{x} = |\hat{r}_{x} - r_x|

Then define η=maxxηx\eta = \max_x{\eta_{x}} as the maximal error of all the predictions.

LearnedLLA

LearnedLLA, like the classical LLA, contains an array of size cnc n for some c>1c > 1 (c=6c = 6 in the case of LearnedLLA), and a calibrator tree of height O(logn)O(\log n). The twist? Each node in the calibrator tree could hold its own LLA — which can be any implementation you like, from the classical to state-of-the-art with O(lognpolyloglogn)O(\log n \text{polylog} \log n) amortized cost per insertion.

A closer look on LearnedLLA

At all times, a LearnedLLA instance with capacity nn is split into ll partitions P1,P2,,PlP_1, P_2, \ldots, P_l, where lnl \leq n, such that each partition is handled by its own black-box LLA, and for any xPix \in P_i and yPjy \in P_j, if i>ji > j then x>yx > y. This structure is reflected in LearnedLLA's calibrator tree. Each node in the tree has a set of assigned ranks and assigned slots, with all nn ranks divided equally among the nodes at the same height. The root is responsible for 6n6n slots and all ranks 1,,n1, \ldots, n, its left child is responsible for ranks 1,,n/21, \ldots, \left\lfloor n/2 \right\rfloor, and its right child is responsible for ranks n/2+1,,n\left\lfloor n/2 \right\rfloor + 1, \ldots, n, and so on. The leaves of the calibrator tree contain nn nodes, each one has 66 slots and responsible for a rank (assuming nn is a power of two). To put this more precisly, the iith node at height hh has 2h2^h assigned ranks {2h(i1)+1,2h(i1)+2,,2hi}\{ 2^h (i-1) + 1, 2^h (i-1) + 2, \ldots, 2^h i \},
and 62h6 \cdot 2^h assigned slots {2h6(i1)+1,2h6(i1)+2,,2h6i}\{ 2^h \cdot 6 (i-1) + 1, 2^h \cdot 6 (i-1) + 2, \ldots, 2^h \cdot 6 i \}. Each node in the calibrator tree TT either corresponds to an LLA — we call these nodes actual LLAs — or it does not, in which case we call them potential LLAs, as they could become LLAs during runtime. It's useful to think of all other nodes in T to be "potential" LLAs. Each root to leaf path could only have one and only one actual LLA. At initialization, all nn leaves of TT corrospond to actual LLAs. We define Pi|P_i| to be the number of ranks assigned to an actual LLA PiP_i, and let size(Pi)\text{size}(P_i) be the number of elements present in an actual LLA PiP_i.

Insertion

Insertion is fairly simple. To insert some element xx with a predicted rank rx^\hat{r_{x}}:

  1. Find ipi_p and isi_s, where PipP_{i_p} and PisP_{i_s} are the actual LLAs that hold xx's predecessor and successor, respectively.
  2. Find ixi_x, where PixP_{i_x} is the actual LLA responsible for rank rx^\hat{r_{x}}.
  3. Compute i where:
i={ipif ip>ixisif is<ixixotherwisei = \begin{cases} i_p & \text{if } i_p > i_x \\ i_s & \text{if } i_s < i_x \\ i_x & \text{otherwise} \end{cases}
  1. If PiP_i would exceed the threshold density of 1/21/2, merge it with its sibling LLA in the calibrator tree, and call init on the potential LLA at their parent. Otherwise, call insert on PiP_i.

Main Results

Insertion Cost

For arbitrary predictions, using the classical LLA model as a black-box, they proved that their insertion algorithm acheives a O(log2η)O(\log^{2} \eta) amoritzed cost per insertion.

  • When predictions are perfect (i.e., each predicted rank exactly matches the true target position), the insertion cost is constant — O(1)O(1).

  • When predictions are poor — specifically, if at least one prediction is maximally wrong (as far from the target as possible) — the insertion cost falls back to O(log2n)O(\log^2 n), matching the complexity of traditional, non-predictive LLA.

More generally, for any LLA insertion algorithm with an amortized cost of C(n)C(n) (where CC is an admissible "reasonable" list labeling cost function), plugging it into the LearnedLLA model as a black box yields an amortized insertion cost of C(η)C(\eta).

They took it a step further, and proved a more powerful result:

  • For prediction errors that are sampled from an unknown distribution with mean μ\mu, and variance s2s^2, their algorithm acheives an expected cost of O(log2μ+s2)O(\log^2{|\mu| + s^2}). That means O(1)O(1) for distributions with constant mean and variance.

Optimality

They claim that their algorithm is optimal. Following their prediction model, they prove that any deterministic list labeling algorithm must acheive Ω(log2η)\Omega(\log^2{\eta}).

Open Questions.

Well, there is a big problem they don't highlight in their paper: If only one prediction was maximally bad, and the rest were perfect, their algorithm will still incur O(log2η)O(\log^2{\eta}). I will explain this in good detail in a later post!!

References

  1. McCauley, S., Moseley, B., Niaparast, A., & Singh, S. (2023). Online List Labeling with Predictions. arXiv:2305.10536

  2. Bender, Michael A., Alex Conway, Martín Farach-Colton, Hanna Komlós, Michal Koucký, William Kuszmaul, and Michael Saks. Nearly Optimal List Labeling. arXiv, 2024. https://doi.org/10.48550/arXiv.2405.00807

  3. Bender, M. A., Demaine, E. D., & Farach-Colton, M. (2000). Cache-oblivious B-trees. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (pp. 399–409). IEEE.