
Online List Labeling with Predictions: What's wrong with LearnedLLA
For a detailed explanation of the LearnedLLA, take a look at my previous blog post here.
LearnedLLA: Summarizing The Results
Here’s the gist:
- With arbitrary predictions, their insertion algorithm achieves an amortized cost per insertion.
- With perfect predictions (every predicted rank is spot on), the cost drops to .
- With maximally bad predictions (as far off as possible), the cost falls back to , same as classic, non-predictive LLA.
They went a step further and proved a stronger result:
- When prediction errors come from an unknown distribution with mean and variance , the expected cost is .
In plain terms: if the mean and variance are constant.
And more generally:
- For any LLA insertion algorithm with an amortized cost of (where is a reasonable, admissible list labeling cost function), plugging it into the LearnedLLA model as a black box yields an amortized insertion cost of .
Where did LearnedLLA fall short?
Even if only a few predictions are maximally bad while most are optimal, their algorithm still incurs amortized.
This appears somewhat counterintuitive, as one bad prediction should not cause the performance of LearnedLLA to degrade by a factor of
A simple example:
Consider a LearnedLLA instance that spans ranks . Now, consider the following insertion sequence of tuples :
In our example, the values 1, 2, 3 would all go in , and then we would have to merge when inserting value 4 ( would exceed the density of ).
Let and denote the maximum error and the arithmetic mean of the error, respectively. In our simple example, we have:
, .
A More General Example
For any LearnedLLA instance that spans elements, consider an insertion sequence of tuples , such that the smallest element has the largest predicted rank , while the rest of the predictions are exact. In other words, we have one prediction far off — causing the maximal error to be — and the rest of the predictions are perfect.
In that case, we have:
Yet, our LearnedLLA insertion cost is .
My Thoughts!
I’ve been struggling with this question for the past year. It only feels natural that we can do better. Maybe there’s a smarter algorithm out there — perhaps a randomized LLA algorithm or a new framework that relies on the average error instead.
References
-
McCauley, S., Moseley, B., Niaparast, A., & Singh, S. (2023). Online List Labeling with Predictions. arXiv:2305.10536
-
Bender, Michael A., Alex Conway, Martín Farach-Colton, Hanna Komlós, Michal Koucký, William Kuszmaul, and Michael Saks. Nearly Optimal List Labeling. arXiv, 2024. https://doi.org/10.48550/arXiv.2405.00807
-
Bender, M. A., Demaine, E. D., & Farach-Colton, M. (2000). Cache-oblivious B-trees. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (pp. 399–409). IEEE.