Thumbnail for A Quick Rundown of the Online List Labeling Problem

A Quick Rundown of the Online List Labeling Problem

April 29, 2025

The Book Shelf Problem

Imagine I invite you over to my house one day and show you my rather small home library. And we decide to play this game:

I empty out my library, and then I give you the books one by one, asking you to put them back in my library in alphabetical order, based on the first letter of each book's title, in the most efficient manner. To see what efficient means in this context, imagine the following scenario:

I give you The Bible, then Voltaire, and you decide to place them at the start of the very first bookshelf in this order: The Bible, Voltaire. Then I hand you Bertrand Russel, and if, say, we're sorting alphabetically by the first letter, you now have to move both The Bible and Voltaire to place Bertrand Russel at the very start of the first bookshelf.

In this scenario, you have to "pay" two movements in order to place Bertrand Russel. This notion of cost is exactly what we're trying to minimize in this context.

To put this more precisely: your goal is to minimize the aggregate number of movements over the total number of inserts I'll ask you to do — you are trying to minimize the amortized cost per insert.


Formal Problem Definition

You are given, online, a set of numbers nn, taken from some universe UU, and your goal is to place them in an array of size m>nm > n. The objective is to minimize the amortized cost per insertion/deletion, where cost is defined by the number of elements moved per insertion or deletion.


What is a List Labeling Array?

One of the oldest solutions to the Online List Labeling (OLL) problem dates back to 1981, when Alon Itai et al. designed a priority queue with O(nlog2n)O(n \log^2 n) total cost for nn insertions. This paper would go on to provide a sufficient solution to the OLL problem: just set the priority pp of each element xx inserted as p=xp = x, and you get a solution to the OLL problem with O(log2n)O(\log^2 n) amortized cost per operation.

Here's a simple way to think about a list labeling array:

A Simple List Labeling Array Implementation

Define an array AA of size A=c×nA = c \times n for some constant c>1c > 1. Split AA into windows of size logn\log n.

These windows form the leaves of an implicit binary tree built on top of the cnlogn\frac{c n}{\log n} leaves. The root contains the range [1,cn][1, c n] in the array, and each internal node contains all the elements in its subtree.

For each node uu in the tree, define size(u)\text{size}(u) as the number of elements in its subarray. We define τi\tau_i to be the threshold density at depth ii. In our case, the binary tree has depth d=logcnlognd = \log \frac{c n}{\log n}.

Let τ0,τd>0\tau_0, \tau_d > 0 be two constants, and define τi\tau_i by the following recursive formula:

τi=τ0+(τdτ0)id\tau_i = \tau_0 + (\tau_d - \tau_0) \frac{i}{d}

The core idea is to prevent each node uu from exceeding its predefined threshold density τk\tau_k. We enforce this policy by merging the subarrays of nodes that exceed their threshold density with their siblings and redistributing elements equally in the resulting subarray. Doing so ensures enough gaps between elements so we don't have to pay a hefty cost per insertion.

Insertion

Insertion is really straightforward: to insert some element xx, we perform a binary search on our binary tree to figure out which window wiw_i xx belongs to, and update size(u)\text{size}(u) for each node along the search path. If the window wiw_i is within threshold, there is always a slot for x. Insert x in its slot and rebalance the wiw_i by evenly distributing all elements. If not, go up the tree to find the the first ancestor of wiw_i's node that is in threshhold, insert x and rebalance all elements evenly in that range.

Cost and Running Time

This approach not only achieves an amoritzed cost of O(log2n)O(\log^2{n}) per insertion, but also a running time of O(log2n)O(\log^2{n}).

Implementation

Find my full implementation in C here: GitHub.

The implementation follows the theoretical design described above, with the array split into windows of size log n and a balancing tree managing the density of array segments. You can use it by including the header file and creating a new LLA instance with your desired parameters.

#include "lla.h"

int main() {
    // Create LLA with N=1024, C=8, TAU_0=0.5, TAU_D=0.75
    lla *my_lla = create_lla(1024, 8, 0.5, 0.75);
    
    // Insert elements
    insert(my_lla, 10);
    insert(my_lla, 5);
    insert(my_lla, 15);
    
    // Cleanup
    cleanup_lla(&my_lla);
    return 0;
}

The implementation achieves the theoretical bounds of O(log²n) insertion time and O(log²n) amortized elements moved per insertion.

References

  1. McCauley, S., Moseley, B., Niaparast, A., & Singh, S. (2023). Online List Labeling with Predictions. arXiv:2305.10536