A Quick Rundown of the Online List Labeling Problem

June 18, 2023

The Book Shelf Problem

Imagine I invite you over to my house one day and show you my rather small home library. And we decide to play this game:

I empty out my library, and then I give you the books one by one, asking you to put them back in my library in alphabetical order, based on the first letter of each book's title, in the most efficient manner. To see what efficient means in this context, imagine the following scenario:

I give you The Bible, then Voltaire, and you decide to place them at the start of the very first bookshelf in this order: The Bible, Voltaire. Then I hand you Bertrand Russel, and if, say, we're sorting alphabetically by the first letter, you now have to move both The Bible and Voltaire to place Bertrand Russel at the very start of the first bookshelf.

In this scenario, you have to "pay" two movements in order to place Bertrand Russel. This notion of cost is exactly what we're trying to minimize in this context.

To put this more precisely: your goal is to minimize the aggregate number of movements over the total number of inserts I'll ask you to do — you are trying to minimize the amortized cost per insert.

Formal Problem Definition

You are given, online, a set of numbers $n$ , taken from some universe $U$ , and your goal is to place them in an array of size $m > n$ . The objective is to minimize the amortized cost per insertion/deletion, where cost is defined by the number of elements moved per insertion or deletion.

What is a List Labeling Array?

One of the oldest solutions to the Online List Labeling (OLL) problem dates back to 1981, when Alon Itai et al. designed a priority queue with $O(n \log^2 n)$ total cost for $n$ insertions. This paper would go on to provide a sufficient solution to the OLL problem: just set the priority $p$ of each element $x$ inserted as $p = x$ , and you get a solution to the OLL problem with $O(\log^2 n)$ amortized cost per operation.

Here's a simple way to think about a list labeling array:

A Simple List Labeling Array Implementation

Define an array $A$ of size $A = c \times n$ for some constant $c > 1$ . Split $A$ into windows of size $\log n$ .

These windows form the leaves of an implicit binary tree built on top of the $\frac{c n}{\log n}$ leaves. The root contains the range $[1, c n]$ in the array, and each internal node contains all the elements in its subtree.

For each node $u$ in the tree, define $\text{size}(u)$ as the number of elements in its subarray. We define $\tau_i$ to be the threshold density at depth $i$ . In our case, the binary tree has depth $d = \log \frac{c n}{\log n}$ .

Let $\tau_0, \tau_d > 0$ be two constants, and define $\tau_i$ by the following recursive formula:

\tau_i = \tau_0 + (\tau_d - \tau_0) \frac{i}{d}

The core idea is to prevent each node $u$ from exceeding its predefined threshold density $\tau_k$ . We enforce this policy by merging the subarrays of nodes that exceed their threshold density with their siblings and redistributing elements equally in the resulting subarray. Doing so ensures enough gaps between elements so we don't have to pay a hefty cost per insertion.

Insertion

Insertion is really straightforward: to insert some element $x$ , we perform a binary search on our binary tree to figure out which window $w_i$ $x$ belongs to, and update $\text{size}(u)$ for each node along the search path. If the window $w_i$ is within threshold, there is always a slot for x. Insert x in its slot and rebalance the $w_i$ by evenly distributing all elements. If not, go up the tree to find the the first ancestor of $w_i$ 's node that is in threshhold, insert x and rebalance all elements evenly in that range.

Cost and Running Time

This approach not only achieves an amoritzed cost of $O(\log^2{n})$ per insertion, but also a running time of $O(\log^2{n})$ .

Implementation

Find my full implementation in C here: GitHub.

The implementation follows the theoretical design described above, with the array split into windows of size log n and a balancing tree managing the density of array segments. You can use it by including the header file and creating a new LLA instance with your desired parameters.

#include "lla.h"

int main() {
    // Create LLA with N=1024, C=8, TAU_0=0.5, TAU_D=0.75
    lla *my_lla = create_lla(1024, 8, 0.5, 0.75);
    
    // Insert elements
    insert(my_lla, 10);
    insert(my_lla, 5);
    insert(my_lla, 15);
    
    // Cleanup
    cleanup_lla(&my_lla);
    return 0;
}

The implementation achieves the theoretical bounds of O(log²n) insertion time and O(log²n) amortized elements moved per insertion.

References

McCauley, S., Moseley, B., Niaparast, A., & Singh, S. (2023). Online List Labeling with Predictions. arXiv:2305.10536