
Entanglement Routing as a Combinatorial Restless Bandit Problem
So I've been spending a lot of time lately thinking about entanglement routing in quantum networks, and honestly the more I dig into it the more I realize how fundamentally different it is from anything we do in classical networking. This post is a high-level walkthrough of a research direction I've been working on, modeling the whole thing as a combinatorial restless multi-armed bandit (coRMAB) and solving it with SEQUOIA.
Let me back up a bit.
The Problem
In a quantum network, you've got nodes that can generate entanglement with their neighbors, but if two nodes aren't directly connected, you need to route entanglement through intermediate nodes using entanglement swapping. Sounds a lot like classical routing right? Just find the shortest path and call it a day?
Nah. It doesn't work like that.
The issue is that the success probability and fidelity of your end-to-end entangled pair depends on the entire path — every single link, every swap operation, all of it. There's no subpath optimality here. A prefix of a good path isn't necessarily good on its own. So Dijkstra's algorithm? Bellman-Ford? Useless. The cost structure is nonlinear and non-separable, which basically breaks every assumption classical shortest-path algorithms rely on.
And it gets worse when you have multiple requests competing for the same links. Picking a path for one request changes the success probabilities for other requests because they share edges with finite capacity. Everything is coupled.
Why Existing RL Approaches Fall Short
There's been some interesting work using deep RL for this. Deep Q-learning, DQN-based methods, GNN-augmented Q-learning, etc. The common pattern is they break the routing decision into a sequence of local hop-by-hop decisions because evaluating Q-values over all possible paths is intractable.
But here's the thing: that local decomposition is basically an assumption that global path quality can be recovered from local choices. And for entanglement routing that assumption just... doesn't hold. The value of a path emerges at the global level. You can't see it by looking at individual edges. So these methods end up relying on heuristic approximations that miss the actual structure of the problem.
Modeling it as a coRMAB
The formulation I've been working with treats each node in the network as a restless arm whose state evolves over time regardless of whether you select it. At each timestep you pick a combinatorial action — which in our case is a full entanglement path, from a feasible set constrained by the network topology and path length limits.
The key thing that makes this a combinatorial restless bandit with strong coupling is that the transition dynamics and rewards depend on the entire action vector, not just individual arms. You can't just solve each arm independently and stitch things together (which is what Whittle index approaches try to do for weakly coupled problems). The physics of entanglement routing forces you to reason about paths as atomic objects.
The optimal value function looks like what you'd expect from a Bellman equation, but the max is over this combinatorial action space which is enormous. Direct enumeration is out of the question.
Enter SEQUOIA
This is the part that I think is genuinely cool. SEQUOIA sidesteps the whole "approximate the combinatorial max with local heuristics" thing by encoding the Q-network (which is just a ReLU net) as a mixed-integer linear program (MILP). The feasibility constraints on paths get folded directly into the MILP. So instead of doing some greedy or beam-search approximation over paths, you get exact inference over the full combinatorial space:
No enumeration, no local decomposition. The MILP solver handles it. Obviously there are scalability questions here (MILP is NP-hard in general), but for the network sizes we care about right now it's very much tractable, and it gives you decisions that are actually aligned with how entanglement routing physics works.
The Architecture
I'm proposing a four-layer architecture that sits at the network layer:
- State Estimation: keeps track of link success probabilities, fidelities, memory availability using local observations
- Routing and Scheduling: this is where the coRMAB + SEQUOIA stuff lives, computing path-level decisions
- Execution: handles the actual entanglement generation, storage, swapping, timing
- Feedback and Update: collects outcomes and feeds them back to update state estimates
The whole thing is designed to be hardware-agnostic. Whether you're running superconducting qubits, trapped ions, photonics, whatever, the routing layer only sees abstracted performance metrics. It doesn't care about the underlying physics of the nodes.
I'm mainly targeting the long-distance regime where photon loss dominates, memories decohere, and synchronization is a nightmare. That's where routing decisions actually matter the most. In short-distance settings the links are good enough that you're mostly just doing scheduling and contention management.
What's Next
The research plan has three thrusts: nailing down the coRMAB formulation, integrating SEQUOIA with quantum network routing constraints, and then running empirical evaluations against existing baselines using simulators like NetSquid and SeQUeNCe. I also want to study how routing interacts with quantum error correction and entanglement distillation, which adds a whole other layer of complexity.
Still early days but I'm pretty optimistic about this direction. The fundamental insight — that entanglement routing needs path-level combinatorial reasoning, not local hop-by-hop heuristics, feels right to me, and SEQUOIA gives us a principled way to actually do that reasoning.
More updates as things progress. If you have thoughts or want to chat about any of this, feel free to reach out.
References
- Z.-P. Xu, J. I. de Vicente, L.-L. Sun, and S. Yu. "Quantum network-entanglement measures." Quantum, 9:1736, 2025. Link
- A. Abane, M. Cubeddu, V. S. Mai, and A. Battou. "Entanglement routing in quantum networks: A comprehensive survey," 2024. arXiv:2408.01234
- L. Le and T. N. Nguyen. "DQRA: Deep quantum routing agent for entanglement routing in quantum networks." IEEE Transactions on Quantum Engineering, 3:1-12, 2022.
- L. Jallow and M. I. Khan. "Adaptive entanglement routing with deep Q-networks in quantum networks," 2025. arXiv:2503.02895
- T. Meuser, J. Weil, A. Lahiri, and M. Paraschiv. "RELIQ: Scalable entanglement routing via reinforcement learning in quantum networks," 2025. arXiv:2511.22321
- L. Xu, B. Wilder, E. B. Khalil, and M. Tambe. "Reinforcement learning with combinatorial actions for coupled restless bandits," 2025. arXiv:2503.01919
- M. Fischetti and J. Jo. "Deep neural networks and mixed integer linear optimization." Constraints, 23(3):296-309, 2018. Link
- L. Xiao. "Combinatorial restless multi-armed bandits," 2024. GitHub