
CLCB-DRA-FAIR
CLCB-DRA-FAIR
This project is me trying to build an offline combinatorial bandit setup for patrol allocation without pretending fairness is optional.
Readme link: README.md
What problem I am modeling
I treat urban patrol allocation like a knapsack CMAB problem:
- neighborhoods are arms
K - beats assigned are discrete allocations
a_k - total available beats are budget
Q - hard constraint is
sum_k a_k <= Q
At each round, the policy chooses a discrete allocation vector and gets reward from detected or prevented incidents.
Why offline and not online
Online exploration in this context is expensive and ethically messy. We already have historical data, so I want the agent to learn from offline logs first.
Main data used:
Baseline window is 2019 to 2023 for now.
Fairness side
The core issue is feedback-loop bias. Places with higher patrol volume can look riskier just because they are watched more.
I am testing fairness constraints like:
- minimum coverage guarantees
- population-aware proportional allocation
- concentration caps
Goal is to avoid a policy that gets reward by repeatedly over-targeting the same communities.
Technical framing
I use a knapsack-constrained CMAB with discrete budget space, with an offline (alpha, beta) approximation oracle and corresponding approximation regret framing over T rounds.
The project direction is:
- Build a strong offline CMAB allocator for this discrete setting.
- Add fairness constraints directly in the allocation policy.
- Compare reward and disparity behavior vs naive and CUCB-style baselines.
References
- CMAB for Resource Allocation
- CMAB: General Framework, Results and Applications
- Dual-Mandate Patrols: Multi-Armed Bandits for Green Security
- Group Fairness in Bandit Arm Selection
- Offline Learning for CMAB
Current status
I have reward curves and disparity plots in progress, and the rough trend looks promising. Still early though. A lot of work left around formal fairness definitions and stronger offline evaluation.