CLCB-DRA-FAIR

March 10, 2026

PythonReinforcement LearningResearch

CLCB-DRA-FAIR

This project is me trying to build an offline combinatorial bandit setup for patrol allocation without pretending fairness is optional.

Readme link: README.md

What problem I am modeling

I treat urban patrol allocation like a knapsack CMAB problem:

neighborhoods are arms K
beats assigned are discrete allocations a_k
total available beats are budget Q
hard constraint is sum_k a_k <= Q

At each round, the policy chooses a discrete allocation vector and gets reward from detected or prevented incidents.

Why offline and not online

Online exploration in this context is expensive and ethically messy. We already have historical data, so I want the agent to learn from offline logs first.

Main data used:

Baseline window is 2019 to 2023 for now.

Fairness side

The core issue is feedback-loop bias. Places with higher patrol volume can look riskier just because they are watched more.

I am testing fairness constraints like:

minimum coverage guarantees
population-aware proportional allocation
concentration caps

Goal is to avoid a policy that gets reward by repeatedly over-targeting the same communities.

Technical framing

I use a knapsack-constrained CMAB with discrete budget space, with an offline (alpha, beta) approximation oracle and corresponding approximation regret framing over T rounds.

The project direction is:

Build a strong offline CMAB allocator for this discrete setting.
Add fairness constraints directly in the allocation policy.
Compare reward and disparity behavior vs naive and CUCB-style baselines.

References

Current status

I have reward curves and disparity plots in progress, and the rough trend looks promising. Still early though. A lot of work left around formal fairness definitions and stronger offline evaluation.

View on GitHub

← Back to Portfolio