Bandit Approaches for Border Patrol STOR-i Conference 2017 Thursday 12 th January James Grant 1, David Leslie 1, Kevin Glazebrook 1, Roberto Szechtman 2 1 Lancaster University; 2 Naval Postgraduate School
Border Patrol 12/01/2017 Bandits for Border Patrol 2
Border Patrol with Drones 12/01/2017 Bandits for Border Patrol 3
When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. 12/01/2017 Bandits for Border Patrol 4
When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. Heterogeneous Drones Combinatorial aspect. 12/01/2017 Bandits for Border Patrol 5
When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. Heterogeneous Drones Combinatorial aspect. Unknown intensity of events Uncertainty around what is a good/bad allocation. 12/01/2017 Bandits for Border Patrol 6
Previous Work Multiple searchers, single event E.g. missing persons, life rafts Focus on collaboration between drones Design of flightpaths etc. Search on a border Szechtmann et al. (2008) 12/01/2017 Bandits for Border Patrol 7
What will we consider? Discretisation Multiple drones Sequential problem Looking for a best allocation 12/01/2017 Bandits for Border Patrol 8
Modelling the Problem Generation of events Piecewise constant, Nonhomogeneous Poisson Process. Probability of detection Assume to be calculable Depends on: Drone in question Number of cells to search Other exogenous variables? 12/01/2017 Bandits for Border Patrol 9
Modelling the Problem Generation of events Piecewise constant, Nonhomogeneous Poisson Process. Probability of detection Assume to be calculable Depends on: Drone in question Number of cells to search Other exogenous variables? We can combine these two to find the expected rate of event detection. 12/01/2017 Bandits for Border Patrol 10
Full information problem λ i : rate of NHPP in cell i for i = 1,, m τ i,j,k = P(drone j detects event in cell i if it has to search k cells event has occurred) Can calculate expected # detections for any drone j and subset of cells A j τ i,j, Aj λ i i A j Thus can determine a best action by Integer Programming 12/01/2017 Bandits for Border Patrol 11
Reinforcement Learning 12/01/2017 Bandits for Border Patrol 12
Exploration v Exploitation With low information, no choice but to explore With high information, freedom to exploit 12/01/2017 Bandits for Border Patrol 13
Exploration v Exploitation? With low information, no choice but to explore What do we do in between? With high information, freedom to exploit 12/01/2017 Bandits for Border Patrol 14
Multi-Armed Bandits Play one arm in each round receive reward Underlying reward dist. associated with each arm Want to find the optimal arm (highest mean μ ) Challenging due to stochasticity 12/01/2017 Bandits for Border Patrol 15
Algorithms Rules for decision making Consider data observed so far Balance exploration (long-term benefit) exploitation (immediate gain) 12/01/2017 Bandits for Border Patrol 16
Algorithms Rules for decision making Consider data observed so far Balance exploration (long-term benefit) exploitation (immediate gain) Quality metrics? Expected Regret R n = nμ n t=1 E(μ A t ) Difference between best and selected 12/01/2017 Bandits for Border Patrol 17
Regret 12/01/2017 Bandits for Border Patrol 18
UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty 12/01/2017 Bandits for Border Patrol 19
UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty 12/01/2017 Bandits for Border Patrol 20
UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty Example: UCB1 algorithm (Auer et al., 2002) INITIALISATION: Play each arm once LOOP: For each arm calculate μ t = μ t + 2 ln t T i (t) Play maximising arm 12/01/2017 Bandits for Border Patrol 21
Back to our problem Rather more complicated setup Combinatorial Poisson reward Thinning of true counts 12/01/2017 Bandits for Border Patrol 22
12/01/2017 Bandits for Border Patrol 23
12/01/2017 Bandits for Border Patrol 24
12/01/2017 Bandits for Border Patrol 25
Robust-F-CUCB INITIALISE: Play combinations of arms such that each arm is played once LOOP: For each arm calculate Play the combination of arms that maximises IP wrt μ Observe rewards and update mean estimates 12/01/2017 Bandits for Border Patrol 26
Where to now?? 12/01/2017 Bandits for Border Patrol 27
Thank you Questions? References: Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-Time Analysis of the Multiarmed Bandit Problem. Machine Learning, 47 (2-3): 235-256. Bubeck, S., Cesa-Bianchi, N., and Lugosi, G. (2013). Bandits with Heavy Tail. IEEE Transactions on Information Theory, 59 (11): 7711-7717. Chen, W., Wang, Y., and Yuan, Y. (2013). Combinatorial Multi-Armed Bandit: General Framework and Applications. In Proceedings of the 30 th International Conference on Machine Learning, 151-159. @STORiJamesG 12/01/2017 Bandits for Border Patrol 28