Bandit Approaches for Border Patrol


 Clifton Farmer
 6 months ago
 Views:
Transcription
1 Bandit Approaches for Border Patrol STORi Conference 2017 Thursday 12 th January James Grant 1, David Leslie 1, Kevin Glazebrook 1, Roberto Szechtman 2 1 Lancaster University; 2 Naval Postgraduate School
2 Border Patrol 12/01/2017 Bandits for Border Patrol 2
3 Border Patrol with Drones 12/01/2017 Bandits for Border Patrol 3
4 When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. 12/01/2017 Bandits for Border Patrol 4
5 When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. Heterogeneous Drones Combinatorial aspect. 12/01/2017 Bandits for Border Patrol 5
6 When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. Heterogeneous Drones Combinatorial aspect. Unknown intensity of events Uncertainty around what is a good/bad allocation. 12/01/2017 Bandits for Border Patrol 6
7 Previous Work Multiple searchers, single event E.g. missing persons, life rafts Focus on collaboration between drones Design of flightpaths etc. Search on a border Szechtmann et al. (2008) 12/01/2017 Bandits for Border Patrol 7
8 What will we consider? Discretisation Multiple drones Sequential problem Looking for a best allocation 12/01/2017 Bandits for Border Patrol 8
9 Modelling the Problem Generation of events Piecewise constant, Nonhomogeneous Poisson Process. Probability of detection Assume to be calculable Depends on: Drone in question Number of cells to search Other exogenous variables? 12/01/2017 Bandits for Border Patrol 9
10 Modelling the Problem Generation of events Piecewise constant, Nonhomogeneous Poisson Process. Probability of detection Assume to be calculable Depends on: Drone in question Number of cells to search Other exogenous variables? We can combine these two to find the expected rate of event detection. 12/01/2017 Bandits for Border Patrol 10
11 Full information problem λ i : rate of NHPP in cell i for i = 1,, m τ i,j,k = P(drone j detects event in cell i if it has to search k cells event has occurred) Can calculate expected # detections for any drone j and subset of cells A j τ i,j, Aj λ i i A j Thus can determine a best action by Integer Programming 12/01/2017 Bandits for Border Patrol 11
12 Reinforcement Learning 12/01/2017 Bandits for Border Patrol 12
13 Exploration v Exploitation With low information, no choice but to explore With high information, freedom to exploit 12/01/2017 Bandits for Border Patrol 13
14 Exploration v Exploitation? With low information, no choice but to explore What do we do in between? With high information, freedom to exploit 12/01/2017 Bandits for Border Patrol 14
15 MultiArmed Bandits Play one arm in each round receive reward Underlying reward dist. associated with each arm Want to find the optimal arm (highest mean μ ) Challenging due to stochasticity 12/01/2017 Bandits for Border Patrol 15
16 Algorithms Rules for decision making Consider data observed so far Balance exploration (longterm benefit) exploitation (immediate gain) 12/01/2017 Bandits for Border Patrol 16
17 Algorithms Rules for decision making Consider data observed so far Balance exploration (longterm benefit) exploitation (immediate gain) Quality metrics? Expected Regret R n = nμ n t=1 E(μ A t ) Difference between best and selected 12/01/2017 Bandits for Border Patrol 17
18 Regret 12/01/2017 Bandits for Border Patrol 18
19 UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty 12/01/2017 Bandits for Border Patrol 19
20 UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty 12/01/2017 Bandits for Border Patrol 20
21 UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty Example: UCB1 algorithm (Auer et al., 2002) INITIALISATION: Play each arm once LOOP: For each arm calculate μ t = μ t + 2 ln t T i (t) Play maximising arm 12/01/2017 Bandits for Border Patrol 21
22 Back to our problem Rather more complicated setup Combinatorial Poisson reward Thinning of true counts 12/01/2017 Bandits for Border Patrol 22
23 12/01/2017 Bandits for Border Patrol 23
24 12/01/2017 Bandits for Border Patrol 24
25 12/01/2017 Bandits for Border Patrol 25
26 RobustFCUCB INITIALISE: Play combinations of arms such that each arm is played once LOOP: For each arm calculate Play the combination of arms that maximises IP wrt μ Observe rewards and update mean estimates 12/01/2017 Bandits for Border Patrol 26
27 Where to now?? 12/01/2017 Bandits for Border Patrol 27
28 Thank you Questions? References: Auer, P., CesaBianchi, N., and Fischer, P. (2002). FiniteTime Analysis of the Multiarmed Bandit Problem. Machine Learning, 47 (23): Bubeck, S., CesaBianchi, N., and Lugosi, G. (2013). Bandits with Heavy Tail. IEEE Transactions on Information Theory, 59 (11): Chen, W., Wang, Y., and Yuan, Y. (2013). Combinatorial MultiArmed Bandit: General Framework and Applications. In Proceedings of the 30 th International Conference on Machine Learning, 12/01/2017 Bandits for Border Patrol 28