Bandit Approaches for Border Patrol

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Bandit Approaches for Border Patrol"

Transcription

1 Bandit Approaches for Border Patrol STOR-i Conference 2017 Thursday 12 th January James Grant 1, David Leslie 1, Kevin Glazebrook 1, Roberto Szechtman 2 1 Lancaster University; 2 Naval Postgraduate School

2 Border Patrol 12/01/2017 Bandits for Border Patrol 2

3 Border Patrol with Drones 12/01/2017 Bandits for Border Patrol 3

4 When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. 12/01/2017 Bandits for Border Patrol 4

5 When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. Heterogeneous Drones Combinatorial aspect. 12/01/2017 Bandits for Border Patrol 5

6 When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. Heterogeneous Drones Combinatorial aspect. Unknown intensity of events Uncertainty around what is a good/bad allocation. 12/01/2017 Bandits for Border Patrol 6

7 Previous Work Multiple searchers, single event E.g. missing persons, life rafts Focus on collaboration between drones Design of flightpaths etc. Search on a border Szechtmann et al. (2008) 12/01/2017 Bandits for Border Patrol 7

8 What will we consider? Discretisation Multiple drones Sequential problem Looking for a best allocation 12/01/2017 Bandits for Border Patrol 8

9 Modelling the Problem Generation of events Piecewise constant, Nonhomogeneous Poisson Process. Probability of detection Assume to be calculable Depends on: Drone in question Number of cells to search Other exogenous variables? 12/01/2017 Bandits for Border Patrol 9

10 Modelling the Problem Generation of events Piecewise constant, Nonhomogeneous Poisson Process. Probability of detection Assume to be calculable Depends on: Drone in question Number of cells to search Other exogenous variables? We can combine these two to find the expected rate of event detection. 12/01/2017 Bandits for Border Patrol 10

11 Full information problem λ i : rate of NHPP in cell i for i = 1,, m τ i,j,k = P(drone j detects event in cell i if it has to search k cells event has occurred) Can calculate expected # detections for any drone j and subset of cells A j τ i,j, Aj λ i i A j Thus can determine a best action by Integer Programming 12/01/2017 Bandits for Border Patrol 11

12 Reinforcement Learning 12/01/2017 Bandits for Border Patrol 12

13 Exploration v Exploitation With low information, no choice but to explore With high information, freedom to exploit 12/01/2017 Bandits for Border Patrol 13

14 Exploration v Exploitation? With low information, no choice but to explore What do we do in between? With high information, freedom to exploit 12/01/2017 Bandits for Border Patrol 14

15 Multi-Armed Bandits Play one arm in each round receive reward Underlying reward dist. associated with each arm Want to find the optimal arm (highest mean μ ) Challenging due to stochasticity 12/01/2017 Bandits for Border Patrol 15

16 Algorithms Rules for decision making Consider data observed so far Balance exploration (long-term benefit) exploitation (immediate gain) 12/01/2017 Bandits for Border Patrol 16

17 Algorithms Rules for decision making Consider data observed so far Balance exploration (long-term benefit) exploitation (immediate gain) Quality metrics? Expected Regret R n = nμ n t=1 E(μ A t ) Difference between best and selected 12/01/2017 Bandits for Border Patrol 17

18 Regret 12/01/2017 Bandits for Border Patrol 18

19 UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty 12/01/2017 Bandits for Border Patrol 19

20 UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty 12/01/2017 Bandits for Border Patrol 20

21 UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty Example: UCB1 algorithm (Auer et al., 2002) INITIALISATION: Play each arm once LOOP: For each arm calculate μ t = μ t + 2 ln t T i (t) Play maximising arm 12/01/2017 Bandits for Border Patrol 21

22 Back to our problem Rather more complicated setup Combinatorial Poisson reward Thinning of true counts 12/01/2017 Bandits for Border Patrol 22

23 12/01/2017 Bandits for Border Patrol 23

24 12/01/2017 Bandits for Border Patrol 24

25 12/01/2017 Bandits for Border Patrol 25

26 Robust-F-CUCB INITIALISE: Play combinations of arms such that each arm is played once LOOP: For each arm calculate Play the combination of arms that maximises IP wrt μ Observe rewards and update mean estimates 12/01/2017 Bandits for Border Patrol 26

27 Where to now?? 12/01/2017 Bandits for Border Patrol 27

28 Thank you Questions? References: Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-Time Analysis of the Multiarmed Bandit Problem. Machine Learning, 47 (2-3): Bubeck, S., Cesa-Bianchi, N., and Lugosi, G. (2013). Bandits with Heavy Tail. IEEE Transactions on Information Theory, 59 (11): Chen, W., Wang, Y., and Yuan, Y. (2013). Combinatorial Multi-Armed Bandit: General Framework and Applications. In Proceedings of the 30 th International Conference on Machine Learning, 12/01/2017 Bandits for Border Patrol 28