Bandit Approaches for Border Patrol

Size: px

Start display at page:

Download "Bandit Approaches for Border Patrol"

Clifton Farmer
5 years ago
Views:

1 Bandit Approaches for Border Patrol STOR-i Conference 2017 Thursday 12 th January James Grant 1, David Leslie 1, Kevin Glazebrook 1, Roberto Szechtman 2 1 Lancaster University; 2 Naval Postgraduate School

2 Border Patrol 12/01/2017 Bandits for Border Patrol 2

3 Border Patrol with Drones 12/01/2017 Bandits for Border Patrol 3

4 When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. 12/01/2017 Bandits for Border Patrol 4

5 When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. Heterogeneous Drones Combinatorial aspect. 12/01/2017 Bandits for Border Patrol 5

6 When is this interesting (mathematically)? When you can t be in all places at once Have to accept detection probabilities <1 somewhere. Heterogeneous Drones Combinatorial aspect. Unknown intensity of events Uncertainty around what is a good/bad allocation. 12/01/2017 Bandits for Border Patrol 6

7 Previous Work Multiple searchers, single event E.g. missing persons, life rafts Focus on collaboration between drones Design of flightpaths etc. Search on a border Szechtmann et al. (2008) 12/01/2017 Bandits for Border Patrol 7

8 What will we consider? Discretisation Multiple drones Sequential problem Looking for a best allocation 12/01/2017 Bandits for Border Patrol 8

9 Modelling the Problem Generation of events Piecewise constant, Nonhomogeneous Poisson Process. Probability of detection Assume to be calculable Depends on: Drone in question Number of cells to search Other exogenous variables? 12/01/2017 Bandits for Border Patrol 9

10 Modelling the Problem Generation of events Piecewise constant, Nonhomogeneous Poisson Process. Probability of detection Assume to be calculable Depends on: Drone in question Number of cells to search Other exogenous variables? We can combine these two to find the expected rate of event detection. 12/01/2017 Bandits for Border Patrol 10

11 Full information problem λ i : rate of NHPP in cell i for i = 1,, m τ i,j,k = P(drone j detects event in cell i if it has to search k cells event has occurred) Can calculate expected # detections for any drone j and subset of cells A j τ i,j, Aj λ i i A j Thus can determine a best action by Integer Programming 12/01/2017 Bandits for Border Patrol 11

12 Reinforcement Learning 12/01/2017 Bandits for Border Patrol 12

13 Exploration v Exploitation With low information, no choice but to explore With high information, freedom to exploit 12/01/2017 Bandits for Border Patrol 13

14 Exploration v Exploitation? With low information, no choice but to explore What do we do in between? With high information, freedom to exploit 12/01/2017 Bandits for Border Patrol 14

15 Multi-Armed Bandits Play one arm in each round receive reward Underlying reward dist. associated with each arm Want to find the optimal arm (highest mean μ ) Challenging due to stochasticity 12/01/2017 Bandits for Border Patrol 15

16 Algorithms Rules for decision making Consider data observed so far Balance exploration (long-term benefit) exploitation (immediate gain) 12/01/2017 Bandits for Border Patrol 16

17 Algorithms Rules for decision making Consider data observed so far Balance exploration (long-term benefit) exploitation (immediate gain) Quality metrics? Expected Regret R n = nμ n t=1 E(μ A t ) Difference between best and selected 12/01/2017 Bandits for Border Patrol 17

18 Regret 12/01/2017 Bandits for Border Patrol 18

19 UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty 12/01/2017 Bandits for Border Patrol 19

20 UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty 12/01/2017 Bandits for Border Patrol 20

21 UCB Algorithms Indices based on: Mean estimate + Uncertainty Measure Play arm with highest index Large indices may reflect Large previous rewards High level of uncertainty Example: UCB1 algorithm (Auer et al., 2002) INITIALISATION: Play each arm once LOOP: For each arm calculate μ t = μ t + 2 ln t T i (t) Play maximising arm 12/01/2017 Bandits for Border Patrol 21

22 Back to our problem Rather more complicated setup Combinatorial Poisson reward Thinning of true counts 12/01/2017 Bandits for Border Patrol 22

23 12/01/2017 Bandits for Border Patrol 23

24 12/01/2017 Bandits for Border Patrol 24

25 12/01/2017 Bandits for Border Patrol 25

26 Robust-F-CUCB INITIALISE: Play combinations of arms such that each arm is played once LOOP: For each arm calculate Play the combination of arms that maximises IP wrt μ Observe rewards and update mean estimates 12/01/2017 Bandits for Border Patrol 26

27 Where to now?? 12/01/2017 Bandits for Border Patrol 27

Thank you Questions? References: Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-Time Analysis of the Multiarmed Bandit Problem. Machine Learning, 47 (2-3): 235-256. Bubeck, S.

28 Thank you Questions? References: Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-Time Analysis of the Multiarmed Bandit Problem. Machine Learning, 47 (2-3): Bubeck, S., Cesa-Bianchi, N., and Lugosi, G. (2013). Bandits with Heavy Tail. IEEE Transactions on Information Theory, 59 (11): Chen, W., Wang, Y., and Yuan, Y. (2013). Combinatorial Multi-Armed Bandit: General Framework and Applications. In Proceedings of the 30 th International Conference on Machine Learning, 12/01/2017 Bandits for Border Patrol 28

Errata Summary. Comparison of the Original Results with the New Results

Errata Summary. Comparison of the Original Results with the New Results Errata for Karim and Beardsley (2016), Explaining Sexual Exploitation and Abuse in Peacekeeping Missions: The Role of Female Peacekeepers and Gender Equality in Contributing Countries, Journal of Peace