Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The evidential value of mobile phone colocation

517 views

Published on

Criminals are well aware that making a phone call leaves a trace behind, which might later be used by police, and later still, in court. Therefore they will often switch phones, and preferably use more or less anonymous phones for "business". However, at the same time as they are using one phone for their work activities, they are possibly using another phone for legitimate business or for ordinary private purposes. This leads to the phenomenon called "co-location": two mobile phones apparently moving together, each separately making calls, but as if the two phones are in the same hands.

How can one find phones, and then co-locating phones, associated with some crime? Can one decide from a short history of apparent co-location whether or not the two phones were in the same hands? How strong is the weight of the evidence in discriminating between two hypotheses: the phones colocate by chance (defence hypothesis) or they colocate because they are in the same hands (prosecution hypothesis)? We have to distinguish two phases of "research": exploratory (criminal investigation) and confirmatory (criminal prosecution). I discuss the roles of statistics in these two phases of forensic statistical analysis of mobile phone co-location.

Published in: Science, Technology, Business
  • Be the first to comment

  • Be the first to like this

The evidential value of mobile phone colocation

  1. 1. The evidential value of mobile phone co-location Richard Gill Mathematical Institute, Leiden University http://www.math.leidenuniv.nl/~gill Joint work with Helena van Eijck (master thesis, Statistical Science programme) http://www.math.leidenuniv.nl/nl/theses/382/ Data Science Meetup Utrecht, 23 January 2014
  2. 2. The chance of coincidence? • DNA match • Finger print match • Handwriting match • ... and so on ... Match probability = P(Coincidence | Hdefence); or better, ! Likelihood Ratio (LR) = P(Coincidence | Hdefence): P(Coincidence | Hprosecution)
  3. 3. Mobile phone co-location • Mobile phone co-location: two cell phones used over a long time period in a way consistent with them being carried by one person
  4. 4. Visualisation (simulated data) (all analysis done in , of course)
  5. 5. Visualisation (simulated data)
  6. 6. Hariri Case • 14 February 2005: assassination, Beirut • Lebanon Police investigation, continued by UNIIIC (2005), and STL (2009) • 2011: STL publishes indictment • 2014: trial opens “The case against the Accused is built in large part on circumstantial evidence. Circumstantial evidence, which works logically by inference and deduction, is often more reliable than direct evidence, which can suffer from first-hand memory loss or eye-witness distortion. It is a recognised legal principle that circumstantial evidence has similar weight and probative value as direct evidence and that circumstantial evidence can be stronger than direct evidence.”
  7. 7. http://www.stl-tsl.org/en/the-cases/stl-11-01
  8. 8. Analysis of CDR revealed co-locating phones ... • “Red network” phones associated with surveillance and assassination (covert: anonymous & closed) • “Blue network” phones associated with logistics, preparation (anonymous but open) • “Green network” phones associated with chain of command (covert) • PMP’s (personal mobile phones) • ... “Call Data Records”: Per call: Cell towers, time, phone numbers
  9. 9. How they found co-locating phones • Given: a “target phone” (already associated with crime) • Select notable patterns of movement • Look for candidate co-locators (match same pattern) • Follow-up the “hits” in time: do they de-co-locate? (look for an anomaly)
  10. 10. Issues • Texas sharp-shooter (testing a hypothesis suggested by the data) • Likelihood ratio: needs two models • Is a model of typical behaviour relevant to evaluation of specific case? • Is a sample from the population relevant to evaluation of a specific case?
  11. 11. Our approach • Part I: investigate reliability of search procedure • Part II: quantify evidential value of each specific pair of co-locating phones using permutation approach
  12. 12. Interlude: How good is the data? • Intermittency • Inaccuracy
  13. 13. Does CDR data uniquely characterise you? Unique in the Crowd: The privacy bounds of human mobility Yves-Alexandre de Montjoye1,2 , Ce´sar A. Hidalgo1,3,4 , Michel Verleysen2 & Vincent D. Blondel2,5 1 Massachusetts Institute of Technology, Media Lab, 20 Ames Street, Cambridge, MA 02139 USA, 2 Universite´ catholique de Louvain, Institute for Information and Communication Technologies, Electronics and Applied Mathematics, Avenue Georges Lemaıˆtre 4, B-1348 Louvain-la-Neuve, Belgium, 3 Harvard University, Center for International Development, 79 JFK Street, Cambridge, MA 02138, USA, 4 Instituto de Sistemas Complejos de Valparaı´so, Paseo 21 de Mayo, Valparaı´so, Chile, 5 Massachusetts Institute of Technology, Laboratory for Information and Decision Systems, 77 Massachusetts Avenue, Cambridge, MA 02139, USA. We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a : S S S E d 2 d 3 d NATURE/SCIENTIFIC REPORTS March 2013
  14. 14. Does CDR data uniquely characterise you? NATURE/SCIENTIFIC REPORTS March 2013 Figure 2 | (A) Ip52 means that the information available to the attacker consist of two 7am-8am spatio-temp was in zone I between 9am to 10am and in zone II between 12pm to 1pm. In this example, the traces of tw www Unique in the Crowd: The privacy bounds of human mobility Yves-Alexandre de Montjoye1,2 , Ce´sar A. Hidalgo1,3,4 , Michel Verleysen2 & Vincent D. Blondel2,5 1 Massachusetts Institute of Technology, Media Lab, 20 Ames Street, Cambridge, MA 02139 USA, 2 Universite´ catholique de Louvain, Institute for Information and Communication Technologies, Electronics and Applied Mathematics, Avenue Georges Lemaıˆtre 4, B-1348 Louvain-la-Neuve, Belgium, 3 Harvard University, Center for International Development, 79 JFK Street, Cambridge, MA 02138, USA, 4 Instituto de Sistemas Complejos de Valparaı´so, Paseo 21 de Mayo, Valparaı´so, Chile, 5 Massachusetts Institute of Technology, Laboratory for Information and Decision Systems, 77 Massachusetts Avenue, Cambridge, MA 02139, USA. We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent fundamental constraints to an individual’s privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals. D erived from the Latin Privatus, meaning ‘‘withdraw from public life,’’ the notion of privacy has been foundational to the development of our diverse societies, forming the basis for individuals’ rights such as free speech and religious freedom1 . Despite its importance, privacy has mainly relied on informal pro- tection mechanisms. For instance, tracking individuals’ movements has been historically difficult, making them de-facto private. For centuries, information technologies have challenged these informal protection mechanisms. S: CS CS CS CE ed 12 ed 13 ed 13 nd als to mit. du)
  15. 15. How accurate is CDR location? • “Deventer murder case”: under “exceptional” atmospheric conditions, a cell phone uses a cell tower 25 Km away, rather than close-by cell towers Forensic Statistics and Graphical Models: Deventer moordzaak, phonecall A28 Maikel Bargpeter February 3, 2012 This analysis is mainly based on ’Leugens over Louwes’. The main reason Louwes got involved in the Deventer moordzaak is that he was the accountant of Mw. Wittenberg and called her on his mobile phone right before the killing. According to Louwes he was on the highway A28, 25 km away from Deventer where the murder took place. So he claims that he is not the killer. claim it is very unlikely such a connection from the A28 could be mad Unfortunately most of the research can not be integrated into the g model at first sight. The only way out is: the normal conditions which might be absent at of the phonecall. Hans Meijer looked up reports at a institute in the U find that around that time and place these special conditions did happe atmosphere.
  16. 16. How accurate is CDR location? • Deventer murder case: under “exceptional” atmospheric conditions, a cell phone uses a cell tower 25 Km away, rather than close-by cell towers event zijn de kansen dat dit matcht met de verdachte geschat op 0.60 net als voor ouders en ander woonachtig in nabijheid van ouders, voor A op 0.25, M 0.40 en de ander niet woonachtig nabij ouderlijk huis op 0.25. 6.1.11 Event 11 De vijf berichten van 9 oktober die dit event kenmerken, hebben binnen een drie kwartier plaats gevonden tussen half twee en kwart over twee, waarvan drie keer de zendmast gelegen aan de Reinaert de Vosstraat is aangestraald en de zendmasten gelegen aan de Hugo de Grootkade en Donker Curtiusstraat zijn beiden eenmalig aangestraald. De zendmasten blijken rondom de woning van M (paarse punt) te liggen, waarvan de meest aangestraalde zendmast het verst weg is gesitueerd. Gegeven de locaties van de zendmasten is het meest aannemelijk dat dit matcht het meest met M en is daarom ook geschat op 0.70. Voor alle andere is dit minder aannemelijk maar niet onwaarschijnlijk is en daarom zijn de kansen van de anderen op 0.40 geschat. 6.1.12 Event 12 Dit event telt 20 berichten en is verspreidt over drie dagen. In de ochtend en de avond van de eerste dag worden de zendmasten nabij het ouderlijk huis aangestraald. De daaropvolgende dag zijn de zendmasten in Duivendrecht en Purmerend aangestraald. De gebruiker van de telefoon kan hier niet mee geïdentificeerd worden, maar uit de berichten kan wel worden opgemaakt dat de dag erop een transactie 9 Donker Curtiusstraat, welke gelegen is nabij de woning van M, aangestraald. Gegeven dat ‘s ochtends de telefoon aangestraald is nabij het ouderlijk huis en twee dagen later nabij het pand waar de verdachte een week eerder een offerte voor een lening heeft ontvangen, is de kans dat hij de telefoon in zijn bezit heeft geschat op 0.8. Voor zijn ouders is het minder aannemelijk dat zij bij de Diopter zijn wezen kijken en daarna via Almere terug naar Amsterdam, is de kans dat zij de telefoon in hun bezit hebben geschat op 0.65. Voor K1 hebben we de kans geschat op 0.55. Dit event wijst niet direct naar A of M. Daarom hebben we hun kansen op 0.25 geschat. Voor K2 is het nog lager, namelijk 0.20. 6.1.13 Event 13 Het enige bericht dat is verzonden is verstuurd in de nabijheid van Rijnstraat 35 in Amsterdam. Deze aangestraalde zendmast ligt in de buurt van een doorlopende weg en is mogelijk in de richting van de woning van broer A. Omdat dit niet heel nauwkeurig is, hebben we besloten dit bericht niet in de verdere analyse mee te nemen. 6.1.14 Event 14 Dit event bevat vijf berichten. Bij één bericht is de locatie niet bekend. ‘s Ochtends is de telefoon aangestraald nabij het ouderlijk huis. Twee uur later worden twee verschillende zendmasten aangestraald in dezelfde minuut. Dit zijn de zendmasten Den Briel straat en de Donker Curtiusstraat te Amsterdam. Een mogelijke verklaring is dat de gebruiker van de telefoon onderweg is vanaf de snelweg (A10) riching de binnenstad van Amsterdam. Een andere verklaring zou kunnen zijn, dat de gebruiker van de telefoon op dat moment boodschappen aan het doen zou zijn op de Centrale Markt, gelegen in het grijze gebied tussen de locaties van de twee zendmasten in. Dit zou overeen kunnen komen met het profiel van M, event 11 event 14 An Amsterdam drugs case – 2 of 19 events blue = cell towers, purple = addresses associated with suspect
  17. 17. How accurate is CDR location? • Deventer murder case: under “exceptional” atmospheric conditions, a cell phone uses a cell tower 25 Km away, rather than close-by cell towers RDG, 12 August 2012 Data: Google latitude; my trip: train
  18. 18. End of interlude. Now: Our approach • Part I: investigate reliability of search procedure • Part II: quantify evidential value of specific pairs of co-locating phones using permutation approach
  19. 19. Part I: the experiment • Chose one target phone from case • Identified all notable three-point patterns of movement • Identified all matches (“hits”) to each pattern • Followed each hit forwards in time to first dis- location event (“anomaly”)
  20. 20. Part I • Measure mobility, and (phone) activity, of hit and of target, in first four days • Mobility: Km travelled • Activity: number of calls • Investigate relation between these four variables and time to first anomaly for our sample of hits
  21. 21. Summary • Dichotomise each of four variables (“high” vs “low”) • Score each hit by number of highs (0 to 4)
  22. 22. Chance of anomaly per day is roughly constant Joint Exponential Fit
  23. 23. Chance of anomaly per day is roughly constant • Very high: sum score 3 and 4: half life (of time to anomaly) is one day • Medium: sum score 2: half life is two days • Very low: sum score 0 and 1: half life is four days
  24. 24. If we believe this, then ... • no anomaly for 10 half lives: 1 in a thousand • no anomaly for 20 half lives: 1 in a million
  25. 25. Conclusion of part 1 • The “chance of coincidence” depends strongly on individual characteristics of particular two phones • The investigative procedure is reliable • first, identify suspects (pattern-hits which continue to colocate a few days) • second, confirm suspects (long term follow-up) • … so we needn’t worry about Texas sharpshooter (we’ll analyse long term follow-up data) • We do have a major reference class problem
  26. 26. Part 2 • Take two co-locating phones: could this be coincidence? • We need to compare the observed history of a pair of phones with that of similar pairs of phones of different persons • Especially: similar activity, similar mobility, frequenting the same locations • Assumption: if two persons are completely unrelated then we may as well compare Mr X Day A with Mr Y Day B, as Mr X Day A with Mr Y Day A
  27. 27. Problems • “Completely unrelated but similar” persons do live in the same neighbourhood, work in the same neighbourhood, frequent the same shops, cafés, places of worship, beach clubs, sporting events, ... • We should condition on confounders (all days are not exchangeable) • Problem of observational (as opposed to experimental) studies: the unknown unknowns
  28. 28. Our solution • Compare history of phone X with artificial histories like phone Y ’s, obtained by permuting (shuffling)Y ’s days • Shuffle weekdays and weekend-days separately • Distance between two histories: total kilometers between consecutive calls on same day of different phones • Note: “artificial histories” need not be realistic in all respects – they should just be realistic in relevant respects
  29. 29. Original vs. Shuffled (simulated data)
  30. 30. Findings • Discovered co-locations are statistically very significant • In retrospect we could better have used a different similarity measure, etc… • We reported to the court exactly what we did do, and all that we did do
  31. 31. Future research • Invent better distance measure (model based LR?) for higher power (note: not for validity) • Should refine permutation procedure (shuffled histories may be unrealistic when overnight location can vary) • As we condition on more confounders, reference population shrinks, prior probabilities change – relevant evidence moves out of our analysis but is still relevant

×