Criminals are well aware that making a phone call leaves a trace behind, which might later be used by police, and later still, in court. Therefore they will often switch phones, and preferably use more or less anonymous phones for "business". However, at the same time as they are using one phone for their work activities, they are possibly using another phone for legitimate business or for ordinary private purposes. This leads to the phenomenon called "co-location": two mobile phones apparently moving together, each separately making calls, but as if the two phones are in the same hands.
How can one find phones, and then co-locating phones, associated with some crime? Can one decide from a short history of apparent co-location whether or not the two phones were in the same hands? How strong is the weight of the evidence in discriminating between two hypotheses: the phones colocate by chance (defence hypothesis) or they colocate because they are in the same hands (prosecution hypothesis)? We have to distinguish two phases of "research": exploratory (criminal investigation) and confirmatory (criminal prosecution). I discuss the roles of statistics in these two phases of forensic statistical analysis of mobile phone co-location.
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
The evidential value of mobile phone colocation
1. The evidential value of
mobile phone co-location
Richard Gill
Mathematical Institute, Leiden University
http://www.math.leidenuniv.nl/~gill
Joint work with Helena van Eijck
(master thesis, Statistical Science programme)
http://www.math.leidenuniv.nl/nl/theses/382/
Data Science Meetup
Utrecht, 23 January 2014
2. The chance of coincidence?
• DNA match
• Finger print match
• Handwriting match
• ... and so on ...
Match probability =
P(Coincidence | Hdefence); or better,
!
Likelihood Ratio (LR) =
P(Coincidence | Hdefence): P(Coincidence | Hprosecution)
3. Mobile phone co-location
• Mobile phone co-location: two cell phones used
over a long time period in a way consistent with
them being carried by one person
6. Hariri Case
• 14 February 2005: assassination, Beirut
• Lebanon Police investigation, continued by UNIIIC
(2005), and STL (2009)
• 2011: STL publishes indictment
• 2014: trial opens
“The case against the Accused is built in large part on circumstantial evidence.
Circumstantial evidence, which works logically by inference and deduction, is often
more reliable than direct evidence, which can suffer from first-hand memory loss or
eye-witness distortion. It is a recognised legal principle that circumstantial
evidence has similar weight and probative value as direct evidence and that
circumstantial evidence can be stronger than direct evidence.”
8. Analysis of CDR revealed
co-locating phones ...
• “Red network” phones associated with surveillance and
assassination (covert: anonymous & closed)
• “Blue network” phones associated with logistics,
preparation (anonymous but open)
• “Green network” phones associated with chain of
command (covert)
• PMP’s (personal mobile phones)
• ...
“Call Data Records”: Per call: Cell towers, time, phone numbers
9. How they found co-locating
phones
• Given: a “target phone” (already associated with crime)
• Select notable patterns of movement
• Look for candidate co-locators (match same pattern)
• Follow-up the “hits” in time: do they de-co-locate?
(look for an anomaly)
10. Issues
• Texas sharp-shooter (testing a hypothesis
suggested by the data)
• Likelihood ratio: needs two models
• Is a model of typical behaviour relevant to
evaluation of specific case?
• Is a sample from the population relevant to
evaluation of a specific case?
11. Our approach
• Part I: investigate reliability of search procedure
• Part II: quantify evidential value of each specific pair
of co-locating phones using permutation approach
13. Does CDR data uniquely
characterise you?
Unique in the Crowd: The privacy bounds
of human mobility
Yves-Alexandre de Montjoye1,2
, Ce´sar A. Hidalgo1,3,4
, Michel Verleysen2
& Vincent D. Blondel2,5
1
Massachusetts Institute of Technology, Media Lab, 20 Ames Street, Cambridge, MA 02139 USA, 2
Universite´ catholique de
Louvain, Institute for Information and Communication Technologies, Electronics and Applied Mathematics, Avenue Georges
Lemaıˆtre 4, B-1348 Louvain-la-Neuve, Belgium, 3
Harvard University, Center for International Development, 79 JFK Street,
Cambridge, MA 02138, USA, 4
Instituto de Sistemas Complejos de Valparaı´so, Paseo 21 de Mayo, Valparaı´so, Chile,
5
Massachusetts Institute of Technology, Laboratory for Information and Decision Systems, 77 Massachusetts Avenue, Cambridge,
MA 02139, USA.
We study fifteen months of human mobility data for one and a half million individuals and find that human
mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly,
and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are
enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a
:
S
S
S
E
d
2
d
3
d
NATURE/SCIENTIFIC REPORTS March 2013
14. Does CDR data uniquely
characterise you?
NATURE/SCIENTIFIC REPORTS March 2013
Figure 2 | (A) Ip52 means that the information available to the attacker consist of two 7am-8am spatio-temp
was in zone I between 9am to 10am and in zone II between 12pm to 1pm. In this example, the traces of tw
www
Unique in the Crowd: The privacy bounds
of human mobility
Yves-Alexandre de Montjoye1,2
, Ce´sar A. Hidalgo1,3,4
, Michel Verleysen2
& Vincent D. Blondel2,5
1
Massachusetts Institute of Technology, Media Lab, 20 Ames Street, Cambridge, MA 02139 USA, 2
Universite´ catholique de
Louvain, Institute for Information and Communication Technologies, Electronics and Applied Mathematics, Avenue Georges
Lemaıˆtre 4, B-1348 Louvain-la-Neuve, Belgium, 3
Harvard University, Center for International Development, 79 JFK Street,
Cambridge, MA 02138, USA, 4
Instituto de Sistemas Complejos de Valparaı´so, Paseo 21 de Mayo, Valparaı´so, Chile,
5
Massachusetts Institute of Technology, Laboratory for Information and Decision Systems, 77 Massachusetts Avenue, Cambridge,
MA 02139, USA.
We study fifteen months of human mobility data for one and a half million individuals and find that human
mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly,
and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are
enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a
formula for the uniqueness of human mobility traces given their resolution and the available outside
information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10
power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent
fundamental constraints to an individual’s privacy and have important implications for the design of
frameworks and institutions dedicated to protect the privacy of individuals.
D
erived from the Latin Privatus, meaning ‘‘withdraw from public life,’’ the notion of privacy has been
foundational to the development of our diverse societies, forming the basis for individuals’ rights such as
free speech and religious freedom1
. Despite its importance, privacy has mainly relied on informal pro-
tection mechanisms. For instance, tracking individuals’ movements has been historically difficult, making them
de-facto private. For centuries, information technologies have challenged these informal protection mechanisms.
S:
CS
CS
CS
CE
ed
12
ed
13
ed
13
nd
als
to
mit.
du)
15. How accurate is CDR
location?
• “Deventer murder case”: under “exceptional”
atmospheric conditions, a cell phone uses a cell
tower 25 Km away, rather than close-by cell towers
Forensic Statistics and Graphical Models:
Deventer moordzaak, phonecall A28
Maikel Bargpeter
February 3, 2012
This analysis is mainly based on ’Leugens over Louwes’.
The main reason Louwes got involved in the Deventer moordzaak is that he
was the accountant of Mw. Wittenberg and called her on his mobile phone
right before the killing. According to Louwes he was on the highway A28, 25
km away from Deventer where the murder took place. So he claims that he is
not the killer.
claim it is very unlikely such a connection from the A28 could be mad
Unfortunately most of the research can not be integrated into the g
model at first sight.
The only way out is: the normal conditions which might be absent at
of the phonecall. Hans Meijer looked up reports at a institute in the U
find that around that time and place these special conditions did happe
atmosphere.
16. How accurate is CDR
location?
• Deventer murder case: under “exceptional”
atmospheric conditions, a cell phone uses a cell
tower 25 Km away, rather than close-by cell towers
event zijn de kansen dat dit matcht met de verdachte geschat op 0.60 net als voor ouders en ander
woonachtig in nabijheid van ouders, voor A op 0.25, M 0.40 en de ander niet woonachtig nabij ouderlijk
huis op 0.25.
6.1.11 Event 11
De vijf berichten van 9 oktober die dit event kenmerken, hebben binnen een drie kwartier plaats gevonden
tussen half twee en kwart over twee, waarvan drie keer de zendmast gelegen aan de Reinaert de Vosstraat
is aangestraald en de zendmasten gelegen aan de Hugo de Grootkade en Donker Curtiusstraat zijn beiden
eenmalig aangestraald.
De zendmasten blijken rondom de woning van M (paarse punt) te liggen, waarvan de meest aangestraalde
zendmast het verst weg is gesitueerd. Gegeven de locaties van de zendmasten is het meest aannemelijk
dat dit matcht het meest met M en is daarom ook geschat op 0.70. Voor alle andere is dit minder
aannemelijk maar niet onwaarschijnlijk is en daarom zijn de kansen van de anderen op 0.40 geschat.
6.1.12 Event 12
Dit event telt 20 berichten en is verspreidt over drie dagen. In de ochtend en de avond van de eerste dag
worden de zendmasten nabij het ouderlijk huis aangestraald. De daaropvolgende dag zijn de zendmasten
in Duivendrecht en Purmerend aangestraald. De gebruiker van de telefoon kan hier niet mee
geïdentificeerd worden, maar uit de berichten kan wel worden opgemaakt dat de dag erop een transactie
9
Donker Curtiusstraat, welke gelegen is nabij de woning van M, aangestraald. Gegeven dat ‘s ochtends de
telefoon aangestraald is nabij het ouderlijk huis en twee dagen later nabij het pand waar de verdachte een
week eerder een offerte voor een lening heeft ontvangen, is de kans dat hij de telefoon in zijn bezit heeft
geschat op 0.8. Voor zijn ouders is het minder aannemelijk dat zij bij de Diopter zijn wezen kijken en
daarna via Almere terug naar Amsterdam, is de kans dat zij de telefoon in hun bezit hebben geschat op
0.65. Voor K1 hebben we de kans geschat op 0.55. Dit event wijst niet direct naar A of M. Daarom
hebben we hun kansen op 0.25 geschat. Voor K2 is het nog lager, namelijk 0.20.
6.1.13 Event 13
Het enige bericht dat is verzonden is verstuurd in de nabijheid van Rijnstraat 35 in Amsterdam. Deze
aangestraalde zendmast ligt in de buurt van een doorlopende weg en is mogelijk in de richting van de
woning van broer A. Omdat dit niet heel nauwkeurig is, hebben we besloten dit bericht niet in de verdere
analyse mee te nemen.
6.1.14 Event 14
Dit event bevat vijf berichten. Bij één bericht is de locatie niet bekend. ‘s Ochtends is de telefoon
aangestraald nabij het ouderlijk huis. Twee uur later worden twee verschillende zendmasten aangestraald
in dezelfde minuut. Dit zijn de zendmasten Den Briel straat en de Donker Curtiusstraat te Amsterdam.
Een mogelijke verklaring is dat de gebruiker van de telefoon onderweg is vanaf de snelweg (A10) riching
de binnenstad van Amsterdam. Een andere verklaring zou kunnen zijn, dat de gebruiker van de telefoon
op dat moment boodschappen aan het doen zou zijn op de Centrale Markt, gelegen in het grijze gebied
tussen de locaties van de twee zendmasten in. Dit zou overeen kunnen komen met het profiel van M,
event 11 event 14
An Amsterdam drugs case – 2 of 19 events
blue = cell towers, purple = addresses associated with suspect
17. How accurate is CDR
location?
• Deventer murder case: under “exceptional”
atmospheric conditions, a cell phone uses a cell
tower 25 Km away, rather than close-by cell towers
RDG, 12 August 2012
Data: Google latitude; my trip: train
18. End of interlude. Now:
Our approach
• Part I: investigate reliability of search procedure
• Part II: quantify evidential value of specific pairs of
co-locating phones using permutation approach
19. Part I: the experiment
• Chose one target phone from case
• Identified all notable three-point patterns of
movement
• Identified all matches (“hits”) to each pattern
• Followed each hit forwards in time to first dis-
location event (“anomaly”)
20. Part I
• Measure mobility, and (phone) activity, of hit and of
target, in first four days
• Mobility: Km travelled
• Activity: number of calls
• Investigate relation between these four variables
and time to first anomaly for our sample of hits
21. Summary
• Dichotomise each of four variables (“high” vs “low”)
• Score each hit by number of highs (0 to 4)
23. Chance of anomaly per day
is roughly constant
• Very high: sum score 3 and 4: half life (of time to
anomaly) is one day
• Medium: sum score 2: half life is two days
• Very low: sum score 0 and 1: half life is four days
24. If we believe this, then ...
• no anomaly for 10 half lives: 1 in a thousand
• no anomaly for 20 half lives: 1 in a million
25. Conclusion of part 1
• The “chance of coincidence” depends strongly on individual
characteristics of particular two phones
• The investigative procedure is reliable
• first, identify suspects (pattern-hits which continue to
colocate a few days)
• second, confirm suspects (long term follow-up)
• … so we needn’t worry about Texas sharpshooter
(we’ll analyse long term follow-up data)
• We do have a major reference class problem
26. Part 2
• Take two co-locating phones: could this be coincidence?
• We need to compare the observed history of a pair of phones
with that of similar pairs of phones of different persons
• Especially: similar activity, similar mobility, frequenting the
same locations
• Assumption: if two persons are completely unrelated then we
may as well compare
Mr X Day A with Mr Y Day B, as
Mr X Day A with Mr Y Day A
27. Problems
• “Completely unrelated but similar” persons do live in
the same neighbourhood, work in the same
neighbourhood, frequent the same shops, cafés,
places of worship, beach clubs, sporting events, ...
• We should condition on confounders
(all days are not exchangeable)
• Problem of observational (as opposed to experimental)
studies: the unknown unknowns
28. Our solution
• Compare history of phone X with artificial histories like
phone Y ’s, obtained by permuting (shuffling)Y ’s days
• Shuffle weekdays and weekend-days separately
• Distance between two histories: total kilometers between
consecutive calls on same day of different phones
• Note: “artificial histories” need not be realistic in all
respects – they should just be realistic in relevant respects
30. Findings
• Discovered co-locations are statistically very significant
• In retrospect we could better have used a different
similarity measure, etc…
• We reported to the court exactly what we did do, and
all that we did do
31. Future research
• Invent better distance measure (model based LR?)
for higher power (note: not for validity)
• Should refine permutation procedure (shuffled histories
may be unrealistic when overnight location can vary)
• As we condition on more confounders, reference
population shrinks, prior probabilities change – relevant
evidence moves out of our analysis but is still relevant