The increasing abundance of data about the trajectories of
personal movement is opening up new opportunities for analyzing and mining human mobility, but new risks emerge
since it opens new ways of intruding into personal privacy.
Representing the personal movements as sequences of places
visited by a person during her/his movements - semantic
trajectory - poses even greater privacy threats w.r.t. raw
geometric location data. In this paper we propose a privacy model defining the attack model of semantic trajectory
linking, together with a privacy notion, called c-safety. This
method provides an upper bound to the probability of inferring that a given person, observed in a sequence of non-sensitive places, has also stopped in any sensitive location.
Coherently with the privacy model, we propose an algorithm
for transforming any dataset of semantic trajectories into a
c-safe one. We report a study on a real-life GPS trajectory dataset to show how our algorithm preserves interesting
quality/utility measures of the original trajectories, such as
sequential pattern mining results.
Preserving Privacy in Semantic-Rich Trajectories of Human Mobility
1. PRESERVING PRIVACY IN SEMANTIC-RICH TRAJECTORIES OF HUMAN MOBILITY Anna Monreale, Roberto Trasarti, Dino Pedreschi, Chiara Renso KDDLab, Pisa Vania Bogorny Univ. Santa Catarina, Brasile Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa) www-kdd.isti.cnr.it ANONIMO MEETING, Pisa, 20,21 settembre 2010 SPRINGL 2010, San Jose, November 2, 2010
2. How the story begins… Semantic trajectories represent the important places visited by people This information can be privacy sensitive! We should find a good generalization of the visited places… preserving semantics! But how? Can we use a taxonomy of places to generalize and find anonymous datasets? Let’s ask help to Anna, Dino and Roberto!
13. Example (1): The process Consider the following set of sequences, and m=3 and c=0.45: S = { <S1, R2, H1 , R1, C1 , S2> <S3, D1, R1, C1 , S2> <S1, P3, C2 , D2, S2> … }
14. Example (2) CostQ CostQ is the number of hops on the tree needed to generalize the sequences of Quasi-identifiers to a common one. Consider the group: <S1, R2, H1 , R1, C1 , S2> <S3, D1, R1, C1 , S2> <S1, P3, C2 , D2, S2> CostQ = 6 + 6 + 6 = 18 <Station,Place,Entertainment,S2 (H1,C1) > <Station,Place,Entertainment,S2 (C1) > <Station,Place,Entertainment,S2 (C2) >
15. Example (2) CostS CostS is the number of hops on the tree needed to generalize the sequence of Sensible in order to obtain the c-safety. From the generalized group: <Station,Place,Entertainment,S2 (H1,C1) > <Station,Place,Entertainment,S2 (C1) > <Station,Place,Entertainment,S2 (C2) > CostS = 3 The Total Cost of this group is 21 hops, which is the lower combination <Station, Place, H1 , Entertainment, Clinic , S2 > <Station, Place, Entertainment, Clinic , S2> <Station, Place, Clinic , Entertainment, S2>