SlideShare a Scribd company logo
1 of 22
Unobtrusive Data Leakage 
Detecting 
Presented By 
Shruti Meshram 
TP4F1314015 
Under the guidance of 
Prof. H. K. Chavan
Outline 
• Introduction 
• Problem Description 
• Guilt Model 
• Distribution Strategies 
2
Outline 
• Introduction 
• Problem Description 
• Guilt Models 
• Distribution Strategies 
3
Introduction 
 Data Leakage. 
 Data Leakage Detection. 
 Traditional ways of Data Leakage Detection. 
 Proposed System. 
4
Outline 
• Introduction 
• Problem Description 
• Guilt Model 
• Distribution Strategies 
5
Problem Entities 
Entity Dataset 
Distributor 
T 
Set of all valuable data 
Agents 
U1, …, Un 
R1, …, Rn 
Ri: Subset of records from T received by 
an agent Ui 
Leaker 
S 
Set of leaked data 
6
Agent’s Data Requests 
• Sample 
– Ri = SAMPLE(T, mi) i.e. Any subset of mi records 
from T can be given to Ui. 
• Explicit 
– Ri = EXPLICIT(T, conditioni) i.e. Ui receives all T 
records that satisfy some condition. 
7
Outline 
• Introduction 
• Problem Description 
• Guilt Model 
• Distribution Strategies 
8
Guilt Models (1/3) 
p: posterior probability that a leaked profile 
comes from other sources 
9 
p 
Other Sources 
e.g. Sarah’s 
Network 
p 
Guilty Agent: Agent who leaks at least one profile 
Pr{Gi|S}: probability that agent Ui is guilty, given 
the leaked set of profiles S
Guilt Models (2/3) 
10 
or 
or 
Agents leak each of their 
data items independently 
Agents leak all their data 
items OR nothing 
or 
p2 
p(1-p) 
(1-p)p 
(1-p)2
Guilt Models (3/3) 
Independently NOT Independently 
11 
Pr{G2} Pr{G2} 
Pr{G1} 
Pr{G1}
Outline 
• Introduction 
• Problem Description 
• Guilt Model 
• Distribution Strategies 
12
The Distributor’s Objective (1/2) 
S (leaked) 
13 
U1 
U2 
U3 
U4 
R1 
R1 
R3 
Pr{G1|S}>>Pr{G2|S} 
Pr{G1|S}>> Pr{G4|S} 
R2 
R3 
R4
The Distributor’s Objective (2/2) 
• To achieve his objective the distributor has to 
distribute sets Ri, …, Rn that 
minimize 
• Intuition: Minimized data sharing among 
agents makes leaked data reveal the guilty 
agents 
14 
R R i j n 
R i j i 
i j 
i 
, , 1,..., 
1 
    

Distribution Strategies – Sample (1/4) 
• Set T has four profiles: 
– Kiran, John, Sarah and Mark 
• There are 4 agents: 
– U1, U2, U3 and U4 
• Each agent requests a sample of any 2 profiles 
of T for a market survey 
15
Distribution Strategies – Sample (2/4) 
Poor  
Minimize R  
R 
i j i j 
16 
U1 
U2 
U3 
U4 
  
  
  
  
U1 
U2 
U3 
U4 
  
  
  
 
Distribution Strategies – Sample (3/4) 
• Optimal Distribution 
• Avoid full overlaps and minimize 
17 
U1 
U2 
U3 
U4 
  
  
  
  
1 
  
 
i j i 
i j 
i 
R R 
R
Distribution Strategies 
Sample Data Requests 
• The distributor has the freedom 
to select the data items to 
provide the agents with 
• General Idea: 
– Provide agents with as 
much disjoint sets of data as 
possible 
• Problem: There are cases where 
the distributed data must 
overlap E.g., |Ri|+…+|Rn|>|T| 
Explicit Data Requests 
• The distributor must provide 
agents with the data they request 
• General Idea: 
– Add fake data to the 
distributed ones to minimize 
overlap of distributed data 
• Problem: Agents can collude and 
identify fake data 
18
Conclusions 
• Modeled as maximum likelihood problem 
• Data distribution strategies that help identify 
the guilty agents 
19
References 
• [1] R. Agrawal and J. Kiernan, “Watermarking Relational Databases, ”Proc. 28th 
Int’l Conf. Very Large Data Bases (VLDB ’02), VLDB Endowment, pp. 155-166, 
2002. 
• [2] R. Sion, M. Atallah, and S. Prabhakar, “Rights Protection for Relational Data,” 
IEEE Trans. Knowledge And Data Engineering , vol. 16, no. 12, Dec. 2004. 
• [3] P. Buneman, S. Khanna, and W.C. Tan, “Why and Where: A Characterization of 
Data Provenance,” Proc. Eighth Int’l Conf. Database Theory (ICDT ’01), J.V. den 
Bussche and V. Vianu, eds.,pp. 316-330, Jan. 2001. 
• [4] P.Buneman and W.-C. Tan “Provenance in Databases,” Proc. ACM SIGMOD, 
pp. 1171-1173, 2007. 
• [5] Y.Cui and J. Widom, “Lineage Tracing for General Data Warehouse 
Transformations,” The VLDB J., vol. 12, pp. 41-58, 2003. 
• [6] S.Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and Audio 
Watermarking,” http://www.scientificcommons. org/43025658, 2007.
References 
• [7] Jen-Sheng, Win-Bin Huang,Chao-Lieh Chen, Yau-Hwang Kuo, “A Feature- 
Based Digital Image Watermarking For Copyright Protection and Content 
Authentication,” 1-4244-1437-7/07/$20.00 ,2007 IEEE ,v-469,ICIP 2007. 
• [8] F. Hartung and B. Girod, “Watermarking of Uncompressed and Compressed 
Video,” Signal Processing, vol. 66, no. 3, pp. 283-301,1998. 
• [9] Y. Li, V. Swarup, and S. Jajodia, “Fingerprinting Relational Databases: 
Schemes and Specialties,” IEEE Trans. Dependable and Secure Computing, vol. 2, 
no. 1, pp. 34-45, Jan.-Mar. 2005. 
• [10] S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, “Flexible 
Support for Multiple Access Control Policies,” ACM Trans. Database Systems, vol. 
26, no. 2, pp. 214-260, 2001. 
• [11] L. Sweeney, “Achieving K-Anonymity Privacy Protection Using 
Generalization and Suppression,” http://en.scientificcommons. org/43196131, 2002.
Thank You!

More Related Content

Viewers also liked

Digital image watermarking
Digital image watermarkingDigital image watermarking
Digital image watermarkingJasni Zain
 
Watermark
WatermarkWatermark
Watermarkravi33s
 
Digital Watermarking Of Medical(DICOM) Images
Digital Watermarking Of Medical(DICOM) ImagesDigital Watermarking Of Medical(DICOM) Images
Digital Watermarking Of Medical(DICOM) ImagesPrashant Singh
 
Digital watermarking
Digital watermarkingDigital watermarking
Digital watermarkingnafees321
 
Paper Explaination : A Survey of Digital Watermarking Techniques, Application...
Paper Explaination : A Survey of Digital Watermarking Techniques, Application...Paper Explaination : A Survey of Digital Watermarking Techniques, Application...
Paper Explaination : A Survey of Digital Watermarking Techniques, Application...Samarth Godara
 
digital watermarking
digital watermarkingdigital watermarking
digital watermarkingBharath
 
Digital Watermarking
Digital WatermarkingDigital Watermarking
Digital WatermarkingAnkush Kr
 
Digital watermarking
Digital watermarkingDigital watermarking
Digital watermarkingrupareliab14
 
Digital watermarking
Digital watermarkingDigital watermarking
Digital watermarkingAnkush Kr
 

Viewers also liked (11)

Digital image watermarking
Digital image watermarkingDigital image watermarking
Digital image watermarking
 
Watermark
WatermarkWatermark
Watermark
 
Digital Watermarking Of Medical(DICOM) Images
Digital Watermarking Of Medical(DICOM) ImagesDigital Watermarking Of Medical(DICOM) Images
Digital Watermarking Of Medical(DICOM) Images
 
Digital watermarking
Digital watermarkingDigital watermarking
Digital watermarking
 
Paper Explaination : A Survey of Digital Watermarking Techniques, Application...
Paper Explaination : A Survey of Digital Watermarking Techniques, Application...Paper Explaination : A Survey of Digital Watermarking Techniques, Application...
Paper Explaination : A Survey of Digital Watermarking Techniques, Application...
 
digital watermarking
digital watermarkingdigital watermarking
digital watermarking
 
Digital Watermarking
Digital WatermarkingDigital Watermarking
Digital Watermarking
 
Digitalwatermarking
DigitalwatermarkingDigitalwatermarking
Digitalwatermarking
 
Digital Watermarking
Digital WatermarkingDigital Watermarking
Digital Watermarking
 
Digital watermarking
Digital watermarkingDigital watermarking
Digital watermarking
 
Digital watermarking
Digital watermarkingDigital watermarking
Digital watermarking
 

Similar to Presentation1

9 - Papadimvxufuzvjutugxugsrgixuxitriou.ppt
9 - Papadimvxufuzvjutugxugsrgixuxitriou.ppt9 - Papadimvxufuzvjutugxugsrgixuxitriou.ppt
9 - Papadimvxufuzvjutugxugsrgixuxitriou.pptnaresh2004s
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Krishnaram Kenthapadi
 
IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesIDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesAnkurTiwari813070
 
Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentationPiet J.H. Daas
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statisticsEdwin de Jonge
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsYONG ZHENG
 
Descriptive analysis ( research methodology).pptx
Descriptive analysis ( research methodology).pptxDescriptive analysis ( research methodology).pptx
Descriptive analysis ( research methodology).pptxJothisJose1
 
Descobrindo o tesouro escondido nos seus dados usando grafos.
Descobrindo o tesouro escondido nos seus dados usando grafos.Descobrindo o tesouro escondido nos seus dados usando grafos.
Descobrindo o tesouro escondido nos seus dados usando grafos.Ana Appel
 
Stories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okStories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okJisc RDM
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Krishnaram Kenthapadi
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptSangrangBargayary3
 
Data sharing in the age of the Social Machine
Data sharing in the age of the Social MachineData sharing in the age of the Social Machine
Data sharing in the age of the Social MachineUlrik Lyngs
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
Toward Accurate Data Analysis under Local Privacy
Toward Accurate Data Analysis under Local PrivacyToward Accurate Data Analysis under Local Privacy
Toward Accurate Data Analysis under Local PrivacyTakao Murakami
 
Differential privacy (개인정보 차등보호)
Differential privacy (개인정보 차등보호)Differential privacy (개인정보 차등보호)
Differential privacy (개인정보 차등보호)Young-Geun Choi
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Kato Mivule
 

Similar to Presentation1 (20)

9 - Papadimvxufuzvjutugxugsrgixuxitriou.ppt
9 - Papadimvxufuzvjutugxugsrgixuxitriou.ppt9 - Papadimvxufuzvjutugxugsrgixuxitriou.ppt
9 - Papadimvxufuzvjutugxugsrgixuxitriou.ppt
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
 
Automating Homelessness
Automating HomelessnessAutomating Homelessness
Automating Homelessness
 
IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesIDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notes
 
Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentation
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 
Oracle openworld-presentation
Oracle openworld-presentationOracle openworld-presentation
Oracle openworld-presentation
 
Descriptive analysis ( research methodology).pptx
Descriptive analysis ( research methodology).pptxDescriptive analysis ( research methodology).pptx
Descriptive analysis ( research methodology).pptx
 
Descobrindo o tesouro escondido nos seus dados usando grafos.
Descobrindo o tesouro escondido nos seus dados usando grafos.Descobrindo o tesouro escondido nos seus dados usando grafos.
Descobrindo o tesouro escondido nos seus dados usando grafos.
 
Stories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okStories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) ok
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .ppt
 
Data sharing in the age of the Social Machine
Data sharing in the age of the Social MachineData sharing in the age of the Social Machine
Data sharing in the age of the Social Machine
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Data Diversity & Data Cultures = Flexible Open by Default Policy
Data Diversity & Data Cultures = Flexible Open by Default PolicyData Diversity & Data Cultures = Flexible Open by Default Policy
Data Diversity & Data Cultures = Flexible Open by Default Policy
 
Toward Accurate Data Analysis under Local Privacy
Toward Accurate Data Analysis under Local PrivacyToward Accurate Data Analysis under Local Privacy
Toward Accurate Data Analysis under Local Privacy
 
Differential privacy (개인정보 차등보호)
Differential privacy (개인정보 차등보호)Differential privacy (개인정보 차등보호)
Differential privacy (개인정보 차등보호)
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
 

Presentation1

  • 1. Unobtrusive Data Leakage Detecting Presented By Shruti Meshram TP4F1314015 Under the guidance of Prof. H. K. Chavan
  • 2. Outline • Introduction • Problem Description • Guilt Model • Distribution Strategies 2
  • 3. Outline • Introduction • Problem Description • Guilt Models • Distribution Strategies 3
  • 4. Introduction  Data Leakage.  Data Leakage Detection.  Traditional ways of Data Leakage Detection.  Proposed System. 4
  • 5. Outline • Introduction • Problem Description • Guilt Model • Distribution Strategies 5
  • 6. Problem Entities Entity Dataset Distributor T Set of all valuable data Agents U1, …, Un R1, …, Rn Ri: Subset of records from T received by an agent Ui Leaker S Set of leaked data 6
  • 7. Agent’s Data Requests • Sample – Ri = SAMPLE(T, mi) i.e. Any subset of mi records from T can be given to Ui. • Explicit – Ri = EXPLICIT(T, conditioni) i.e. Ui receives all T records that satisfy some condition. 7
  • 8. Outline • Introduction • Problem Description • Guilt Model • Distribution Strategies 8
  • 9. Guilt Models (1/3) p: posterior probability that a leaked profile comes from other sources 9 p Other Sources e.g. Sarah’s Network p Guilty Agent: Agent who leaks at least one profile Pr{Gi|S}: probability that agent Ui is guilty, given the leaked set of profiles S
  • 10. Guilt Models (2/3) 10 or or Agents leak each of their data items independently Agents leak all their data items OR nothing or p2 p(1-p) (1-p)p (1-p)2
  • 11. Guilt Models (3/3) Independently NOT Independently 11 Pr{G2} Pr{G2} Pr{G1} Pr{G1}
  • 12. Outline • Introduction • Problem Description • Guilt Model • Distribution Strategies 12
  • 13. The Distributor’s Objective (1/2) S (leaked) 13 U1 U2 U3 U4 R1 R1 R3 Pr{G1|S}>>Pr{G2|S} Pr{G1|S}>> Pr{G4|S} R2 R3 R4
  • 14. The Distributor’s Objective (2/2) • To achieve his objective the distributor has to distribute sets Ri, …, Rn that minimize • Intuition: Minimized data sharing among agents makes leaked data reveal the guilty agents 14 R R i j n R i j i i j i , , 1,..., 1     
  • 15. Distribution Strategies – Sample (1/4) • Set T has four profiles: – Kiran, John, Sarah and Mark • There are 4 agents: – U1, U2, U3 and U4 • Each agent requests a sample of any 2 profiles of T for a market survey 15
  • 16. Distribution Strategies – Sample (2/4) Poor  Minimize R  R i j i j 16 U1 U2 U3 U4         U1 U2 U3 U4        
  • 17. Distribution Strategies – Sample (3/4) • Optimal Distribution • Avoid full overlaps and minimize 17 U1 U2 U3 U4         1    i j i i j i R R R
  • 18. Distribution Strategies Sample Data Requests • The distributor has the freedom to select the data items to provide the agents with • General Idea: – Provide agents with as much disjoint sets of data as possible • Problem: There are cases where the distributed data must overlap E.g., |Ri|+…+|Rn|>|T| Explicit Data Requests • The distributor must provide agents with the data they request • General Idea: – Add fake data to the distributed ones to minimize overlap of distributed data • Problem: Agents can collude and identify fake data 18
  • 19. Conclusions • Modeled as maximum likelihood problem • Data distribution strategies that help identify the guilty agents 19
  • 20. References • [1] R. Agrawal and J. Kiernan, “Watermarking Relational Databases, ”Proc. 28th Int’l Conf. Very Large Data Bases (VLDB ’02), VLDB Endowment, pp. 155-166, 2002. • [2] R. Sion, M. Atallah, and S. Prabhakar, “Rights Protection for Relational Data,” IEEE Trans. Knowledge And Data Engineering , vol. 16, no. 12, Dec. 2004. • [3] P. Buneman, S. Khanna, and W.C. Tan, “Why and Where: A Characterization of Data Provenance,” Proc. Eighth Int’l Conf. Database Theory (ICDT ’01), J.V. den Bussche and V. Vianu, eds.,pp. 316-330, Jan. 2001. • [4] P.Buneman and W.-C. Tan “Provenance in Databases,” Proc. ACM SIGMOD, pp. 1171-1173, 2007. • [5] Y.Cui and J. Widom, “Lineage Tracing for General Data Warehouse Transformations,” The VLDB J., vol. 12, pp. 41-58, 2003. • [6] S.Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and Audio Watermarking,” http://www.scientificcommons. org/43025658, 2007.
  • 21. References • [7] Jen-Sheng, Win-Bin Huang,Chao-Lieh Chen, Yau-Hwang Kuo, “A Feature- Based Digital Image Watermarking For Copyright Protection and Content Authentication,” 1-4244-1437-7/07/$20.00 ,2007 IEEE ,v-469,ICIP 2007. • [8] F. Hartung and B. Girod, “Watermarking of Uncompressed and Compressed Video,” Signal Processing, vol. 66, no. 3, pp. 283-301,1998. • [9] Y. Li, V. Swarup, and S. Jajodia, “Fingerprinting Relational Databases: Schemes and Specialties,” IEEE Trans. Dependable and Secure Computing, vol. 2, no. 1, pp. 34-45, Jan.-Mar. 2005. • [10] S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, “Flexible Support for Multiple Access Control Policies,” ACM Trans. Database Systems, vol. 26, no. 2, pp. 214-260, 2001. • [11] L. Sweeney, “Achieving K-Anonymity Privacy Protection Using Generalization and Suppression,” http://en.scientificcommons. org/43196131, 2002.