Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

775 views

Published on

Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)
Literature Review Talk by Kato Mivule

Published in: Data & Analytics, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
775
On SlideShare
0
From Embeds
0
Number of Embeds
233
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

  1. 1. Bioinformatics Literature Review Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005) Literature Review by Kato Mivule COSC891 – Bioinformatics, Spring 2014 Bowie State University Reference: Bradley. A. Malin, "Protecting genomic sequence anonymity with generalization lattices.", Methods of information in medicine, Vol. 44, No. 5. (2005), pp. 687-692 Bowie State University Department of Computer Science Image Source: U.S. National Library of Medicine
  2. 2. Outline • The Problem • Methodology • Conclusion and Future work Bowie State University Department of Computer Science Bioinformatics Literature Review
  3. 3. The Problem • Transactions in DNA data poses serious privacy concerns. • DNA uniquely identifies an individual. • DNA data is prone to re-identification and inference attacks. Bowie State University Department of Computer Science Bioinformatics Literature Review
  4. 4. The Problem: Bowie State University Department of Computer Science Bioinformatics Literature Review Source: Forbes.com - April 25th 2013
  5. 5. Methodology • Apply k-Anonymity • Apply Generalization • Apply the concept of generalization lattice to determine the distance between two residues in a single nucleotide region, which offers the most similar generalized concept for two residues – for example adenine and guanine are both purines. • DNALA – using k-anonymity by granting that the DNA sequence of one individual will be similar to the DNA sequence of another individual. Bowie State University Department of Computer Science Bioinformatics Literature Review
  6. 6. Methodology • k-anonymity • K-anonymity uses both generalization and suppression to enforce confidentiality. • K-anonymity requires that for a data set with quasi-identifier attributes in a database to be published, values in the quasi-identifier attributes must be repeated at least k times to ensure privacy, with the value of k > 1. • Because of the generalization and suppression features, k-anonymity is applicable for DNA data privacy. Bowie State University Department of Computer Science Bioinformatics Literature Review
  7. 7. Methodology Generalization • Generalization is a data privacy method in which values in attributes that could cause identify disclosure are made less informative by being replaced with general values. • An example is replacing age values of people born between 1970 and 1979 to just 1970. • Generalization follows the Domain Generalization Hierarchy (DGH), which is different levels of generalization. For example we could use L1 =1970-09 and generalize to the month, L2 = 1970, generalize to the year, L3 = 197* generalize to the decade. Bowie State University Department of Computer Science Bioinformatics Literature Review
  8. 8. Methodology DNALA – DNA Lattice Anonymization • Employs k-anonymity for data privacy • The technique safeguards privacy by ensuring that the DNA sequence of one individual will be precisely the same as the sequence of one other individual in the published data set. • When an institution publishes DNA sequence data using DNALA technique, the uniqueness of every DNA sequence is assured to be inseparable from at least k – 1 other identities. Bowie State University Department of Computer Science Bioinformatics Literature Review
  9. 9. Methodology DNA Domain Generalization Hierarchy Bowie State University Department of Computer Science Bioinformatics Literature Review Image source: Malin, (2005)
  10. 10. Methodology DNA Domain Generalization Hierarchy Bowie State University Department of Computer Science Bioinformatics Literature Review Image source: Malin, (2005)
  11. 11. Methodology DNA Domain Generalization Hierarchy Bowie State University Department of Computer Science Bioinformatics Literature Review Image source: Malin, (2005)
  12. 12. Methodology DNALA Algorithm Bowie State University Department of Computer Science Bioinformatics Literature Review Image source: Malin, (2005)
  13. 13. Conclusion and Future works •DNA data privacy using k-anonymity is still promising. •Data utility remains a challenge as more DNA sequence info gets generalized. •How do other algorithms such as noise addition, and differential privacy apply? •Could we generate synthetic and or obfuscated DNA data with similar traits as the original? Bowie State University Department of Computer Science Bioinformatics Literature Review
  14. 14. References 1. Bradley. A. Malin, "Protecting genomic sequence anonymity with generalization lattices.", Methods of information in medicine, Vol. 44, No. 5. (2005), pp. 687-692 2. K. Mivule and C. Turner, “Applying Data Privacy Techniques on Published Data in Uganda,” in International Conference on e-Learning, e-Business, Enterprise Information Systems, and e- Government (EEE), 2012, pp. 110–115. 3. Adam Tanner, Forbes.com "Harvard Professor Re-Identifies Anonymous Volunteers In DNA Study", Forbes.com, 4/25/2013, Accessed: 02/10/2014, Available Online: http://www.forbes.com/sites/adamtanner/2013/04/25/harvard-professor-re-identifies-anonymous- volunteers-in-dna-study/ Bowie State University Department of Computer Science Bioinformatics Literature Review

×