Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards a privacy-preserving environment for genomic data analysis

336 views

Published on

A privacy-preserving environment for genomic data analysis is feasible; A privacy-preserving environment will help promote data sharing: not the opposite and a severe leak may reverse the public opinion trend; Determinants of human genetic individuality are an essential study for a privacy-preserving environment

IMM Computational Biology and Bioinformatics Seminars (CBBS), October 13, 2016

Published in: Science
  • Be the first to comment

  • Be the first to like this

Towards a privacy-preserving environment for genomic data analysis

  1. 1. Towards a privacy-preserving environment for genomic data analysis Francisco M. Couto LaSIGE, Faculdade de Ciências da Universidade de Lisboa October 13, 2016 IMM Computational Biology and Bioinformatics Seminars (CBBS)
  2. 2. WHY PRIVACY?
  3. 3. CAGTAATATACTTTCAACCTTTCGAAAGACTAAGCCAATATCGATTGTCACCAGCAGAAGCCGGCGCA ACACCATAGTGCTGAGCGTATCATCGGATGCATTTTTAGCACTGACGCTTGGGAATATTCTCCCCAAGA TTGCGTCTCGGGCTACAGACCCACAGTGTTAAAGCATACTCAATAGGCGAGCCTCTGTGAAGTGCTGG TGCAGCGAGAAAGAGCAGAGTTAGGAGCACTCAAGGGGACATGTTTCAACGCTTGGACTCTACCGTT ATGTGGGACGGACGGCGGGTTAAAAGGTGTAGTGATCCCGCACGGACCATGGTTCCCTTTGAAACTT ACCTCTTTCGTGGCGAGTTGCGCTCTTCTCATCCAGCAAGACTGCTTTAACTGCCACCCCCATCGCCTCT TGCAGGAAGATGTGACCTTTGATTTCGCGCGCCCGAATGAAGTGTATCTCGATCACCCAAAGGCACAT ACACCTTGGAGCCCTTGTGTAGGACCTACTCCTCGCTGCATCGCCTCTTCGGAAAAAACACCATATGGC GTTAGAGTTTGATTTGAGGCATATTGGCTCCTTGTGGGCGTCATCTGGGGTTAAGTTTCAATTCTGATG GTTGAGAGATGATGACCTAGGACTAGGGCACTACTAGTATCGGCAGGTAGGGGGGGAAGCACACGG AAACGCGCGATCCTTAAATGAGCAGTTGATCCAGAGAGTTCACCTCTTCAGTAGTTATCCCGTTGGTG GCCGTGCCGTTAAGACCTGATCATCCTGTTAGGGTTCCTGAACGCGTGGATCTGCAGCCAGTTCCGATC CCATCGCACGTGGGTTGGTCAACTACAGAACCCATAAGGGTACAGGCCAGGCTGTAACCCGAACATC AAAAATTCGTGGTGCTTGTGCTGAGATGCTTATTCGAAGCTTAGGGAACTGTTCTTGTAAAACCGAGA TCACCCCTTTATGACACGACCACTAGCTGCTGATGCGCGACACAATATGAGCTCGAAGCCGATACGCT CGTCCTATCATTGGTCTAAGTTCATTTGCTTGTTGGGCGATGTATTCGGTAGCGAGAGGCAATAACAA CTTAGCGGGTAGAGTCTGGTGCTACACGAAACATAAGGTGGATTACAGCTAGATCCATAATGGCGGT GCACACGTGATGAACTGGCGATTAAAGCTTACCATCAATTGTATACTACCTGCATAGCCGTCTCAGTCC CTCTCGACTAACTAGTATGGTCGCTAGTGGGTTCTCCTCCCGAATACACCTAAGCGATGAAAACCATCA ACATGGTTGCTGAGCGACAGACAATGTACGGGACCCCACTCGCTTGCGTTGATATGTGCGTCGTCGTA GTAAAGTTAGGTTTTCATCCTTCTGTTGCATCCAGGATAACAGTCAGACTTCTTGATGGCCTCGCATGA CGCGAACGTTGTAAAGGGACTACTACTCCAGAATAGTCCCCCCTGTCACTAGGAAGACATACAAGGAT TGGACTTCGGGGGTGCGGGTCAAGGATACTAGGTGGATTGGGTGGCGTCACCAATAGCACGAGCGA ACCAACAACTGCTGCTGGCCGCTGATGAAGAGGCTCCTTAATCTGCACACTGGTAAAATGCAACTTAT AAGGTGGATTAGAAACGTGCTCGCGTATAGTTAGCATAACAGTATGACTTCGGACACGCGGTCTGTA CTAGTCAATCAACGACCCAGCACGCGCTAGTATAGCCGATCCCACAAGCGGGTTCAAGCTTAGCTCCG AGTGGTCCACGAAATGTAGTATGTACTAGCGACTGTCCAATGACTGGGCGTCGAACGATCCTAATTAG TACGTATACTCTGTTGGATTACCAACTATCGGTGTGGTTAATTCTAAGAATGTACAAGCTAACCCAAAG TGAACTGCAAGATGCGGGGATTAAATTACTTATGAGCCCAGGGTGTGGAAGCAGAGCTCCCAACGTG TCAGACATACTGAATTTTCCGCCGCGAGACCTGGAGGAGTATGGGAGGTAACGCTAGGCTGCTTGAT AAACATGCGCGGCCAGTCACCTAGAAGGGTATCAAGTGGGATGTCAGCCAAGCAAACCACACCAGGA TATAAATCGCCAAAGGCAACAAACAAAGGTATCCACTACCAGGAGAGGCACAACTAGTGACTATGAA AGGTCCCTGGATTGAGCAGTTGAGATAGACGACCCCTATGCGTCAGCTGAGTAGGCTTGGCATGCGC CGGCGCGGGTTTGTTTCAGCACTTCTACCCTTTTCGTAATGGACAGAGGTTCAAAAAGTAACTGGCTG AAGGACTCCGCGGATCCCTTTTTATAGGGGCGCAAAAGGTGCAACGACTGTAAACAACTCTGAGAAT GAACCTTTAGGCTAGGTTCTTGACGACACCCGTGGAGATATGTCATACTAAGATATCATGGCCTTACA TAGCTAAGGGGACACGCATTAGTGACACATAGATGAGTGCTTACGCGTTTTCAATCTCAGGCTAGGCA GATCCTGTAGTCCTTCTGTCACGAGTCATCTCGCAGACCTGTACTACCCGAGAGGACTTTTCCGATCGG CTAGTCGGAGGCCTTCTTTTCACACAAAAGGTGCGTTAATCTCTCTGAGAAGGGTAACAGCGATTTAC CCACGAGCTAGTCGTTCAAGGAAGATGTATTCTCATGAAGTACGAATCAGCAAACTTAGCGCACATCC AGTTCAGGTGTGAGGATACAATTTCCTGCGTGGGCGACGTATTATATCCCTATAGGAAGTGCAAGTGC TGACAAATAGCAATGACGGTCAGTTGACTATCCGCCTAGGACCGACGATGTCAAGATGGTCGACACG ATATGGCTTCTACGTAGGTAACAGGGAGGAGCAAACCGTCAATCGGCTTGTAAACTAAGGCCTTTGC ATGGCTAACCTGCATACCAGTTCTACCTTTTACGTCCGACGAGACCCTACGTGGGCTATGTTTGTGCTC TAGAGTGCACCTACACGGTCCTTAGCCCTTCTACCCCCGTGTAAATTCTTGCGGCGGAGATGGCCCGGA AGCGTCACTAGATCGGCCGAATCTGGCGGGCGGACCCCGTATTGAGAGGCGTCTTTGCGCTAGAATCC GGCACGCCAGTGCGTTAAACTCGCTGATTACCGAGTTATCAGCAGGGGCATCTGTATAAACTCCTTCG CAGCCTCGTGAGCATACCACCGATTCGGTTGAGTATGTAAGAAGCATTTTCTTACTGATATGCGCTAGC CTTTATCGGTTCTGTTCATACGAGGTTATGCGTTCTTGATAGGTGACGTGTTCTAACGTGTACGCTTAA CGCTTGACAGTCTCTGTCAGCTACGAAAACGACGTTTCTTTCAATTAGTTGTGGGGACTAGTCGTGGAC TTTGGTGGCACGTTTCATCGGGGAACACTTGTCTTCTACTTGGGCTGTTCGGAAAAGCGCCTTGTGCTA GCCACTAAATCCTGACGACCGGTATTTCGCATAACACCGAAGGATTGGCAACGAGTTATTAGAAATAG TATAAAAAGCCTCGCGTACTTGGTTAGTGTAGAGTGTCCTCATTATGGGTTGCGGGACTCTGCCCAAG AAGGTGTATTGTTGCACTACCATCATTGGAGCCGCTCGCCCACGACGGGATTTCGAGAGCTAGGTGAA TTCGCTAGCCTCGCTGCGTATAAACGGAACTTAGAGGCGTATTAGCGATGACAGTCTTAAGACAGGCT TCTCAAATAATCTAAGCACTATACCTATGTATACCAGAATTGGCGAATAAGGAATATTAGACGTGGGA TCCCCCCGTCCGTGGGACCAAGTAATAGTCAACGCGGGTTTGTCTCCACAAAAACGGCACCAAACTCT TGCTAAGGTCGGTCGTCTGCGGATCTCGCTGTTTGGTCGCGGGTCCTAGGGCGAAGGGATAGCCATA GGCAAATGAGCGGCATCATCCACTAGCTCGACACACGCGCGTTAGACCCAACGCCACTTTTCCGATCA GAGACAACCAAGGTGGTGTATATGCACCTCTCCGCATAACTCAATCCAGAGCCGGCCTGGATGTTCTT GCTTGTGAGGCATTGAGCGTAGTTGCCGTGACCAAATGCTTTACCAAATCAGAACAAGATTCCCGGCA GCGTTCGGGCACGTTTTGCACATATCATCTTTCGGCCGACTGAGATGGTAGCCGGCGTGAACCCGAAA GAGTGTTAGATCGGTGATTCTAAGCGCCCCAATGTGTAAGGTAAACAGTAGCAACAAACCGATACGT CCATAGCGGGCCCATTCATCAAAAAGCGGTGCCATTTCAACTAAAGACTTCGCAATAAGAACCAGACC AAGAGGAACTATAGGCCAAGACCCGCCCCCTTGGCGGAAGGCGATGGGGAGCCCAGGGGAACATTG CGCGCCCGCTCTTATGCGGAGCATAACACACTTGTCCCGCTTGGCGCCTGCGGAAGGTTTCCCCGAAAC TCTGTGGCGGTGCATTTCGAGAAGAACAGAGATCACTTACAGGATATAGTGCGAGTACGCGGGTAGT CAGGTTTAGTTAGACTTTTAGTGACGCTTACAAATCCCAGTATCGTTTAATAGGCCAGAGTCCAGTAGT AGTGCCATACGTGGTAACAGAGCTTACCCATAAACGGCGGTATGTGGTTACGCTAGAGAAGTATAAG GTGGGAAGATGCTACGTACTGATACACGGATCTGATACCTTAAGCAGTTTAGGTCTAAGCACGCCAAC TGGGTTTGGTTTTTTAGTTGGGTTAAGCCTCACCTCTGAGCACATGTTGGATGATCCGTCATGGCAGGC GACAATCGCGTTCGTACCGCCCGGCCACGCCGAGGACGACAGCGCGCCGGCCGCGTGTTGTTACGAA CTTTATTTGCAACCATCACGCACTTAGAAAAGCTTTTTGACTTGGGCCTGTGCGGCACATTGAATATTA GTTACAACTCAAAATTCCCTGCGTTGACACTCCAGCAAGTACAAGCCGAGGGCCACGTCCGGCCCGCC CTGCAGGCATAGACACATATTTCTGCTGGCCGTCGACTGCCTCCGCGATCCGATAGGTTCAGATCCTCT CGCCCACGCGTGAAGTTGCCCGCGTAATCAAACGACAGCGAATTACTTACACGACCAAGGTTGCGAG GATTATAATAGGCAGAGCATCCAATCTCTCCTGTCCTCATGCGACCAGGCATTAACTCGGTCCGTGCTT CACTTCAAGTCGTGATCCGTTTAAGTCGGCATGCACTCTAACCAATCGTGGAGCAACAAGACTTGTACC GTTGACCCAGAAGCCTTGTCGAATCTGGTGTATCTGAGGATTCTTGTGGATTATCTTTGAAACACCGCG AAATCTTAAACCAAGGTCAATGTTGAGGTATGCTGAATGGACTGCATAAGCATAGGGCATGCCCGGT CGGTACCCTAACCACTAAAACACAAGTCAGACTCAGCCTAGATCGCTGCCCGGCCCAGGACTTGTCAG CGCTCACTATACGTAGCATTTAACGAAGCGTCAAGCCCTTACCTAACAGGTGGCGTCCCGGTTAAGTG GCAAAAGTAAAAACGCGGAAAATAGTACAAACTGGGTAACTATGTCAAGAGTATTCGGGTGACATTT GTCATTACATCCCTTTGGCTCCAGGCATCAGTGCGCCCCCGCGCCCCCTACAAAGAGCAGACACGTTTT AGTGAATGACAAATGACCAGCGCTCAGCGCATCTGGGTAGGATCTCTATGCTGCTGTCCCAGCGTTCA CCTTCCTACCACTCACGTATATGGCCCGGTATGTAGAGGAGTGTTGTTCGGTGAGTTGTGCGCGCATA ACTCGGTAACTTTTATATTCATTTGGTAGCTGGCTTCCCCAATATCAACATTTGCGTTAGGCTTGGTCTC CTCAGGGTGCGCAACGGGTACACTACGTGGAACGTTACTTCCAACACACAGTTAATAGTCTTCTCAGT ACGTCTTGCTCATTCATTCGCTGGAAAACTAATGGCTACATTATTCCAAGTCGCTATGAGACCCCGCCG CTCTGTGCTACGTTGGTCATGCTGAGTCAAGAAATTTTCGCGATAGCTTTATAAATTCTCCTGGCGACT TAGCCAGAGTAAAAGCCGGCATTCTACACTTGTAGGTAAACCCAACGACAGATTCGTTCGCCAGGTGC CTCCTACGTTCGCTTTATTGATTTCGTGCGAACCCCGTGAATCGATTTACATGCGGTTGTCTGTAGACCG CAGCCAAAACCGCAAGTAAGCGTCGTGCCCACTGGGAATGCTTTTCCAGTCCGTCGTGTGGATCGAAT AGGGCATGTAGTCCTATATAGCAACCTCCTTTACGTTAGCGACGGTAGCCCGAAAACTCCGTAGGTAC GCCGGCCGTCGGGCACAAATAAGGAAAGTACATGGTGGTTGCTTGCTTCCAGATGGTCTGATTGGCG CGTAACAGGTGGAGCCCATCCGAAGATTGATAGTCCTCAAGCGTGTAAGGGCTCCGTGTGTGATGCTT TAGAGTAATCTTCTTACCTATTCGTAAACCGGCCAAAGGCTCACTGCAGCAGTGACAATCATTGGTAA AGCCGGGTCTACTGCCCTCACAATGGTTTCAAATCATATATCCAGATCCCACATTGGGACGTCCGGGC GCGGTATCGCGAGCTTGGTGGCTCTCGCGTGTAGACTGCTATCGGGGGAAGTTCGTGAAAGCTCAAC TTCTGGAGGGAACGCCCGTTTATCTACACCGAGATACACTTAGATTAGGCAAGACAGAAAACCACAG AAGCACGGTCTCCCCTTATGGGCAGCAACCGTTATGCGCTCTGTACCCTCTATGCCGTTGCCAGCTACG GTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATTAGTGTCACATGGTTGCACCTCTCT GAGTCAGGATTGACAGATCCAATTTACCCGTTCTTTATGGGCAGCAACCGTTATGCGCTCTGTACCCTC TATGCCGTTGCCAGCTACGGTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATTATGC CGTTGCCAGCTACGGTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATTATGCCGTTG CCAGCTACGGTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATGCATGCATGCATTGG all or nothing privacy data sharing
  4. 4. CAGTAATATACTTTCAACCTTTCGAAAGACTAAGCCAATATCGATTGTCACCAGCAGAAGCCGGCGCAACACCATAGTGCTGAGCGTATCATCGGATGCATTTTTAGCACTGA CGCTTGGGAATATTCTCCCCAAGATTGCGTCTCGGGCTACAGACCCACAGTGTTAAAGCATACTCAATAGGCGAGCCTCTGTGAAGTGCTGGTGCAGCGAGAAAGAGCAGA GTTAGGAGCACTCAAGGGGACATGTTTCAACGCTTGGACTCTACCGTTATGTGGGACGGACGGCGGGTTAAAAGGTGTAGTGATCCCGCACGGACCATGGTTCCCTTTGAA ACTTACCTCTTTCGTGGCGAGTTGCGCTCTTCTCATCCAGCAAGACTGCTTTAACTGCCACCCCCATCGCCTCTTGCAGGAAGATGTGACCTTTGATTTCGCGCGCCCGAATGA AGTGTATCTCGATCACCCAAAGGCACATACACCTTGGAGCCCTTGTGTAGGACCTACTCCTCGCTGCATCGCCTCTTCGGAAAAAACACCATATGGCGTTAGAGTTTGATTTG AGGCATATTGGCTCCTTGTGGGCGTCATCTGGGGTTAAGTTTCAATTCTGATGGTTGAGAGATGATGACCTAGGACTAGGGCACTACTAGTATCGGCAGGTAGGGGGGGAA GCACACGGAAACGCGCGATCCTTAAATGAGCAGTTGATCCAGAGAGTTCACCTCTTCAGTAGTTATCCCGTTGGTGGCCGTGCCGTTAAGACCTGATCATCCTGTTAGGGTTC CTGAACGCGTGGATCTGCAGCCAGTTCCGATCCCATCGCACGTGGGTTGGTCAACTACAGAACCCATAAGGGTACAGGCCAGGCTGTAACCCGAACATCAAAAATTCGTGGT GCTTGTGCTGAGATGCTTATTCGAAGCTTAGGGAACTGTTCTTGTAAAACCGAGATCACCCCTTTATGACACGACCACTAGCTGCTGATGCGCGACACAATATGAGCTCGAA GCCGATACGCTCGTCCTATCATTGGTCTAAGTTCATTTGCTTGTTGGGCGATGTATTCGGTAGCGAGAGGCAATAACAACTTAGCGGGTAGAGTCTGGTGCTACACGAAACA TAAGGTGGATTACAGCTAGATCCATAATGGCGGTGCACACGTGATGAACTGGCGATTAAAGCTTACCATCAATTGTATACTACCTGCATAGCCGTCTCAGTCCCTCTCGACTA ACTAGTATGGTCGCTAGTGGGTTCTCCTCCCGAATACACCTAAGCGATGAAAACCATCAACATGGTTGCTGAGCGACAGACAATGTACGGGACCCCACTCGCTTGCGTTGAT ATGTGCGTCGTCGTAGTAAAGTTAGGTTTTCATCCTTCTGTTGCATCCAGGATAACAGTCAGACTTCTTGATGGCCTCGCATGACGCGAACGTTGTAAAGGGACTACTACTCC AGAATAGTCCCCCCTGTCACTAGGAAGACATACAAGGATTGGACTTCGGGGGTGCGGGTCAAGGATACTAGGTGGATTGGGTGGCGTCACCAATAGCACGAGCGAACCAA CAACTGCTGCTGGCCGCTGATGAAGAGGCTCCTTAATCTGCACACTGGTAAAATGCAACTTATAAGGTGGATTAGAAACGTGCTCGCGTATAGTTAGCATAACAGTATGACT TCGGACACGCGGTCTGTACTAGTCAATCAACGACCCAGCACGCGCTAGTATAGCCGATCCCACAAGCGGGTTCAAGCTTAGCTCCGAGTGGTCCACGAAATGTAGTATGTAC TAGCGACTGTCCAATGACTGGGCGTCGAACGATCCTAATTAGTACGTATACTCTGTTGGATTACCAACTATCGGTGTGGTTAATTCTAAGAATGTACAAGCTAACCCAAAGTG AACTGCAAGATGCGGGGATTAAATTACTTATGAGCCCAGGGTGTGGAAGCAGAGCTCCCAACGTGTCAGACATACTGAATTTTCCGCCGCGAGACCTGGAGGAGTATGGGA GGTAACGCTAGGCTGCTTGATAAACATGCGCGGCCAGTCACCTAGAAGGGTATCAAGTGGGATGTCAGCCAAGCAAACCACACCAGGATATAAATCGCCAAAGGCAACAA ACAAAGGTATCCACTACCAGGAGAGGCACAACTAGTGACTATGAAAGGTCCCTGGATTGAGCAGTTGAGATAGACGACCCCTATGCGTCAGCTGAGTAGGCTTGGCATGCG CCGGCGCGGGTTTGTTTCAGCACTTCTACCCTTTTCGTAATGGACAGAGGTTCAAAAAGTAACTGGCTGAAGGACTCCGCGGATCCCTTTTTATAGGGGCGCAAAAGGTGCA ACGACTGTAAACAACTCTGAGAATGAACCTTTAGGCTAGGTTCTTGACGACACCCGTGGAGATATGTCATACTAAGATATCATGGCCTTACATAGCTAAGGGGACACGCATT AGTGACACATAGATGAGTGCTTACGCGTTTTCAATCTCAGGCTAGGCAGATCCTGTAGTCCTTCTGTCACGAGTCATCTCGCAGACCTGTACTACCCGAGAGGACTTTTCCGA TCGGCTAGTCGGAGGCCTTCTTTTCACACAAAAGGTGCGTTAATCTCTCTGAGAAGGGTAACAGCGATTTACCCACGAGCTAGTCGTTCAAGGAAGATGTATTCTCATGAAG TACGAATCAGCAAACTTAGCGCACATCCAGTTCAGGTGTGAGGATACAATTTCCTGCGTGGGCGACGTATTATATCCCTATAGGAAGTGCAAGTGCTGACAAATAGCAATGA CGGTCAGTTGACTATCCGCCTAGGACCGACGATGTCAAGATGGTCGACACGATATGGCTTCTACGTAGGTAACAGGGAGGAGCAAACCGTCAATCGGCTTGTAAACTAAGG CCTTTGCATGGCTAACCTGCATACCAGTTCTACCTTTTACGTCCGACGAGACCCTACGTGGGCTATGTTTGTGCTCTAGAGTGCACCTACACGGTCCTTAGCCCTTCTACCCCCG TGTAAATTCTTGCGGCGGAGATGGCCCGGAAGCGTCACTAGATCGGCCGAATCTGGCGGGCGGACCCCGTATTGAGAGGCGTCTTTGCGCTAGAATCCGGCACGCCAGTGC GTTAAACTCGCTGATTACCGAGTTATCAGCAGGGGCATCTGTATAAACTCCTTCGCAGCCTCGTGAGCATACCACCGATTCGGTTGAGTATGTAAGAAGCATTTTCTTACTGA TATGCGCTAGCCTTTATCGGTTCTGTTCATACGAGGTTATGCGTTCTTGATAGGTGACGTGTTCTAACGTGTACGCTTAACGCTTGACAGTCTCTGTCAGCTACGAAAACGACG TTTCTTTCAATTAGTTGTGGGGACTAGTCGTGGACTTTGGTGGCACGTTTCATCGGGGAACACTTGTCTTCTACTTGGGCTGTTCGGAAAAGCGCCTTGTGCTAGCCACTAAA TCCTGACGACCGGTATTTCGCATAACACCGAAGGATTGGCAACGAGTTATTAGAAATAGTATAAAAAGCCTCGCGTACTTGGTTAGTGTAGAGTGTCCTCATTATGGGTTGC GGGACTCTGCCCAAGAAGGTGTATTGTTGCACTACCATCATTGGAGCCGCTCGCCC CTGTGCGCTGCACGGATACTTGGGCCCGAAATGAAGACAGGCTTCTCCGGTCTATGGGTAGTCTTTCATGACCATTCATGCCAGCTTTCTACTACTGCCCAACCGTATCGA GGCGTGCATAGCCGTAATCCAGCGTTCGCCAGCTGAGACGCGATTGATAGTTTTTCAGGTGTCGTTGTTCAATTCCAAAGCACAGAGCGATATGCCACGACGGGATTTC GAGAGCTAGGTGAATTCGCTAGCCTCGCTGCGTATAAACGGAACTTAGAGGCGTATTAGCGATGACAGTCTTAAGACAGGCTTCTCAAATAATCTAAGCACTATACCTA TGTATACCAGAATTGGCGAATAAGGAATATTAGACGTGGGATCCCCCCGTCCGTGGGACCAAGTAATAGTCAACGCGGGTTTGTCTCCACAAAAACGGCACCAAACTCT TGCTAAGGTCGGTCGTCTGCGGATCTCGCTGTTTGGTCGCGGGTCCTAGGGCGAAGGGATAGCCATAGGCAAATGAGCGGCATCATCCACTAGCTCGACACACGCGCGT TAGACCCAACGCCACTTTTCCGATCAGAGACAACCAAGGTGGTGTATATGCACCTCTCCGCATAACTCAATCCAGAGCCGGCCTGGATGTTCTTGCTTGTGAGGCATTGA GCGTAGTTGCCGTGACCAAATGCTTTACCAAATCAGAACAAGATTCCCGGCAGCGTTCGGGCACGTTTTGCACATATCATCTTTCGGCCGACTGAGATGGTAGCCGGCGT GAACCCGAAAGAGTGTTAGATCGGTGATTCTAAGCGCCCCAATGTGTAAGGTAAACAGTAGCAACAAACCGATACGTCCATAGCGGGCCCATTCATCAAAAAGCGGTG CCATTTCAACTAAAGACTTCGCAATAAGAACCAGACCAAGAGGAACTATAGGCCAAGACCCGCCCCCTTGGCGGAAGGCGATGGGGAGCCCAGGGGAACATTGCGCGC CCGCTCTTATGCGGAGCATAACACACTTGTCCCGCTTGGCGCCTGCGGAAGGTTTCCCCGAAACTCTGTGGCGGTGCATTTCGAGAAGAACAGAGATCACTTACAGGAT ATAGTGCGAGTACGCGGGTAGTCAGGTTTAGTTAGACTTTTAGTGACGCTTACAAATCCCAGTATCGTTTAATAGGCCAGAGTCCAGTAGTAGTGCCATACGTGGTAAC AGAGCTTACCCATAAACGGCGGTATGTGGTTACGCTAGAGAAGTATAAGGTGGGAAGATGCTACGTACTGATACACGGATCTGATACCTTAAGCAGTTTAGGTCTAAG CACGCCAACTGGGTTTGGTTTTTTAGTTGGGTTAAGCCTCACCTCTGAGCACATGTTGGATGATCCGTCATGGCAGGCGACAATCGCGTTCGTACCGCCCGGCCACGCCG AGGACGACAGCGCGCCGGCCGCGTGTTGTTACGAACTTTATTTGCAACCATCACGCACTTAGAAAAGCTTTTTGACTTGGGCCTGTGCGGCACATTGAATATTAGTTACA ACTCAAAATTCCCTGCGTTGACACTCCAGCAAGTACAAGCCGAGGGCCACGTCCGGCCCGCCCTGCAGGCATAGACACATATTTCTGCTGGCCGTCGACTGCCTCCGCGA TCCGATAGGTTCAGATCCTCTCGCCCACGCGTGAAGTTGCCCGCGTAATCAAACGACAGCGAATTACTTACACGACCAAGGTTGCGAGGATTATAATAGGCAGAGCATC CAATCTCTCCTGTCCTCATGCGACCAGGCATTAACTCGGTCCGTGCTTCACTTCAAGTCGTGATCCGTTTAAGTCGGCATGCACTCTAACCAATCGTGGAGCAACAAGACT TGTACCGTTGACCCAGAAGCCTTGTCGAATCTGGTGTATCTGAGGATTCTTGTGGATTATCTTTGAAACACCGCGAAATCTTAAACCAAGGTCAATGTTGAGGTATGCTG AATGGACTGCATAAGCATAGGGCATGCCCGGTCGGTACCCTAACCACTAAAACACAAGTCAGACTCAGCCTAGATCGCTGCCCGGCCCAGGACTTGTCAGCGCTCACTA TACGTAGCATTTAACGAAGCGTCAAGCCCTTACCTAACAGGTGGCGTCCCGGTTAAGTGGCAAAAGTAAAAACGCGGAAAATAGTACAAACTGGGTAACTATGTCAAG AGTATTCGGGTGACATTTGTCATTACATCCCTTTGGCTCCAGGCATCAGTGCGCCCCCGCGCCCCCTACAAAGAGCAGACACGTTTTAGTGAATGACAAATGACCAGCGC TCAGCGCATCTGGGTAGGATCTCTATGCTGCTGTCCCAGCGTTCACCTTCCTACCACTCACGTATATGGCCCGGTATGTAGAGGAGTGTTGTTCGGTGAGTTGTGCGCGC ATAACTCGGTAACTTTTATATTCATTTGGTAGCTGGCTTCCCCAATATCAACATTTGCGTTAGGCTTGGTCTCCTCAGGGTGCGCAACGGGTACACTACGTGGAACGTTAC TTCCAACACACAGTTAATAGTCTTCTCAGTACGTCTTGCTCATTCATTCGCTGGAAAACTAATGGCTACATTATTCCAAGTCGCTATGAGACCCCGCCGCTCTGTGCTACGT TGGTCATGCTGAGTCAAGAAATTTTCGCGATAGCTTTATAAATTCTCCTGGCGACTTAGCCAGAGTAAAAGCCGGCATTCTACACTTGTAGGTAAACCCAACGACAGATT CGTTCGCCAGGTGCCTCCTACGTTCGCTTTATTGATTTCGTGCGAACCCCGTGAATCGATTTACATGCGGTTGTCTGTAGACCGCAGCCAAAACCGCAAGTAAGCGTCGT GCCCACTGGGAATGCTTTTCCAGTCCGTCGTGTGGATCGAATAGGGCATGTAGTCCTATATAGCAACCTCCTTTACGTTAGCGACGGTAGCCCGAAAACTCCGTAGGTAC GCCGGCCGTCGGGCACAAATAAGGAAAGTACATGGTGGTTGCTTGCTTCCAGATGGTCTGATTGGCGCGTAACAGGTGGAGCCCATCCGAAGATTGATAGTCCTCAAG CGTGTAAGGGCTCCGTGTGTGATGCTTTAGAGTAATCTTCTTACCTATTCGTAAACCGGCCAAAGGCTCACTGCAGCAGTGACAATCATTGGTAAAGCCGGGTCTACTGC CCTCACAATGGTTTCAAATCATATATCCAGATCCCACATTGGGACGTCCGGGCGCGGTATCGCGAGCTTGGTGGCTCTCGCGTGTAGACTGCTATCGGGGGAAGTTCGTG AAAGCTCAACTTCTGGAGGGAACGCCCGTTTATCTACACCGAGATACACTTAGATTAGGCAAGACAGAAAACCACAGAAGCACGGTCTCCCCTTATGGGCAGCAACCGT TATGCGCTCTGTACCCTCTATGCCGTTGCCAGCTACGGTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATTAGTGTCACATGGTTGCACCTCTCTGAGTC AGGATTGACAGATCCAATTTACCCGTTCT hybrid
  5. 5. GTCTGTACTAGTCAATCAACGACCCAGCACGCGCTAGTATAGCCGATCCCACAAGCGGGTTCAAGCTTAGCTCCGAGTGGTCCACGAAATGTAGTATGTACTAGCGACTGTCCAATGACTGGGCGTCGAACGATCCTAATTAGTACGTATACTC TGTTGGATTACCAACTATCGGTGTGGTTAATTCTAAGAATGTACAAGCTAACCCAAAGTGAACTGCAAGATGCGGGGATTAAATTACTTATGAGCCCAGGGTGTGGAAGCAGAGCTCCCAACGTGTCAGACATACTGAATTTTCCGCCGCGAG ACCTGGAGGAGTATGGGAGGTAACGCTAGGCTGCTTGATAAACATGCGCGGCCAGTCACCTAGAAGGGTATCAAGTGGGATGTCAGCCAAGCAAACCACACCAGGATATAAATCGCCAAAGGCAACAAACAAAGGTATCCACTACCAGGAG AGGCACAACTAGTGACTATGAAAGGTCCCTGGATTGAGCAGTTGAGATAGACGACCCCTATGCGTCAGCTGAGTAGGCTTGGCATGCGCCGGCGCGGGTTTGTTTCAGCACTTCTACCCTTTTCGTAATGGACAGAGGTTCAAAAAGTAACTG GCTGAAGGACTCCGCGGATCCCTTTTTATAGGGGCGCAAAAGGTGCAACGACTGTAAACAACTCTGAGAATGAACCTTTAGGCTAGGTTCTTGACGACACCCGTGGAGATATGTCATACTAAGATATCATGGCCTTACATAGCTAAGGGGACA CGCATTAGTGACACATAGATGAGTGCTTACGCGTTTTCAATCTCAGGCTAGGCAGATCCTGTAGTCCTTCTGTCACGAGTCATCTCGCAGACCTGTACTACCCGAGAGGACTTTTCCGATCGGCTAGTCGGAGGCCTTCTTTTCACACAAAAGGT GCGTTAATCTCTCTGAGAAGGGTAACAGCGATTTACCCACGAGCTAGTCGTTCAAGGAAGATGTATTCTCATGAAGTACGAATCAGCAAACTTAGCGCACATCCAGTTCAGGTGTGAGGATACAATTTCCTGCGTGGGCGACGTATTATATCCC TATAGGAAGTGCAAGTGCTGACAAATAGCAATGACGGTCAGTTGACTATCCGCCTAGGACCGACGATGTCAAGATGGTCGACACGATATGGCTTCTACGTAGGTAACAGGGAGGAGCAAACCGTCAATCGGCTTGTAAACTAAGGCCTTTGC ATGGCTAACCTGCATACCAGTTCTACCTTTTACGTCCGACGAGACCCTACGTGGGCTATGTTTGTGCTCTAGAGTGCACCTACACGGTCCTTAGCCCTTCTACCCCCGTGTAAATTCTTGCGGCGGAGATGGCCCGGAAGCGTCACTAGATCGGCC GAATCTGGCGGGCGGACCCCGTATTGAGAGGCGTCTTTGCGCTAGAATCCGGCACGCCAGTGCGTTAAACTCGCTGATTACCGAGTTATCAGCAGGGGCATCTGTATAAACTCCTTCGCAGCCTCGTGAGCATACCACCGATTCGGTTGAGTAT GTAAGAAGCATTTTCTTACTGATATGCGCTAGCCTTTATCGGTTCTGTTCATACGAGGTTATGCGTTCTTGATAGGTGACGTGTTCTAACGTGTACGCTTAACGCTTGACAGTCTCTGTCAGCTACGAAAACGACGTTTCTTTCAATTAGTTGTGG GGACTAGTCGTGGACTTTGGTGGCACGTTTCATCGGGGAACACTTGTCTTCTACTTGGGCTGTTCGGAAAAGCGCCTTGTGCTAGCCACTAAATCCTGACGACCGGTATTTCGCATAACACCGAAGGATTGGCAACGAGTTATTAGAAATAGTA TAAAAAGCCTCGCGTACTTGGTTAGTGTAGAGTGTCCTCATTATGGGTTGCGGGACTCTGCCCAAGAAGGTGTATTGTTGCACTACCATCATTGGAGCCGCTCGCCCAAAGGCGCGGTTAGGTAGACGGATGCGTCAAGCAATAGTCAAGTCC ACTGACATGACGGTAGAGCTCGTGACCTCAACTAGCCTGTGCGCTGCACGGATACTTGGGCCCGAAATGAAGACAGGCTTCTCCGGTCTATGGGTAGTCTTTCATGACCATTCATGCCAGCTTTCTACTACTGCCCAACCGTATCGAGGCGTGCA TAGCCGTAATCCAGCGTTCGCCAGCTGAGACGCGATTGATAGTTTTTCAGGTGTCGTTGTTCAATTCCAAAGCACAGAGCGATATGCCACGACGGGATTTCGAGAGCTAGGTGAATTCGCTAGCCTCGCTGCGTATAAACGGAACTTAGAGGC GTATTAGCGATGACAGTCTTAAGACAGGCTTCTCAAATAATCTAAGCACTATACCTATGTATACCAGAATTGGCGAATAAGGAATATTAGACGTGGGATCCCCCCGTCCGTGGGACCAAGTAATAGTCAACGCGGGTTTGTCTCCACAAAAAC GGCACCAAACTCTTGCTAAGGTCGGTCGTCTGCGGATCTCGCTGTTTGGTCGCGGGTCCTAGGGCGAAGGGATAGCCATAGGCAAATGAGCGGCATCATCCACTAGCTCGACACACGCGCGTTAGACCCAACGCCACTTTTCCGATCAGAGAC AACCAAGGTGGTGTATATGCACCTCTCCGCATAACTCAATCCAGAGCCGGCCTGGATGTTCTTGCTTGTGAGGCATTGAGCGTAGTTGCCGTGACCAAATGCTTTACCAAATCAGAACAAGATTCCCGGCAGCGTTCGGGCACGTTTTGCACAT ATCATCTTTCGGCCGACTGAGATGGTAGCCGGCGTGAACCCGAAAGAGTGTTAGATCGGTGATTCTAAGCGCCCCAATGTGTAAGGTAAACAGTAGCAACAAACCGATACGTCCATAGCGGGCCCATTCATCAAAAAGCGGTGCCATTTCAAC TAAAGACTTCGCAATAAGAACCAGACCAAGAGGAACTATAGGCCAAGACCCGCCCCCTTGGCGGAAGGCGATGGGGAGCCCAGGGGAACATTGCGCGCCCGCTCTTATGCGGAGCATAACACACTTGTCCCGCTTGGCGCCTGCGGAAGGTT TCCCCGAAACTCTGTGGCGGTGCATTTCGAGAAGAACAGAGATCACTTACAGGATATAGTGCGAGTACGCGGGTAGTCAGGTTTAGTTAGACTTTTAGTGACGCTTACAAATCCCAGTATCGTTTAATAGGCCAGAGTCCAGTAGTAGTGCCAT ACGTGGTAACAGAGCTTACCCATAAACGGCGGTATGTGGTTACGCTAGAGAAGTATAAGGTGGGAAGATGCTACGTACTGATACACGGATCTGATACCTTAAGCAGTTTAGGTCTAAGCACGCCAACTGGGTTTGGTTTTTTAGTTGGGTTAA GCCTCACCTCTGAGCACATGTTGGATGATCCGTCATGGCAGGCGACAATCGCGTTCGTACCGCCCGGCCACGCCGAGGACGACAGCGCGCCGGCCGCGTGTTGTTACGAACTTTATTTGCAACCATCACGCACTTAGAAAAGCTTTTTGACTTG GGCCTGTGCGGCACATTGAATATTAGTTACAACTCAAAATTCCCTGCGTTGACACTCCAGCAAGTACAAGCCGAGGGCCACGTCCGGCCCGCCCTGCAGGCATAGACACATATTTCTGCTGGCCGTCGACTGCCTCCGCGATCCGATAGGTTCA GATCCTCTCGCCCACGCGTGAAGTTGCCCGCGTAATCAAACGACAGCGAATTACTTACACGACCAAGGTTGCGAGGATTATAATAGGCAGAGCATCCAATCTCTCCTGTCCTCATGCGACCAGGCATTAACTCGGTCCGTGCTTCACTTCAAGTC GTGATCCGTTTAAGTCGGCATGCACTCTAACCAATCGTGGAGCAACAAGACTTGTACCGTTGACCCAGAAGCCTTGTCGAATCTGGTGTATCTGAGGATTCTTGTGGATTATCTTTGAAACACCGCGAAATCTTAAACCAAGGTCAATGTTGAG GTATGCTGAATGGACTGCATAAGCATAGGGCATGCCCGGTCGGTACCCTAACCACTAAAACACAAGTCAGACTCAGCCTAGATCGCTGCCCGGCCCAGGACTTGTCAGCGCTCACTATACGTAGCATTTAACGAAGCGTCAAGCCCTTACCTAA CAGGTGGCGTCCCGGTTAAGTGGCAAAAGTAAAAACGCGGAAAATAGTACAAACTGGGTAACTATGTCAAGAGTATTCGGGTGACATTTGTCATTACATCCCTTTGGCTCCAGGCATCAGTGCGCCCCCGCGCCCCCTACAAAGAGCAGACAC GTTTTAGTGAATGACAAATGACCAGCGCTCAGCGCATCTGGGTAGGATCTCTATGCTGCTGTCCCAGCGTTCACCTTCCTACCACTCACGTATATGGCCCGGTATGTAGAGGAGTGTTGTTCGGTGAGTTGTGCGCGCATAACTCGGTAACTTTT ATATTCATTTGGTAGCTGGCTTCCCCAATATCAACATTTGCGTTAGGCTTGGTCTCCTCAGGGTGCGCAACGGGTACACTACGTGGAACGTTACTTCCAACACACAGTTAATAGTCTTCTCAGTACGTCTTGCTCATTCATTCGCTGGAAAACTAA TGGCTACATTATTCCAAGTCGCTATGAGACCCCGCCGCTCTGTGCTACGTTGGTCATGCTGAGTCAAGAAATTTTCGCGATAGCTTTATAAATTCTCCTGGCGACTTAGCCAGAGTAAAAGCCGGCATTCTACACTTGTAGGTAAACCCAACGAC AGATTCGTTCGCCAGGTGCCTCCTACGTTCGCTTTATTGATTTCGTGCGAACCCCGTGAATCGATTTACATGCGGTTGTCTGTAGACCGCAGCCAAAACCGCAAGTAAGCGTCGTGCCCACTGGGAATGCTTTTCCAGTCCGTCGTGTGGATCGA ATAGGGCATGTAGTCCTATATAGCAACCTCCTTTACGTTAGCGACGGTAGCCCGAAAACTCCGTAGGTACGCCGGCCGTCGGGCACAAATAAGGAAAGTACATGGTGGTTGCTTGCTTCCAGATGGTCTGATTGGCGCGTAACAGGTGGAGCC CATCCGAAGATTGATAGTCCTCAAGCGTGTAAGGGCTCCGTGTGTGATGCTTTAGAGTAATCTTCTTACCTATTCGTAAACCGGCCAAAGGCTCACTGCAGCAGTGACAATCATTGGTAAAGCCGGGTCTACTGCCCTCACAATGGTTTCAAATC ATATATCCAGATCCCACATTGGGACGTCCGGGCGCGGTATCGCGAGCTTGGTGGCTCTCGCGTGTAGACTGCTATCGGGGGAAGTTCGTGAAAGCTCAACTTCTGGAGGGAACGCCCGTTTATCTACACCGAGATACACTTAGATTAGGCAAG ACAGAAAACCACAGAAGCACGGTCTCCCCTTATGGGCAGCAACCGTTATGCGCTCTGTACCCTCTATGCCGTTGCCAGCTACGGTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATTAGTGTCACATGGTTGCACCTCTCTGAG Are all portions equally important for privacy?
  6. 6. Does it affect others? • DNA is transmitted from parents to children • One leak may compromise a large group of people (relatives)
  7. 7. How to detect the privacy sensitive parts? • the detection should identify small sensitive elements – using databases of known patterns • Challenge: – constructing a comprehensive knowledge database
  8. 8. What is the impact on sharing data? • sharing certain portions of data is more attractive than sharing nothing • privacy-sensitive portions may still be shared in a controlled way – e.g., using the cryptographic methods
  9. 9. Why sharing? • The highest value of genomes is achieved only when sharing them with others • sharing each individual genome may have little impact – but sharing many of them may have a huge impact
  10. 10. But why sharing non-privacy sensitive sequences? • Is there any scientifically value there? – Researchers want the privacy sensitive ones • But are they really non-privacy sensitive sequences? – Maybe, currently we do not know – Some of them will become privacy sensitive in the future • According to new discoveries – Sharing them may speed up this process
  11. 11. BUT ISN'T THIS DATA DE-IDENTIFIED?
  12. 12. Privacy attacks Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic privacy." Nature Reviews Genetics 15.6 (2014): 409-421.
  13. 13. Identity Tracing attacks • Goal: – uniquely identify the data donor – despite data de-identifying techniques • absence of explicit identifiers such as the name and exact address • Method – accumulate quasi-identifiers • additional metadata, such as basic demographic details, inclusion/exclusion criteria, pedigree structure, and health conditions – gradually narrow down the possible individuals
  14. 14. A possible route for identity tracing 228 - 268M individuals 23 - 8 individuals
  15. 15. 1000 Genomes project attack • Queried the Y-STR profiles in – YSearch and SMGF – correct surname in 12% of cases • with 82% of confidence • Triangulating identities – combined the obtained surnames with age and state – U.S. census • 131 out of the 1,092 participants – will never recover their privacy Gymrek, Melissa, et al. "Identifying personal genomes by surname inference." Science 339.6117 (2013): 321-324.
  16. 16. PRIVACY-PRESERVING ENVIRONMENT
  17. 17. Main players • Sample donors – donate biological material • Sample managers – receive, manipulate, sequence, store, and provide biological material and the results • Researchers – Consumers of data • Auditors – verify who accessed specific datasets
  18. 18. Donors • Inform their preferences on data sharing – free to customize • Blanket consent – participate in projects related to specific topics • Opt-in or opt-out – specific projects they sympathize with (or not) • May delegate to Sample Managers • Donor dies – relatives gain the ability to explicitly customize them
  19. 19. Researchers • register themselves in the system • propose a project • If approved by the Sample Manager – Can use authorized privacy-sensitive portions • The value of sequenced data is kept intact to authorized researchers
  20. 20. Privacy-preserving environment for genomic data analysis
  21. 21. KNOWLEDGE DATABASE
  22. 22. Attacks References Re-identification (few hundred SNPs are enough) Lin [23] Acquire knowledge about targets from GWAS results Wang [34] Acquire knowledge about targets from microarray results Homer [18] Infer masked genes (e.g., the APOE gene from Dr. Watson [35]) Nyholt [28] Genomic variations Attacks References Use STR profiles to identify donors of 1000 Genomes Project Gymrek [16] Forensic identification Butler [6] Short tandem repeats Attacks References Direct-to-consumer genomic testing Goldsmith [13] Masking the APOE gene (related to Alzheimer) from Dr. Watson’s genome Wheeler [35] Disease-related genes Successful attacks using public data Cogo, Vinicius V., et al. "A high-throughput method to detect privacy-sensitive human genomic data." Proceedings of the 14th ACM Workshop on Privacy in the Electronic Society. ACM, 2015.
  23. 23. STR n DYS392 4 DYS396 23 … … DYS 618 17 • Small strings repeated several times • Individual profile: DYS392 = [TAT]n cgac TAT TAT TAT TAT cgca n=4 Short tandem repeats (STR)
  24. 24. Disease-related genes
  25. 25. Genomic variations
  26. 26. STRs Genes Variations Total Databases TRDB GeneCards 1000 Genomes Project - Number of entries 240k 20k 38M 38.3M DB sequences 22M 8.7M 1147M 1178M DB size 660MB 87MB 34.4GB 35.1GB Note: Any other database can be used with our solution Retrieving the sequences
  27. 27. • Bloom filter • Efficient data structure (space and performance) • Test if an element is member of a set • Does a specific value belong to the set? • No means No (no false negatives) • Yes means Maybe (configurable false positives) • False positive affects efficiency only (not efficacy) Efficient query system
  28. 28. https://github.com/vvcogo/dna-privacy-detector Open source-code
  29. 29. EVALUATION
  30. 30. %? How much of a human genome is considered privacy-sensitive?
  31. 31. 0% 50% 100% 10% 25% 75% PercentageofSensitiveReads 88.7% non-sensitive 0% 12% 11% 5% Y-STR (0.16%) All-Gene (0.33%) All-STR (1.2%) All-SNP (10.6%) All-Together (11.3%) 10% 1% 11.3% sensitive Percentage of privacy-sensitive sequences
  32. 32. How big is the Bloom filter?
  33. 33. Bloom filter size (Max 6GB)
  34. 34. Is the detector a bottleneck?
  35. 35. 35/52 Throughput – Single core
  36. 36. NGS machines = 300 000bp/s 200x 44x 60M bp/s 13M bp/s NGS = 0.3M bp/s
  37. 37. Throughput – Multi-core
  38. 38. NGS machines = 300 000bp/s 1600x 200x 480M bp/s 66M bp/s NGS = 0.3M bp/s
  39. 39. What if a novel sequence is discovered?
  40. 40. • Novel STRs: o In 11 years (2003-2014) TRDB registered 1k novel STRs (0.42% growth) o Useless for attackers until present in STRs databases • Novel genes: o Do not determine alone the contraction of a disease o May have no relation with any disease o Novel discoveries correlate diseases with known genes (limited number) • Novel genomic variations: o No variation determines alone the identity or contraction of a disease o covered by increasing population samples in allele frequency studies Completeness of the method
  41. 41. OPEN PROBLEMS
  42. 42. Different levels of privacy • Rare STRs and genomic modifications – higher the likelihood of re-identification • How to build a discrete filtering of sensitive reads – With multiple severity levels
  43. 43. Human genetic individuality • How to define an individuality measure as a function that, – given a genome and a population, – returns a numerical value reflecting – their diversity in terms of human genetics • Found an identical genome in the population – individuality = 0 • No privacy-sensitive sequence of the population found in the given genome – individuality = 1
  44. 44. What are the determinants of human genetic individuality? • The complexity of human behaviour is enormous • So the determinants of human genetic individuality – may be hard to predict from single genomic properties • Follow a systems biology approach to reach for a deeper understanding of the complexity of the human genome by analysing what defines us as individuals
  45. 45. PhD scholarship available • PhD scholarship for the project: – "What are the determinants of human genetic individuality?" • Under the supervision of – Prof. Francisco Couto (LASIGE) – and Prof. Margarida Gama-Carvalho (BioISI) • Applications until October 21st (12PM, CET) – How to apply: http://biosys.campus.ciencias.ulisboa.pt/node/6
  46. 46. Sharing data? • “Adherence to data-sharing policies is as inconsistent as the policies themselves” “351 papers covered by some data-sharing policy, only 143 fully adhered to that policy” (~40%) Corbyn, Zoë. "Researchers Failing to Make Raw Data Public." 2012-03- 30]. http://www. nature. com/news/2011/110914/full/news2011. 536. html (2011). • “More often than scientists would like to admit, they cannot even recover the data associated with their own published works” Goodman, Alyssa, et al. "Ten simple rules for the care and feeding of scientific data." PLoS Comput Biol 10.4 (2014): e1003542.
  47. 47. Reproducibility • One of the main principles of the scientific method • But without access to data – it is impossible (or very hard) to replicate results
  48. 48. Incentivize rather than Enforce • “to encourage data sharing, systematic reward and recognition mechanisms are necessary”. – Principles of data management and sharing at European Research Infrastructures Couto, Francisco M. "Rating, recognizing and rewarding metadata integration and sharing on the semantic web." Proceedings of the 10th International Conference on Uncertainty Reasoning for the Semantic Web-Volume 1259. CEUR-WS. org, 2014.
  49. 49. Final Remarks • A privacy-preserving environment for genomic data analysis is feasible • A privacy-preserving environment will help promote data sharing – Not the opposite – a severe leak may reverse the public opinion trend • Determinants of human genetic individuality – essential study for a privacy-preserving environment
  50. 50. Acknowledgments • SnT – Paulo Veríssimo – Maria Fernandes – Jérémie Decouchant • BioISI – Margarida Gama-Carvalho • LaSIGE – Vinícius Cogo – Alysson Bessani

×