SlideShare a Scribd company logo
1 of 50
Towards a privacy-preserving
environment for genomic data analysis
Francisco M. Couto
LaSIGE, Faculdade de Ciências da Universidade de Lisboa
October 13, 2016
IMM Computational Biology and Bioinformatics Seminars (CBBS)
WHY PRIVACY?
CAGTAATATACTTTCAACCTTTCGAAAGACTAAGCCAATATCGATTGTCACCAGCAGAAGCCGGCGCA
ACACCATAGTGCTGAGCGTATCATCGGATGCATTTTTAGCACTGACGCTTGGGAATATTCTCCCCAAGA
TTGCGTCTCGGGCTACAGACCCACAGTGTTAAAGCATACTCAATAGGCGAGCCTCTGTGAAGTGCTGG
TGCAGCGAGAAAGAGCAGAGTTAGGAGCACTCAAGGGGACATGTTTCAACGCTTGGACTCTACCGTT
ATGTGGGACGGACGGCGGGTTAAAAGGTGTAGTGATCCCGCACGGACCATGGTTCCCTTTGAAACTT
ACCTCTTTCGTGGCGAGTTGCGCTCTTCTCATCCAGCAAGACTGCTTTAACTGCCACCCCCATCGCCTCT
TGCAGGAAGATGTGACCTTTGATTTCGCGCGCCCGAATGAAGTGTATCTCGATCACCCAAAGGCACAT
ACACCTTGGAGCCCTTGTGTAGGACCTACTCCTCGCTGCATCGCCTCTTCGGAAAAAACACCATATGGC
GTTAGAGTTTGATTTGAGGCATATTGGCTCCTTGTGGGCGTCATCTGGGGTTAAGTTTCAATTCTGATG
GTTGAGAGATGATGACCTAGGACTAGGGCACTACTAGTATCGGCAGGTAGGGGGGGAAGCACACGG
AAACGCGCGATCCTTAAATGAGCAGTTGATCCAGAGAGTTCACCTCTTCAGTAGTTATCCCGTTGGTG
GCCGTGCCGTTAAGACCTGATCATCCTGTTAGGGTTCCTGAACGCGTGGATCTGCAGCCAGTTCCGATC
CCATCGCACGTGGGTTGGTCAACTACAGAACCCATAAGGGTACAGGCCAGGCTGTAACCCGAACATC
AAAAATTCGTGGTGCTTGTGCTGAGATGCTTATTCGAAGCTTAGGGAACTGTTCTTGTAAAACCGAGA
TCACCCCTTTATGACACGACCACTAGCTGCTGATGCGCGACACAATATGAGCTCGAAGCCGATACGCT
CGTCCTATCATTGGTCTAAGTTCATTTGCTTGTTGGGCGATGTATTCGGTAGCGAGAGGCAATAACAA
CTTAGCGGGTAGAGTCTGGTGCTACACGAAACATAAGGTGGATTACAGCTAGATCCATAATGGCGGT
GCACACGTGATGAACTGGCGATTAAAGCTTACCATCAATTGTATACTACCTGCATAGCCGTCTCAGTCC
CTCTCGACTAACTAGTATGGTCGCTAGTGGGTTCTCCTCCCGAATACACCTAAGCGATGAAAACCATCA
ACATGGTTGCTGAGCGACAGACAATGTACGGGACCCCACTCGCTTGCGTTGATATGTGCGTCGTCGTA
GTAAAGTTAGGTTTTCATCCTTCTGTTGCATCCAGGATAACAGTCAGACTTCTTGATGGCCTCGCATGA
CGCGAACGTTGTAAAGGGACTACTACTCCAGAATAGTCCCCCCTGTCACTAGGAAGACATACAAGGAT
TGGACTTCGGGGGTGCGGGTCAAGGATACTAGGTGGATTGGGTGGCGTCACCAATAGCACGAGCGA
ACCAACAACTGCTGCTGGCCGCTGATGAAGAGGCTCCTTAATCTGCACACTGGTAAAATGCAACTTAT
AAGGTGGATTAGAAACGTGCTCGCGTATAGTTAGCATAACAGTATGACTTCGGACACGCGGTCTGTA
CTAGTCAATCAACGACCCAGCACGCGCTAGTATAGCCGATCCCACAAGCGGGTTCAAGCTTAGCTCCG
AGTGGTCCACGAAATGTAGTATGTACTAGCGACTGTCCAATGACTGGGCGTCGAACGATCCTAATTAG
TACGTATACTCTGTTGGATTACCAACTATCGGTGTGGTTAATTCTAAGAATGTACAAGCTAACCCAAAG
TGAACTGCAAGATGCGGGGATTAAATTACTTATGAGCCCAGGGTGTGGAAGCAGAGCTCCCAACGTG
TCAGACATACTGAATTTTCCGCCGCGAGACCTGGAGGAGTATGGGAGGTAACGCTAGGCTGCTTGAT
AAACATGCGCGGCCAGTCACCTAGAAGGGTATCAAGTGGGATGTCAGCCAAGCAAACCACACCAGGA
TATAAATCGCCAAAGGCAACAAACAAAGGTATCCACTACCAGGAGAGGCACAACTAGTGACTATGAA
AGGTCCCTGGATTGAGCAGTTGAGATAGACGACCCCTATGCGTCAGCTGAGTAGGCTTGGCATGCGC
CGGCGCGGGTTTGTTTCAGCACTTCTACCCTTTTCGTAATGGACAGAGGTTCAAAAAGTAACTGGCTG
AAGGACTCCGCGGATCCCTTTTTATAGGGGCGCAAAAGGTGCAACGACTGTAAACAACTCTGAGAAT
GAACCTTTAGGCTAGGTTCTTGACGACACCCGTGGAGATATGTCATACTAAGATATCATGGCCTTACA
TAGCTAAGGGGACACGCATTAGTGACACATAGATGAGTGCTTACGCGTTTTCAATCTCAGGCTAGGCA
GATCCTGTAGTCCTTCTGTCACGAGTCATCTCGCAGACCTGTACTACCCGAGAGGACTTTTCCGATCGG
CTAGTCGGAGGCCTTCTTTTCACACAAAAGGTGCGTTAATCTCTCTGAGAAGGGTAACAGCGATTTAC
CCACGAGCTAGTCGTTCAAGGAAGATGTATTCTCATGAAGTACGAATCAGCAAACTTAGCGCACATCC
AGTTCAGGTGTGAGGATACAATTTCCTGCGTGGGCGACGTATTATATCCCTATAGGAAGTGCAAGTGC
TGACAAATAGCAATGACGGTCAGTTGACTATCCGCCTAGGACCGACGATGTCAAGATGGTCGACACG
ATATGGCTTCTACGTAGGTAACAGGGAGGAGCAAACCGTCAATCGGCTTGTAAACTAAGGCCTTTGC
ATGGCTAACCTGCATACCAGTTCTACCTTTTACGTCCGACGAGACCCTACGTGGGCTATGTTTGTGCTC
TAGAGTGCACCTACACGGTCCTTAGCCCTTCTACCCCCGTGTAAATTCTTGCGGCGGAGATGGCCCGGA
AGCGTCACTAGATCGGCCGAATCTGGCGGGCGGACCCCGTATTGAGAGGCGTCTTTGCGCTAGAATCC
GGCACGCCAGTGCGTTAAACTCGCTGATTACCGAGTTATCAGCAGGGGCATCTGTATAAACTCCTTCG
CAGCCTCGTGAGCATACCACCGATTCGGTTGAGTATGTAAGAAGCATTTTCTTACTGATATGCGCTAGC
CTTTATCGGTTCTGTTCATACGAGGTTATGCGTTCTTGATAGGTGACGTGTTCTAACGTGTACGCTTAA
CGCTTGACAGTCTCTGTCAGCTACGAAAACGACGTTTCTTTCAATTAGTTGTGGGGACTAGTCGTGGAC
TTTGGTGGCACGTTTCATCGGGGAACACTTGTCTTCTACTTGGGCTGTTCGGAAAAGCGCCTTGTGCTA
GCCACTAAATCCTGACGACCGGTATTTCGCATAACACCGAAGGATTGGCAACGAGTTATTAGAAATAG
TATAAAAAGCCTCGCGTACTTGGTTAGTGTAGAGTGTCCTCATTATGGGTTGCGGGACTCTGCCCAAG
AAGGTGTATTGTTGCACTACCATCATTGGAGCCGCTCGCCCACGACGGGATTTCGAGAGCTAGGTGAA
TTCGCTAGCCTCGCTGCGTATAAACGGAACTTAGAGGCGTATTAGCGATGACAGTCTTAAGACAGGCT
TCTCAAATAATCTAAGCACTATACCTATGTATACCAGAATTGGCGAATAAGGAATATTAGACGTGGGA
TCCCCCCGTCCGTGGGACCAAGTAATAGTCAACGCGGGTTTGTCTCCACAAAAACGGCACCAAACTCT
TGCTAAGGTCGGTCGTCTGCGGATCTCGCTGTTTGGTCGCGGGTCCTAGGGCGAAGGGATAGCCATA
GGCAAATGAGCGGCATCATCCACTAGCTCGACACACGCGCGTTAGACCCAACGCCACTTTTCCGATCA
GAGACAACCAAGGTGGTGTATATGCACCTCTCCGCATAACTCAATCCAGAGCCGGCCTGGATGTTCTT
GCTTGTGAGGCATTGAGCGTAGTTGCCGTGACCAAATGCTTTACCAAATCAGAACAAGATTCCCGGCA
GCGTTCGGGCACGTTTTGCACATATCATCTTTCGGCCGACTGAGATGGTAGCCGGCGTGAACCCGAAA
GAGTGTTAGATCGGTGATTCTAAGCGCCCCAATGTGTAAGGTAAACAGTAGCAACAAACCGATACGT
CCATAGCGGGCCCATTCATCAAAAAGCGGTGCCATTTCAACTAAAGACTTCGCAATAAGAACCAGACC
AAGAGGAACTATAGGCCAAGACCCGCCCCCTTGGCGGAAGGCGATGGGGAGCCCAGGGGAACATTG
CGCGCCCGCTCTTATGCGGAGCATAACACACTTGTCCCGCTTGGCGCCTGCGGAAGGTTTCCCCGAAAC
TCTGTGGCGGTGCATTTCGAGAAGAACAGAGATCACTTACAGGATATAGTGCGAGTACGCGGGTAGT
CAGGTTTAGTTAGACTTTTAGTGACGCTTACAAATCCCAGTATCGTTTAATAGGCCAGAGTCCAGTAGT
AGTGCCATACGTGGTAACAGAGCTTACCCATAAACGGCGGTATGTGGTTACGCTAGAGAAGTATAAG
GTGGGAAGATGCTACGTACTGATACACGGATCTGATACCTTAAGCAGTTTAGGTCTAAGCACGCCAAC
TGGGTTTGGTTTTTTAGTTGGGTTAAGCCTCACCTCTGAGCACATGTTGGATGATCCGTCATGGCAGGC
GACAATCGCGTTCGTACCGCCCGGCCACGCCGAGGACGACAGCGCGCCGGCCGCGTGTTGTTACGAA
CTTTATTTGCAACCATCACGCACTTAGAAAAGCTTTTTGACTTGGGCCTGTGCGGCACATTGAATATTA
GTTACAACTCAAAATTCCCTGCGTTGACACTCCAGCAAGTACAAGCCGAGGGCCACGTCCGGCCCGCC
CTGCAGGCATAGACACATATTTCTGCTGGCCGTCGACTGCCTCCGCGATCCGATAGGTTCAGATCCTCT
CGCCCACGCGTGAAGTTGCCCGCGTAATCAAACGACAGCGAATTACTTACACGACCAAGGTTGCGAG
GATTATAATAGGCAGAGCATCCAATCTCTCCTGTCCTCATGCGACCAGGCATTAACTCGGTCCGTGCTT
CACTTCAAGTCGTGATCCGTTTAAGTCGGCATGCACTCTAACCAATCGTGGAGCAACAAGACTTGTACC
GTTGACCCAGAAGCCTTGTCGAATCTGGTGTATCTGAGGATTCTTGTGGATTATCTTTGAAACACCGCG
AAATCTTAAACCAAGGTCAATGTTGAGGTATGCTGAATGGACTGCATAAGCATAGGGCATGCCCGGT
CGGTACCCTAACCACTAAAACACAAGTCAGACTCAGCCTAGATCGCTGCCCGGCCCAGGACTTGTCAG
CGCTCACTATACGTAGCATTTAACGAAGCGTCAAGCCCTTACCTAACAGGTGGCGTCCCGGTTAAGTG
GCAAAAGTAAAAACGCGGAAAATAGTACAAACTGGGTAACTATGTCAAGAGTATTCGGGTGACATTT
GTCATTACATCCCTTTGGCTCCAGGCATCAGTGCGCCCCCGCGCCCCCTACAAAGAGCAGACACGTTTT
AGTGAATGACAAATGACCAGCGCTCAGCGCATCTGGGTAGGATCTCTATGCTGCTGTCCCAGCGTTCA
CCTTCCTACCACTCACGTATATGGCCCGGTATGTAGAGGAGTGTTGTTCGGTGAGTTGTGCGCGCATA
ACTCGGTAACTTTTATATTCATTTGGTAGCTGGCTTCCCCAATATCAACATTTGCGTTAGGCTTGGTCTC
CTCAGGGTGCGCAACGGGTACACTACGTGGAACGTTACTTCCAACACACAGTTAATAGTCTTCTCAGT
ACGTCTTGCTCATTCATTCGCTGGAAAACTAATGGCTACATTATTCCAAGTCGCTATGAGACCCCGCCG
CTCTGTGCTACGTTGGTCATGCTGAGTCAAGAAATTTTCGCGATAGCTTTATAAATTCTCCTGGCGACT
TAGCCAGAGTAAAAGCCGGCATTCTACACTTGTAGGTAAACCCAACGACAGATTCGTTCGCCAGGTGC
CTCCTACGTTCGCTTTATTGATTTCGTGCGAACCCCGTGAATCGATTTACATGCGGTTGTCTGTAGACCG
CAGCCAAAACCGCAAGTAAGCGTCGTGCCCACTGGGAATGCTTTTCCAGTCCGTCGTGTGGATCGAAT
AGGGCATGTAGTCCTATATAGCAACCTCCTTTACGTTAGCGACGGTAGCCCGAAAACTCCGTAGGTAC
GCCGGCCGTCGGGCACAAATAAGGAAAGTACATGGTGGTTGCTTGCTTCCAGATGGTCTGATTGGCG
CGTAACAGGTGGAGCCCATCCGAAGATTGATAGTCCTCAAGCGTGTAAGGGCTCCGTGTGTGATGCTT
TAGAGTAATCTTCTTACCTATTCGTAAACCGGCCAAAGGCTCACTGCAGCAGTGACAATCATTGGTAA
AGCCGGGTCTACTGCCCTCACAATGGTTTCAAATCATATATCCAGATCCCACATTGGGACGTCCGGGC
GCGGTATCGCGAGCTTGGTGGCTCTCGCGTGTAGACTGCTATCGGGGGAAGTTCGTGAAAGCTCAAC
TTCTGGAGGGAACGCCCGTTTATCTACACCGAGATACACTTAGATTAGGCAAGACAGAAAACCACAG
AAGCACGGTCTCCCCTTATGGGCAGCAACCGTTATGCGCTCTGTACCCTCTATGCCGTTGCCAGCTACG
GTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATTAGTGTCACATGGTTGCACCTCTCT
GAGTCAGGATTGACAGATCCAATTTACCCGTTCTTTATGGGCAGCAACCGTTATGCGCTCTGTACCCTC
TATGCCGTTGCCAGCTACGGTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATTATGC
CGTTGCCAGCTACGGTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATTATGCCGTTG
CCAGCTACGGTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATGCATGCATGCATTGG
all or nothing
privacy data sharing
CAGTAATATACTTTCAACCTTTCGAAAGACTAAGCCAATATCGATTGTCACCAGCAGAAGCCGGCGCAACACCATAGTGCTGAGCGTATCATCGGATGCATTTTTAGCACTGA
CGCTTGGGAATATTCTCCCCAAGATTGCGTCTCGGGCTACAGACCCACAGTGTTAAAGCATACTCAATAGGCGAGCCTCTGTGAAGTGCTGGTGCAGCGAGAAAGAGCAGA
GTTAGGAGCACTCAAGGGGACATGTTTCAACGCTTGGACTCTACCGTTATGTGGGACGGACGGCGGGTTAAAAGGTGTAGTGATCCCGCACGGACCATGGTTCCCTTTGAA
ACTTACCTCTTTCGTGGCGAGTTGCGCTCTTCTCATCCAGCAAGACTGCTTTAACTGCCACCCCCATCGCCTCTTGCAGGAAGATGTGACCTTTGATTTCGCGCGCCCGAATGA
AGTGTATCTCGATCACCCAAAGGCACATACACCTTGGAGCCCTTGTGTAGGACCTACTCCTCGCTGCATCGCCTCTTCGGAAAAAACACCATATGGCGTTAGAGTTTGATTTG
AGGCATATTGGCTCCTTGTGGGCGTCATCTGGGGTTAAGTTTCAATTCTGATGGTTGAGAGATGATGACCTAGGACTAGGGCACTACTAGTATCGGCAGGTAGGGGGGGAA
GCACACGGAAACGCGCGATCCTTAAATGAGCAGTTGATCCAGAGAGTTCACCTCTTCAGTAGTTATCCCGTTGGTGGCCGTGCCGTTAAGACCTGATCATCCTGTTAGGGTTC
CTGAACGCGTGGATCTGCAGCCAGTTCCGATCCCATCGCACGTGGGTTGGTCAACTACAGAACCCATAAGGGTACAGGCCAGGCTGTAACCCGAACATCAAAAATTCGTGGT
GCTTGTGCTGAGATGCTTATTCGAAGCTTAGGGAACTGTTCTTGTAAAACCGAGATCACCCCTTTATGACACGACCACTAGCTGCTGATGCGCGACACAATATGAGCTCGAA
GCCGATACGCTCGTCCTATCATTGGTCTAAGTTCATTTGCTTGTTGGGCGATGTATTCGGTAGCGAGAGGCAATAACAACTTAGCGGGTAGAGTCTGGTGCTACACGAAACA
TAAGGTGGATTACAGCTAGATCCATAATGGCGGTGCACACGTGATGAACTGGCGATTAAAGCTTACCATCAATTGTATACTACCTGCATAGCCGTCTCAGTCCCTCTCGACTA
ACTAGTATGGTCGCTAGTGGGTTCTCCTCCCGAATACACCTAAGCGATGAAAACCATCAACATGGTTGCTGAGCGACAGACAATGTACGGGACCCCACTCGCTTGCGTTGAT
ATGTGCGTCGTCGTAGTAAAGTTAGGTTTTCATCCTTCTGTTGCATCCAGGATAACAGTCAGACTTCTTGATGGCCTCGCATGACGCGAACGTTGTAAAGGGACTACTACTCC
AGAATAGTCCCCCCTGTCACTAGGAAGACATACAAGGATTGGACTTCGGGGGTGCGGGTCAAGGATACTAGGTGGATTGGGTGGCGTCACCAATAGCACGAGCGAACCAA
CAACTGCTGCTGGCCGCTGATGAAGAGGCTCCTTAATCTGCACACTGGTAAAATGCAACTTATAAGGTGGATTAGAAACGTGCTCGCGTATAGTTAGCATAACAGTATGACT
TCGGACACGCGGTCTGTACTAGTCAATCAACGACCCAGCACGCGCTAGTATAGCCGATCCCACAAGCGGGTTCAAGCTTAGCTCCGAGTGGTCCACGAAATGTAGTATGTAC
TAGCGACTGTCCAATGACTGGGCGTCGAACGATCCTAATTAGTACGTATACTCTGTTGGATTACCAACTATCGGTGTGGTTAATTCTAAGAATGTACAAGCTAACCCAAAGTG
AACTGCAAGATGCGGGGATTAAATTACTTATGAGCCCAGGGTGTGGAAGCAGAGCTCCCAACGTGTCAGACATACTGAATTTTCCGCCGCGAGACCTGGAGGAGTATGGGA
GGTAACGCTAGGCTGCTTGATAAACATGCGCGGCCAGTCACCTAGAAGGGTATCAAGTGGGATGTCAGCCAAGCAAACCACACCAGGATATAAATCGCCAAAGGCAACAA
ACAAAGGTATCCACTACCAGGAGAGGCACAACTAGTGACTATGAAAGGTCCCTGGATTGAGCAGTTGAGATAGACGACCCCTATGCGTCAGCTGAGTAGGCTTGGCATGCG
CCGGCGCGGGTTTGTTTCAGCACTTCTACCCTTTTCGTAATGGACAGAGGTTCAAAAAGTAACTGGCTGAAGGACTCCGCGGATCCCTTTTTATAGGGGCGCAAAAGGTGCA
ACGACTGTAAACAACTCTGAGAATGAACCTTTAGGCTAGGTTCTTGACGACACCCGTGGAGATATGTCATACTAAGATATCATGGCCTTACATAGCTAAGGGGACACGCATT
AGTGACACATAGATGAGTGCTTACGCGTTTTCAATCTCAGGCTAGGCAGATCCTGTAGTCCTTCTGTCACGAGTCATCTCGCAGACCTGTACTACCCGAGAGGACTTTTCCGA
TCGGCTAGTCGGAGGCCTTCTTTTCACACAAAAGGTGCGTTAATCTCTCTGAGAAGGGTAACAGCGATTTACCCACGAGCTAGTCGTTCAAGGAAGATGTATTCTCATGAAG
TACGAATCAGCAAACTTAGCGCACATCCAGTTCAGGTGTGAGGATACAATTTCCTGCGTGGGCGACGTATTATATCCCTATAGGAAGTGCAAGTGCTGACAAATAGCAATGA
CGGTCAGTTGACTATCCGCCTAGGACCGACGATGTCAAGATGGTCGACACGATATGGCTTCTACGTAGGTAACAGGGAGGAGCAAACCGTCAATCGGCTTGTAAACTAAGG
CCTTTGCATGGCTAACCTGCATACCAGTTCTACCTTTTACGTCCGACGAGACCCTACGTGGGCTATGTTTGTGCTCTAGAGTGCACCTACACGGTCCTTAGCCCTTCTACCCCCG
TGTAAATTCTTGCGGCGGAGATGGCCCGGAAGCGTCACTAGATCGGCCGAATCTGGCGGGCGGACCCCGTATTGAGAGGCGTCTTTGCGCTAGAATCCGGCACGCCAGTGC
GTTAAACTCGCTGATTACCGAGTTATCAGCAGGGGCATCTGTATAAACTCCTTCGCAGCCTCGTGAGCATACCACCGATTCGGTTGAGTATGTAAGAAGCATTTTCTTACTGA
TATGCGCTAGCCTTTATCGGTTCTGTTCATACGAGGTTATGCGTTCTTGATAGGTGACGTGTTCTAACGTGTACGCTTAACGCTTGACAGTCTCTGTCAGCTACGAAAACGACG
TTTCTTTCAATTAGTTGTGGGGACTAGTCGTGGACTTTGGTGGCACGTTTCATCGGGGAACACTTGTCTTCTACTTGGGCTGTTCGGAAAAGCGCCTTGTGCTAGCCACTAAA
TCCTGACGACCGGTATTTCGCATAACACCGAAGGATTGGCAACGAGTTATTAGAAATAGTATAAAAAGCCTCGCGTACTTGGTTAGTGTAGAGTGTCCTCATTATGGGTTGC
GGGACTCTGCCCAAGAAGGTGTATTGTTGCACTACCATCATTGGAGCCGCTCGCCC
CTGTGCGCTGCACGGATACTTGGGCCCGAAATGAAGACAGGCTTCTCCGGTCTATGGGTAGTCTTTCATGACCATTCATGCCAGCTTTCTACTACTGCCCAACCGTATCGA
GGCGTGCATAGCCGTAATCCAGCGTTCGCCAGCTGAGACGCGATTGATAGTTTTTCAGGTGTCGTTGTTCAATTCCAAAGCACAGAGCGATATGCCACGACGGGATTTC
GAGAGCTAGGTGAATTCGCTAGCCTCGCTGCGTATAAACGGAACTTAGAGGCGTATTAGCGATGACAGTCTTAAGACAGGCTTCTCAAATAATCTAAGCACTATACCTA
TGTATACCAGAATTGGCGAATAAGGAATATTAGACGTGGGATCCCCCCGTCCGTGGGACCAAGTAATAGTCAACGCGGGTTTGTCTCCACAAAAACGGCACCAAACTCT
TGCTAAGGTCGGTCGTCTGCGGATCTCGCTGTTTGGTCGCGGGTCCTAGGGCGAAGGGATAGCCATAGGCAAATGAGCGGCATCATCCACTAGCTCGACACACGCGCGT
TAGACCCAACGCCACTTTTCCGATCAGAGACAACCAAGGTGGTGTATATGCACCTCTCCGCATAACTCAATCCAGAGCCGGCCTGGATGTTCTTGCTTGTGAGGCATTGA
GCGTAGTTGCCGTGACCAAATGCTTTACCAAATCAGAACAAGATTCCCGGCAGCGTTCGGGCACGTTTTGCACATATCATCTTTCGGCCGACTGAGATGGTAGCCGGCGT
GAACCCGAAAGAGTGTTAGATCGGTGATTCTAAGCGCCCCAATGTGTAAGGTAAACAGTAGCAACAAACCGATACGTCCATAGCGGGCCCATTCATCAAAAAGCGGTG
CCATTTCAACTAAAGACTTCGCAATAAGAACCAGACCAAGAGGAACTATAGGCCAAGACCCGCCCCCTTGGCGGAAGGCGATGGGGAGCCCAGGGGAACATTGCGCGC
CCGCTCTTATGCGGAGCATAACACACTTGTCCCGCTTGGCGCCTGCGGAAGGTTTCCCCGAAACTCTGTGGCGGTGCATTTCGAGAAGAACAGAGATCACTTACAGGAT
ATAGTGCGAGTACGCGGGTAGTCAGGTTTAGTTAGACTTTTAGTGACGCTTACAAATCCCAGTATCGTTTAATAGGCCAGAGTCCAGTAGTAGTGCCATACGTGGTAAC
AGAGCTTACCCATAAACGGCGGTATGTGGTTACGCTAGAGAAGTATAAGGTGGGAAGATGCTACGTACTGATACACGGATCTGATACCTTAAGCAGTTTAGGTCTAAG
CACGCCAACTGGGTTTGGTTTTTTAGTTGGGTTAAGCCTCACCTCTGAGCACATGTTGGATGATCCGTCATGGCAGGCGACAATCGCGTTCGTACCGCCCGGCCACGCCG
AGGACGACAGCGCGCCGGCCGCGTGTTGTTACGAACTTTATTTGCAACCATCACGCACTTAGAAAAGCTTTTTGACTTGGGCCTGTGCGGCACATTGAATATTAGTTACA
ACTCAAAATTCCCTGCGTTGACACTCCAGCAAGTACAAGCCGAGGGCCACGTCCGGCCCGCCCTGCAGGCATAGACACATATTTCTGCTGGCCGTCGACTGCCTCCGCGA
TCCGATAGGTTCAGATCCTCTCGCCCACGCGTGAAGTTGCCCGCGTAATCAAACGACAGCGAATTACTTACACGACCAAGGTTGCGAGGATTATAATAGGCAGAGCATC
CAATCTCTCCTGTCCTCATGCGACCAGGCATTAACTCGGTCCGTGCTTCACTTCAAGTCGTGATCCGTTTAAGTCGGCATGCACTCTAACCAATCGTGGAGCAACAAGACT
TGTACCGTTGACCCAGAAGCCTTGTCGAATCTGGTGTATCTGAGGATTCTTGTGGATTATCTTTGAAACACCGCGAAATCTTAAACCAAGGTCAATGTTGAGGTATGCTG
AATGGACTGCATAAGCATAGGGCATGCCCGGTCGGTACCCTAACCACTAAAACACAAGTCAGACTCAGCCTAGATCGCTGCCCGGCCCAGGACTTGTCAGCGCTCACTA
TACGTAGCATTTAACGAAGCGTCAAGCCCTTACCTAACAGGTGGCGTCCCGGTTAAGTGGCAAAAGTAAAAACGCGGAAAATAGTACAAACTGGGTAACTATGTCAAG
AGTATTCGGGTGACATTTGTCATTACATCCCTTTGGCTCCAGGCATCAGTGCGCCCCCGCGCCCCCTACAAAGAGCAGACACGTTTTAGTGAATGACAAATGACCAGCGC
TCAGCGCATCTGGGTAGGATCTCTATGCTGCTGTCCCAGCGTTCACCTTCCTACCACTCACGTATATGGCCCGGTATGTAGAGGAGTGTTGTTCGGTGAGTTGTGCGCGC
ATAACTCGGTAACTTTTATATTCATTTGGTAGCTGGCTTCCCCAATATCAACATTTGCGTTAGGCTTGGTCTCCTCAGGGTGCGCAACGGGTACACTACGTGGAACGTTAC
TTCCAACACACAGTTAATAGTCTTCTCAGTACGTCTTGCTCATTCATTCGCTGGAAAACTAATGGCTACATTATTCCAAGTCGCTATGAGACCCCGCCGCTCTGTGCTACGT
TGGTCATGCTGAGTCAAGAAATTTTCGCGATAGCTTTATAAATTCTCCTGGCGACTTAGCCAGAGTAAAAGCCGGCATTCTACACTTGTAGGTAAACCCAACGACAGATT
CGTTCGCCAGGTGCCTCCTACGTTCGCTTTATTGATTTCGTGCGAACCCCGTGAATCGATTTACATGCGGTTGTCTGTAGACCGCAGCCAAAACCGCAAGTAAGCGTCGT
GCCCACTGGGAATGCTTTTCCAGTCCGTCGTGTGGATCGAATAGGGCATGTAGTCCTATATAGCAACCTCCTTTACGTTAGCGACGGTAGCCCGAAAACTCCGTAGGTAC
GCCGGCCGTCGGGCACAAATAAGGAAAGTACATGGTGGTTGCTTGCTTCCAGATGGTCTGATTGGCGCGTAACAGGTGGAGCCCATCCGAAGATTGATAGTCCTCAAG
CGTGTAAGGGCTCCGTGTGTGATGCTTTAGAGTAATCTTCTTACCTATTCGTAAACCGGCCAAAGGCTCACTGCAGCAGTGACAATCATTGGTAAAGCCGGGTCTACTGC
CCTCACAATGGTTTCAAATCATATATCCAGATCCCACATTGGGACGTCCGGGCGCGGTATCGCGAGCTTGGTGGCTCTCGCGTGTAGACTGCTATCGGGGGAAGTTCGTG
AAAGCTCAACTTCTGGAGGGAACGCCCGTTTATCTACACCGAGATACACTTAGATTAGGCAAGACAGAAAACCACAGAAGCACGGTCTCCCCTTATGGGCAGCAACCGT
TATGCGCTCTGTACCCTCTATGCCGTTGCCAGCTACGGTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATTAGTGTCACATGGTTGCACCTCTCTGAGTC
AGGATTGACAGATCCAATTTACCCGTTCT
hybrid
GTCTGTACTAGTCAATCAACGACCCAGCACGCGCTAGTATAGCCGATCCCACAAGCGGGTTCAAGCTTAGCTCCGAGTGGTCCACGAAATGTAGTATGTACTAGCGACTGTCCAATGACTGGGCGTCGAACGATCCTAATTAGTACGTATACTC
TGTTGGATTACCAACTATCGGTGTGGTTAATTCTAAGAATGTACAAGCTAACCCAAAGTGAACTGCAAGATGCGGGGATTAAATTACTTATGAGCCCAGGGTGTGGAAGCAGAGCTCCCAACGTGTCAGACATACTGAATTTTCCGCCGCGAG
ACCTGGAGGAGTATGGGAGGTAACGCTAGGCTGCTTGATAAACATGCGCGGCCAGTCACCTAGAAGGGTATCAAGTGGGATGTCAGCCAAGCAAACCACACCAGGATATAAATCGCCAAAGGCAACAAACAAAGGTATCCACTACCAGGAG
AGGCACAACTAGTGACTATGAAAGGTCCCTGGATTGAGCAGTTGAGATAGACGACCCCTATGCGTCAGCTGAGTAGGCTTGGCATGCGCCGGCGCGGGTTTGTTTCAGCACTTCTACCCTTTTCGTAATGGACAGAGGTTCAAAAAGTAACTG
GCTGAAGGACTCCGCGGATCCCTTTTTATAGGGGCGCAAAAGGTGCAACGACTGTAAACAACTCTGAGAATGAACCTTTAGGCTAGGTTCTTGACGACACCCGTGGAGATATGTCATACTAAGATATCATGGCCTTACATAGCTAAGGGGACA
CGCATTAGTGACACATAGATGAGTGCTTACGCGTTTTCAATCTCAGGCTAGGCAGATCCTGTAGTCCTTCTGTCACGAGTCATCTCGCAGACCTGTACTACCCGAGAGGACTTTTCCGATCGGCTAGTCGGAGGCCTTCTTTTCACACAAAAGGT
GCGTTAATCTCTCTGAGAAGGGTAACAGCGATTTACCCACGAGCTAGTCGTTCAAGGAAGATGTATTCTCATGAAGTACGAATCAGCAAACTTAGCGCACATCCAGTTCAGGTGTGAGGATACAATTTCCTGCGTGGGCGACGTATTATATCCC
TATAGGAAGTGCAAGTGCTGACAAATAGCAATGACGGTCAGTTGACTATCCGCCTAGGACCGACGATGTCAAGATGGTCGACACGATATGGCTTCTACGTAGGTAACAGGGAGGAGCAAACCGTCAATCGGCTTGTAAACTAAGGCCTTTGC
ATGGCTAACCTGCATACCAGTTCTACCTTTTACGTCCGACGAGACCCTACGTGGGCTATGTTTGTGCTCTAGAGTGCACCTACACGGTCCTTAGCCCTTCTACCCCCGTGTAAATTCTTGCGGCGGAGATGGCCCGGAAGCGTCACTAGATCGGCC
GAATCTGGCGGGCGGACCCCGTATTGAGAGGCGTCTTTGCGCTAGAATCCGGCACGCCAGTGCGTTAAACTCGCTGATTACCGAGTTATCAGCAGGGGCATCTGTATAAACTCCTTCGCAGCCTCGTGAGCATACCACCGATTCGGTTGAGTAT
GTAAGAAGCATTTTCTTACTGATATGCGCTAGCCTTTATCGGTTCTGTTCATACGAGGTTATGCGTTCTTGATAGGTGACGTGTTCTAACGTGTACGCTTAACGCTTGACAGTCTCTGTCAGCTACGAAAACGACGTTTCTTTCAATTAGTTGTGG
GGACTAGTCGTGGACTTTGGTGGCACGTTTCATCGGGGAACACTTGTCTTCTACTTGGGCTGTTCGGAAAAGCGCCTTGTGCTAGCCACTAAATCCTGACGACCGGTATTTCGCATAACACCGAAGGATTGGCAACGAGTTATTAGAAATAGTA
TAAAAAGCCTCGCGTACTTGGTTAGTGTAGAGTGTCCTCATTATGGGTTGCGGGACTCTGCCCAAGAAGGTGTATTGTTGCACTACCATCATTGGAGCCGCTCGCCCAAAGGCGCGGTTAGGTAGACGGATGCGTCAAGCAATAGTCAAGTCC
ACTGACATGACGGTAGAGCTCGTGACCTCAACTAGCCTGTGCGCTGCACGGATACTTGGGCCCGAAATGAAGACAGGCTTCTCCGGTCTATGGGTAGTCTTTCATGACCATTCATGCCAGCTTTCTACTACTGCCCAACCGTATCGAGGCGTGCA
TAGCCGTAATCCAGCGTTCGCCAGCTGAGACGCGATTGATAGTTTTTCAGGTGTCGTTGTTCAATTCCAAAGCACAGAGCGATATGCCACGACGGGATTTCGAGAGCTAGGTGAATTCGCTAGCCTCGCTGCGTATAAACGGAACTTAGAGGC
GTATTAGCGATGACAGTCTTAAGACAGGCTTCTCAAATAATCTAAGCACTATACCTATGTATACCAGAATTGGCGAATAAGGAATATTAGACGTGGGATCCCCCCGTCCGTGGGACCAAGTAATAGTCAACGCGGGTTTGTCTCCACAAAAAC
GGCACCAAACTCTTGCTAAGGTCGGTCGTCTGCGGATCTCGCTGTTTGGTCGCGGGTCCTAGGGCGAAGGGATAGCCATAGGCAAATGAGCGGCATCATCCACTAGCTCGACACACGCGCGTTAGACCCAACGCCACTTTTCCGATCAGAGAC
AACCAAGGTGGTGTATATGCACCTCTCCGCATAACTCAATCCAGAGCCGGCCTGGATGTTCTTGCTTGTGAGGCATTGAGCGTAGTTGCCGTGACCAAATGCTTTACCAAATCAGAACAAGATTCCCGGCAGCGTTCGGGCACGTTTTGCACAT
ATCATCTTTCGGCCGACTGAGATGGTAGCCGGCGTGAACCCGAAAGAGTGTTAGATCGGTGATTCTAAGCGCCCCAATGTGTAAGGTAAACAGTAGCAACAAACCGATACGTCCATAGCGGGCCCATTCATCAAAAAGCGGTGCCATTTCAAC
TAAAGACTTCGCAATAAGAACCAGACCAAGAGGAACTATAGGCCAAGACCCGCCCCCTTGGCGGAAGGCGATGGGGAGCCCAGGGGAACATTGCGCGCCCGCTCTTATGCGGAGCATAACACACTTGTCCCGCTTGGCGCCTGCGGAAGGTT
TCCCCGAAACTCTGTGGCGGTGCATTTCGAGAAGAACAGAGATCACTTACAGGATATAGTGCGAGTACGCGGGTAGTCAGGTTTAGTTAGACTTTTAGTGACGCTTACAAATCCCAGTATCGTTTAATAGGCCAGAGTCCAGTAGTAGTGCCAT
ACGTGGTAACAGAGCTTACCCATAAACGGCGGTATGTGGTTACGCTAGAGAAGTATAAGGTGGGAAGATGCTACGTACTGATACACGGATCTGATACCTTAAGCAGTTTAGGTCTAAGCACGCCAACTGGGTTTGGTTTTTTAGTTGGGTTAA
GCCTCACCTCTGAGCACATGTTGGATGATCCGTCATGGCAGGCGACAATCGCGTTCGTACCGCCCGGCCACGCCGAGGACGACAGCGCGCCGGCCGCGTGTTGTTACGAACTTTATTTGCAACCATCACGCACTTAGAAAAGCTTTTTGACTTG
GGCCTGTGCGGCACATTGAATATTAGTTACAACTCAAAATTCCCTGCGTTGACACTCCAGCAAGTACAAGCCGAGGGCCACGTCCGGCCCGCCCTGCAGGCATAGACACATATTTCTGCTGGCCGTCGACTGCCTCCGCGATCCGATAGGTTCA
GATCCTCTCGCCCACGCGTGAAGTTGCCCGCGTAATCAAACGACAGCGAATTACTTACACGACCAAGGTTGCGAGGATTATAATAGGCAGAGCATCCAATCTCTCCTGTCCTCATGCGACCAGGCATTAACTCGGTCCGTGCTTCACTTCAAGTC
GTGATCCGTTTAAGTCGGCATGCACTCTAACCAATCGTGGAGCAACAAGACTTGTACCGTTGACCCAGAAGCCTTGTCGAATCTGGTGTATCTGAGGATTCTTGTGGATTATCTTTGAAACACCGCGAAATCTTAAACCAAGGTCAATGTTGAG
GTATGCTGAATGGACTGCATAAGCATAGGGCATGCCCGGTCGGTACCCTAACCACTAAAACACAAGTCAGACTCAGCCTAGATCGCTGCCCGGCCCAGGACTTGTCAGCGCTCACTATACGTAGCATTTAACGAAGCGTCAAGCCCTTACCTAA
CAGGTGGCGTCCCGGTTAAGTGGCAAAAGTAAAAACGCGGAAAATAGTACAAACTGGGTAACTATGTCAAGAGTATTCGGGTGACATTTGTCATTACATCCCTTTGGCTCCAGGCATCAGTGCGCCCCCGCGCCCCCTACAAAGAGCAGACAC
GTTTTAGTGAATGACAAATGACCAGCGCTCAGCGCATCTGGGTAGGATCTCTATGCTGCTGTCCCAGCGTTCACCTTCCTACCACTCACGTATATGGCCCGGTATGTAGAGGAGTGTTGTTCGGTGAGTTGTGCGCGCATAACTCGGTAACTTTT
ATATTCATTTGGTAGCTGGCTTCCCCAATATCAACATTTGCGTTAGGCTTGGTCTCCTCAGGGTGCGCAACGGGTACACTACGTGGAACGTTACTTCCAACACACAGTTAATAGTCTTCTCAGTACGTCTTGCTCATTCATTCGCTGGAAAACTAA
TGGCTACATTATTCCAAGTCGCTATGAGACCCCGCCGCTCTGTGCTACGTTGGTCATGCTGAGTCAAGAAATTTTCGCGATAGCTTTATAAATTCTCCTGGCGACTTAGCCAGAGTAAAAGCCGGCATTCTACACTTGTAGGTAAACCCAACGAC
AGATTCGTTCGCCAGGTGCCTCCTACGTTCGCTTTATTGATTTCGTGCGAACCCCGTGAATCGATTTACATGCGGTTGTCTGTAGACCGCAGCCAAAACCGCAAGTAAGCGTCGTGCCCACTGGGAATGCTTTTCCAGTCCGTCGTGTGGATCGA
ATAGGGCATGTAGTCCTATATAGCAACCTCCTTTACGTTAGCGACGGTAGCCCGAAAACTCCGTAGGTACGCCGGCCGTCGGGCACAAATAAGGAAAGTACATGGTGGTTGCTTGCTTCCAGATGGTCTGATTGGCGCGTAACAGGTGGAGCC
CATCCGAAGATTGATAGTCCTCAAGCGTGTAAGGGCTCCGTGTGTGATGCTTTAGAGTAATCTTCTTACCTATTCGTAAACCGGCCAAAGGCTCACTGCAGCAGTGACAATCATTGGTAAAGCCGGGTCTACTGCCCTCACAATGGTTTCAAATC
ATATATCCAGATCCCACATTGGGACGTCCGGGCGCGGTATCGCGAGCTTGGTGGCTCTCGCGTGTAGACTGCTATCGGGGGAAGTTCGTGAAAGCTCAACTTCTGGAGGGAACGCCCGTTTATCTACACCGAGATACACTTAGATTAGGCAAG
ACAGAAAACCACAGAAGCACGGTCTCCCCTTATGGGCAGCAACCGTTATGCGCTCTGTACCCTCTATGCCGTTGCCAGCTACGGTGACTCTGTCTAACCAGCTGAAGTGGATAACCTTGTGGAGCATTAGTGTCACATGGTTGCACCTCTCTGAG
Are all portions equally important for
privacy?
Does it affect others?
• DNA is transmitted from parents to children
• One leak may compromise a large group of
people (relatives)
How to detect the privacy sensitive
parts?
• the detection should identify small sensitive
elements
– using databases of known patterns
• Challenge:
– constructing a comprehensive knowledge
database
What is the impact on sharing data?
• sharing certain portions of data is more
attractive than sharing nothing
• privacy-sensitive portions may still be shared
in a controlled way
– e.g., using the cryptographic methods
Why sharing?
• The highest value of genomes is achieved only
when sharing them with others
• sharing each individual genome may have
little impact
– but sharing many of them may have a huge
impact
But why sharing non-privacy sensitive
sequences?
• Is there any scientifically value there?
– Researchers want the privacy sensitive ones
• But are they really non-privacy sensitive
sequences?
– Maybe, currently we do not know
– Some of them will become privacy sensitive in the
future
• According to new discoveries
– Sharing them may speed up this process
BUT ISN'T THIS DATA
DE-IDENTIFIED?
Privacy attacks
Erlich, Yaniv, and Arvind Narayanan. "Routes for breaching and protecting genetic
privacy." Nature Reviews Genetics 15.6 (2014): 409-421.
Identity Tracing attacks
• Goal:
– uniquely identify the data donor
– despite data de-identifying techniques
• absence of explicit identifiers such as the name and exact
address
• Method
– accumulate quasi-identifiers
• additional metadata, such as basic demographic details,
inclusion/exclusion criteria, pedigree structure, and health
conditions
– gradually narrow down the possible individuals
A possible route for identity tracing
228 - 268M individuals
23 - 8 individuals
1000 Genomes project attack
• Queried the Y-STR profiles in
– YSearch and SMGF
– correct surname in 12% of cases
• with 82% of confidence
• Triangulating identities
– combined the obtained surnames with age and state
– U.S. census
• 131 out of the 1,092 participants
– will never recover their privacy
Gymrek, Melissa, et al. "Identifying personal genomes by surname inference." Science 339.6117
(2013): 321-324.
PRIVACY-PRESERVING
ENVIRONMENT
Main players
• Sample donors
– donate biological material
• Sample managers
– receive, manipulate, sequence, store, and provide
biological material and the results
• Researchers
– Consumers of data
• Auditors
– verify who accessed specific datasets
Donors
• Inform their preferences on data sharing
– free to customize
• Blanket consent
– participate in projects related to specific topics
• Opt-in or opt-out
– specific projects they sympathize with (or not)
• May delegate to Sample Managers
• Donor dies
– relatives gain the ability to explicitly customize them
Researchers
• register themselves in the system
• propose a project
• If approved by the Sample Manager
– Can use authorized privacy-sensitive portions
• The value of sequenced data is kept intact to
authorized researchers
Privacy-preserving environment for
genomic data analysis
KNOWLEDGE DATABASE
Attacks References
Re-identification (few hundred SNPs are enough) Lin [23]
Acquire knowledge about targets from GWAS results Wang [34]
Acquire knowledge about targets from microarray results Homer [18]
Infer masked genes (e.g., the APOE gene from Dr. Watson [35]) Nyholt [28]
Genomic variations
Attacks References
Use STR profiles to identify donors of 1000 Genomes Project Gymrek [16]
Forensic identification Butler [6]
Short tandem repeats
Attacks References
Direct-to-consumer genomic testing Goldsmith [13]
Masking the APOE gene (related to Alzheimer) from Dr. Watson’s genome Wheeler [35]
Disease-related genes
Successful attacks using public data
Cogo, Vinicius V., et al. "A high-throughput method to detect privacy-sensitive human genomic
data." Proceedings of the 14th ACM Workshop on Privacy in the Electronic Society. ACM, 2015.
STR n
DYS392 4
DYS396 23
… …
DYS 618 17
• Small strings repeated several times
• Individual profile:
DYS392 = [TAT]n
cgac TAT TAT TAT TAT cgca
n=4
Short tandem repeats (STR)
Disease-related genes
Genomic variations
STRs Genes Variations Total
Databases TRDB GeneCards
1000 Genomes
Project
-
Number of
entries
240k 20k 38M 38.3M
DB sequences 22M 8.7M 1147M 1178M
DB size 660MB 87MB 34.4GB 35.1GB
Note: Any other database can be used with our solution
Retrieving the sequences
• Bloom filter
• Efficient data structure (space and performance)
• Test if an element is member of a set
• Does a specific value belong to the set?
• No means No (no false negatives)
• Yes means Maybe (configurable false positives)
• False positive affects efficiency only (not efficacy)
Efficient query system
https://github.com/vvcogo/dna-privacy-detector
Open source-code
EVALUATION
%?
How much of a human
genome is considered
privacy-sensitive?
0%
50%
100%
10%
25%
75%
PercentageofSensitiveReads
88.7%
non-sensitive
0%
12%
11%
5%
Y-STR (0.16%)
All-Gene (0.33%)
All-STR (1.2%)
All-SNP (10.6%)
All-Together (11.3%)
10%
1%
11.3%
sensitive
Percentage of privacy-sensitive
sequences
How big is the
Bloom filter?
Bloom filter size (Max 6GB)
Is the detector a
bottleneck?
35/52
Throughput – Single core
NGS machines = 300 000bp/s
200x
44x
60M bp/s
13M bp/s
NGS = 0.3M bp/s
Throughput – Multi-core
NGS machines = 300 000bp/s
1600x
200x
480M bp/s
66M bp/s
NGS = 0.3M bp/s
What if a novel
sequence is
discovered?
• Novel STRs:
o In 11 years (2003-2014) TRDB registered 1k novel STRs (0.42% growth)
o Useless for attackers until present in STRs databases
• Novel genes:
o Do not determine alone the contraction of a disease
o May have no relation with any disease
o Novel discoveries correlate diseases with known genes (limited
number)
• Novel genomic variations:
o No variation determines alone the identity or contraction of a disease
o covered by increasing population samples in allele frequency studies
Completeness of the method
OPEN PROBLEMS
Different levels of privacy
• Rare STRs and genomic modifications
– higher the likelihood of re-identification
• How to build a discrete filtering of sensitive
reads
– With multiple severity levels
Human genetic individuality
• How to define an individuality measure as a
function that,
– given a genome and a population,
– returns a numerical value reflecting
– their diversity in terms of human genetics
• Found an identical genome in the population
– individuality = 0
• No privacy-sensitive sequence of the population
found in the given genome
– individuality = 1
What are the determinants of human
genetic individuality?
• The complexity of human behaviour is enormous
• So the determinants of human genetic
individuality
– may be hard to predict from single genomic properties
• Follow a systems biology approach to reach for a
deeper understanding of the complexity of the
human genome by analysing what defines us as
individuals
PhD scholarship available
• PhD scholarship for the project:
– "What are the determinants of human genetic
individuality?"
• Under the supervision of
– Prof. Francisco Couto (LASIGE)
– and Prof. Margarida Gama-Carvalho (BioISI)
• Applications until October 21st (12PM, CET)
– How to apply:
http://biosys.campus.ciencias.ulisboa.pt/node/6
Sharing data?
• “Adherence to data-sharing policies is as
inconsistent as the policies themselves”
“351 papers covered by some data-sharing policy,
only 143 fully adhered to that policy” (~40%)
Corbyn, Zoë. "Researchers Failing to Make Raw Data Public." 2012-03-
30]. http://www. nature. com/news/2011/110914/full/news2011. 536.
html (2011).
• “More often than scientists would like to admit,
they cannot even recover the data associated
with their own published works”
Goodman, Alyssa, et al. "Ten simple rules for the care and feeding of
scientific data." PLoS Comput Biol 10.4 (2014): e1003542.
Reproducibility
• One of the main principles of the scientific
method
• But without access to data
– it is impossible (or very hard) to replicate results
Incentivize
rather than Enforce
• “to encourage data sharing, systematic
reward and recognition mechanisms are
necessary”.
– Principles of data management and sharing at
European Research Infrastructures
Couto, Francisco M. "Rating, recognizing and rewarding metadata integration and
sharing on the semantic web." Proceedings of the 10th International Conference
on Uncertainty Reasoning for the Semantic Web-Volume 1259. CEUR-WS. org,
2014.
Final Remarks
• A privacy-preserving environment for genomic
data analysis is feasible
• A privacy-preserving environment will help
promote data sharing
– Not the opposite
– a severe leak may reverse the public opinion trend
• Determinants of human genetic individuality
– essential study for a privacy-preserving
environment
Acknowledgments
• SnT
– Paulo Veríssimo
– Maria Fernandes
– Jérémie Decouchant
• BioISI
– Margarida Gama-Carvalho
• LaSIGE
– Vinícius Cogo
– Alysson Bessani

More Related Content

More from Francisco Couto

Master's Theses in Bioinformatics and Computational Biology
Master's Theses in Bioinformatics and Computational BiologyMaster's Theses in Bioinformatics and Computational Biology
Master's Theses in Bioinformatics and Computational BiologyFrancisco Couto
 
Linked Data – challenges for Imagiology and Radiology
Linked Data – challenges for Imagiology and RadiologyLinked Data – challenges for Imagiology and Radiology
Linked Data – challenges for Imagiology and RadiologyFrancisco Couto
 
Metadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata qualityMetadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata qualityFrancisco Couto
 
MER: a Minimal Named-Entity Recognition Tagger and Annotation Server
MER: a Minimal Named-Entity Recognition Tagger and Annotation ServerMER: a Minimal Named-Entity Recognition Tagger and Annotation Server
MER: a Minimal Named-Entity Recognition Tagger and Annotation ServerFrancisco Couto
 
A Large-Scale Characterization of User Behaviour in Cable TV
A Large-Scale Characterization of User Behaviour in Cable TVA Large-Scale Characterization of User Behaviour in Cable TV
A Large-Scale Characterization of User Behaviour in Cable TVFrancisco Couto
 
Master in Bioinformatics and Computational Biology
Master in Bioinformatics and Computational BiologyMaster in Bioinformatics and Computational Biology
Master in Bioinformatics and Computational BiologyFrancisco Couto
 
KnowledgeCoin : recognizing and rewarding metadata integration and sharing ...
KnowledgeCoin: recognizing and rewarding metadata integration and sharing ...KnowledgeCoin: recognizing and rewarding metadata integration and sharing ...
KnowledgeCoin : recognizing and rewarding metadata integration and sharing ...Francisco Couto
 
Bioinf2Bio Oportunidades
Bioinf2Bio OportunidadesBioinf2Bio Oportunidades
Bioinf2Bio OportunidadesFrancisco Couto
 
Stabvida oportunidades profissionais
Stabvida oportunidades profissionaisStabvida oportunidades profissionais
Stabvida oportunidades profissionaisFrancisco Couto
 
Mestrado em Bioinformática e Biologia Computacional da FCUL
Mestrado em Bioinformática e Biologia Computacional da FCULMestrado em Bioinformática e Biologia Computacional da FCUL
Mestrado em Bioinformática e Biologia Computacional da FCULFrancisco Couto
 

More from Francisco Couto (10)

Master's Theses in Bioinformatics and Computational Biology
Master's Theses in Bioinformatics and Computational BiologyMaster's Theses in Bioinformatics and Computational Biology
Master's Theses in Bioinformatics and Computational Biology
 
Linked Data – challenges for Imagiology and Radiology
Linked Data – challenges for Imagiology and RadiologyLinked Data – challenges for Imagiology and Radiology
Linked Data – challenges for Imagiology and Radiology
 
Metadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata qualityMetadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata quality
 
MER: a Minimal Named-Entity Recognition Tagger and Annotation Server
MER: a Minimal Named-Entity Recognition Tagger and Annotation ServerMER: a Minimal Named-Entity Recognition Tagger and Annotation Server
MER: a Minimal Named-Entity Recognition Tagger and Annotation Server
 
A Large-Scale Characterization of User Behaviour in Cable TV
A Large-Scale Characterization of User Behaviour in Cable TVA Large-Scale Characterization of User Behaviour in Cable TV
A Large-Scale Characterization of User Behaviour in Cable TV
 
Master in Bioinformatics and Computational Biology
Master in Bioinformatics and Computational BiologyMaster in Bioinformatics and Computational Biology
Master in Bioinformatics and Computational Biology
 
KnowledgeCoin : recognizing and rewarding metadata integration and sharing ...
KnowledgeCoin: recognizing and rewarding metadata integration and sharing ...KnowledgeCoin: recognizing and rewarding metadata integration and sharing ...
KnowledgeCoin : recognizing and rewarding metadata integration and sharing ...
 
Bioinf2Bio Oportunidades
Bioinf2Bio OportunidadesBioinf2Bio Oportunidades
Bioinf2Bio Oportunidades
 
Stabvida oportunidades profissionais
Stabvida oportunidades profissionaisStabvida oportunidades profissionais
Stabvida oportunidades profissionais
 
Mestrado em Bioinformática e Biologia Computacional da FCUL
Mestrado em Bioinformática e Biologia Computacional da FCULMestrado em Bioinformática e Biologia Computacional da FCUL
Mestrado em Bioinformática e Biologia Computacional da FCUL
 

Recently uploaded

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsoolala9823
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 

Recently uploaded (20)

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 

Towards a privacy-preserving environment for genomic data analysis