SlideShare a Scribd company logo
1 of 27
A Genome Sequence Analysis System Built with Hypertable Doug Judd CEO, Hypertable, Inc.
Application Development Team ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is Hypertable? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hypertable Deployments
Why NoSQL?
Source:  Nature 458, 719-724 (2009)
Source: wired.com, February 2011
Genomics 101
Base Pair (aka “base”) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Gene ,[object Object],[object Object],[object Object],[object Object],[object Object]
Biological Samples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example Reads File GTGGATAGGGGGAGACTAATGTAGTATGATTATCATCATCAACAGAAGCTATGACACCAGGATAAA CATTTCTTATTGCTGAAAGTATTCTATTGTAGAGATGTACCACAATTTGGTTTCTGGTTTTGTATT GGGAGGATACTAGGGATTACTGAAGCCAACTTTGCAGACTCATACATTTGACTAGACACAGCC ACATTACAGTTTTCTGAGGAAAATTCTTAAGATGTTACCCCAAAACATAGCATTTTAAATTAAAAC GGACCGGCTGAAGCCATGGCAGAAGAACATAAATTGTGAAGATTTCATGGGCATTTATTAGTT GGAAGTGATAAGTGTCCATGAAATCTTCACAATTTATGTTCAGAGATTGCAGTAAAGACAGGTGTA AAGACACAGCAAAGCTAAGAGGACCCAACACACGGTAGGGTCGGGGACCTTGGAGAAACATGG TGGCTTCTTCCTACATGCTTGTGATAGATGACCAAAAAACATTTGTTGAGTTGATGAATAGTACAA AAAAGGGGCGGATAATAAATGAAAAGGGAATGTGCTGTTATTTCCTACTAAGATCAGAAAGAG ATATAAACAAAAGCTGTCATCACTTAGGGACTTCAGCCACATAAAACAATGTCAGGCTAGTCACTT AGAGCTTTGGGACTAGTTGAGTGGCAGCTTAACAAAGCAACGCAATATCCATAGGGATTGGGG ATATTTACATCTAGTGGATTCTACCAGTATGGTGGTCTTATGTGGACTGCACGTGGTTTTCTAGTA AGATAGCAGCTCTTCCCAAATTTATTTATAATTGTGGCATTATTTATAATATCAAAATATTAT GTTGCCAAAGGAGATTAACATTTGAGTCAGTGGGCGGGGTAAGGCCGACCTACCCTTAATCTGGTG GAGAAAGAAGCTGCTAATGGAGTTTAAAAGGTTACTGTCATTAATGAAAAATAAATTTACAGC CAGACATTTATGAACAGAAATGGGAAAAACACACTAGGAAAGCACTGCAAAGACTAATCTGTCTTT AAAGGAGATAGAGTGACTCCAGGCCCCTTAGAAATGACTATACCTGGCAGAGCATGCCAACTG ATGGGCTCGAGTCCTCACAAATATGAATTCCCCCTAAGTCTTGAGAGGTCATTTGTGCATTTGGAA GGAAGAACATTCCATGCTCATGGGTAGGAAGAATCAATATCGTGAAAATGGTCATACTGCCCA GCGGGGTTTTTTTTTGTTTCATATTAACTTTAAAGTAGTTTTTTTCCATTTTGTGAAGAAAGACAT AAAGAACCAAGGCTAATAGTTGTTTGAGTTGTACTTACCATGTTGTTAAATGTCACCTCACAC CGCTGCCAGCCTATCAGAGCCGGGAATTACACCGTGCTTGGAGTTCTGGCACAGATCCACAGCTAC AGTTCTTCATTGTAAGAAATGGATGCTAACATGTAACAAGAAAACATCTGAAGGTTAAACTCA AATAAATGGGTTAATAGTTTGTCTTTCGGTCTTCATACTTTCAATATAAGTGGTTTACTTAGCCGA
Sequence Alignment ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Taxonomy ,[object Object],[object Object],[object Object],[object Object],[object Object]
GenBank ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Schema Design
Taxa Table ,[object Object],[object Object],CREATE TABLE Taxa (ID, Type, Children, Name); /1 ID 1 /1 ID :fullName /root /1 Type no rank /1 Children 1,10239,12884,12908,28384,131567 /1 Name root /1/10239 ID 10239 /1/10239 ID :fullName /root/Viruses /1/10239 Type superkingdom /1/10239 Children 12333,12429,12877,29258,35237, … /1/10239 Name Viruses /1/10239/12333 ID 12333 /1/10239/12333 ID :fullName /root/Viruses/unclassified phages /1/10239/12333 Type no rank /1/10239/12333 Children 12340,12347,12366,12371,12374, … /1/10239/12333 Name unclassified phages
Reads Table ,[object Object],[object Object],CREATE TABLE Reads (Sequence, Quality, GeneKey, Comments); AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  Sequence   ATCGCACCATTGAACTCCAGTC... AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  Quality   eeaeeeede_Ycc]dcacab... AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  Comments :qualityFilter  11071815... AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  Sequence   GGCTTACGCCTGTAATCCCAGC... AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  Quality   gfee_cgggegggecggggegc... AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  GeneKey :gnl|GNOMON|1320663.m  11... AbCam1_100_ACAGTG,HWI...17#ACAGTG/1  Sequence   AGGATACGGAAGGCCCAAGGAG... AbCam1_100_ACAGTG,HWI...17#ACAGTG/1  Quality   cdd`dffffffgffgggegf^e... AbCam1_100_ACAGTG,HWI...17#ACAGTG/1  GeneKey :chr10  110718151643.1308... AbCam1_100_ACAGTG,HWI...80#ACAGTG/1  Sequence   ACGGAAGAGCACACGTCTGAAC... AbCam1_100_ACAGTG,HWI...80#ACAGTG/1  Quality   cbccb[^WUb]_b`_[bR_]... AbCam1_100_ACAGTG,HWI...80#ACAGTG/1  Comments :qualityFilter  11071815... AbCam1_100_ACAGTG,HWI...88#ACAGTG/1  Sequence   GAACTCCAGTCACACAGTGATC... AbCam1_100_ACAGTG,HWI...88#ACAGTG/1  Quality   eeeeeeeeeeeceeeeeaeeTQ... AbCam1_100_ACAGTG,HWI...88#ACAGTG/1  Comments :qualityFilter  11071815...
Genes Table ,[object Object],[object Object],CREATE TABLE Genes (Sequence, TaxID, ID, ReadID); 1000075  Sequence   GAATTCCATGGCAGTAAAACATCTTCCCTTC… 1000075  TaxID   9606 1000075  ID :name  HSLFBPS6 Human fructose-1,6-biphosphatase  1000075  ReadID :0310.Lane8big,HWI-EAS355:8:91:1231:1315#0/1 … 1000075  ReadID :0908.Mexus2.TATTAT,SCS:1:22:395:324#0/1_TA … 1000075  ReadID :0916.Enceph2,SCS:6:24:1519:513#0/1 1000075  ReadID :0916.Mexus,SCS:1:22:410:248#0/1 1000075  ReadID :0916.MonkeyAdeno,SCS:2:17:811:769#0/1 1000075  ReadID :0916.MonkeyAdeno,SCS:2:21:1132:1067#0/1 1000075  ReadID :0916.MonkeyAdeno,SCS:2:24:1207:492#0/1 1000075  ReadID :0916.MonkeyAdeno,SCS:2:33:1138:547#0/1 1000075  ReadID :0916.Parecho,SCS:3:4:679:1416#0/1|1 1000075  ReadID :HIV.HIV18_Lane7.s_7_sequence.AAA,SCS:7:30:688 … 1000075  ReadID :HIV.HIV18_Lane7.s_7_sequence.AAA,SCS:7:30:688 … 1000075  ReadID :HIV.HIV18_Lane7.s_7_sequence.unbiased,SCS:7:30 …
Monitoring Table Overview
Applications
Novel Virus Discovery ,[object Object],[object Object],[object Object],[object Object],[object Object]
Novel Virus Discovery Algorithm Detail ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pathogen Discovery  in Cancer Samples ,[object Object],[object Object]
Taxonomic Tree Viewer ,[object Object],[object Object],[object Object]
Depletion Array (future) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The End Questions?

More Related Content

Similar to A Genome Sequence Analysis System Built With Hypertable

2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
c.titus.brown
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
David Ruau
 
Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5
Osama Barayan
 
Long vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdfLong vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdf
balrajashok
 

Similar to A Genome Sequence Analysis System Built With Hypertable (20)

Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
poster
posterposter
poster
 
NCBI
NCBINCBI
NCBI
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Biomart Update
Biomart UpdateBiomart Update
Biomart Update
 
MutaDATABASE
MutaDATABASEMutaDATABASE
MutaDATABASE
 
Bioinformatics t2-databases v2014
Bioinformatics t2-databases v2014Bioinformatics t2-databases v2014
Bioinformatics t2-databases v2014
 
Bioinformatica t2-databases
Bioinformatica t2-databasesBioinformatica t2-databases
Bioinformatica t2-databases
 
Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
Long vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdfLong vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdf
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

A Genome Sequence Analysis System Built With Hypertable

  • 1. A Genome Sequence Analysis System Built with Hypertable Doug Judd CEO, Hypertable, Inc.
  • 2.
  • 3.
  • 6. Source: Nature 458, 719-724 (2009)
  • 9.
  • 10.
  • 11.
  • 12. Example Reads File GTGGATAGGGGGAGACTAATGTAGTATGATTATCATCATCAACAGAAGCTATGACACCAGGATAAA CATTTCTTATTGCTGAAAGTATTCTATTGTAGAGATGTACCACAATTTGGTTTCTGGTTTTGTATT GGGAGGATACTAGGGATTACTGAAGCCAACTTTGCAGACTCATACATTTGACTAGACACAGCC ACATTACAGTTTTCTGAGGAAAATTCTTAAGATGTTACCCCAAAACATAGCATTTTAAATTAAAAC GGACCGGCTGAAGCCATGGCAGAAGAACATAAATTGTGAAGATTTCATGGGCATTTATTAGTT GGAAGTGATAAGTGTCCATGAAATCTTCACAATTTATGTTCAGAGATTGCAGTAAAGACAGGTGTA AAGACACAGCAAAGCTAAGAGGACCCAACACACGGTAGGGTCGGGGACCTTGGAGAAACATGG TGGCTTCTTCCTACATGCTTGTGATAGATGACCAAAAAACATTTGTTGAGTTGATGAATAGTACAA AAAAGGGGCGGATAATAAATGAAAAGGGAATGTGCTGTTATTTCCTACTAAGATCAGAAAGAG ATATAAACAAAAGCTGTCATCACTTAGGGACTTCAGCCACATAAAACAATGTCAGGCTAGTCACTT AGAGCTTTGGGACTAGTTGAGTGGCAGCTTAACAAAGCAACGCAATATCCATAGGGATTGGGG ATATTTACATCTAGTGGATTCTACCAGTATGGTGGTCTTATGTGGACTGCACGTGGTTTTCTAGTA AGATAGCAGCTCTTCCCAAATTTATTTATAATTGTGGCATTATTTATAATATCAAAATATTAT GTTGCCAAAGGAGATTAACATTTGAGTCAGTGGGCGGGGTAAGGCCGACCTACCCTTAATCTGGTG GAGAAAGAAGCTGCTAATGGAGTTTAAAAGGTTACTGTCATTAATGAAAAATAAATTTACAGC CAGACATTTATGAACAGAAATGGGAAAAACACACTAGGAAAGCACTGCAAAGACTAATCTGTCTTT AAAGGAGATAGAGTGACTCCAGGCCCCTTAGAAATGACTATACCTGGCAGAGCATGCCAACTG ATGGGCTCGAGTCCTCACAAATATGAATTCCCCCTAAGTCTTGAGAGGTCATTTGTGCATTTGGAA GGAAGAACATTCCATGCTCATGGGTAGGAAGAATCAATATCGTGAAAATGGTCATACTGCCCA GCGGGGTTTTTTTTTGTTTCATATTAACTTTAAAGTAGTTTTTTTCCATTTTGTGAAGAAAGACAT AAAGAACCAAGGCTAATAGTTGTTTGAGTTGTACTTACCATGTTGTTAAATGTCACCTCACAC CGCTGCCAGCCTATCAGAGCCGGGAATTACACCGTGCTTGGAGTTCTGGCACAGATCCACAGCTAC AGTTCTTCATTGTAAGAAATGGATGCTAACATGTAACAAGAAAACATCTGAAGGTTAAACTCA AATAAATGGGTTAATAGTTTGTCTTTCGGTCTTCATACTTTCAATATAAGTGGTTTACTTAGCCGA
  • 13.
  • 14.
  • 15.
  • 17.
  • 18.
  • 19.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.

Editor's Notes

  1. Improvements in the rate of DNA sequencing over the past 30 years and into the future