Next-Generation Informatics

•

5 likes•1,019 views

David Dooling

Talk from the Bioinformatics session of the Advances in Genome Biology and Technology 2009 meeting.

Technology Education

Next-Generation Informatics
David Dooling <ddooling@wustl.edu>
AGBT Bioinformatics
2009-02-05

Framing the problem
,--./01#234#
567#
89-.3:/#;<=>#
8/?@/AB/#
6/.1-AA/C#

!quot;quot;quot;# !quot;quot;$# !quot;quot;!# !quot;quot;%# !quot;quot;&# !quot;quot;'# !quot;quot;(# !quot;quot;)# !quot;quot;*# !quot;quot;+# !quot;$quot;#

ddooling@wustl.edu

Different perspectives

ddooling@wustl.edu

LIMS - Illumina/Solexa

ddooling@wustl.edu

Analysis - cDNA
Solexa cDNA reads
Maq/Tophat
[Transcriptome] OR [Genome + SpliceJunctions (SJs)] OR [Genome]

Maq
Reads Reads
Read SNPs
map to map to
depth Indels
novel SJs or “non-genic”
introns regions

Velvet
GenScan
Gene
Variant Splice Novel
expression
discovery/ isotypes Genes
(to exquisite
ASE
sensitivity)
ddooling@wustl.edu

Changing pipelines - LIMS
Tech-Specific Primary
Prep Submission
Prep /Detection Analysis
PCR (Technology-
Solexa
specific) NCBI SRA
Hybrid
454
Selection Flow-space
NCBI
Medical
cDNAs SOLiD Color-space
Archive
.
Bisulfite Church
Project
.
Polony(?)
Jumping Archives
.
Libraries (e.g., DCC)
Helicos(?)
Sample
Pooling
3730 Phred
NCBI Trace
…
WGS
Courtesy of Toby Bloom
ddooling@wustl.edu

Changing pipelines - Analysis
BLAST
Phrap
BLAT
Arachne
PASH
PCAP
ssaha
Phusion
runMapping

Assemblers
ELAND Euler
Aligners

mapreads
ATLAS
Arachne
Newbler
MAQ
Velvet
exonerate
Forge
SHRiMP
SPLIGN SSAKE
Mosaik
VCAKE
SLIM Search
Euler-USR
SXOligoSearch
SHARCGS
SOAP2
CABOG
NovoCraft
Bowtie
Tophat
ddooling@wustl.edu

Framing the solution

ddooling@wustl.edu

UR
• Object-relational mapping (ORM) layer
– Interact with persistence layer (e.g., relational
database) through objects and methods
– Automatic, dynamic class definitions
– Moose1-like object definition syntax
• Object context
– In-memory transactions (even across databases)
– Caching/deferred loading
• Dynamic command-line interface
• Integrated documentation system

1 - http://www.iinteractive.com/moose/
ddooling@wustl.edu

… but with a wrinkle
• Lab personnel accept
the software you give
them
• Analysts are more
than happy to develop
their own
• We need to make it
easy for analysts to
build tools within the
system

ddooling@wustl.edu

Pairing

Analyst

Programmer

ddooling@wustl.edu

Variant Detection Pipeline

ddooling@wustl.edu

Assembly and Annotation Pipeline

ddooling@wustl.edu

Challenges
• There is still much more work to do
• Sequencing is demolishing Moore’s law
• The cult of traces
• The richness of data
• Visualization

ddooling@wustl.edu

Thanks
Web Site
http://genome.wustl.edu/
Blog
http://www.politigenomics.com/

LIMS Paper
http://www.biomedcentral.com/1471-2105/8/362
UR Presentation
http://www.media-landscape.com/yapc/2006-06-27.ScottSmith/

ddooling@wustl.edu

Viewers also liked

Foundations for Discovery InformaticsPhilip Bourne

Simagis for healthcarekhvatkov

pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...butest

Literature mining and large-scale data integrationLars Juhl Jensen

Xu Xing: EasyGenomics – Next Generation Bioinformatics on the CloudGigaScience, BGI Hong Kong

Data visualization for developmentSara-Jayne Terp

Why Human Brain Cannot Score Her2 Cancer Biomarkerkhvatkov

START LAB - Introduction of the MOBILE APP Edition by Olivier VerdinSolvay Entrepreneurs

Exposome & Expotype - Exploring new challenges for Health Informatics Researc...Fernando Martin-Sanchez

Epic2014 balancingHelen Barrett

Eigenvalues of Symmetrix Hierarchical MatricesThomas Mach

Computational Explanation in Biologically Inspired Cognitive Architectures/Sy...Antonio Lieto

Health InformaticsCockrell School

N. Jimenez_Informática para la salud: la genómica computacional y la medicina...COIICV

Prof. Mohamed Labib Salem's studentsProf. Mohamed Labib Salem

Caroline romedenne mapingfinalfinalVictoria Vesna

Historys Greateste-Lightenment

Project Unity: The Way of the Future for Plant BreedingPhenome Networks

Viewers also liked (18)

Foundations for Discovery Informatics

Simagis for healthcare

pptx - Preventing Sepsis: Artificial Intelligence, Knowledge ...

Literature mining and large-scale data integration

Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud

Data visualization for development

Why Human Brain Cannot Score Her2 Cancer Biomarker

START LAB - Introduction of the MOBILE APP Edition by Olivier Verdin

Exposome & Expotype - Exploring new challenges for Health Informatics Researc...

Epic2014 balancing

Eigenvalues of Symmetrix Hierarchical Matrices

Computational Explanation in Biologically Inspired Cognitive Architectures/Sy...

Health Informatics

N. Jimenez_Informática para la salud: la genómica computacional y la medicina...

Prof. Mohamed Labib Salem's students

Caroline romedenne mapingfinalfinal

Historys Greatest

Project Unity: The Way of the Future for Plant Breeding

Similar to Next-Generation Informatics

JSUG - TU Wien Cocoon Project by Andreas PieberChristoph Pickl

Keep the Complexity. Simplify with SKOSJames R. Morris

Lumpy agbt-presarq5x

Inter Lab Quigg 2Tom Loughran

Scaling Genomic Analysesfnothaft

CompatibleOne FISL Conference 2011 BrazilCompatibleOne

Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopPiotr Turek

Data-intensive profile for the VAMDCAstroAtom

Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc

Apache iBatis (ApacheCon US 2007)Carsten Ziegeler

Software tools for calculating materials properties in high-throughput (pymat...Anubhav Jain

March 2009 The Geomodeling Network NewsletterMitch Sutherland

STI Summit 2011 - Mlr-smSemantic Technology Institute International

Caporaso sloan qiime_workshop_slides_18_oct2012gregcaporaso

Sgg crest-presentation-finalmarpierc

2013 pag-equine-workshopc.titus.brown

Use of CharDM in an archive of velocity cubesJose Enrique Ruiz

The NERD projectGiuseppe Rizzo

CompatibleOne OpenStack Summit April11CompatibleOne

Gray 110916 ns-fwkshpJesse Lingeman

Similar to Next-Generation Informatics (20)

JSUG - TU Wien Cocoon Project by Andreas Pieber

Keep the Complexity. Simplify with SKOS

Lumpy agbt-pres

Inter Lab Quigg 2

Scaling Genomic Analyses

CompatibleOne FISL Conference 2011 Brazil

Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop

Data-intensive profile for the VAMDC

Knowing Your NGS Upstream: Alignment and Variants

Apache iBatis (ApacheCon US 2007)

Software tools for calculating materials properties in high-throughput (pymat...

March 2009 The Geomodeling Network Newsletter

STI Summit 2011 - Mlr-sm

Caporaso sloan qiime_workshop_slides_18_oct2012

Sgg crest-presentation-final

2013 pag-equine-workshop

Use of CharDM in an archive of velocity cubes

The NERD project

CompatibleOne OpenStack Summit April11

Gray 110916 ns-fwkshp

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Search Engine Optimization SEO PDF for 2024.pdfRankYa

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

"ML in Production",Oleksandr BaganFwdays

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club

Vertex AI Gemini Prompt Engineering Tips

WordPress Websites for Engineers: Elevate Your Brand

Anypoint Exchange: It’s Not Just a Repo!

Search Engine Optimization SEO PDF for 2024.pdf

Designing IA for AI - Information Architecture Conference 2024

Take control of your SAP testing with UiPath Test Suite

Dev Dives: Streamline document processing with UiPath Studio Web

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Unraveling Multimodality with Large Language Models.pdf

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

DevEX - reference for building teams, processes, and platforms

"Debugging python applications inside k8s environment", Andrii Soldatenko

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost

Nell’iperspazio con Rocket: il Framework Web di Rust!

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

"ML in Production",Oleksandr Bagan

Ensuring Technical Readiness For Copilot in Microsoft 365

Next-Generation Informatics

1. Next-Generation Informatics David Dooling <ddooling@wustl.edu> AGBT Bioinformatics 2009-02-05

2. Framing the problem ddooling@wustl.edu

3. Framing the problem ,--./01#234# 567# 89-.3:/#;<=># 8/?@/AB/# 6/.1-AA/C# !quot;quot;quot;# !quot;quot;$# !quot;quot;!# !quot;quot;%# !quot;quot;&# !quot;quot;'# !quot;quot;(# !quot;quot;)# !quot;quot;*# !quot;quot;+# !quot;$quot;# ddooling@wustl.edu

4. Different perspectives ddooling@wustl.edu

5. LIMS ddooling@wustl.edu

6. LIMS - Illumina/Solexa ddooling@wustl.edu

7. LIMS - Roche/454 ddooling@wustl.edu

8. Analysis ddooling@wustl.edu

9. Analysis - cDNA Solexa cDNA reads Maq/Tophat [Transcriptome] OR [Genome + SpliceJunctions (SJs)] OR [Genome] Maq Reads Reads Read SNPs map to map to depth Indels novel SJs or “non-genic” introns regions Velvet GenScan Gene Variant Splice Novel expression discovery/ isotypes Genes (to exquisite ASE sensitivity) ddooling@wustl.edu

10. Project Lead ddooling@wustl.edu

11. Changing pipelines ddooling@wustl.edu

12. Changing pipelines - LIMS Tech-Specific Primary Prep Submission Prep /Detection Analysis PCR (Technology- Solexa specific) NCBI SRA Hybrid 454 Selection Flow-space NCBI Medical cDNAs SOLiD Color-space Archive . Bisulfite Church Project . Polony(?) Jumping Archives . Libraries (e.g., DCC) Helicos(?) Sample Pooling 3730 Phred NCBI Trace … WGS Courtesy of Toby Bloom ddooling@wustl.edu

13. Changing pipelines - Analysis BLAST Phrap BLAT Arachne PASH PCAP ssaha Phusion runMapping Assemblers ELAND Euler Aligners mapreads ATLAS Arachne Newbler MAQ Velvet exonerate Forge SHRiMP SPLIGN SSAKE Mosaik VCAKE SLIM Search Euler-USR SXOligoSearch SHARCGS SOAP2 CABOG NovoCraft Bowtie Tophat ddooling@wustl.edu

14. Framing the solution ddooling@wustl.edu

15. Past is prologue ddooling@wustl.edu

16. Convert this… ddooling@wustl.edu

17. … into this ddooling@wustl.edu

18. Convert this… ddooling@wustl.edu

19. … into this ddooling@wustl.edu

20. UR • Object-relational mapping (ORM) layer – Interact with persistence layer (e.g., relational database) through objects and methods – Automatic, dynamic class definitions – Moose1-like object definition syntax • Object context – In-memory transactions (even across databases) – Caching/deferred loading • Dynamic command-line interface • Integrated documentation system 1 - http://www.iinteractive.com/moose/ ddooling@wustl.edu

21. Genome Workflow ddooling@wustl.edu

22. Genome Model ddooling@wustl.edu

23. Past is prologue… ddooling@wustl.edu

24. … but with a wrinkle • Lab personnel accept the software you give them • Analysts are more than happy to develop their own • We need to make it easy for analysts to build tools within the system ddooling@wustl.edu

25. Easy Perl API ddooling@wustl.edu

26. Pairing Analyst Programmer ddooling@wustl.edu

27. Variant Detection Pipeline ddooling@wustl.edu

28. cDNA Analysis ddooling@wustl.edu

29. 16S Pipeline ddooling@wustl.edu

30. Assembly and Annotation Pipeline ddooling@wustl.edu

31. Challenges • There is still much more work to do • Sequencing is demolishing Moore’s law • The cult of traces • The richness of data • Visualization ddooling@wustl.edu

32. CIRCOS ddooling@wustl.edu

33. Thanks Web Site http://genome.wustl.edu/ Blog http://www.politigenomics.com/ LIMS Paper http://www.biomedcentral.com/1471-2105/8/362 UR Presentation http://www.media-landscape.com/yapc/2006-06-27.ScottSmith/ ddooling@wustl.edu

Editor's Notes

There is too much data 4 genomes to more than an order of magnitude increase Move from processing regions to single genomes to multi-genome comparisons This is a story about how we are trying to deal with this problem
This creates tension
Sample in -> answer out Don’t care how the sausage was made.
Never the same pipe twice (TJ Max)
And expanding beyond the laboratory
Different aligners, genotypers
How do we even begin to tackle this problem? How do we resolve the tension between changing pipelines and production systems?
Metadata Store DNA types, equipment, reagents, even process steps as rows rather than tables So maq is not maq, it is an aligner Standards like SAM help
Solexa/Maq specific commands
Generic medical resequencing pipeline
Never write SQL
XML and flow chart
Click on any box to see processing details including file system location
Screenshot of script vs. module
photograph
What I have talked about here is automation There is still much work to do in data reduction
How do you compare more than three genomes? How do you track all the analysis? So that’s one problem

Next-Generation Informatics

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (18)

Similar to Next-Generation Informatics

Similar to Next-Generation Informatics (20)

Recently uploaded

Recently uploaded (20)

Next-Generation Informatics

Editor's Notes