SlideShare a Scribd company logo
1 of 13
Download to read offline
HIG Project Overview

           August 31, 2012




    Matthieu-P. Schapranow
    Hasso Plattner Institute
Chair of Prof. Hasso Plattner
Vision: Real-time Analysis of Genomic
    Data to Improve Medical Treatment
2




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Build up the Whole Picture out of Layers

3     ■  Data:
           □  Combine research findings from int’l scientific databases in
              single system at HPI
      ■  Platform:
           □  Expose information as a service to be consumed by special
              purpose applications
      ■  Applications:
           □  Support genome alignment pipeline processing by
           □  Massively parallel execute:
                □ Alignment algorithms, e.g. BWA, BT2, etc.
                □ Variant calling
           □  Analyze individual patient results (real-time annotations with
              combined data)
           □  Analyze patient cohorts using individual filters
    HIG Project Overview, M. Schapranow, Aug 31, 2012
How the Vision Becomes Real
4


      ■  Platform:
           □  Worker Framework: Enables parallel execution of tasks
              (alignment, variant calling) across node limits
           □  Updating Framework: Retrieves periodic database updated of
              international databases and automatically integrates them into
              local store
      ■  Applications:
           □  Alignment Coordinator: Submit alignment tasks and retrieve
              mutation lists, e.g. CSV
           □  Genome Browser: Interactive browsing in reference and
              specific patient genomes



    HIG Project Overview, M. Schapranow, Aug 31, 2012
Alignment Coordinator
5


      ■  Available Alignment Algorithms (and growing)
           □  Bowtie2
           □  Bowtie
           □  BWA
           □  TMAP
           □  SNAP
           □  MAQ
           □  SOAP




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Numbers you should know
    Alignment Execution Time
6


      ■  One cell line ~600k reads / 110MB
      ■  Pipeline: Alignment and variant calling

             Property               Traditional             HPI
           Full Genome                    No                Yes
                Cores               2 * 6 cores         25 * 40 cores
           Main Memory                  48 GB              25 TB
              Runtime                   ~720                ~40s




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Numbers you should know
    History of the Human Genome Project
7


      ■  1984: Idea of a global Human Genome
         (HG) project discussed at Alta Summit:
         “DNA available on the Internet”
      ■  1990: HG project for 15 years started in
         the US (3 billion USD funding)
      ■  2000: Rough draft of the HG announced
      ■  2003: Complete genome sequenced
      ■  2006: Last and longest chr1 sequenced


      ■  … what’s next?




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Numbers you should know
    Human Genome
8


              Entity                Cardinality
      Different Bases                 4 (A,C,G,T)
      Base Pairs                        3.137 Bbp
      Chromosomes                                  23
      Distinct Genes                       20k-25k
      Amino Acids                                  21
      (coded as triplets)
      Proteins                           50k-300k




      Taken from http://de.wikipedia.org/wiki/Code-Sonne

    HIG Project Overview, M. Schapranow, Aug 31, 2012
9
                                                                                Costs in USD




                                                               0,01
                                                                      0,1
                                                                            1
                                                                                    10
                                                                                               100
                                                                                                     1000
                                                                                                            10000
                                                    01.01.01
                                                    01.05.01
                                                    01.09.01
                                                    01.01.02
                                                    01.05.02
                                                    01.09.02
                                                    01.01.03
                                                    01.05.03
                                                    01.09.03
                                                    01.01.04
                                                    01.05.04
                                                                                                                                                                                                              Comparison of Costs




                                                    01.09.04
                                                    01.01.05
                                                                                                                    Costs per Megabyte RAM




                                                    01.05.05
                                                    01.09.05
                                                                                                                                                                                                              Numbers you should know




HIG Project Overview, M. Schapranow, Aug 31, 2012
                                                    01.01.06
                                                    01.05.06
                                                    01.09.06
                                                    01.01.07
                                                    01.05.07
                                                    01.09.07
                                                    01.01.08
                                                    01.05.08
                                                    01.09.08
                                                    01.01.09
                                                                                                                    Costs per Megabase Sequencing




                                                    01.05.09
                                                    01.09.09
                                                    01.01.10
                                                                                                                                                    Comparison of Costs for Main Memory and Genome Analysis




                                                    01.05.10
                                                    01.09.10
                                                    01.01.11
                                                    01.05.11
                                                    01.09.11
                                                    01.01.12
Hardware Characteristics
10


       ■  1,000 core cluster,
          25 TB main memory
       ■  Consists of 25 identical nodes:
            □  80 cores
            □  1 TB main memory
            □  Intel® Xeon® E7- 4870
            □  2.40GHz
            □  30 MB Cache




     HIG Project Overview, M. Schapranow, Aug 31, 2012
Customer Process as of Today
11


       ■  Tissue sequencing in context of cancer treatment
       ■  Complex, time-consuming, media breaks, manual steps




     HIG Project Overview, M. Schapranow, Aug 31, 2012
Project Objectives
12


       ■  Alignment of DNA reads (FASTQ) against reference genome
          (FASTA) è mapped reads
       ■  Real-time analysis of mapped reads
            □  Detection of mutations (SNP, INDELs)
            □  Comparison of multiple tissues
            □  Detection of similar clusters to identify co-relations
       ■  Analysis of mutations
            □  Identify mutations with scientific references (existing
               knowledge)
            □  Detection of similar clusters to identify co-relations
            □  Identify genes and regulators for certain phenotypic
               characteristics, e.g. “fast running horses”
     HIG Project Overview, M. Schapranow, Aug 31, 2012
Thank you for your interest!
     Keep in contact with us.
13




                                                                 Matthieu-P. Schapranow, M.Sc.
                                                               schapranow@hpi.uni-potsdam.de
                                                                        http://j.mp/schapranow




                                                                     Hasso Plattner Institute
                                                 Enterprise Platform & Integration Concepts
                                                                     Matthieu-P. Schapranow
                                                                       August-Bebel-Str. 88
                                                                   14482 Potsdam, Germany

     HIG Project Overview, M. Schapranow, Aug 31, 2012

More Related Content

More from Matthieu Schapranow

Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Matthieu Schapranow
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...Matthieu Schapranow
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineMatthieu Schapranow
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureMatthieu Schapranow
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Matthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineMatthieu Schapranow
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineMatthieu Schapranow
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineMatthieu Schapranow
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchMatthieu Schapranow
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Matthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineMatthieu Schapranow
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...Matthieu Schapranow
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesMatthieu Schapranow
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Matthieu Schapranow
 
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Matthieu Schapranow
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Matthieu Schapranow
 

More from Matthieu Schapranow (20)

Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision Medicine
 
"When time matters..."
"When time matters...""When time matters..."
"When time matters..."
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems Medicine
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision Medicine
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?
 
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

High-Performance In-Memory Genome (HIG) Project

  • 1. HIG Project Overview August 31, 2012 Matthieu-P. Schapranow Hasso Plattner Institute Chair of Prof. Hasso Plattner
  • 2. Vision: Real-time Analysis of Genomic Data to Improve Medical Treatment 2 HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 3. Build up the Whole Picture out of Layers 3 ■  Data: □  Combine research findings from int’l scientific databases in single system at HPI ■  Platform: □  Expose information as a service to be consumed by special purpose applications ■  Applications: □  Support genome alignment pipeline processing by □  Massively parallel execute: □ Alignment algorithms, e.g. BWA, BT2, etc. □ Variant calling □  Analyze individual patient results (real-time annotations with combined data) □  Analyze patient cohorts using individual filters HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 4. How the Vision Becomes Real 4 ■  Platform: □  Worker Framework: Enables parallel execution of tasks (alignment, variant calling) across node limits □  Updating Framework: Retrieves periodic database updated of international databases and automatically integrates them into local store ■  Applications: □  Alignment Coordinator: Submit alignment tasks and retrieve mutation lists, e.g. CSV □  Genome Browser: Interactive browsing in reference and specific patient genomes HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 5. Alignment Coordinator 5 ■  Available Alignment Algorithms (and growing) □  Bowtie2 □  Bowtie □  BWA □  TMAP □  SNAP □  MAQ □  SOAP HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 6. Numbers you should know Alignment Execution Time 6 ■  One cell line ~600k reads / 110MB ■  Pipeline: Alignment and variant calling Property Traditional HPI Full Genome No Yes Cores 2 * 6 cores 25 * 40 cores Main Memory 48 GB 25 TB Runtime ~720 ~40s HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 7. Numbers you should know History of the Human Genome Project 7 ■  1984: Idea of a global Human Genome (HG) project discussed at Alta Summit: “DNA available on the Internet” ■  1990: HG project for 15 years started in the US (3 billion USD funding) ■  2000: Rough draft of the HG announced ■  2003: Complete genome sequenced ■  2006: Last and longest chr1 sequenced ■  … what’s next? HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 8. Numbers you should know Human Genome 8 Entity Cardinality Different Bases 4 (A,C,G,T) Base Pairs 3.137 Bbp Chromosomes 23 Distinct Genes 20k-25k Amino Acids 21 (coded as triplets) Proteins 50k-300k Taken from http://de.wikipedia.org/wiki/Code-Sonne HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 9. 9 Costs in USD 0,01 0,1 1 10 100 1000 10000 01.01.01 01.05.01 01.09.01 01.01.02 01.05.02 01.09.02 01.01.03 01.05.03 01.09.03 01.01.04 01.05.04 Comparison of Costs 01.09.04 01.01.05 Costs per Megabyte RAM 01.05.05 01.09.05 Numbers you should know HIG Project Overview, M. Schapranow, Aug 31, 2012 01.01.06 01.05.06 01.09.06 01.01.07 01.05.07 01.09.07 01.01.08 01.05.08 01.09.08 01.01.09 Costs per Megabase Sequencing 01.05.09 01.09.09 01.01.10 Comparison of Costs for Main Memory and Genome Analysis 01.05.10 01.09.10 01.01.11 01.05.11 01.09.11 01.01.12
  • 10. Hardware Characteristics 10 ■  1,000 core cluster, 25 TB main memory ■  Consists of 25 identical nodes: □  80 cores □  1 TB main memory □  Intel® Xeon® E7- 4870 □  2.40GHz □  30 MB Cache HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 11. Customer Process as of Today 11 ■  Tissue sequencing in context of cancer treatment ■  Complex, time-consuming, media breaks, manual steps HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 12. Project Objectives 12 ■  Alignment of DNA reads (FASTQ) against reference genome (FASTA) è mapped reads ■  Real-time analysis of mapped reads □  Detection of mutations (SNP, INDELs) □  Comparison of multiple tissues □  Detection of similar clusters to identify co-relations ■  Analysis of mutations □  Identify mutations with scientific references (existing knowledge) □  Detection of similar clusters to identify co-relations □  Identify genes and regulators for certain phenotypic characteristics, e.g. “fast running horses” HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 13. Thank you for your interest! Keep in contact with us. 13 Matthieu-P. Schapranow, M.Sc. schapranow@hpi.uni-potsdam.de http://j.mp/schapranow Hasso Plattner Institute Enterprise Platform & Integration Concepts Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany HIG Project Overview, M. Schapranow, Aug 31, 2012