SlideShare a Scribd company logo
Bachelor Project:
Real-time Analysis of Genome Data
                       July 12, 2012




               Matthieu-P. Schapranow
               Hasso Plattner Institute
           Chair of Prof. Hasso Plattner
Numbers you should know
    The Human Genome Project
2


      ■  1984: Human Genome (HG) project idea
         discussed at Alta Summit as “DNA
         available on the Internet”
      ■  1990: HG project for 15 years started in
         the US (3 billion USD funding)
      ■  2000: Rough draft of the HG announced
      ■  2003: Complete genome sequenced
      ■  2006: Last and longest chr1 sequenced
      ■  As of today, we know:
        □  HG consists of 3.2 Bbp (~3.2 GB),
        □  23 chromosomes,
        □  20k-25k distinct genes

    Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
3
                                                                                              Costs in USD




                                                                             0,01
                                                                                    0,1
                                                                                          1
                                                                                                  10
                                                                                                             100
                                                                                                                   1000
                                                                                                                          10000
                                                                  01.01.01
                                                                  01.05.01
                                                                  01.09.01
                                                                  01.01.02
                                                                  01.05.02
                                                                  01.09.02
                                                                  01.01.03
                                                                  01.05.03
                                                                  01.09.03
                                                                  01.01.04
                                                                  01.05.04
                                                                                                                                                                                                                            Comparison of Costs




                                                                  01.09.04
                                                                  01.01.05
                                                                                                                                  Costs per Megabyte RAM




                                                                  01.05.05
                                                                  01.09.05
                                                                                                                                                                                                                            Numbers you should know




                                                                  01.01.06
                                                                  01.05.06
                                                                  01.09.06
                                                                  01.01.07
                                                                  01.05.07



Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
                                                                  01.09.07
                                                                  01.01.08
                                                                  01.05.08
                                                                  01.09.08
                                                                  01.01.09
                                                                                                                                  Costs per Megabase Sequencing




                                                                  01.05.09
                                                                  01.09.09
                                                                  01.01.10
                                                                                                                                                                  Comparison of Costs for Main Memory and Genome Analysis




                                                                  01.05.10
                                                                  01.09.10
                                                                  01.01.11
                                                                  01.05.11
                                                                  01.09.11
                                                                  01.01.12
Numbers you should know
    Hardware Characteristics
4


      ■  1,000 core cluster,
         25 TB main memory
      ■  Consists of 25 identical nodes:
            □  80 cores
            □  1 TB main memory
            □  Intel® Xeon® E7- 4870
            □  2.40GHz
            □  30 MB Cache




    Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
Aims of the Bachelor’s Project
5


      ■  Gather interdisciplinary knowledge to work in
         teams with biological and medical experts
      ■  Explore data from gene, protein, drug, and
         pathway databases to gain new insights
      ■  Implement algorithms optimized for in-memory
         technology, e.g. cluster algorithms for quantifying
         similarity of samples or detection of single
         nucleotide polymorphisms
      ■  Proof applicability of in-memory technology for
         real-time analysis of genome data
      ■  Areas of interest: life sciences, crop sciences,
         biology, crime investigation, etc.


    Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
Your profile
6


      ■  What we expect
            □  Flexibility in working interdisciplinary
            □  At least one passed database lecture
            □  Knowledge in using either or all: Python, C++, Bash, SQL




      ■  We provide you with
            □  Introduction to in-memory technology and genomics basics
            □  Technology introduction in either or all: SQL, SQLScript, L, R,
               BFL



    Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
Do not hesitate to contact us!
7




                                                                    Matthieu-P. Schapranow, M.Sc.
                                                                  schapranow@hpi.uni-potsdam.de
                                                                           http://j.mp/schapranow




                                                                    Hasso Plattner Institute
                                                Enterprise Platform & Integration Concepts
                                                                    Matthieu-P. Schapranow
                                                                      August-Bebel-Str. 88
                                                                  14482 Potsdam, Germany

    Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012

More Related Content

What's hot

It Sector Presenter
It Sector PresenterIt Sector Presenter
It Sector Presenter
dawnhotchen
 
Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09
Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09
Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09
Bob Lowery
 
Short term intensive rotational grazing in native pasture: effects on soil ni...
Short term intensive rotational grazing in native pasture: effects on soil ni...Short term intensive rotational grazing in native pasture: effects on soil ni...
Short term intensive rotational grazing in native pasture: effects on soil ni...
Joanna Hicks
 
Fatal Crash Graphs July 2010
Fatal Crash Graphs July 2010Fatal Crash Graphs July 2010
Fatal Crash Graphs July 2010
State of Utah, Salt Lake City
 
១២ រាជធានីភ្នំពេញ
១២ រាជធានីភ្នំពេញ១២ រាជធានីភ្នំពេញ
១២ រាជធានីភ្នំពេញsam seyla hun
 
Dr. Julie Menard - What Would Dr. Leman Do... for PRRS
Dr. Julie Menard - What Would Dr. Leman Do... for PRRSDr. Julie Menard - What Would Dr. Leman Do... for PRRS
Dr. Julie Menard - What Would Dr. Leman Do... for PRRS
John Blue
 
១៥ ពោធិសាត់
១៥ ពោធិសាត់១៥ ពោធិសាត់
១៥ ពោធិសាត់sam seyla hun
 
Presentation SCA interim report Q2 2011
Presentation SCA interim report Q2 2011Presentation SCA interim report Q2 2011
Presentation SCA interim report Q2 2011
SCA Svenska Cellulosa Aktiebolaget
 
1-800-PetMeds September 2011 Business Plan
1-800-PetMeds September 2011 Business Plan1-800-PetMeds September 2011 Business Plan
1-800-PetMeds September 2011 Business PlanPet Meds
 
6 city report
6 city report6 city report
០២ ខេត្តបាត់ដំបង
០២ ខេត្តបាត់ដំបង០២ ខេត្តបាត់ដំបង
០២ ខេត្តបាត់ដំបងsam seyla hun
 
MoScript a textual DSL for model manipulations
MoScript a textual DSL for model manipulationsMoScript a textual DSL for model manipulations
MoScript a textual DSL for model manipulationsWolfgang Kling
 
០៤ ខេត្តកំពង់ឆ្នាំង
០៤ ខេត្តកំពង់ឆ្នាំង០៤ ខេត្តកំពង់ឆ្នាំង
០៤ ខេត្តកំពង់ឆ្នាំងsam seyla hun
 
July Tatum Survey - Recovery Stalls
July Tatum Survey - Recovery StallsJuly Tatum Survey - Recovery Stalls
July Tatum Survey - Recovery Stalls
fnapoli
 
Multiple Species Grazing in Oklahoma
Multiple Species Grazing in OklahomaMultiple Species Grazing in Oklahoma
Multiple Species Grazing in Oklahoma
Kerr Center for Sustainable Agriculture
 
Aug 2009 Tatum Survey
Aug 2009 Tatum SurveyAug 2009 Tatum Survey
Aug 2009 Tatum Survey
fnapoli
 
០៦ ខេត្តកំពង់ធំ
០៦ ខេត្តកំពង់ធំ០៦ ខេត្តកំពង់ធំ
០៦ ខេត្តកំពង់ធំsam seyla hun
 

What's hot (20)

It Sector Presenter
It Sector PresenterIt Sector Presenter
It Sector Presenter
 
Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09
Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09
Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09
 
Short term intensive rotational grazing in native pasture: effects on soil ni...
Short term intensive rotational grazing in native pasture: effects on soil ni...Short term intensive rotational grazing in native pasture: effects on soil ni...
Short term intensive rotational grazing in native pasture: effects on soil ni...
 
Fatal Crash Graphs July 2010
Fatal Crash Graphs July 2010Fatal Crash Graphs July 2010
Fatal Crash Graphs July 2010
 
១២ រាជធានីភ្នំពេញ
១២ រាជធានីភ្នំពេញ១២ រាជធានីភ្នំពេញ
១២ រាជធានីភ្នំពេញ
 
Dr. Julie Menard - What Would Dr. Leman Do... for PRRS
Dr. Julie Menard - What Would Dr. Leman Do... for PRRSDr. Julie Menard - What Would Dr. Leman Do... for PRRS
Dr. Julie Menard - What Would Dr. Leman Do... for PRRS
 
Antony Allen
Antony AllenAntony Allen
Antony Allen
 
១៥ ពោធិសាត់
១៥ ពោធិសាត់១៥ ពោធិសាត់
១៥ ពោធិសាត់
 
Presentation SCA interim report Q2 2011
Presentation SCA interim report Q2 2011Presentation SCA interim report Q2 2011
Presentation SCA interim report Q2 2011
 
1-800-PetMeds September 2011 Business Plan
1-800-PetMeds September 2011 Business Plan1-800-PetMeds September 2011 Business Plan
1-800-PetMeds September 2011 Business Plan
 
6 city report
6 city report6 city report
6 city report
 
០២ ខេត្តបាត់ដំបង
០២ ខេត្តបាត់ដំបង០២ ខេត្តបាត់ដំបង
០២ ខេត្តបាត់ដំបង
 
BLA Capabilities
BLA CapabilitiesBLA Capabilities
BLA Capabilities
 
Kuya rafael
Kuya rafaelKuya rafael
Kuya rafael
 
MoScript a textual DSL for model manipulations
MoScript a textual DSL for model manipulationsMoScript a textual DSL for model manipulations
MoScript a textual DSL for model manipulations
 
០៤ ខេត្តកំពង់ឆ្នាំង
០៤ ខេត្តកំពង់ឆ្នាំង០៤ ខេត្តកំពង់ឆ្នាំង
០៤ ខេត្តកំពង់ឆ្នាំង
 
July Tatum Survey - Recovery Stalls
July Tatum Survey - Recovery StallsJuly Tatum Survey - Recovery Stalls
July Tatum Survey - Recovery Stalls
 
Multiple Species Grazing in Oklahoma
Multiple Species Grazing in OklahomaMultiple Species Grazing in Oklahoma
Multiple Species Grazing in Oklahoma
 
Aug 2009 Tatum Survey
Aug 2009 Tatum SurveyAug 2009 Tatum Survey
Aug 2009 Tatum Survey
 
០៦ ខេត្តកំពង់ធំ
០៦ ខេត្តកំពង់ធំ០៦ ខេត្តកំពង់ធំ
០៦ ខេត្តកំពង់ធំ
 

Viewers also liked

Virtual Open House Presentation
Virtual Open House PresentationVirtual Open House Presentation
Virtual Open House Presentation
Lebanon Valley College
 
Case Study: Streamline ATP Checks with HPI Smart Enterprise Widgets
Case Study: Streamline ATP Checks with HPI Smart Enterprise WidgetsCase Study: Streamline ATP Checks with HPI Smart Enterprise Widgets
Case Study: Streamline ATP Checks with HPI Smart Enterprise WidgetsMatthieu Schapranow
 
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Matthieu Schapranow
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Matthieu Schapranow
 
Introduction to SMI
Introduction to SMIIntroduction to SMI
Introduction to SMI
Susan Bradley
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Matthieu Schapranow
 
Usd for balt artek day first
Usd for balt artek day firstUsd for balt artek day first
Usd for balt artek day first
Service Design Agency
 
License-based Access Control in EPCglobal Networks
License-based Access Control in EPCglobal NetworksLicense-based Access Control in EPCglobal Networks
License-based Access Control in EPCglobal NetworksMatthieu Schapranow
 
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013Matthieu Schapranow
 
In-memory Applications for Oncology
In-memory Applications for OncologyIn-memory Applications for Oncology
In-memory Applications for OncologyMatthieu Schapranow
 
BioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialBioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or Potential
Matthieu Schapranow
 
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply Chains
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply ChainsCoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply Chains
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply ChainsMatthieu Schapranow
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?
Matthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
Matthieu Schapranow
 

Viewers also liked (14)

Virtual Open House Presentation
Virtual Open House PresentationVirtual Open House Presentation
Virtual Open House Presentation
 
Case Study: Streamline ATP Checks with HPI Smart Enterprise Widgets
Case Study: Streamline ATP Checks with HPI Smart Enterprise WidgetsCase Study: Streamline ATP Checks with HPI Smart Enterprise Widgets
Case Study: Streamline ATP Checks with HPI Smart Enterprise Widgets
 
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI
 
Introduction to SMI
Introduction to SMIIntroduction to SMI
Introduction to SMI
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
 
Usd for balt artek day first
Usd for balt artek day firstUsd for balt artek day first
Usd for balt artek day first
 
License-based Access Control in EPCglobal Networks
License-based Access Control in EPCglobal NetworksLicense-based Access Control in EPCglobal Networks
License-based Access Control in EPCglobal Networks
 
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013
 
In-memory Applications for Oncology
In-memory Applications for OncologyIn-memory Applications for Oncology
In-memory Applications for Oncology
 
BioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialBioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or Potential
 
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply Chains
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply ChainsCoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply Chains
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply Chains
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 

More from Matthieu Schapranow

Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticePatient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Matthieu Schapranow
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?
Matthieu Schapranow
 
AI in Oncology
AI in OncologyAI in Oncology
AI in Oncology
Matthieu Schapranow
 
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthAnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
Matthieu Schapranow
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Matthieu Schapranow
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
Matthieu Schapranow
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision Medicine
Matthieu Schapranow
 
"When time matters..."
"When time matters...""When time matters..."
"When time matters..."
Matthieu Schapranow
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
Matthieu Schapranow
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Matthieu Schapranow
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems Medicine
Matthieu Schapranow
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision Medicine
Matthieu Schapranow
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Matthieu Schapranow
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Matthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
Matthieu Schapranow
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
Matthieu Schapranow
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Matthieu Schapranow
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Matthieu Schapranow
 

More from Matthieu Schapranow (20)

Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticePatient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?
 
AI in Oncology
AI in OncologyAI in Oncology
AI in Oncology
 
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthAnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision Medicine
 
"When time matters..."
"When time matters...""When time matters..."
"When time matters..."
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems Medicine
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision Medicine
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Real-time Analysis of Genome Data

  • 1. Bachelor Project: Real-time Analysis of Genome Data July 12, 2012 Matthieu-P. Schapranow Hasso Plattner Institute Chair of Prof. Hasso Plattner
  • 2. Numbers you should know The Human Genome Project 2 ■  1984: Human Genome (HG) project idea discussed at Alta Summit as “DNA available on the Internet” ■  1990: HG project for 15 years started in the US (3 billion USD funding) ■  2000: Rough draft of the HG announced ■  2003: Complete genome sequenced ■  2006: Last and longest chr1 sequenced ■  As of today, we know: □  HG consists of 3.2 Bbp (~3.2 GB), □  23 chromosomes, □  20k-25k distinct genes Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  • 3. 3 Costs in USD 0,01 0,1 1 10 100 1000 10000 01.01.01 01.05.01 01.09.01 01.01.02 01.05.02 01.09.02 01.01.03 01.05.03 01.09.03 01.01.04 01.05.04 Comparison of Costs 01.09.04 01.01.05 Costs per Megabyte RAM 01.05.05 01.09.05 Numbers you should know 01.01.06 01.05.06 01.09.06 01.01.07 01.05.07 Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012 01.09.07 01.01.08 01.05.08 01.09.08 01.01.09 Costs per Megabase Sequencing 01.05.09 01.09.09 01.01.10 Comparison of Costs for Main Memory and Genome Analysis 01.05.10 01.09.10 01.01.11 01.05.11 01.09.11 01.01.12
  • 4. Numbers you should know Hardware Characteristics 4 ■  1,000 core cluster, 25 TB main memory ■  Consists of 25 identical nodes: □  80 cores □  1 TB main memory □  Intel® Xeon® E7- 4870 □  2.40GHz □  30 MB Cache Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  • 5. Aims of the Bachelor’s Project 5 ■  Gather interdisciplinary knowledge to work in teams with biological and medical experts ■  Explore data from gene, protein, drug, and pathway databases to gain new insights ■  Implement algorithms optimized for in-memory technology, e.g. cluster algorithms for quantifying similarity of samples or detection of single nucleotide polymorphisms ■  Proof applicability of in-memory technology for real-time analysis of genome data ■  Areas of interest: life sciences, crop sciences, biology, crime investigation, etc. Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  • 6. Your profile 6 ■  What we expect □  Flexibility in working interdisciplinary □  At least one passed database lecture □  Knowledge in using either or all: Python, C++, Bash, SQL ■  We provide you with □  Introduction to in-memory technology and genomics basics □  Technology introduction in either or all: SQL, SQLScript, L, R, BFL Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  • 7. Do not hesitate to contact us! 7 Matthieu-P. Schapranow, M.Sc. schapranow@hpi.uni-potsdam.de http://j.mp/schapranow Hasso Plattner Institute Enterprise Platform & Integration Concepts Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012