Real-time Analysis of Genome Data

1,024 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,024
On SlideShare
0
From Embeds
0
Number of Embeds
320
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Real-time Analysis of Genome Data

  1. 1. Bachelor Project:Real-time Analysis of Genome Data July 12, 2012 Matthieu-P. Schapranow Hasso Plattner Institute Chair of Prof. Hasso Plattner
  2. 2. Numbers you should know The Human Genome Project2 ■  1984: Human Genome (HG) project idea discussed at Alta Summit as “DNA available on the Internet” ■  1990: HG project for 15 years started in the US (3 billion USD funding) ■  2000: Rough draft of the HG announced ■  2003: Complete genome sequenced ■  2006: Last and longest chr1 sequenced ■  As of today, we know: □  HG consists of 3.2 Bbp (~3.2 GB), □  23 chromosomes, □  20k-25k distinct genes Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  3. 3. 3 Costs in USD 0,01 0,1 1 10 100 1000 10000 01.01.01 01.05.01 01.09.01 01.01.02 01.05.02 01.09.02 01.01.03 01.05.03 01.09.03 01.01.04 01.05.04 Comparison of Costs 01.09.04 01.01.05 Costs per Megabyte RAM 01.05.05 01.09.05 Numbers you should know 01.01.06 01.05.06 01.09.06 01.01.07 01.05.07Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012 01.09.07 01.01.08 01.05.08 01.09.08 01.01.09 Costs per Megabase Sequencing 01.05.09 01.09.09 01.01.10 Comparison of Costs for Main Memory and Genome Analysis 01.05.10 01.09.10 01.01.11 01.05.11 01.09.11 01.01.12
  4. 4. Numbers you should know Hardware Characteristics4 ■  1,000 core cluster, 25 TB main memory ■  Consists of 25 identical nodes: □  80 cores □  1 TB main memory □  Intel® Xeon® E7- 4870 □  2.40GHz □  30 MB Cache Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  5. 5. Aims of the Bachelor’s Project5 ■  Gather interdisciplinary knowledge to work in teams with biological and medical experts ■  Explore data from gene, protein, drug, and pathway databases to gain new insights ■  Implement algorithms optimized for in-memory technology, e.g. cluster algorithms for quantifying similarity of samples or detection of single nucleotide polymorphisms ■  Proof applicability of in-memory technology for real-time analysis of genome data ■  Areas of interest: life sciences, crop sciences, biology, crime investigation, etc. Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  6. 6. Your profile6 ■  What we expect □  Flexibility in working interdisciplinary □  At least one passed database lecture □  Knowledge in using either or all: Python, C++, Bash, SQL ■  We provide you with □  Introduction to in-memory technology and genomics basics □  Technology introduction in either or all: SQL, SQLScript, L, R, BFL Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  7. 7. Do not hesitate to contact us!7 Matthieu-P. Schapranow, M.Sc. schapranow@hpi.uni-potsdam.de http://j.mp/schapranow Hasso Plattner Institute Enterprise Platform & Integration Concepts Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012

×