41. Title TextCost per Genome
https://www.genome.gov/images/content/costpergenome2015_4.jpg
Next Generation Sequencing (NGS)
debuted
Illumina HiSeq X10
debuted
Human Genome Project (HGP)
Completed
Precision Medicine Initiative
announced
43. Title TextThe First $1,000 Genome
http://systems.illumina.com/systems/hiseq-x-sequencing-system.html
44. Title TextExpectation of Data Processing
Power for illumina HiSeq X Ten
• A cluster of 10 HiSeq X instruments
• Capable of sequencing up to 18,000 whole human genomes each year
• Has a run cycle of ~3 days and produces ~150 genomes each run cycle
• Running the industry standard BWA+GATK analysis pipeline to perform this
analysis on a reasonably high-end (Dual Intel Xeon E5-2697v2 CPU – 12 core,
2.7 GHz with 96 GB DRAM) compute server takes ~24 hours per genome.
• To achieve the required throughput of 150 genomes every three days, at least
50 of these servers are required.
• Should meet a target of ~28 minutes for the completion of the mapping, aligning,
sorting, de-duplication and variant calling of each genome.
45. Title Text次世代定序 (NGS) 101
https://www.broadinstitute.org/gatk/img/cartoon-blackbox-workflow-web-blackblue.png
Wet Lab Dry Lab
46. Title TextSequencing Error
Dr. Watson
Discoverer of the structure of DNA in 1953
< 0.1%
~ 1 %
Chimp
Most closest species to human
Sequencing Error = ~1%
Dr. Su
Cofounder of Atgenomix in 2015
~ 0.1%
54. Title TextScale-Up vs. Scale-Out
Horizontal Scaling
(More Nodes)
VerticalScaling
(BiggerNodes)
More expensive server
(Big Memory, Many CPU cores)
Many commodity nodes
55. Title TextHadoop – HDFS, Spark, YARN
https://www.tutorialspoint.com/hadoop/hadoop_introduction.htm