2. Health & Life Sciences at Intel
Where information and care meet
Personalized Medicine =
Big Data Analytics in Health and Life Sciences
3. Health & Life Sciences at Intel
Where information and care meet
Life Sciences :: Key Industry Challenges and Solutions
• Many (most) applications are single-threaded, single address space
Intel is delivering optimizations working with open source
community, developing NGS+HPC curriculum
• Some algorithms scale quadratically with the size of the problem.
Large data sets exceed available memory and storage
Innovations in acceleration, compute, storage, networking,
security, and *-as-a-service.
• International collaboration is an imperative, bioinformatics
expertise is scarce
• Intel is working closely with the ecosystem to address enterprise
to cloud transmission of terabyte payloads
• Databases are distributed, data is siloed and will likely stay that way
Tools like Hadoop, Lustre, Graphlab, In-Memory Analytics,
Security etc.
Need for Balanced Compute Infrastructure
*Other names and brands may be claimed as the property of others.
5. Health & Life Sciences at Intel
Where information and care meet
Profiling: Single Instance Run of GATK
GATK: Genome Analysis Toolkit
• # of Machines = 1
• # of cores/Machine = 24
• Temporary Storage – RAID0 2x4TB HDD
• Input Dataset: G15512.HCC1954.1, coverage: 65x
Average CPU utilization is very low. Most cores not being used
Average I/O bandwidth is very low. Application not I/O bound
Average memory footprint is small. Application not using memory available in newer systems
There is a lot of room to improve*Other names and brands may be claimed as the property of others.
6. Health & Life Sciences at Intel
Where information and care meet
Improvements in GATK 3.0
• Pair HMM Acceleration using Intel® AVX
resulted in 970x speedup
− Computation kernel and bottleneck in GATK
Haplotype Caller
− AVX enables 8 floating point SIMD
operations in parallel
6
*Other names and brands may be claimed as the property of others.
7. Health & Life Sciences at Intel
Where information and care meet
Applications and Workloads Optimized on Intel Architecture
• Focus on improving genomics, molecular dynamics pipelines
• Optimize individual applications (node and cluster); Work with code authors to release optimizations
DOMAIN Applications
Intel®
Architecture
Target
Genomics
Bowtie 1*, Bowtie 2* Xeon® processor
BWA* Xeon® processor
BLAST* Xeon® processor
GATK* Xeon® processor
HMMER*
Xeon® processor
Xeon® Phi™
coprocessor
Abyss* Xeon® processor
Velvet* Xeon® processor
*Other names and brands may be claimed as the property of others.
DOMAIN Applications
Intel®
Architecture
Targets
Molecular
Dynamics/
Chemistry
AMBER*
Xeon® processor
Xeon® Phi™
coprocessor
NAMD*
GROMACS*
GAMESS*
Quantum Espresso*
Gaussian*
VASP*
CP2K*
QBOX*
CPMD*
LAMMPS*
8. Health & Life Sciences at Intel
Where information and care meet
• Challenge: Ayasdi Cure™ analyzes highly
complex, large data sets and relies on fast
computation times to provide real-time
output.
• Solution:
− Intel® AVX instructions - four double-precision floating-point
operations in parallel vs. one.
− Intel® MKL library - accelerate filter computations
• Benefits: 400% performance increase in
distance computation.
Scripps DNA Sequencing Pipeline
• Challenge: Processing times, Logistical Delays,
Cluster complexity
• Solution: Intel® Xeon® E7-4800 series using SSDs
• Benefits: ~4x Improvement on processing times
8
4x
*Other names and brands may be claimed as the property of others.
9. Health & Life Sciences at Intel
Where information and care meet
Ultra High-Speed Networking Optimizations
• Challenge: Improving big data transfer to and
from the backend data center
• Solution:
− Optimize ultra high-speed (10+ Gbps) data
transfer solutions built on Aspera’s FASP ™
technology
− Intel® Xeon® E5-2600 (DDIO, SR-IOV)
• Benefits:
− 300% improvement in transfer throughput
− Physical or virtual, LAN or WAN – same transfer
speeds
High Performance Scale-out Storage Challenge:
• Challenge: 10-15TB data added weekly, small
fraction of overall storage capacity and need a
system to scale, be flexible and efficient
• Solution: HPC-class storage, powered by Intel®
Enterprise Edition for Lustre* software
• Benefits:
− Openess, global namespace
− Performance of upwards of 1 TB/s
− Virtually unlimited file system and per file
sizes, and management simplicity
9
*Other names and brands may be claimed as the property of others.
10. Health & Life Sciences at Intel
Where information and care meet
HPC Appliances for Life Sciences
• Challenge: Experiment processing takes 7 days with current infrastructure.
Delays treatment for sick patients
• Solution: Dell Next Generation Sequencing Appliance
− Single Rack Solution; 9 Teraflops, Lustre File Storage; Intel SW tools
• Benefits: RNA-Seq processing reduced to 4 hour
• Includes everything you need for NGS - compute, storage, software, networking,
infrastructure, installation, deployment, training, service & support
Dell HSS (Lustre)
(up to 360TB)
Dell NSS (NFS)
(up to 180TB)
Infrastructure:
Dell PE, PC & F10
M420 (Compute)
(up to 32 nodes)
2U Plenum
Actual placement in racks may vary.
NSS-HA Pair
NSS User Data
HSS Metadata Pair
HSS OSS Pair
HSS User Data
** 2-socket Intel(R) Xeon(R) CPU E5-2687W / 3.1 GHz
*Other names and brands may be claimed as the property of others. *Other names and brands may be claimed as the property of others.
11. Health & Life Sciences at Intel
Where information and care meet
Genomics & Clinical Analytics Appliances
11
2U Plenum
Actual placement in racks may vary.
NSS-HA Pair
NSS User Data
HSS Metadata Pair
HSS OSS Pair
HSS User Data
*Other names and brands may be claimed as the property of others.
12. Health & Life Sciences at Intel
Where information and care meet
Value
• Enable researchers to discover biomarkers and
drug targets by correlating genomic data sets
• 90% gain in throughput; 6X data compression
Analytics
• Provide curated data sets with pre-computed
analysis (classification, correlation, biomarkers)
• Provide APIs for applications to combine and
analyze public and private data sets
• Hive + Hadoop for query/search; Intel® Xeon® +
10 GbE
Genomics Data Discovery using Hadoop
*Other names and brands may be claimed as the property of others.
13. Health & Life Sciences at Intel
Where information and care meet
Charite “Real-time” Cancer Analysis – Matching proper therapies
to patients using in-memory techniques
• Challenge: Real-time analysis of cancer
patients using in-memory SAP HANA
Oncolyzer database running on Intel® Xeon®
family infrastructure. (3.5M Data points per
Patient, Up to 20 TB of data/patient)
• Solution: Using structured and unstructured
data to collect and analyze tables used to take
up to two days -- now takes seconds
• Benefits: Improves medical quality in
disruptive way for Patient, Doctor, Hospital,
Research
http://moss.ger.ith.intel.com/sites/SAP/SAP%20account%20team%20documents/Marketing/SAP%20HANA/SAPHANA_Charite_case_study_HI.PDF *Other names and brands may be claimed as the property of others.
14. Health & Life Sciences at Intel
Where information and care meet
High Throughput Science: Embracing Cloud-based Analytics for
Computational Chemistry Simulation
• Challenge: Sustaining 50000+ compute cores
for large scale simulations, for less than a
week; CapEX v. OpX
• Solution: Novartis leveraged software from
AWS partner, Cycle Computing, and MolSoft to
provision a fully secured cluster of 30,000
CPUs, powered by the Intel® Xeon® processor
E5 family.
− Completed screening of 3.2 million
compounds in approximately 9 hrs,
compared to 4 -14 days on existing
resources.
Powerof60.com
*Other names and brands may be claimed as the property of others.
15. Health & Life Sciences at Intel
Where information and care meet
Regional Health Information Network, RHIN
China (Jinzhou, Pop 3M)
• Challenge: RHIN has challenges with
scalability, performance and maintenance.
Data storage is expensive
• Solution: EMR data and healthcare
services running on Intel Hadoop
Distribution and Xeon E5 servers.
• Benefits: High performance and
scalability demonstrated via POC and stress
testing. Significantly reduced storage cost
• 1/5 Reduction in Response Time; 5x
Concurrent Users
Data processing flow of RHIN platform
http://hadoop.intel.com/pdfs/IntelChinaHealthyCityAnalyticsCaseStudy.pdf *Other names and brands may be claimed as the property of others.
16. Health & Life Sciences at Intel
Where information and care meet
Policy – United States, European Union
Snapshot of US, EU Recommendations
Develop an ICT-enabled European Strategy for Personalised
Medicine
2014-2020
Driving research to unleash the potential of ICT at the point-of-care
EU R&D initiatives must address:
Interoperability of technical standards for managing and sharing sequence data in
research and clinical samples;
Development of hardware, software and workflow algorithms to accelerate cost
efficient analysis of genetic abnormalities that cause cancer and other complex
diseases;
Research to ensure convergence of Big Data and Cloud Computing infrastructure to
meet the requirements of High Performance Computing and data throughout the life
sciences and healthcare value chains
The eHealth Action Plan 2020 should include Personalised Medicine as a
priority
Gain knowledge of the challenges and barriers (technical, organizational, legal and
political) to the adoption of ICT in support of Personalised Medicine leveraged by
genomic information;
Evaluate how to change workflows and education requirements to facilitate adoption
of ICT mediated personalized medicine in clinical practice;
Expand collaboration with other regions of the world in matters of common interest,
e.g. by leveraging the eHealth MoU with the United States of America;
Study, evaluate and disseminate technology neutral risk assessment frameworks for
data privacy and security, covering the entire ICT enabled Personalised Medicine
delivery chain;
Develop effective methods for enabling the use of medical information for public health
and research
17. Health & Life Sciences at Intel
Where information and care meet
Let us all make Personalized Medicine
mainstream by 2020 ..
..You focus on the Science, let us focus on
“IT”
• www.intel.com/healthcare/bigdata
• Ketan.Paranjape@intel.com