Intel life sciences_personalizedmedicine_stanford biomed 052214 dist
Upcoming SlideShare
Loading in...5
×
 

Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

on

  • 234 views

Talk at Stanford Big Data in Biomedicine Conference by Ketan Paranjape, WW Director Health and Life Sciences

Talk at Stanford Big Data in Biomedicine Conference by Ketan Paranjape, WW Director Health and Life Sciences

Statistics

Views

Total Views
234
Views on SlideShare
232
Embed Views
2

Actions

Likes
0
Downloads
9
Comments
0

1 Embed 2

http://www.slideee.com 2

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Intel life sciences_personalizedmedicine_stanford biomed 052214 dist Intel life sciences_personalizedmedicine_stanford biomed 052214 dist Presentation Transcript

  • Personalized Medicine BigData“IT”inHealthandLifeSciences KetanParanjape WorldwideDirector,HealthandLifeSciences IntelCorp. 1 www.intel.com/healthcare/bigdata
  • Health & Life Sciences at Intel Where information and care meet Personalized Medicine = Big Data Analytics in Health and Life Sciences
  • Health & Life Sciences at Intel Where information and care meet Life Sciences :: Key Industry Challenges and Solutions • Many (most) applications are single-threaded, single address space Intel is delivering optimizations working with open source community, developing NGS+HPC curriculum • Some algorithms scale quadratically with the size of the problem. Large data sets exceed available memory and storage Innovations in acceleration, compute, storage, networking, security, and *-as-a-service. • International collaboration is an imperative, bioinformatics expertise is scarce • Intel is working closely with the ecosystem to address enterprise to cloud transmission of terabyte payloads • Databases are distributed, data is siloed and will likely stay that way Tools like Hadoop, Lustre, Graphlab, In-Memory Analytics, Security etc. Need for Balanced Compute Infrastructure *Other names and brands may be claimed as the property of others.
  • Recent Collaborations 4
  • Health & Life Sciences at Intel Where information and care meet Profiling: Single Instance Run of GATK GATK: Genome Analysis Toolkit • # of Machines = 1 • # of cores/Machine = 24 • Temporary Storage – RAID0 2x4TB HDD • Input Dataset: G15512.HCC1954.1, coverage: 65x Average CPU utilization is very low. Most cores not being used Average I/O bandwidth is very low. Application not I/O bound Average memory footprint is small. Application not using memory available in newer systems There is a lot of room to improve*Other names and brands may be claimed as the property of others.
  • Health & Life Sciences at Intel Where information and care meet Improvements in GATK 3.0 • Pair HMM Acceleration using Intel® AVX resulted in 970x speedup − Computation kernel and bottleneck in GATK Haplotype Caller − AVX enables 8 floating point SIMD operations in parallel 6 *Other names and brands may be claimed as the property of others.
  • Health & Life Sciences at Intel Where information and care meet Applications and Workloads Optimized on Intel Architecture • Focus on improving genomics, molecular dynamics pipelines • Optimize individual applications (node and cluster); Work with code authors to release optimizations DOMAIN Applications Intel® Architecture Target Genomics Bowtie 1*, Bowtie 2* Xeon® processor BWA* Xeon® processor BLAST* Xeon® processor GATK* Xeon® processor HMMER* Xeon® processor Xeon® Phi™ coprocessor Abyss* Xeon® processor Velvet* Xeon® processor *Other names and brands may be claimed as the property of others. DOMAIN Applications Intel® Architecture Targets Molecular Dynamics/ Chemistry AMBER* Xeon® processor Xeon® Phi™ coprocessor NAMD* GROMACS* GAMESS* Quantum Espresso* Gaussian* VASP* CP2K* QBOX* CPMD* LAMMPS*
  • Health & Life Sciences at Intel Where information and care meet • Challenge: Ayasdi Cure™ analyzes highly complex, large data sets and relies on fast computation times to provide real-time output. • Solution: − Intel® AVX instructions - four double-precision floating-point operations in parallel vs. one. − Intel® MKL library - accelerate filter computations • Benefits: 400% performance increase in distance computation. Scripps DNA Sequencing Pipeline • Challenge: Processing times, Logistical Delays, Cluster complexity • Solution: Intel® Xeon® E7-4800 series using SSDs • Benefits: ~4x Improvement on processing times 8 4x *Other names and brands may be claimed as the property of others.
  • Health & Life Sciences at Intel Where information and care meet Ultra High-Speed Networking Optimizations • Challenge: Improving big data transfer to and from the backend data center • Solution: − Optimize ultra high-speed (10+ Gbps) data transfer solutions built on Aspera’s FASP ™ technology − Intel® Xeon® E5-2600 (DDIO, SR-IOV) • Benefits: − 300% improvement in transfer throughput − Physical or virtual, LAN or WAN – same transfer speeds High Performance Scale-out Storage Challenge: • Challenge: 10-15TB data added weekly, small fraction of overall storage capacity and need a system to scale, be flexible and efficient • Solution: HPC-class storage, powered by Intel® Enterprise Edition for Lustre* software • Benefits: − Openess, global namespace − Performance of upwards of 1 TB/s − Virtually unlimited file system and per file sizes, and management simplicity 9 *Other names and brands may be claimed as the property of others.
  • Health & Life Sciences at Intel Where information and care meet HPC Appliances for Life Sciences • Challenge: Experiment processing takes 7 days with current infrastructure. Delays treatment for sick patients • Solution: Dell Next Generation Sequencing Appliance − Single Rack Solution; 9 Teraflops, Lustre File Storage; Intel SW tools • Benefits: RNA-Seq processing reduced to 4 hour • Includes everything you need for NGS - compute, storage, software, networking, infrastructure, installation, deployment, training, service & support Dell HSS (Lustre) (up to 360TB) Dell NSS (NFS) (up to 180TB) Infrastructure: Dell PE, PC & F10 M420 (Compute) (up to 32 nodes) 2U Plenum Actual placement in racks may vary. NSS-HA Pair NSS User Data HSS Metadata Pair HSS OSS Pair HSS User Data ** 2-socket Intel(R) Xeon(R) CPU E5-2687W / 3.1 GHz *Other names and brands may be claimed as the property of others. *Other names and brands may be claimed as the property of others.
  • Health & Life Sciences at Intel Where information and care meet Genomics & Clinical Analytics Appliances 11 2U Plenum Actual placement in racks may vary. NSS-HA Pair NSS User Data HSS Metadata Pair HSS OSS Pair HSS User Data *Other names and brands may be claimed as the property of others.
  • Health & Life Sciences at Intel Where information and care meet Value • Enable researchers to discover biomarkers and drug targets by correlating genomic data sets • 90% gain in throughput; 6X data compression Analytics • Provide curated data sets with pre-computed analysis (classification, correlation, biomarkers) • Provide APIs for applications to combine and analyze public and private data sets • Hive + Hadoop for query/search; Intel® Xeon® + 10 GbE Genomics Data Discovery using Hadoop *Other names and brands may be claimed as the property of others.
  • Health & Life Sciences at Intel Where information and care meet Charite “Real-time” Cancer Analysis – Matching proper therapies to patients using in-memory techniques • Challenge: Real-time analysis of cancer patients using in-memory SAP HANA Oncolyzer database running on Intel® Xeon® family infrastructure. (3.5M Data points per Patient, Up to 20 TB of data/patient) • Solution: Using structured and unstructured data to collect and analyze tables used to take up to two days -- now takes seconds • Benefits: Improves medical quality in disruptive way for Patient, Doctor, Hospital, Research http://moss.ger.ith.intel.com/sites/SAP/SAP%20account%20team%20documents/Marketing/SAP%20HANA/SAPHANA_Charite_case_study_HI.PDF *Other names and brands may be claimed as the property of others.
  • Health & Life Sciences at Intel Where information and care meet High Throughput Science: Embracing Cloud-based Analytics for Computational Chemistry Simulation • Challenge: Sustaining 50000+ compute cores for large scale simulations, for less than a week; CapEX v. OpX • Solution: Novartis leveraged software from AWS partner, Cycle Computing, and MolSoft to provision a fully secured cluster of 30,000 CPUs, powered by the Intel® Xeon® processor E5 family. − Completed screening of 3.2 million compounds in approximately 9 hrs, compared to 4 -14 days on existing resources. Powerof60.com *Other names and brands may be claimed as the property of others.
  • Health & Life Sciences at Intel Where information and care meet Regional Health Information Network, RHIN China (Jinzhou, Pop 3M) • Challenge: RHIN has challenges with scalability, performance and maintenance. Data storage is expensive • Solution: EMR data and healthcare services running on Intel Hadoop Distribution and Xeon E5 servers. • Benefits: High performance and scalability demonstrated via POC and stress testing. Significantly reduced storage cost • 1/5 Reduction in Response Time; 5x Concurrent Users Data processing flow of RHIN platform http://hadoop.intel.com/pdfs/IntelChinaHealthyCityAnalyticsCaseStudy.pdf *Other names and brands may be claimed as the property of others.
  • Health & Life Sciences at Intel Where information and care meet Policy – United States, European Union Snapshot of US, EU Recommendations Develop an ICT-enabled European Strategy for Personalised Medicine 2014-2020 Driving research to unleash the potential of ICT at the point-of-care EU R&D initiatives must address:  Interoperability of technical standards for managing and sharing sequence data in research and clinical samples;  Development of hardware, software and workflow algorithms to accelerate cost efficient analysis of genetic abnormalities that cause cancer and other complex diseases;  Research to ensure convergence of Big Data and Cloud Computing infrastructure to meet the requirements of High Performance Computing and data throughout the life sciences and healthcare value chains The eHealth Action Plan 2020 should include Personalised Medicine as a priority  Gain knowledge of the challenges and barriers (technical, organizational, legal and political) to the adoption of ICT in support of Personalised Medicine leveraged by genomic information;  Evaluate how to change workflows and education requirements to facilitate adoption of ICT mediated personalized medicine in clinical practice;  Expand collaboration with other regions of the world in matters of common interest, e.g. by leveraging the eHealth MoU with the United States of America;  Study, evaluate and disseminate technology neutral risk assessment frameworks for data privacy and security, covering the entire ICT enabled Personalised Medicine delivery chain;  Develop effective methods for enabling the use of medical information for public health and research
  • Health & Life Sciences at Intel Where information and care meet Let us all make Personalized Medicine mainstream by 2020 .. ..You focus on the Science, let us focus on “IT” • www.intel.com/healthcare/bigdata • Ketan.Paranjape@intel.com