SlideShare a Scribd company logo
1 of 22
Biology is eating the
world & AI is eating
Biology
Pradeep K Dubey
Intel Senior Fellow, IEEE Fellow
Director, Parallel Computing Labs
Intel All.AI 2021 @ Population Scale Virtual Summit 2
Machines:
Crunch
Numbers
Humans:
Make
Decisions
Intel All.AI 2021 @ Population Scale Virtual Summit 3
Machines:
Crunch
Numbers
Humans:
Make
Decisions
Division of Labor Between Man and Machine Is Getting Disrupted:
Faster than Anyone Predicted!
Machines:
Number Crunching
AND
Decision Making
FROM
A World of
analytical
models
Computational Fluid Dynamics
Start with Mathematical Model
Model  Simulate  Predict
Start with Data
Initial State  Increment  Steer
TO
A World of
Data driven
Models
Event Detection from Social Media
Inside - Out Outside - In
Intel All.AI 2021 @ Population Scale Virtual Summit 5
• Effectiveness of AI relies on how well model structure matches the underlying invariant (structure) of the
high-dimensional task objective
• A good set of implicit or explicit inductive bias incorporating domain knowledge
• Such as, CNNs for vision and attention networks for NLP or emerging GNNs
• Training time: How well we manage exploitation versus exploration to get to the most generalizable
(flatter) minima
• Avoiding typical solver attraction to sharp minima
• Higher-order methods
What makes AI effective in practice
5
Intel All.AI 2021 @ Population Scale Virtual Summit 6
better understanding of interiors and
evolution of RED GIANT stars
Accurately extract seismic parameters from 1000
spectra in under 10 secs
Measuring the frequency separation ∆ν and period separation ∆Π in red-giant stars using Machine learning, under submission at Science Advances
Department of Astronomy and Astrophysics, Tata Institute of Fundamental, Center for Space Science, NYUAD Institute, New York University Abu Dhabi, Division of Solar and Plasma Astrophysics, NAOJ,
Mitaka, Tokyo, Japan, Parallel Computing Lab, Intel Labs, Bangalore, India
Intel All.AI 2021 @ Population Scale Virtual Summit 7
Convergence of Revolutions
Daphne Koller*: https://www.youtube.com/watch?v=V6bSlPNwrKo&feature=youtu.be
Advances in
CELL
biology &
creation of
immense
amount of
data
Advances in
ML to
analyZE
large scale
data and
leverage To
make
Prediction
Intel All.AI 2021 @ Population Scale Virtual Summit 8
AI is Eating Biology
8
Biology is experiencing its “AI moment”
Publications involving AI methods (e.g. deep learning, NLP, computer vision, RL) in biology are growing
21000 papers in 2020 alone
> 50% YoY since 2019
Papers since 2019 = 25% of all output
since 2000
https://pubs.acs.org/doi/10.1021/acs.jcim.1c01114
Intel All.AI 2021 @ Population Scale Virtual Summit 9
Intel All.AI 2021 @ Population Scale Virtual Summit 10
Understand mechanisms, Design Interventions:
Massive Compute Appetite
Big Data: Astronomical or Genomical
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Algorithmic, Computational & Data Management
Requirements
>1000x
growth
IN COMPUTE
NEEDEDTO MATCH
DEMAND
100’s of TB/s
MEMORY BW AT
100’S OF GB
CAPACITY
Process 100’s of exabytes of
multi-modal data
e.g., Learning on Large Graphs,
structure learning, regulatory
networks, Combinatorial
optimizations…
Secure, Privacy preserving,
Federated
Intel All.AI 2021 @ Population Scale Virtual Summit 11
Accelerating Graph Neural
Networks on Xeon
Supercomputing’21 - distGNN: Scalable Distributed Training
for Large-Scale Graph Neural Networks
Full batch Training ~2-3.7x faster on 1s-CLX (1s) for GraphSAGE on OGB-Products & Reddit ~83x for distributed training on 128 sockets on OGB-
Papers
Cascade Lake Xeon: Intel® Xeon® Platinum 8280 Processor 38.5M Cache, 2.70 GHz, 28 cores
[arXiv’20, arXiv’21, SC’21]
DGL v 0.5.3
GraphSAGE on Reddit
GraphSAGE on OGB-Products
OGB-Papers: 100 Million Node Graph
Roofline: Upper &
lower bound
DGL v 0.5.3
Intel All.AI 2021 @ Population Scale Virtual Summit 12
LamBdaZero
 Search space 10^18 vs internet 10^9
 Combinatorial Optimization at scale
 Uses ML and HPC to accelerate screening of drug-like
molecules
 @MILA with Prof. Yoshua Bengio
[Intel-MILA announcement]
Intel All.AI 2021 @ Population Scale Virtual Summit 13
Bao*: Making Learned Query Optimization Practical
* Paper: https://arxiv.org/abs/2004.03814 , Code: https://learned.systems/bao
Intel All.AI 2021 @ Population Scale Virtual Summit 14
Bao outperforms them all!
SIGMOD’21: Best Paper
(Data Management)*
In collab with Prof. Tim
Kraska@MIT
* SIGMOD’21 Best Paper Announcement: https://2021.sigmod.org/sigmod_best_papers.shtml
Intel All.AI 2021 @ Population Scale Virtual Summit 15
BWA-MEM2* : An Accelerated
version of BWA MEM
(BWA-MEM has 950K+ Downloads, 70K
Users WW)
15
Higher is better
 In collaboration with Dr Heng Li, Author BWA-MEM
 Reference genome: GRCh38; Read dataset: 50x WGS ERR194147 (NA12878/HG001)
from Illumina HiSeq 2000
Sequence alignment
Cascade Lake Xeon: Intel® Xeon® Platinum 8280 Processor 38.5M Cache, 2.70 GHz, 28 cores
Ice Lake Xeon, ICX: Intel® Xeon® Platinum 8380 Processor 60MB Cache, 2.40 GHz, 40 cores
9.8
15.8
22.1
8.9
2s CLX 2s CLX 2s ICX 1 A100
BWA-MEM BWA-MEM2 Clara Parabricks BWA-
MEM
Throughput in genomes/day for 50x WGS
Higher is better
2.25x
2.5x
Source of Clara Parabricks results: https://at-cg.github.io/posts/ParaBricks-WGS/
Enabling Community Worldwide
https://github.com/bwa-mem2/bwa-mem2
horticulture
nutrition
In production use by Cancer, Ageing and Somatic
Mutations, Wellcome Sanger Institute; tested on ~88
Billion reads
Intel All.AI 2021 @ Population Scale Virtual Summit 16
MM2-Fast Accelerates
MINIMAP2 on Xeon by 3.1
Cascade Lake Xeon, CLX: Intel® Xeon® Platinum 8280 Processor 38.5M Cache, 2.70 GHz, 28 cores
Ice Lake Xeon, ICX: Intel® Xeon® Platinum 8380 Processor 60MB Cache, 2.40 GHz, 40 cores
[bioRxiv’21]
MM2-Fast Branch in
Minimap2 repo
In collaboration with Dr Heng Li, Author Minimap2
Reference genome: GRCh38; Read dataset: ONT, PacBio HiFi and PacBio CLR datasets derived from human trio benchmark genomes HG002, HG003 and HG004 as given at https://precision.fda.gov/challenges/10/view
and https://github.com/genome-in-a-bottle/giab_data_indexes
Minimap2 has >
100k Downloads
Intel All.AI 2021 @ Population Scale Virtual Summit 17
9x speedup for Analysis of Single Cell ATAC-
SEQ Data
Denoising and peak calling on noisy
ATAC-Seq data
Cascade Lake Xeon, CLX: Intel® Xeon® Platinum 8280 Processor 38.5MB Cache, 2.70 GHz, 28 cores
Cooper Lake Xeon, CPX: Intel® Xeon® Platinum 8380H Processor 38.5MB Cache, 2.90 GHz, 28 cores
Ice Lake Xeon, ICX: Intel® Xeon® Platinum 8380 Processor 60MB Cache, 2.40 GHz, 40 cores
Higher is better
1.8x
2.3x
Source of Clara Parabricks performance: [Nvidia, 2020] AtacWorks: A deep convolutional neural network toolkit for epigenomics
2.3x speedup over NVIDIA
Clara Parabricks on DGX-1
box (8 card V100) with 16
sockets of Cooper Lake
1.8x speedup over NVIDIA
Clara Parabricks on DGX-1
box (8 card V100) with 16
sockets of Ice Lake
[arXiv’21,
bioRxiv’21]
Intel All.AI 2021 @ Population Scale Virtual Summit 18
Intel All.AI 2021 @ Population Scale Virtual Summit 19
Brain tumor segmentation finds tumors from
MRIs
Sheller, M.J., Edwards, B., Reina, G.A. et al. Federated learning in medicine: facilitating multi-institutional
collaborations without sharing patient data. Sci Rep 10, 12598 (2020).
Intel-UPenn Collaboration
How much better does each institution do
when training on the full data vs. just their
own data?
17%
BETTER
2.6%
BETTER
on their own validation data
on the hold-out BraTS data
Other names and brands may be claimed as the property of others
Intel All.AI 2021 @ Population Scale Virtual Summit 20
1. Privacy Preserved Machine Learning for data
and model privacy / protection
2. Privacy/Confidentiality Preservation
3. Attestation and integrity
4. Federation deployment
5. Federated nodes software stacks for TTM
6. Curation tools and deployment automation
github.com/intel/openfl
openfl.readthedocs.io/
 Enables greatest access to data
 Any company can host a privacy
preserved federation
 Complete software and platform
offering time to market deployment
Intel All.AI 2021 @ Population Scale Virtual Summit 21
: a Benchmark Suite For
 Many GenomicsBench benchmarks have abundant data parallelism, but significant irregularity
makes it challenging to achieve good performance.
 12 representative kernels spanning the major steps in short-read and long-read sequence
analysis pipelines
 FM-index, Banded Smith-Waterman, deBruijn graphs, Pair HMM, DP Chaining, SIMD Partial Order
Alignment, Adaptive Banded Signal to Event Alignment, Genomic Relationship Matrix, Neural networks
based Basecalling, Neural networks based variant calling, Kmer counting, Pileup counting
Open-sourced and under active development:
https://github.com/arun-sub/genomicsbench
Xeon Optimized implementations of kernels under active development at:
https://github.com/IntelLabs/Trans-Omics-Acceleration-Library
Intel All.AI 2021 @ Population Scale Virtual Summit 22
DISCUSSIONS

More Related Content

What's hot

Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Grigori Fursin
 
On premise ai platform - from dc to edge
On premise ai platform - from dc to edgeOn premise ai platform - from dc to edge
On premise ai platform - from dc to edge
Conference Papers
 

What's hot (20)

AIDC Summit LA: Wipro Solutions Overview
AIDC Summit LA: Wipro Solutions Overview AIDC Summit LA: Wipro Solutions Overview
AIDC Summit LA: Wipro Solutions Overview
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
Aplicações Potenciais de Deep Learning à Indústria do PetróleoAplicações Potenciais de Deep Learning à Indústria do Petróleo
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
NVIDIA Developer Program Overview
NVIDIA Developer Program OverviewNVIDIA Developer Program Overview
NVIDIA Developer Program Overview
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
 
Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with Partners
 
TechWiseTV Workshop: Improving Performance and Agility with Cisco HyperFlex
TechWiseTV Workshop: Improving Performance and Agility with Cisco HyperFlexTechWiseTV Workshop: Improving Performance and Agility with Cisco HyperFlex
TechWiseTV Workshop: Improving Performance and Agility with Cisco HyperFlex
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
 
Parallel universe-issue-29
Parallel universe-issue-29Parallel universe-issue-29
Parallel universe-issue-29
 
Developing Digital Twins
Developing Digital TwinsDeveloping Digital Twins
Developing Digital Twins
 
"SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
 	 "SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle... 	 "SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
"SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
 
Accelerate AI w/ Synthetic Data using GANs
Accelerate AI w/ Synthetic Data using GANsAccelerate AI w/ Synthetic Data using GANs
Accelerate AI w/ Synthetic Data using GANs
 
EPSRC CDT Conference
EPSRC CDT ConferenceEPSRC CDT Conference
EPSRC CDT Conference
 
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
 
On premise ai platform - from dc to edge
On premise ai platform - from dc to edgeOn premise ai platform - from dc to edge
On premise ai platform - from dc to edge
 
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded DayC:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas
Hire a Machine to Code - Michael Arthur Bucko & Aurélien NicolasHire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas
Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas
 

Similar to AI for All: Biology is eating the world & AI is eating Biology

Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source EcosystemAccelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
DataWorks Summit
 
THE INTERNET OF THINGS, PRODUCTIVITY AND EMPLOYMENT Boston 0915
THE INTERNET OF THINGS, PRODUCTIVITY AND EMPLOYMENT Boston 0915THE INTERNET OF THINGS, PRODUCTIVITY AND EMPLOYMENT Boston 0915
THE INTERNET OF THINGS, PRODUCTIVITY AND EMPLOYMENT Boston 0915
Economic Strategy Institute
 
Big data high performance computing commenting
Big data   high performance computing commentingBig data   high performance computing commenting
Big data high performance computing commenting
Intel IT Center
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clustering
SK Ahammad Fahad
 

Similar to AI for All: Biology is eating the world & AI is eating Biology (20)

Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source EcosystemAccelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
 
Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012Nikravesh australia long_versionkeynote2012
Nikravesh australia long_versionkeynote2012
 
How Can AI and IoT Power the Chemical Industry?
How Can AI and IoT Power the Chemical Industry?How Can AI and IoT Power the Chemical Industry?
How Can AI and IoT Power the Chemical Industry?
 
BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012
 
IoT 2014 Value Creation Workshop: SDIL
IoT 2014 Value Creation Workshop: SDILIoT 2014 Value Creation Workshop: SDIL
IoT 2014 Value Creation Workshop: SDIL
 
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
 
Fuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingFuelling the AI Revolution with Gaming
Fuelling the AI Revolution with Gaming
 
The Internet of Things, Productivity, and Employment
The Internet of Things, Productivity, and Employment The Internet of Things, Productivity, and Employment
The Internet of Things, Productivity, and Employment
 
THE INTERNET OF THINGS, PRODUCTIVITY AND EMPLOYMENT Boston 0915
THE INTERNET OF THINGS, PRODUCTIVITY AND EMPLOYMENT Boston 0915THE INTERNET OF THINGS, PRODUCTIVITY AND EMPLOYMENT Boston 0915
THE INTERNET OF THINGS, PRODUCTIVITY AND EMPLOYMENT Boston 0915
 
Enabling Transparent Hardware Acceleration on Zynq SoC for Python Data Scienc...
Enabling Transparent Hardware Acceleration on Zynq SoC for Python Data Scienc...Enabling Transparent Hardware Acceleration on Zynq SoC for Python Data Scienc...
Enabling Transparent Hardware Acceleration on Zynq SoC for Python Data Scienc...
 
2015 Bio-IT Trends From the Trenches
2015 Bio-IT Trends From the Trenches2015 Bio-IT Trends From the Trenches
2015 Bio-IT Trends From the Trenches
 
On Big Data
On Big DataOn Big Data
On Big Data
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
Big data high performance computing commenting
Big data   high performance computing commentingBig data   high performance computing commenting
Big data high performance computing commenting
 
State of AI Report 2022 - ONLINE.pdf
State of AI Report 2022 - ONLINE.pdfState of AI Report 2022 - ONLINE.pdf
State of AI Report 2022 - ONLINE.pdf
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clustering
 
High Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming waveHigh Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming wave
 
10/21 Top 5 Deep Learning Stories
10/21 Top 5 Deep Learning Stories10/21 Top 5 Deep Learning Stories
10/21 Top 5 Deep Learning Stories
 
Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618
 
50th Anniversary Keynote for Korean Testing Laboratory
50th Anniversary Keynote for Korean Testing Laboratory50th Anniversary Keynote for Korean Testing Laboratory
50th Anniversary Keynote for Korean Testing Laboratory
 

More from Intel® Software

More from Intel® Software (20)

Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
 
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
 
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
 
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
 
Intel® AI: Parameter Efficient Training
Intel® AI: Parameter Efficient TrainingIntel® AI: Parameter Efficient Training
Intel® AI: Parameter Efficient Training
 
Intel® AI: Non-Parametric Priors for Generative Adversarial Networks
Intel® AI: Non-Parametric Priors for Generative Adversarial Networks Intel® AI: Non-Parametric Priors for Generative Adversarial Networks
Intel® AI: Non-Parametric Priors for Generative Adversarial Networks
 

Recently uploaded

Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
drm1699
 

Recently uploaded (20)

Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdf
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
 
[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)
 
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements Engineering
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with Links
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
 
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
 
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdf
 
Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 
Rapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and InsightsRapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and Insights
 

AI for All: Biology is eating the world & AI is eating Biology

  • 1. Biology is eating the world & AI is eating Biology Pradeep K Dubey Intel Senior Fellow, IEEE Fellow Director, Parallel Computing Labs
  • 2. Intel All.AI 2021 @ Population Scale Virtual Summit 2 Machines: Crunch Numbers Humans: Make Decisions
  • 3. Intel All.AI 2021 @ Population Scale Virtual Summit 3 Machines: Crunch Numbers Humans: Make Decisions Division of Labor Between Man and Machine Is Getting Disrupted: Faster than Anyone Predicted! Machines: Number Crunching AND Decision Making
  • 4. FROM A World of analytical models Computational Fluid Dynamics Start with Mathematical Model Model  Simulate  Predict Start with Data Initial State  Increment  Steer TO A World of Data driven Models Event Detection from Social Media Inside - Out Outside - In
  • 5. Intel All.AI 2021 @ Population Scale Virtual Summit 5 • Effectiveness of AI relies on how well model structure matches the underlying invariant (structure) of the high-dimensional task objective • A good set of implicit or explicit inductive bias incorporating domain knowledge • Such as, CNNs for vision and attention networks for NLP or emerging GNNs • Training time: How well we manage exploitation versus exploration to get to the most generalizable (flatter) minima • Avoiding typical solver attraction to sharp minima • Higher-order methods What makes AI effective in practice 5
  • 6. Intel All.AI 2021 @ Population Scale Virtual Summit 6 better understanding of interiors and evolution of RED GIANT stars Accurately extract seismic parameters from 1000 spectra in under 10 secs Measuring the frequency separation ∆ν and period separation ∆Π in red-giant stars using Machine learning, under submission at Science Advances Department of Astronomy and Astrophysics, Tata Institute of Fundamental, Center for Space Science, NYUAD Institute, New York University Abu Dhabi, Division of Solar and Plasma Astrophysics, NAOJ, Mitaka, Tokyo, Japan, Parallel Computing Lab, Intel Labs, Bangalore, India
  • 7. Intel All.AI 2021 @ Population Scale Virtual Summit 7 Convergence of Revolutions Daphne Koller*: https://www.youtube.com/watch?v=V6bSlPNwrKo&feature=youtu.be Advances in CELL biology & creation of immense amount of data Advances in ML to analyZE large scale data and leverage To make Prediction
  • 8. Intel All.AI 2021 @ Population Scale Virtual Summit 8 AI is Eating Biology 8 Biology is experiencing its “AI moment” Publications involving AI methods (e.g. deep learning, NLP, computer vision, RL) in biology are growing 21000 papers in 2020 alone > 50% YoY since 2019 Papers since 2019 = 25% of all output since 2000 https://pubs.acs.org/doi/10.1021/acs.jcim.1c01114
  • 9. Intel All.AI 2021 @ Population Scale Virtual Summit 9
  • 10. Intel All.AI 2021 @ Population Scale Virtual Summit 10 Understand mechanisms, Design Interventions: Massive Compute Appetite Big Data: Astronomical or Genomical https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 Algorithmic, Computational & Data Management Requirements >1000x growth IN COMPUTE NEEDEDTO MATCH DEMAND 100’s of TB/s MEMORY BW AT 100’S OF GB CAPACITY Process 100’s of exabytes of multi-modal data e.g., Learning on Large Graphs, structure learning, regulatory networks, Combinatorial optimizations… Secure, Privacy preserving, Federated
  • 11. Intel All.AI 2021 @ Population Scale Virtual Summit 11 Accelerating Graph Neural Networks on Xeon Supercomputing’21 - distGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks Full batch Training ~2-3.7x faster on 1s-CLX (1s) for GraphSAGE on OGB-Products & Reddit ~83x for distributed training on 128 sockets on OGB- Papers Cascade Lake Xeon: Intel® Xeon® Platinum 8280 Processor 38.5M Cache, 2.70 GHz, 28 cores [arXiv’20, arXiv’21, SC’21] DGL v 0.5.3 GraphSAGE on Reddit GraphSAGE on OGB-Products OGB-Papers: 100 Million Node Graph Roofline: Upper & lower bound DGL v 0.5.3
  • 12. Intel All.AI 2021 @ Population Scale Virtual Summit 12 LamBdaZero  Search space 10^18 vs internet 10^9  Combinatorial Optimization at scale  Uses ML and HPC to accelerate screening of drug-like molecules  @MILA with Prof. Yoshua Bengio [Intel-MILA announcement]
  • 13. Intel All.AI 2021 @ Population Scale Virtual Summit 13 Bao*: Making Learned Query Optimization Practical * Paper: https://arxiv.org/abs/2004.03814 , Code: https://learned.systems/bao
  • 14. Intel All.AI 2021 @ Population Scale Virtual Summit 14 Bao outperforms them all! SIGMOD’21: Best Paper (Data Management)* In collab with Prof. Tim Kraska@MIT * SIGMOD’21 Best Paper Announcement: https://2021.sigmod.org/sigmod_best_papers.shtml
  • 15. Intel All.AI 2021 @ Population Scale Virtual Summit 15 BWA-MEM2* : An Accelerated version of BWA MEM (BWA-MEM has 950K+ Downloads, 70K Users WW) 15 Higher is better  In collaboration with Dr Heng Li, Author BWA-MEM  Reference genome: GRCh38; Read dataset: 50x WGS ERR194147 (NA12878/HG001) from Illumina HiSeq 2000 Sequence alignment Cascade Lake Xeon: Intel® Xeon® Platinum 8280 Processor 38.5M Cache, 2.70 GHz, 28 cores Ice Lake Xeon, ICX: Intel® Xeon® Platinum 8380 Processor 60MB Cache, 2.40 GHz, 40 cores 9.8 15.8 22.1 8.9 2s CLX 2s CLX 2s ICX 1 A100 BWA-MEM BWA-MEM2 Clara Parabricks BWA- MEM Throughput in genomes/day for 50x WGS Higher is better 2.25x 2.5x Source of Clara Parabricks results: https://at-cg.github.io/posts/ParaBricks-WGS/ Enabling Community Worldwide https://github.com/bwa-mem2/bwa-mem2 horticulture nutrition In production use by Cancer, Ageing and Somatic Mutations, Wellcome Sanger Institute; tested on ~88 Billion reads
  • 16. Intel All.AI 2021 @ Population Scale Virtual Summit 16 MM2-Fast Accelerates MINIMAP2 on Xeon by 3.1 Cascade Lake Xeon, CLX: Intel® Xeon® Platinum 8280 Processor 38.5M Cache, 2.70 GHz, 28 cores Ice Lake Xeon, ICX: Intel® Xeon® Platinum 8380 Processor 60MB Cache, 2.40 GHz, 40 cores [bioRxiv’21] MM2-Fast Branch in Minimap2 repo In collaboration with Dr Heng Li, Author Minimap2 Reference genome: GRCh38; Read dataset: ONT, PacBio HiFi and PacBio CLR datasets derived from human trio benchmark genomes HG002, HG003 and HG004 as given at https://precision.fda.gov/challenges/10/view and https://github.com/genome-in-a-bottle/giab_data_indexes Minimap2 has > 100k Downloads
  • 17. Intel All.AI 2021 @ Population Scale Virtual Summit 17 9x speedup for Analysis of Single Cell ATAC- SEQ Data Denoising and peak calling on noisy ATAC-Seq data Cascade Lake Xeon, CLX: Intel® Xeon® Platinum 8280 Processor 38.5MB Cache, 2.70 GHz, 28 cores Cooper Lake Xeon, CPX: Intel® Xeon® Platinum 8380H Processor 38.5MB Cache, 2.90 GHz, 28 cores Ice Lake Xeon, ICX: Intel® Xeon® Platinum 8380 Processor 60MB Cache, 2.40 GHz, 40 cores Higher is better 1.8x 2.3x Source of Clara Parabricks performance: [Nvidia, 2020] AtacWorks: A deep convolutional neural network toolkit for epigenomics 2.3x speedup over NVIDIA Clara Parabricks on DGX-1 box (8 card V100) with 16 sockets of Cooper Lake 1.8x speedup over NVIDIA Clara Parabricks on DGX-1 box (8 card V100) with 16 sockets of Ice Lake [arXiv’21, bioRxiv’21]
  • 18. Intel All.AI 2021 @ Population Scale Virtual Summit 18
  • 19. Intel All.AI 2021 @ Population Scale Virtual Summit 19 Brain tumor segmentation finds tumors from MRIs Sheller, M.J., Edwards, B., Reina, G.A. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep 10, 12598 (2020). Intel-UPenn Collaboration How much better does each institution do when training on the full data vs. just their own data? 17% BETTER 2.6% BETTER on their own validation data on the hold-out BraTS data Other names and brands may be claimed as the property of others
  • 20. Intel All.AI 2021 @ Population Scale Virtual Summit 20 1. Privacy Preserved Machine Learning for data and model privacy / protection 2. Privacy/Confidentiality Preservation 3. Attestation and integrity 4. Federation deployment 5. Federated nodes software stacks for TTM 6. Curation tools and deployment automation github.com/intel/openfl openfl.readthedocs.io/  Enables greatest access to data  Any company can host a privacy preserved federation  Complete software and platform offering time to market deployment
  • 21. Intel All.AI 2021 @ Population Scale Virtual Summit 21 : a Benchmark Suite For  Many GenomicsBench benchmarks have abundant data parallelism, but significant irregularity makes it challenging to achieve good performance.  12 representative kernels spanning the major steps in short-read and long-read sequence analysis pipelines  FM-index, Banded Smith-Waterman, deBruijn graphs, Pair HMM, DP Chaining, SIMD Partial Order Alignment, Adaptive Banded Signal to Event Alignment, Genomic Relationship Matrix, Neural networks based Basecalling, Neural networks based variant calling, Kmer counting, Pileup counting Open-sourced and under active development: https://github.com/arun-sub/genomicsbench Xeon Optimized implementations of kernels under active development at: https://github.com/IntelLabs/Trans-Omics-Acceleration-Library
  • 22. Intel All.AI 2021 @ Population Scale Virtual Summit 22 DISCUSSIONS

Editor's Notes

  1. AI-Driven HPC Research: A first of its kind Deep Learning approach to learn parameters that govern stellar evolution for Red Giant Stars, achieving average inference time of 5ms/star on Intel® Xeon® Platinum 8280, much faster (>10000x) than current SOTA methods based auto-correlation and MCMC: The power spectra of red giant stars are studied for better understanding of interiors and evolution of stars. The Kepler and TESS space missions have provided a vast set of red giant light curves data, and such data sets are expected to grow exponentially with future missions such as PLATO. There is a need to analyze such data accurately and efficiently at scale to enhance the understanding of physics of stars. For this, working in collaboration with cross-geo group of scientists, led by Tata Institute of Fundamental Research in India, we have developed a Deep Learning approach that can learn various parameters that govern the complex behavior of such stellar evolution. We train the networks using simulated data on a single node Intel® Xeon® Platinum 8280. Inference on a star takes average 5 milliseconds, which is 10000x faster than auto-correlation based methods, and 1000000x faster than MCMC methods. To the best of our knowledge, we are the first one to develop such efficient machine learning approach to analyze red giant stars. We have been invited to submit the paper to Science Advances scientific journal (impact factor 14.4).   Our network consists of six 1D convolution layers, followed by two LSTM layers and one dense layer. We apply categorical cross entropy loss and ADAM optimizer for backpropagation. The network takes a normalized power spectrum as input and outputs a probability (confidence score) of a parameter to be in a bin (range of values). Currently, we focus on learning the marginal distribution of three seismic parameters, namely, frequency separation ∆ν, period separation ∆Π, and peak frequency ν_max, using separate networks for each such parameter. Training time takes ~50 node hours for each seismic parameter on a single node Xeon cascade lake with 56 cores using Tensorflow.   Our learned model is accurate distinguishing red giants from noise by analyzing the spectra of real stars. It has a precision of 87% and recall 86%. The false positive rates are dominated by non-solar-like pulsator stars. Additionally, our model can discover new potential red giants. After eliminating false positives by visual inspection, we detect ~25 new red giants (validated this through various catalogues). Finally, our model can infer the relationship among various such seismic parameters, e.g., strong linear correlation between ∆ν and ∆Π (well-established in physics), and the relationship between ∆ν and ν_max that is observed in other studies. First figure below: The red points are predicted (∆ν, ν_max) and green band maps the relation observed in other studies; second figure: Prediction results (along with confidence) of our model on real stars.
  2. AI is inferring laws of physics, unravelling complex phenomena, giving human super-human capabilities to see. Every time humans have seen more , world has transformed (think astronomy, microscopy). Now that is happening to biology …. With increased resolution and sense making …. We can begin to understand mechanisms behind how biological systems work …understand how diseases happen, how different characteristics evolve
  3. Even after decades of work, we knew structure for only about ~4K proteins and then overnight … with AI (AlphaFold), 20000 Human Protein structures were decoded. Using data, AI is beginning to unravel complex phenomena. Imagine ….we can engineer biological systems and give ourselves capabilities/materials that otherwise biology discovers in thousands or even millions of years of evolution
  4. Biological data is going to be the largest dataset on the planet >> YouTube with for example billions of genomes getting sequenced routinely….. We will need massive leaps in computational power State of the art platforms today can do < 10 Whole Genome Sequences in a day, we need > 1000x leap in computational power to do all kinds of omics, rapidly to realize the vision of precision medicine. Similarly, to design new material or drugs …. Search space is orders of magnitudes greater than number of web pages == massive compute appetite Next Frontier in AI --- Search & Combinatorial Optimization e.g. Search for Novel Molecules > O (10^60) Search space for Protein Design: O (10^130) Number of webpages on Internet: O (10^9)
  5. CLX: Cascade Lake Xeon CPX: Cooper Lake Xeon 1-D convolutions are specially important to digital biology due to sequence data Nvidia performance source for 1D convolutions: [Nvidia, 2020] AtacWorks: A deep convolutional neural network toolkit for epigenomics