SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
WuXi NextCODE Scales Up
Genomic Sequencing in AWS
Hákon Guðbjartsson, Ph.D.
Chief Informatics Officer,
WuXi NextCODE
hakon@wuxinextcode.com
A N T 2 1 0 - S
Jonsi Stefansson
Cloud Data Services CTO, NetApp
jonsi.Stefansson@netapp.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Life Sciences Has a New Challenge…DATA!
The exponential growth of genomic data is challenging the industry to develop new and better
ways to manage and mine truly ‘big’ data.
Source: PLOSBiology
2 EB
Growth of
genomic
data by
2025
Genomes
Sequenced
40 Exabytes (EB)/yr
100M– 2B
Cost effective sequencing
has been solved…
…the new challenge
is data growth
1000GB=1TB, 1000TB=1PT, 1000PT=1EB
Average price for
WGS now < $1000
(1 EB = 1x106 TB)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How does HTS data look?
Showing data for four samples in the BRCA2 gene (1/300k of the genome)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sequence reads and consistent variations
Showing the first exon in BRCA2 (1/6M of the genome)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The sequence of the double stranded helix
Many of the differences are noise. Reading frames show possible amino acids.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
.
.
.
dis.1
dis.N
subj.
g. pos
subj.
g. pos
subj.
g. pos
GW-STRs FM-SNPs GW-chip SNPs SEQ
Completeness of sequence data promotes reuse
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The WuXi NextCODE Difference
Our purpose-built platform and its breadth and depth of differentiated capabilities
sets us apart
(Genomically Ordered Relational database)
The world’s scalable digital platform purpose-built for the genome and population health
ROBUST COHORT
SOURCING
HIGHEST QUALITY
SEQUENCING
SCALABLE DATA
INTEGRATION
BEST-IN-CLASS DEEP
LEARNING + A.I.
Fueled by the GORdb™ platform
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
WXNC Platform Overview
A single digital platform built from the ground-up for population
scale genomics.
APP LAYER
Domain specific applications that are
powered by the WXNC platform
API + SDK
An intuitive API + SDK to allow
customization of app layer capabilities
DATA
A single layer to combine user data with
WXNC’s KnowledgeBASE of proprietary
and reference databases
GORdb™
A single digital architecture to power
WXNC’s end-to-end genomics capabilities
GORdb™
API + SDK
KnowledgeBASE Client Data
DiscoveryCODE CustomAppsPhenoCODECancerCODE RareCODE
+
NFS - NetApp cloud volumes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Our secure AWS system architecture
VPC or SAS
AWS Direct Connect
Co-location
NetApp Cloud
Volumes Service
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Web Services
Why WXNC has selected to work with AWS
Accessibility
Services built to store and retrieve data
from anywhere in the world. Availability
in multiple countries.
Reliability
Redundancy to achieve 99.999999999
durability. Available across multiple zones.
High performance
Elastic environments that automatically
scale.
Battle tested solutions.
Compliant
Comprehensive security suite. US and
global security compliance
Rich solution ecosystem
Amazon RDS for Postgres SQL and Oracle; Cloud
storage service such as Amazon S3, Amazon Glacier,
Amazon EFS; AWS Partner Network: NetApp Cloud
Volumes; Edico secondary analysis.
Popularity
The preferred cloud platform for most of
our pharma customers.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Platform connecting research to the clinic
Samples ingested in CSA are automatically available in the DiscoveyCODE platform for research
Perform Genomic
Assay
Identify
Variants
Annotate with Clinical
Knowledgebase
Determine Variant
Impact
Aggregate Patient
Samples Perform Statistical
Analysis on Cohorts
Identify
Biomarkers
Populate Clinical
Knowledgebase
Generate
Report
CSA The DiscoveryCODE Platform
Clinical Care Research Discovery
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimize. Mine. Share.
Fuel your organization’s discovery and development pipeline with a comprehensive analytical
suite built into the GORdb™ platform.
Discover genomic biomarkers from cohorts
of individuals
Discover genomic classifiers for patient
enrollment in clinical trials
De novo biomarker discovery
Phase 1
Discover genomic biomarkers associated
with adverse effects
Phase 2+
Discover genomic biomarkers associated
with adverse effects and for
responders/non-responders
Phase 3
Develop genomic classifiers for companion
IVD submission with an IND
DISCOVERY
DEVELOPMENT
Application
Suite
Clinical Sequence
Analyzer™
Sequence
Miner™
Artificial
Intelligence
PhenoCODE™
…and more
to come
+
GORdb™
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Validated A.I. in Genomics
WXNC is leading genomics A.I. with its suite of validated and published methods and algorithms.
CARDIOVASCULAR
Validated our machine learning
capability in published in vitro
models
METABOLIC
Classified all variants of
targeted genes for childhood
obesity drug and companion
diagnostic development
ONCOLOGY
Identified a signal predictive
of survival across 21 cancers
using RNA, CNV, and
methylation data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Clinical Sequence Analysis
The nature of the feedback loop for variant interpretation will change.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data driven diagnosis of rare diseases
High impact variants in genes that “match” the signs & symptoms
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1.8M
1400
173
Blindness, Deafness, Diaphragmatic Weakness
1
Allele Freq <2%
VEP: MOD/LOF
Candidate Genes +
Paralogs
Recessive
All Variants
All Shared Variants
https://www.youtube.com/watch?v=kaTlGr0bHSk
Clinical Diagnostics Example
Ending a 5-year odyssey in minutes for 2 sisters
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Law of Diminishing Return
The standard deviation in estimates has inverse square root behavior
Rare
disease
Common
disease
Sample size
Precision
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Rare Exchange
Use-cases related to exchange of information
ACMG variant curations
Enable crowd-sourcing of variant curations
across hospitals and organizations.
Sample match-maker
Enable exploration of the availability of
samples that overlap with index-case in
phenotypes or rare high-impact variants in
gene.
Delegate analysis
Enable user to move his case study
seamlessly to another organization that has
expertise and data in the disease of interest.
Aggregate data sharing
Share information across organizations
such as AF, GTF, segregated by predefined
traits and by dynamic match-making
definition.
This can be on a variant level, or gene-level
Study/sample sharing
Share an entire study with its associated
phenotypes and genomic data to another
organization.
Blind analysis
Temporarily move one or more samples to
other RareCode systems and perform
genome wide burden analysis, inheritance
labeling etc.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
An epilepsy study example
Even small patient cohorts can lead to identification of new causal genes
Whole exome sequencing of 117 undiagnosed patients with epilepsy (41 trios and 76 singletons,
including epileptic encephalopathies (83 patients), febrile-infection related epilepsy, Rasmussen
encephalitis, and other focal and generalized epilepsies).
Results: Likely pathogenic variants Identified for epilepsy for 33% (n=39) of patients;
In a further 35% (n=41), potentially clinically relevant variants in candidate genes.
accounted for 33% of the variants identified: KCNQ2 (n=8) most frequently, followed by
SCN2A (n=4).
Epileptic encephalopathies associated with genetic variants in recently characterized genes
FGF12, GNAO1, ITPA, KCNB1, KCNH1, MBD5, PTPN23, RHOBTB2, SYNGAP1, and WWOX
Expanded the phenotype of genes known to be linked to intellectual disability DYNC1H1,
HUWE1, KAT6A and PTCHD
Anne Rochtus, Meredith Park, Lacey Smith, Alan Taylor, Christelle El Achkar, Shira Rockowitz, Beth Rosen Sheidley,
Annapurna Poduri
Boston Children’s Hospital and Harvard Medical School using WXNC Clinical Sequence Analyzer and Sequence
Miner
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is GORdb?
GORdb provides for genomic data
the data abstraction and query
functionality which conventional
RDBMS provide for regular
business data.
A relational database for genomic data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The GORql syntax
Influenced by SQL and shell pipe commands
SQLUnix
commands
GORql pipe syntax
+
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Genomic Ordering Enables Rapid Queries
- GORql = SQL + Unix bash
- Allows for targeted querying and streaming
- Fast analysis and data updates
- Normalized schema designs for all genomic data
- Elastic scaling
- Parallel execution
- Materialized views
- External commands
GORdb™
GenomicAxis
Partition Axis
Targeted genomic
queries based on
chromosomal
coordinates
SPEED SCALABILITY
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GOR Genotype Files
VCF2GOR for sparse alle row format and non-sparse horizontal layout
Transposed GT file with variant listed for all PNs0 = hom ref, good cov
1 = het, good cov
2 = hom, good cov
NA = poor cov, No Call
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SQL vs GORql pipe syntax
Calculating transitivity vs transversion ratio – example taken from Google BigQuery
140 million rows, takes ~ 5 seconds:
create #t# = pgor –split 100 #dbsnp# | where len(ref)=1 and len(alt)=1
| calc transition = if(ref+’>’+alt in (’A>G’,’G>A’,’C>T’,’T>C’),1,0)
| calc transversion = 1 - transition
| group 100000 -sum -ic transition,transversion;
gor [#t#] | group 100000 –sum –ic sum_*
| calc TiTv_ratio = float(sum_sum_transition)/sum_sum_transversion
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GORdb™ outperforms SparkSQL in genomic joins
The rapid, real-time querying relational database purpose-built for big genomic data.
TimeinSec
GORdb™ Outperforms Spark in
Genomic Queries
0
50
100
150
200
250
300
Q1 Q2 Q3 Q4 Q5 Q6
SparkSQL GORdb
QUERY 2 Retrieve
dbSNP data based on
overlap with genes (e.g.
90k rows of
overlapping genes
named BR*)
QUERY 3 Retrieve
dbSNP data based
on overlap with
exons (i.e., more
segments)
QUERY 4 Variant lookup
by joining variants from
dbSNP with a set of
variants based on rsIDs
(e.g. all rs22* giving ~100k
rows)
QUERY 5
Aggregating
counts of variants
in dbSNP based on
sequence structure
QUERY 1 Retrieve
dbSNP data and filter
based on genomic
range (e.g. 281 rows
from a region in
chr19)
The Genomically-Ordered Relational (GOR)
Database
• Developed by WXNC specifically for large volumes of
genomic data
• Database and query language optimized for genomic data
streaming
• Genomic coordinate indexing
• On-the-fly data joins
• Instant updates
GORdb™ provides for genomics the data abstraction and query
functionality which conventional RDBMS provides for regular
business data
RDBMS = relational database management software
QUERY 6 Calculate
transition-
transversion ratio
for all of dbSNP
(>100million rows)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The benefits of NetApp Cloud Volumes
Moving from self-managed NFS storage to NetApp Cloud Volumes was seamless
Less complexity
Moved from 15 EBS volumes
fronted with 3 large NFS servers
to a single NetApp service
endpoint.
Easy onboarding
Copied 50Tb of data in more than
2 million files in less than two
days.
Performance
Reading mutation data from 100k
samples with 1024 cores was 3x
faster than with self-managed
solution
Easier management
Backups, and data cloning for test
environments.
Reliability
Advanced RAID-DP technology
provide more reliability for large
data sets.
Tailors to GORdb needs
Transparent NFS caching
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hákon Guðbjartsson, Ph.D.
Chief Informatics Officer
hakon@wuxinextcode.com
Jonsi Stefansson
Cloud Data Services CTO, NetApp
jonsi.Stefansson@netapp.com

More Related Content

What's hot

Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...
Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...
Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...
Amazon Web Services
 
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
Amazon Web Services
 
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
Amazon Web Services
 
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Amazon Web Services
 
Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...
Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...
Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...
Amazon Web Services
 
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Amazon Web Services
 
Architecting for Healthcare Compliance on AWS (HLC301-i) - AWS re:Invent 2018
Architecting for Healthcare Compliance on AWS (HLC301-i) - AWS re:Invent 2018Architecting for Healthcare Compliance on AWS (HLC301-i) - AWS re:Invent 2018
Architecting for Healthcare Compliance on AWS (HLC301-i) - AWS re:Invent 2018
Amazon Web Services
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Amazon Web Services
 
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Amazon Web Services
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Amazon Web Services
 
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
Amazon Web Services
 
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
Amazon Web Services
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
Amazon Web Services
 
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
Amazon Web Services
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
Amazon Web Services
 
How Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsHow Amazon.com uses AWS Analytics
How Amazon.com uses AWS Analytics
Amazon Web Services
 
Using data lifecycle management
Using data lifecycle managementUsing data lifecycle management
Using data lifecycle management
Interfacing
 
Access Control in AWS Glue Data Catalog (ANT376) - AWS re:Invent 2018
Access Control in AWS Glue Data Catalog (ANT376) - AWS re:Invent 2018Access Control in AWS Glue Data Catalog (ANT376) - AWS re:Invent 2018
Access Control in AWS Glue Data Catalog (ANT376) - AWS re:Invent 2018
Amazon Web Services
 
Leadership Session: Overview of Amazon Digital User Engagement Solutions (DIG...
Leadership Session: Overview of Amazon Digital User Engagement Solutions (DIG...Leadership Session: Overview of Amazon Digital User Engagement Solutions (DIG...
Leadership Session: Overview of Amazon Digital User Engagement Solutions (DIG...
Amazon Web Services
 
Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018
Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018
Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018
Amazon Web Services
 

What's hot (20)

Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...
Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...
Integrate Amazon WorkDocs with Security & Compliance Solutions & Applications...
 
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
AI/ML with Data Lakes: Counterintuitive Consumer Insights in Retail (RET206) ...
 
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
 
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
 
Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...
Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...
Redshift Advisor Quick Start: Recommendations on Tuning Your Data Warehouse (...
 
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
 
Architecting for Healthcare Compliance on AWS (HLC301-i) - AWS re:Invent 2018
Architecting for Healthcare Compliance on AWS (HLC301-i) - AWS re:Invent 2018Architecting for Healthcare Compliance on AWS (HLC301-i) - AWS re:Invent 2018
Architecting for Healthcare Compliance on AWS (HLC301-i) - AWS re:Invent 2018
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
 
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
Supercell – Scaling Mobile Games (GAM301) - AWS re:Invent 2018
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
 
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
 
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
How Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsHow Amazon.com uses AWS Analytics
How Amazon.com uses AWS Analytics
 
Using data lifecycle management
Using data lifecycle managementUsing data lifecycle management
Using data lifecycle management
 
Access Control in AWS Glue Data Catalog (ANT376) - AWS re:Invent 2018
Access Control in AWS Glue Data Catalog (ANT376) - AWS re:Invent 2018Access Control in AWS Glue Data Catalog (ANT376) - AWS re:Invent 2018
Access Control in AWS Glue Data Catalog (ANT376) - AWS re:Invent 2018
 
Leadership Session: Overview of Amazon Digital User Engagement Solutions (DIG...
Leadership Session: Overview of Amazon Digital User Engagement Solutions (DIG...Leadership Session: Overview of Amazon Digital User Engagement Solutions (DIG...
Leadership Session: Overview of Amazon Digital User Engagement Solutions (DIG...
 
Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018
Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018
Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018
 

Similar to WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent 2018

Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
DataWorks Summit
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
Sanjay Padhi, Ph.D
 
Drug Discovery Innovation in a Precompetitive Cloud Platform (LFS302-S) - AWS...
Drug Discovery Innovation in a Precompetitive Cloud Platform (LFS302-S) - AWS...Drug Discovery Innovation in a Precompetitive Cloud Platform (LFS302-S) - AWS...
Drug Discovery Innovation in a Precompetitive Cloud Platform (LFS302-S) - AWS...
Amazon Web Services
 
Centralizing Data to Address Imperatives in Clinical Development
Centralizing Data to Address Imperatives in Clinical DevelopmentCentralizing Data to Address Imperatives in Clinical Development
Centralizing Data to Address Imperatives in Clinical Development
Saama
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftThe Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
Databricks
 
Sdl use cases
Sdl use casesSdl use cases
Sdl use cases
John Rueter
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
confluent
 
Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun...
Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun...Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun...
Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun...
Databricks
 
Appistry WGDAS Presentation
Appistry WGDAS PresentationAppistry WGDAS Presentation
Appistry WGDAS Presentation
elasticdave
 
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptxAWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
Nolan Nichols
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Sage Base
 
Enabling patient-centricity-pfizer
Enabling patient-centricity-pfizerEnabling patient-centricity-pfizer
Enabling patient-centricity-pfizer
David Teszler
 
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...
Amazon Web Services
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
Neo4j
 
ABD209_Accelerating the Speed of Innovation with a Data Sciences Data & Analy...
ABD209_Accelerating the Speed of Innovation with a Data Sciences Data & Analy...ABD209_Accelerating the Speed of Innovation with a Data Sciences Data & Analy...
ABD209_Accelerating the Speed of Innovation with a Data Sciences Data & Analy...
Amazon Web Services
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
Amazon Web Services
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big Value
Thomas Kelly, PMP
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 

Similar to WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent 2018 (20)

Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive ModelMoving Health Care Analytics to Hadoop to Build a Better Predictive Model
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Drug Discovery Innovation in a Precompetitive Cloud Platform (LFS302-S) - AWS...
Drug Discovery Innovation in a Precompetitive Cloud Platform (LFS302-S) - AWS...Drug Discovery Innovation in a Precompetitive Cloud Platform (LFS302-S) - AWS...
Drug Discovery Innovation in a Precompetitive Cloud Platform (LFS302-S) - AWS...
 
Centralizing Data to Address Imperatives in Clinical Development
Centralizing Data to Address Imperatives in Clinical DevelopmentCentralizing Data to Address Imperatives in Clinical Development
Centralizing Data to Address Imperatives in Clinical Development
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftThe Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
 
Sdl use cases
Sdl use casesSdl use cases
Sdl use cases
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
 
Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun...
Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun...Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun...
Apache Spark + AI Helps and FDA Protects the Nation with Jonathan Chu and Kun...
 
Appistry WGDAS Presentation
Appistry WGDAS PresentationAppistry WGDAS Presentation
Appistry WGDAS Presentation
 
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptxAWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
AWS HCLS Virtual Symposium 2021_Maze-Nichols.pptx
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
 
Enabling patient-centricity-pfizer
Enabling patient-centricity-pfizerEnabling patient-centricity-pfizer
Enabling patient-centricity-pfizer
 
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 
ABD209_Accelerating the Speed of Innovation with a Data Sciences Data & Analy...
ABD209_Accelerating the Speed of Innovation with a Data Sciences Data & Analy...ABD209_Accelerating the Speed of Innovation with a Data Sciences Data & Analy...
ABD209_Accelerating the Speed of Innovation with a Data Sciences Data & Analy...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big Value
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. WuXi NextCODE Scales Up Genomic Sequencing in AWS Hákon Guðbjartsson, Ph.D. Chief Informatics Officer, WuXi NextCODE hakon@wuxinextcode.com A N T 2 1 0 - S Jonsi Stefansson Cloud Data Services CTO, NetApp jonsi.Stefansson@netapp.com
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Life Sciences Has a New Challenge…DATA! The exponential growth of genomic data is challenging the industry to develop new and better ways to manage and mine truly ‘big’ data. Source: PLOSBiology 2 EB Growth of genomic data by 2025 Genomes Sequenced 40 Exabytes (EB)/yr 100M– 2B Cost effective sequencing has been solved… …the new challenge is data growth 1000GB=1TB, 1000TB=1PT, 1000PT=1EB Average price for WGS now < $1000 (1 EB = 1x106 TB)
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How does HTS data look? Showing data for four samples in the BRCA2 gene (1/300k of the genome)
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sequence reads and consistent variations Showing the first exon in BRCA2 (1/6M of the genome)
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The sequence of the double stranded helix Many of the differences are noise. Reading frames show possible amino acids.
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. . . . dis.1 dis.N subj. g. pos subj. g. pos subj. g. pos GW-STRs FM-SNPs GW-chip SNPs SEQ Completeness of sequence data promotes reuse
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The WuXi NextCODE Difference Our purpose-built platform and its breadth and depth of differentiated capabilities sets us apart (Genomically Ordered Relational database) The world’s scalable digital platform purpose-built for the genome and population health ROBUST COHORT SOURCING HIGHEST QUALITY SEQUENCING SCALABLE DATA INTEGRATION BEST-IN-CLASS DEEP LEARNING + A.I. Fueled by the GORdb™ platform
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. WXNC Platform Overview A single digital platform built from the ground-up for population scale genomics. APP LAYER Domain specific applications that are powered by the WXNC platform API + SDK An intuitive API + SDK to allow customization of app layer capabilities DATA A single layer to combine user data with WXNC’s KnowledgeBASE of proprietary and reference databases GORdb™ A single digital architecture to power WXNC’s end-to-end genomics capabilities GORdb™ API + SDK KnowledgeBASE Client Data DiscoveryCODE CustomAppsPhenoCODECancerCODE RareCODE + NFS - NetApp cloud volumes
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Our secure AWS system architecture VPC or SAS AWS Direct Connect Co-location NetApp Cloud Volumes Service
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Web Services Why WXNC has selected to work with AWS Accessibility Services built to store and retrieve data from anywhere in the world. Availability in multiple countries. Reliability Redundancy to achieve 99.999999999 durability. Available across multiple zones. High performance Elastic environments that automatically scale. Battle tested solutions. Compliant Comprehensive security suite. US and global security compliance Rich solution ecosystem Amazon RDS for Postgres SQL and Oracle; Cloud storage service such as Amazon S3, Amazon Glacier, Amazon EFS; AWS Partner Network: NetApp Cloud Volumes; Edico secondary analysis. Popularity The preferred cloud platform for most of our pharma customers.
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Platform connecting research to the clinic Samples ingested in CSA are automatically available in the DiscoveyCODE platform for research Perform Genomic Assay Identify Variants Annotate with Clinical Knowledgebase Determine Variant Impact Aggregate Patient Samples Perform Statistical Analysis on Cohorts Identify Biomarkers Populate Clinical Knowledgebase Generate Report CSA The DiscoveryCODE Platform Clinical Care Research Discovery
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Optimize. Mine. Share. Fuel your organization’s discovery and development pipeline with a comprehensive analytical suite built into the GORdb™ platform. Discover genomic biomarkers from cohorts of individuals Discover genomic classifiers for patient enrollment in clinical trials De novo biomarker discovery Phase 1 Discover genomic biomarkers associated with adverse effects Phase 2+ Discover genomic biomarkers associated with adverse effects and for responders/non-responders Phase 3 Develop genomic classifiers for companion IVD submission with an IND DISCOVERY DEVELOPMENT Application Suite Clinical Sequence Analyzer™ Sequence Miner™ Artificial Intelligence PhenoCODE™ …and more to come + GORdb™
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Validated A.I. in Genomics WXNC is leading genomics A.I. with its suite of validated and published methods and algorithms. CARDIOVASCULAR Validated our machine learning capability in published in vitro models METABOLIC Classified all variants of targeted genes for childhood obesity drug and companion diagnostic development ONCOLOGY Identified a signal predictive of survival across 21 cancers using RNA, CNV, and methylation data
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Clinical Sequence Analysis The nature of the feedback loop for variant interpretation will change.
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data driven diagnosis of rare diseases High impact variants in genes that “match” the signs & symptoms
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1.8M 1400 173 Blindness, Deafness, Diaphragmatic Weakness 1 Allele Freq <2% VEP: MOD/LOF Candidate Genes + Paralogs Recessive All Variants All Shared Variants https://www.youtube.com/watch?v=kaTlGr0bHSk Clinical Diagnostics Example Ending a 5-year odyssey in minutes for 2 sisters
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The Law of Diminishing Return The standard deviation in estimates has inverse square root behavior Rare disease Common disease Sample size Precision
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Rare Exchange Use-cases related to exchange of information ACMG variant curations Enable crowd-sourcing of variant curations across hospitals and organizations. Sample match-maker Enable exploration of the availability of samples that overlap with index-case in phenotypes or rare high-impact variants in gene. Delegate analysis Enable user to move his case study seamlessly to another organization that has expertise and data in the disease of interest. Aggregate data sharing Share information across organizations such as AF, GTF, segregated by predefined traits and by dynamic match-making definition. This can be on a variant level, or gene-level Study/sample sharing Share an entire study with its associated phenotypes and genomic data to another organization. Blind analysis Temporarily move one or more samples to other RareCode systems and perform genome wide burden analysis, inheritance labeling etc.
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. An epilepsy study example Even small patient cohorts can lead to identification of new causal genes Whole exome sequencing of 117 undiagnosed patients with epilepsy (41 trios and 76 singletons, including epileptic encephalopathies (83 patients), febrile-infection related epilepsy, Rasmussen encephalitis, and other focal and generalized epilepsies). Results: Likely pathogenic variants Identified for epilepsy for 33% (n=39) of patients; In a further 35% (n=41), potentially clinically relevant variants in candidate genes. accounted for 33% of the variants identified: KCNQ2 (n=8) most frequently, followed by SCN2A (n=4). Epileptic encephalopathies associated with genetic variants in recently characterized genes FGF12, GNAO1, ITPA, KCNB1, KCNH1, MBD5, PTPN23, RHOBTB2, SYNGAP1, and WWOX Expanded the phenotype of genes known to be linked to intellectual disability DYNC1H1, HUWE1, KAT6A and PTCHD Anne Rochtus, Meredith Park, Lacey Smith, Alan Taylor, Christelle El Achkar, Shira Rockowitz, Beth Rosen Sheidley, Annapurna Poduri Boston Children’s Hospital and Harvard Medical School using WXNC Clinical Sequence Analyzer and Sequence Miner
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is GORdb? GORdb provides for genomic data the data abstraction and query functionality which conventional RDBMS provide for regular business data. A relational database for genomic data
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The GORql syntax Influenced by SQL and shell pipe commands SQLUnix commands GORql pipe syntax +
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Genomic Ordering Enables Rapid Queries - GORql = SQL + Unix bash - Allows for targeted querying and streaming - Fast analysis and data updates - Normalized schema designs for all genomic data - Elastic scaling - Parallel execution - Materialized views - External commands GORdb™ GenomicAxis Partition Axis Targeted genomic queries based on chromosomal coordinates SPEED SCALABILITY
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. GOR Genotype Files VCF2GOR for sparse alle row format and non-sparse horizontal layout Transposed GT file with variant listed for all PNs0 = hom ref, good cov 1 = het, good cov 2 = hom, good cov NA = poor cov, No Call
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SQL vs GORql pipe syntax Calculating transitivity vs transversion ratio – example taken from Google BigQuery 140 million rows, takes ~ 5 seconds: create #t# = pgor –split 100 #dbsnp# | where len(ref)=1 and len(alt)=1 | calc transition = if(ref+’>’+alt in (’A>G’,’G>A’,’C>T’,’T>C’),1,0) | calc transversion = 1 - transition | group 100000 -sum -ic transition,transversion; gor [#t#] | group 100000 –sum –ic sum_* | calc TiTv_ratio = float(sum_sum_transition)/sum_sum_transversion
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. GORdb™ outperforms SparkSQL in genomic joins The rapid, real-time querying relational database purpose-built for big genomic data. TimeinSec GORdb™ Outperforms Spark in Genomic Queries 0 50 100 150 200 250 300 Q1 Q2 Q3 Q4 Q5 Q6 SparkSQL GORdb QUERY 2 Retrieve dbSNP data based on overlap with genes (e.g. 90k rows of overlapping genes named BR*) QUERY 3 Retrieve dbSNP data based on overlap with exons (i.e., more segments) QUERY 4 Variant lookup by joining variants from dbSNP with a set of variants based on rsIDs (e.g. all rs22* giving ~100k rows) QUERY 5 Aggregating counts of variants in dbSNP based on sequence structure QUERY 1 Retrieve dbSNP data and filter based on genomic range (e.g. 281 rows from a region in chr19) The Genomically-Ordered Relational (GOR) Database • Developed by WXNC specifically for large volumes of genomic data • Database and query language optimized for genomic data streaming • Genomic coordinate indexing • On-the-fly data joins • Instant updates GORdb™ provides for genomics the data abstraction and query functionality which conventional RDBMS provides for regular business data RDBMS = relational database management software QUERY 6 Calculate transition- transversion ratio for all of dbSNP (>100million rows)
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The benefits of NetApp Cloud Volumes Moving from self-managed NFS storage to NetApp Cloud Volumes was seamless Less complexity Moved from 15 EBS volumes fronted with 3 large NFS servers to a single NetApp service endpoint. Easy onboarding Copied 50Tb of data in more than 2 million files in less than two days. Performance Reading mutation data from 100k samples with 1024 cores was 3x faster than with self-managed solution Easier management Backups, and data cloning for test environments. Reliability Advanced RAID-DP technology provide more reliability for large data sets. Tailors to GORdb needs Transparent NFS caching
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 31. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hákon Guðbjartsson, Ph.D. Chief Informatics Officer hakon@wuxinextcode.com Jonsi Stefansson Cloud Data Services CTO, NetApp jonsi.Stefansson@netapp.com