dell-emc-powerscale-for-ngs.pptx

Dell EMC UDS for Next-
Generation Sequencing
May 2020

© Copyright 2020 Dell Inc.
2
What is Next-Generation
Sequencing?
—
Identify key problems:
• Rapid Analysis
• Data Management
—
Why Dell EMC Storage
—
Architectures
—
Compression (Petagene)
—
Wrap-up
Overview

3
What is NGS?

4
Why sequence DNA?
Accelerate drug
discoveries
Identify mutations
causing disease
Practice personalized
medicine
Sequencing the human genome was first completed in 2003
Every human has
~6.4B
bases (ATCGs)

5
Low cost whole genome
sequencing
Applications and biological
studies never before possible
Driving advancements
in Next-Generation
Sequencing
5

6
FASTQ
FASTQ
FASTQ
FASTQ
PRIMARY ANALYSIS SECONDARY ANALYSIS
mapping, alignment and variant calling
TERTIARY ANALYSIS
Biological Data
Clinical Data
Lab Data
VCF
VCF
BAM
BAM
BAM
BAM
BAM
VCF
VCF
NGS INSTRUMENT COMPUTE AND STORAGE ANALYTICS

7
Key challenges
Enabling
rapid analysis
Efficiently managing
petabytes of data
7

8
Rapid analysis
8

9
Increasing Analysis Speed
Mix & match
Trade offs
Rack/DC space Expertise
“Other” required analyses
Business SLA
Budget Operations
Methods & validation
…
Existing infrastructure
PARABRICKS
ADD MORE CORES REWRITE SOFTWARE USE ACCELERATORS
9

10
NGS
INSTRUMENT
MAPPING, ALIGNMENT &
VARIANT CALLING
VARIANT
CALL FILES
4M POINTS
OF INTEREST
FASTQ
FASTQ
FASTQ
FASTQ
BAM
BAM
BAM
BAM
VCF
VCF
VCF
VCF
NVIDIA Parabricks accelerated secondary analysis
Uses GPUs (cloud/
on-premise) for computing
HPC cluster for running
entire analysis
Reduces cost of
computing significantly
+ +

11
Efficient data management
11

12
G E N E R A T E
HDD
Data life cycle approach
A N A L Y Z E A R C H I V E
PowerScale / File
ECS / Object
Flash

13
Why Dell EMC Storage?

14
The potential is within your unstructured data to
IoT
Archiving
AI / ML
Home directories
Data analytics
Block chain
File shares
Video files
EDA
Energy Financial
services
Media &
entertainment
Any data-driven
business
Manufacturing
Life
sciences
14 of 35
Internal Use - Confidential

15
Unlocking the power of OneFS
15
2020
+ DELL EMC POWEREDGE
Deployment agility Use Case Flexibility Accelerated Innovation
of 35
+

16
PowerScale
Introducing
Unlock the potential of your data
Add nodes in 60 seconds.
Auto-discover, auto-balance.
Migration-free design.
Flexible file and object access.
Software-defined architecture.
Extends to edge and cloud.
Intelligent software for all.
CloudIQ for infrastructure insights.
DataIQ for data insights.
Simplicity at any scale Any data anywhere Intelligent insights
16
Internal Use - Confidential of 35

17
Simplicity at any scale
The core strengths of OneFS are brought forward into PowerScale OneFS
Start small
Grow to petabyte scale
Swap in new nodes in 60
seconds, decommission old
nodes - with no downtime
Utilize new Ansible and
Kubernetes integrations
DevOps ready
Any scale
Scale-out architecture
ensures no hot spots
AutoBalance ®
No node left behind
Increased inline data
reduction capabilities
Efficient
Sustain multi-node
failures with no downtime
Resilient
Internal Use - Confidential 17 of 35

18
The new PowerScale family
A200 | A2000
H400 | H500 | H5600 | H600
F800 | F810
MULTI-CLOUD / NATIVE CLOUD
Azure, AWS, Google
PowerScale F600 All-NVMe
PowerScale F200 All-Flash

19
Any data
Enterprise class unstructured data services with simultaneous multi-protocol support
OneFS
Archive
Content
Social media
Safety and security
Application
test
Design / test
Data
monetization
Marketing
Next-gen apps
Blobs
File shares
Hadoop analytics
Containers
Any user
Any client
System admin
Developer
Editor
Data architect
Access to all data
at the same time
Empower users to
get what they need
19

20
Introducing S3 access
App
App
App
DEVELOPER
OneFS
PowerScale
S3
Data read & write
All data simultaneously read and
write through any protocol
Data migration & copies
No need to migrate and copy data
to / from secondary source
Data consolidation
File & object access on the
same platform
Researcher
System
admin
Video editor

21
REDUCE RISK
Identify and avoid issues to
expedite trouble-shooting
PLAN AHEAD
Anticipate business needs
and avoid outages
IMPROVE PRODUCTIVITY
Single pane of glass
view of data center
Intelligent insights into your infrastructure
CloudIQ makes it easy to determine the health of your systems
Proactive Monitoring &
Predictive Analytics
DataIQ
CloudIQ
faster to predict capacity
approaching/almost fill
faster to identify
HA problems
21

22
Intelligent insights into your data
DataIQ makes it easy to find data faster
E D G E C O R E C L O U D
No additional purchase
required for Dell EMC products
Included with PowerScale Discover | Understand | Act
Tag, track, analyze and report on data.
Act on unique insights by moving data
where it’s needed.
DataIQ
CloudIQ
22

23
Dell EMC PowerScale
Unlocking the potential within your data
• CloudIQ: Infrastructure insights
• DataIQ: Data insights
OneFS
DataIQ CloudIQ
E D G E - - - - C O R E - - - - C L O U D
• True multiprotocol access
• Edge to cloud deployment
• Simple non-disruptive scaling of
efficiency, bandwidth and capacity
23

24
Architectures

25
Life science architecture
A starting point
Users
On-site
instrumentation
Sample DB
Clinical DB
Annotation DB
SMB
TIER 1 STORAGE
PowerScale
HPC
Hadoop / Spark
NFS
HDFS Collaborators
TIER 3 STORAGE
ECS OBJECT STORAGE
TIER 2 STORAGE
PowerScale
+PUBLIC CLOUD
Compute
Cluster
Scheduler
Data IQ
Sync IQ
Data lifecycle manager irods / DataIQ / arcitecta …

26
PowerScale Portfolio & Parabricks for life sciences
A turn-key, solution designed for life science and healthcare organizations
Multiple GPU-enabled server options
• Dell Power Edge C4140/R740/DSS8440
• NVIDIA DGX Series
Dell EMC PowerScale Storage Portfolio
• NVMe/All-Flash/Hyrbid/Archive Nodes
• Choose Appropriate Nodes for Your NGS
Environment
• Scales Out From TBs to PBs Based On Needs
Parabricks app
• On-site & hybrid cloud configurations
Simple • fast • flexible
reliable • compact
Users
Sequencing
instrumentation
Clinical
Lab
Biological
Data

27
Hybrid cloud architecture
A turn-key, solution designed for life science and healthcare organizations
Users
Sequencing
instrumentation
Clinical
Lab
Biological
Data
PARABRICKS
Directly connected
• Flexible multi-cloud support
• No vendor lock-in with data
independent of the cloud
• Leverage cloud(s) of choice
based on application needs
• Reduce risk with
centralized, durable
storage
• Fast and low cost with no
additional infrastructure to
setup or manage
BCL &
FASTQ
Managed
service
provider

28
Compute intensive workloads with Microsoft Azure
Life sciences, media and entertainment and more
• Efficiently run compute-intensive
workloads in Azure
• Up to 100Gbps bandwidth and as
low as 1.2ms latency connection to
the cloud with ExpressRoute Local
• No outbound data traffic costs
• Ideal for industries such as Life
Sciences and Media and
Entertainment, where Azure
provides rich application services
Azure ExpressRoute Local
Life science genome analysis use case
example tested on Isilon with Azure
Managed
service
provider
On-Premises
Genomic Analysis
Native
replication
Azure
compute
Genomic Analysis

29
Compression

30
100%
lossless
compression;
MD5 verified
60-90%
reduction in
FASTQ.gz and
BAM data files
PETAGENE: BASICS
• Novel IP in Genomic Data Compression
• Compressed files look and act
as original data
• Speeds up data movement and
processing
• Cloud Edition enables streaming from
object stores
• PetaSuite Protect enables fine-grain
control and monitoring of shared data
for compliance/auditing

31
50%
savings on
storage costs
2-4x
faster data
PETAGENE: KEY DIFFERENTIATORS
• Elegant solution for compression
of genomic data
• Verifiable lossless compression
• No need to decompress files
to use them
• Compression + streaming to/from
object stores

32
Wrap-Up

33
Key points
We continuously test and validate solutions to assist
choosing the best mix of technologies
Rapid analysis and
efficient data management
Technology choices
The primary jobs of
technical computing
organizations
Often a reflection of the
habits & practices of the
organization

34
Dell EMC life science by the numbers
Global pharmaceutical
companies
Global biotech
companies
North America academic
medical centers
Research
centers
Illumina Novaseq
customer sites
I N S TA L L E D AT
Focused on life science
organizations since 2008
Used by 400+ organizations for NGS,
HPC and research archive workloads
34
Claims based on internal SFDC sales data- June 2020

dell-emc-powerscale-for-ngs.pptx

dell-emc-powerscale-for-ngs.pptx

Recommended

Recommended

More Related Content

Similar to dell-emc-powerscale-for-ngs.pptx

Similar to dell-emc-powerscale-for-ngs.pptx (20)

Recently uploaded

Recently uploaded (20)

dell-emc-powerscale-for-ngs.pptx

Editor's Notes