SlideShare a Scribd company logo
1 of 35
Dell EMC UDS for Next-
Generation Sequencing
May 2020
© Copyright 2020 Dell Inc.
2
What is Next-Generation
Sequencing?
—
Identify key problems:
• Rapid Analysis
• Data Management
—
Why Dell EMC Storage
—
Architectures
—
Compression (Petagene)
—
Wrap-up
Overview
© Copyright 2020 Dell Inc.
3
What is NGS?
© Copyright 2020 Dell Inc.
4
Why sequence DNA?
Accelerate drug
discoveries
Identify mutations
causing disease
Practice personalized
medicine
Sequencing the human genome was first completed in 2003
Every human has
~6.4B
bases (ATCGs)
© Copyright 2020 Dell Inc.
5
Low cost whole genome
sequencing
Applications and biological
studies never before possible
Driving advancements
in Next-Generation
Sequencing
© Copyright 2020 Dell Inc.
5
© Copyright 2020 Dell Inc.
6
FASTQ
FASTQ
FASTQ
FASTQ
PRIMARY ANALYSIS SECONDARY ANALYSIS
mapping, alignment and variant calling
TERTIARY ANALYSIS
Biological Data
Clinical Data
Lab Data
VCF
VCF
BAM
BAM
BAM
BAM
BAM
VCF
VCF
NGS INSTRUMENT COMPUTE AND STORAGE ANALYTICS
© Copyright 2020 Dell Inc.
7
Key challenges
Enabling
rapid analysis
Efficiently managing
petabytes of data
© Copyright 2020 Dell Inc.
7
© Copyright 2020 Dell Inc.
8
Rapid analysis
© Copyright 2020 Dell Inc.
8
© Copyright 2020 Dell Inc.
9
Increasing Analysis Speed
Mix & match
Trade offs
Rack/DC space Expertise
“Other” required analyses
Business SLA
Budget Operations
Methods & validation
…
Existing infrastructure
PARABRICKS
ADD MORE CORES REWRITE SOFTWARE USE ACCELERATORS
© Copyright 2020 Dell Inc.
9
© Copyright 2020 Dell Inc.
10
NGS
INSTRUMENT
MAPPING, ALIGNMENT &
VARIANT CALLING
VARIANT
CALL FILES
4M POINTS
OF INTEREST
FASTQ
FASTQ
FASTQ
FASTQ
BAM
BAM
BAM
BAM
VCF
VCF
VCF
VCF
NVIDIA Parabricks accelerated secondary analysis
Uses GPUs (cloud/
on-premise) for computing
HPC cluster for running
entire analysis
Reduces cost of
computing significantly
+ +
© Copyright 2020 Dell Inc.
11
Efficient data management
© Copyright 2020 Dell Inc.
11
© Copyright 2020 Dell Inc.
12
G E N E R A T E
HDD
Data life cycle approach
A N A L Y Z E A R C H I V E
PowerScale / File
ECS / Object
Flash
© Copyright 2020 Dell Inc.
13
Why Dell EMC Storage?
© Copyright 2020 Dell Inc.
14
The potential is within your unstructured data to
IoT
Archiving
AI / ML
Home directories
Data analytics
Block chain
File shares
Video files
EDA
Energy Financial
services
Media &
entertainment
Any data-driven
business
Manufacturing
Life
sciences
© Copyright 2020 Dell Inc.
14 of 35
Internal Use - Confidential
© Copyright 2020 Dell Inc.
15
Unlocking the power of OneFS
© Copyright 2020 Dell Inc.
15
Internal Use - Confidential
2020
+ DELL EMC POWEREDGE
Deployment agility Use Case Flexibility Accelerated Innovation
of 35
+
© Copyright 2020 Dell Inc.
16
PowerScale
Introducing
Unlock the potential of your data
Add nodes in 60 seconds.
Auto-discover, auto-balance.
Migration-free design.
Flexible file and object access.
Software-defined architecture.
Extends to edge and cloud.
Intelligent software for all.
CloudIQ for infrastructure insights.
DataIQ for data insights.
Simplicity at any scale Any data anywhere Intelligent insights
© Copyright 2020 Dell Inc.
16
Internal Use - Confidential of 35
© Copyright 2020 Dell Inc.
17
Simplicity at any scale
The core strengths of OneFS are brought forward into PowerScale OneFS
Start small
Grow to petabyte scale
Swap in new nodes in 60
seconds, decommission old
nodes - with no downtime
Utilize new Ansible and
Kubernetes integrations
DevOps ready
Any scale
Scale-out architecture
ensures no hot spots
AutoBalance ®
No node left behind
Increased inline data
reduction capabilities
Efficient
Sustain multi-node
failures with no downtime
Resilient
© Copyright 2020 Dell Inc.
Internal Use - Confidential 17 of 35
© Copyright 2020 Dell Inc.
18
Internal Use - Confidential
The new PowerScale family
A200 | A2000
H400 | H500 | H5600 | H600
F800 | F810
MULTI-CLOUD / NATIVE CLOUD
Azure, AWS, Google
PowerScale F600 All-NVMe
PowerScale F200 All-Flash
© Copyright 2020 Dell Inc.
19
Any data
Enterprise class unstructured data services with simultaneous multi-protocol support
OneFS
Archive
Content
Social media
Safety and security
Application
test
Design / test
Data
monetization
Marketing
Next-gen apps
Blobs
File shares
Hadoop analytics
Containers
Any user
Any client
System admin
Developer
Editor
Data architect
Access to all data
at the same time
Empower users to
get what they need
© Copyright 2020 Dell Inc.
19
Internal Use - Confidential of 35
© Copyright 2020 Dell Inc.
20
Introducing S3 access
App
App
App
DEVELOPER
OneFS
PowerScale
S3
Data read & write
All data simultaneously read and
write through any protocol
Data migration & copies
No need to migrate and copy data
to / from secondary source
Data consolidation
File & object access on the
same platform
Researcher
System
admin
Video editor
© Copyright 2020 Dell Inc.
21
REDUCE RISK
Identify and avoid issues to
expedite trouble-shooting
PLAN AHEAD
Anticipate business needs
and avoid outages
IMPROVE PRODUCTIVITY
Single pane of glass
view of data center
Intelligent insights into your infrastructure
CloudIQ makes it easy to determine the health of your systems
Proactive Monitoring &
Predictive Analytics
DataIQ
CloudIQ
faster to predict capacity
approaching/almost fill
faster to identify
HA problems
© Copyright 2020 Dell Inc.
21
Internal Use - Confidential of 35
© Copyright 2020 Dell Inc.
22
Intelligent insights into your data
DataIQ makes it easy to find data faster
E D G E C O R E C L O U D
No additional purchase
required for Dell EMC products
Included with PowerScale Discover | Understand | Act
Tag, track, analyze and report on data.
Act on unique insights by moving data
where it’s needed.
DataIQ
CloudIQ
© Copyright 2020 Dell Inc.
22
Internal Use - Confidential of 35
© Copyright 2020 Dell Inc.
23
Dell EMC PowerScale
Unlocking the potential within your data
• CloudIQ: Infrastructure insights
• DataIQ: Data insights
OneFS
DataIQ CloudIQ
E D G E - - - - C O R E - - - - C L O U D
• True multiprotocol access
• Edge to cloud deployment
• Simple non-disruptive scaling of
efficiency, bandwidth and capacity
© Copyright 2020 Dell Inc.
23
Internal Use - Confidential of 35
© Copyright 2020 Dell Inc.
24
Architectures
© Copyright 2020 Dell Inc.
25
Life science architecture
A starting point
Users
On-site
instrumentation
Sample DB
Clinical DB
Annotation DB
SMB
TIER 1 STORAGE
PowerScale
HPC
Hadoop / Spark
NFS
HDFS Collaborators
TIER 3 STORAGE
ECS OBJECT STORAGE
TIER 2 STORAGE
PowerScale
+PUBLIC CLOUD
Compute
Cluster
Scheduler
Data IQ
Sync IQ
Data lifecycle manager irods / DataIQ / arcitecta …
© Copyright 2020 Dell Inc.
26
PowerScale Portfolio & Parabricks for life sciences
A turn-key, solution designed for life science and healthcare organizations
Multiple GPU-enabled server options
• Dell Power Edge C4140/R740/DSS8440
• NVIDIA DGX Series
Dell EMC PowerScale Storage Portfolio
• NVMe/All-Flash/Hyrbid/Archive Nodes
• Choose Appropriate Nodes for Your NGS
Environment
• Scales Out From TBs to PBs Based On Needs
Parabricks app
• On-site & hybrid cloud configurations
Simple • fast • flexible
reliable • compact
Users
Sequencing
instrumentation
Clinical
Lab
Biological
Data
© Copyright 2020 Dell Inc.
27
Hybrid cloud architecture
A turn-key, solution designed for life science and healthcare organizations
Users
Sequencing
instrumentation
Clinical
Lab
Biological
Data
PARABRICKS
Directly connected
• Flexible multi-cloud support
• No vendor lock-in with data
independent of the cloud
• Leverage cloud(s) of choice
based on application needs
• Reduce risk with
centralized, durable
storage
• Fast and low cost with no
additional infrastructure to
setup or manage
BCL &
FASTQ
Managed
service
provider
© Copyright 2020 Dell Inc.
28
Compute intensive workloads with Microsoft Azure
Life sciences, media and entertainment and more
• Efficiently run compute-intensive
workloads in Azure
• Up to 100Gbps bandwidth and as
low as 1.2ms latency connection to
the cloud with ExpressRoute Local
• No outbound data traffic costs
• Ideal for industries such as Life
Sciences and Media and
Entertainment, where Azure
provides rich application services
Azure ExpressRoute Local
Life science genome analysis use case
example tested on Isilon with Azure
Managed
service
provider
On-Premises
Genomic Analysis
Native
replication
Azure
compute
Genomic Analysis
© Copyright 2020 Dell Inc.
29
Compression
© Copyright 2020 Dell Inc.
30
100%
lossless
compression;
MD5 verified
60-90%
reduction in
FASTQ.gz and
BAM data files
PETAGENE: BASICS
• Novel IP in Genomic Data Compression
• Compressed files look and act
as original data
• Speeds up data movement and
processing
• Cloud Edition enables streaming from
object stores
• PetaSuite Protect enables fine-grain
control and monitoring of shared data
for compliance/auditing
© Copyright 2020 Dell Inc.
© Copyright 2020 Dell Inc.
31
50%
savings on
storage costs
2-4x
faster data
PETAGENE: KEY DIFFERENTIATORS
• Elegant solution for compression
of genomic data
• Verifiable lossless compression
• No need to decompress files
to use them
• Compression + streaming to/from
object stores
© Copyright 2020 Dell Inc.
© Copyright 2020 Dell Inc.
32
Wrap-Up
© Copyright 2020 Dell Inc.
33
Key points
We continuously test and validate solutions to assist
choosing the best mix of technologies
Rapid analysis and
efficient data management
Technology choices
The primary jobs of
technical computing
organizations
Often a reflection of the
habits & practices of the
organization
© Copyright 2020 Dell Inc.
34
Dell EMC life science by the numbers
Global pharmaceutical
companies
Global biotech
companies
North America academic
medical centers
Research
centers
Illumina Novaseq
customer sites
I N S TA L L E D AT
Focused on life science
organizations since 2008
Used by 400+ organizations for NGS,
HPC and research archive workloads
© Copyright 2020 Dell Inc.
34
Claims based on internal SFDC sales data- June 2020
dell-emc-powerscale-for-ngs.pptx

More Related Content

Similar to dell-emc-powerscale-for-ngs.pptx

Open Source DWBI-A Primer
Open Source DWBI-A PrimerOpen Source DWBI-A Primer
Open Source DWBI-A Primerpartha69
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020Adam Doyle
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics WebinarBill Wong
 
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshopBelgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshopDenodo
 
MT129 Isilon Data Lake Overview
MT129 Isilon Data Lake OverviewMT129 Isilon Data Lake Overview
MT129 Isilon Data Lake OverviewDell EMC World
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Denodo
 
PLNOG 17 - Shabbir Ahmad - Dell EMC’s SDN strategy based on Open Networking
PLNOG 17 - Shabbir Ahmad - Dell EMC’s SDN strategy based on Open NetworkingPLNOG 17 - Shabbir Ahmad - Dell EMC’s SDN strategy based on Open Networking
PLNOG 17 - Shabbir Ahmad - Dell EMC’s SDN strategy based on Open NetworkingPROIDEA
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
 
Accelerating Cloud Services - Intel
Accelerating Cloud Services - IntelAccelerating Cloud Services - Intel
Accelerating Cloud Services - IntelAmazon Web Services
 
Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013IntelAPAC
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_clouderaPrem Jain
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM Ganesan Narayanasamy
 
Dell cloud solutions for the future – ready enterprise
Dell cloud solutions for the future – ready enterpriseDell cloud solutions for the future – ready enterprise
Dell cloud solutions for the future – ready enterpriseandreas kuncoro
 
MT47 Modernize infrastructure for a modern data center
MT47 Modernize infrastructure for a modern data centerMT47 Modernize infrastructure for a modern data center
MT47 Modernize infrastructure for a modern data centerDell EMC World
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...Denodo
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 

Similar to dell-emc-powerscale-for-ngs.pptx (20)

Open Source DWBI-A Primer
Open Source DWBI-A PrimerOpen Source DWBI-A Primer
Open Source DWBI-A Primer
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
 
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshopBelgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
 
MT129 Isilon Data Lake Overview
MT129 Isilon Data Lake OverviewMT129 Isilon Data Lake Overview
MT129 Isilon Data Lake Overview
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
 
PLNOG 17 - Shabbir Ahmad - Dell EMC’s SDN strategy based on Open Networking
PLNOG 17 - Shabbir Ahmad - Dell EMC’s SDN strategy based on Open NetworkingPLNOG 17 - Shabbir Ahmad - Dell EMC’s SDN strategy based on Open Networking
PLNOG 17 - Shabbir Ahmad - Dell EMC’s SDN strategy based on Open Networking
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
Accelerating Cloud Services - Intel
Accelerating Cloud Services - IntelAccelerating Cloud Services - Intel
Accelerating Cloud Services - Intel
 
Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_cloudera
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
 
Dell cloud solutions for the future – ready enterprise
Dell cloud solutions for the future – ready enterpriseDell cloud solutions for the future – ready enterprise
Dell cloud solutions for the future – ready enterprise
 
MT47 Modernize infrastructure for a modern data center
MT47 Modernize infrastructure for a modern data centerMT47 Modernize infrastructure for a modern data center
MT47 Modernize infrastructure for a modern data center
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

dell-emc-powerscale-for-ngs.pptx

  • 1. Dell EMC UDS for Next- Generation Sequencing May 2020
  • 2. © Copyright 2020 Dell Inc. 2 What is Next-Generation Sequencing? — Identify key problems: • Rapid Analysis • Data Management — Why Dell EMC Storage — Architectures — Compression (Petagene) — Wrap-up Overview
  • 3. © Copyright 2020 Dell Inc. 3 What is NGS?
  • 4. © Copyright 2020 Dell Inc. 4 Why sequence DNA? Accelerate drug discoveries Identify mutations causing disease Practice personalized medicine Sequencing the human genome was first completed in 2003 Every human has ~6.4B bases (ATCGs)
  • 5. © Copyright 2020 Dell Inc. 5 Low cost whole genome sequencing Applications and biological studies never before possible Driving advancements in Next-Generation Sequencing © Copyright 2020 Dell Inc. 5
  • 6. © Copyright 2020 Dell Inc. 6 FASTQ FASTQ FASTQ FASTQ PRIMARY ANALYSIS SECONDARY ANALYSIS mapping, alignment and variant calling TERTIARY ANALYSIS Biological Data Clinical Data Lab Data VCF VCF BAM BAM BAM BAM BAM VCF VCF NGS INSTRUMENT COMPUTE AND STORAGE ANALYTICS
  • 7. © Copyright 2020 Dell Inc. 7 Key challenges Enabling rapid analysis Efficiently managing petabytes of data © Copyright 2020 Dell Inc. 7
  • 8. © Copyright 2020 Dell Inc. 8 Rapid analysis © Copyright 2020 Dell Inc. 8
  • 9. © Copyright 2020 Dell Inc. 9 Increasing Analysis Speed Mix & match Trade offs Rack/DC space Expertise “Other” required analyses Business SLA Budget Operations Methods & validation … Existing infrastructure PARABRICKS ADD MORE CORES REWRITE SOFTWARE USE ACCELERATORS © Copyright 2020 Dell Inc. 9
  • 10. © Copyright 2020 Dell Inc. 10 NGS INSTRUMENT MAPPING, ALIGNMENT & VARIANT CALLING VARIANT CALL FILES 4M POINTS OF INTEREST FASTQ FASTQ FASTQ FASTQ BAM BAM BAM BAM VCF VCF VCF VCF NVIDIA Parabricks accelerated secondary analysis Uses GPUs (cloud/ on-premise) for computing HPC cluster for running entire analysis Reduces cost of computing significantly + +
  • 11. © Copyright 2020 Dell Inc. 11 Efficient data management © Copyright 2020 Dell Inc. 11
  • 12. © Copyright 2020 Dell Inc. 12 G E N E R A T E HDD Data life cycle approach A N A L Y Z E A R C H I V E PowerScale / File ECS / Object Flash
  • 13. © Copyright 2020 Dell Inc. 13 Why Dell EMC Storage?
  • 14. © Copyright 2020 Dell Inc. 14 The potential is within your unstructured data to IoT Archiving AI / ML Home directories Data analytics Block chain File shares Video files EDA Energy Financial services Media & entertainment Any data-driven business Manufacturing Life sciences © Copyright 2020 Dell Inc. 14 of 35 Internal Use - Confidential
  • 15. © Copyright 2020 Dell Inc. 15 Unlocking the power of OneFS © Copyright 2020 Dell Inc. 15 Internal Use - Confidential 2020 + DELL EMC POWEREDGE Deployment agility Use Case Flexibility Accelerated Innovation of 35 +
  • 16. © Copyright 2020 Dell Inc. 16 PowerScale Introducing Unlock the potential of your data Add nodes in 60 seconds. Auto-discover, auto-balance. Migration-free design. Flexible file and object access. Software-defined architecture. Extends to edge and cloud. Intelligent software for all. CloudIQ for infrastructure insights. DataIQ for data insights. Simplicity at any scale Any data anywhere Intelligent insights © Copyright 2020 Dell Inc. 16 Internal Use - Confidential of 35
  • 17. © Copyright 2020 Dell Inc. 17 Simplicity at any scale The core strengths of OneFS are brought forward into PowerScale OneFS Start small Grow to petabyte scale Swap in new nodes in 60 seconds, decommission old nodes - with no downtime Utilize new Ansible and Kubernetes integrations DevOps ready Any scale Scale-out architecture ensures no hot spots AutoBalance ® No node left behind Increased inline data reduction capabilities Efficient Sustain multi-node failures with no downtime Resilient © Copyright 2020 Dell Inc. Internal Use - Confidential 17 of 35
  • 18. © Copyright 2020 Dell Inc. 18 Internal Use - Confidential The new PowerScale family A200 | A2000 H400 | H500 | H5600 | H600 F800 | F810 MULTI-CLOUD / NATIVE CLOUD Azure, AWS, Google PowerScale F600 All-NVMe PowerScale F200 All-Flash
  • 19. © Copyright 2020 Dell Inc. 19 Any data Enterprise class unstructured data services with simultaneous multi-protocol support OneFS Archive Content Social media Safety and security Application test Design / test Data monetization Marketing Next-gen apps Blobs File shares Hadoop analytics Containers Any user Any client System admin Developer Editor Data architect Access to all data at the same time Empower users to get what they need © Copyright 2020 Dell Inc. 19 Internal Use - Confidential of 35
  • 20. © Copyright 2020 Dell Inc. 20 Introducing S3 access App App App DEVELOPER OneFS PowerScale S3 Data read & write All data simultaneously read and write through any protocol Data migration & copies No need to migrate and copy data to / from secondary source Data consolidation File & object access on the same platform Researcher System admin Video editor
  • 21. © Copyright 2020 Dell Inc. 21 REDUCE RISK Identify and avoid issues to expedite trouble-shooting PLAN AHEAD Anticipate business needs and avoid outages IMPROVE PRODUCTIVITY Single pane of glass view of data center Intelligent insights into your infrastructure CloudIQ makes it easy to determine the health of your systems Proactive Monitoring & Predictive Analytics DataIQ CloudIQ faster to predict capacity approaching/almost fill faster to identify HA problems © Copyright 2020 Dell Inc. 21 Internal Use - Confidential of 35
  • 22. © Copyright 2020 Dell Inc. 22 Intelligent insights into your data DataIQ makes it easy to find data faster E D G E C O R E C L O U D No additional purchase required for Dell EMC products Included with PowerScale Discover | Understand | Act Tag, track, analyze and report on data. Act on unique insights by moving data where it’s needed. DataIQ CloudIQ © Copyright 2020 Dell Inc. 22 Internal Use - Confidential of 35
  • 23. © Copyright 2020 Dell Inc. 23 Dell EMC PowerScale Unlocking the potential within your data • CloudIQ: Infrastructure insights • DataIQ: Data insights OneFS DataIQ CloudIQ E D G E - - - - C O R E - - - - C L O U D • True multiprotocol access • Edge to cloud deployment • Simple non-disruptive scaling of efficiency, bandwidth and capacity © Copyright 2020 Dell Inc. 23 Internal Use - Confidential of 35
  • 24. © Copyright 2020 Dell Inc. 24 Architectures
  • 25. © Copyright 2020 Dell Inc. 25 Life science architecture A starting point Users On-site instrumentation Sample DB Clinical DB Annotation DB SMB TIER 1 STORAGE PowerScale HPC Hadoop / Spark NFS HDFS Collaborators TIER 3 STORAGE ECS OBJECT STORAGE TIER 2 STORAGE PowerScale +PUBLIC CLOUD Compute Cluster Scheduler Data IQ Sync IQ Data lifecycle manager irods / DataIQ / arcitecta …
  • 26. © Copyright 2020 Dell Inc. 26 PowerScale Portfolio & Parabricks for life sciences A turn-key, solution designed for life science and healthcare organizations Multiple GPU-enabled server options • Dell Power Edge C4140/R740/DSS8440 • NVIDIA DGX Series Dell EMC PowerScale Storage Portfolio • NVMe/All-Flash/Hyrbid/Archive Nodes • Choose Appropriate Nodes for Your NGS Environment • Scales Out From TBs to PBs Based On Needs Parabricks app • On-site & hybrid cloud configurations Simple • fast • flexible reliable • compact Users Sequencing instrumentation Clinical Lab Biological Data
  • 27. © Copyright 2020 Dell Inc. 27 Hybrid cloud architecture A turn-key, solution designed for life science and healthcare organizations Users Sequencing instrumentation Clinical Lab Biological Data PARABRICKS Directly connected • Flexible multi-cloud support • No vendor lock-in with data independent of the cloud • Leverage cloud(s) of choice based on application needs • Reduce risk with centralized, durable storage • Fast and low cost with no additional infrastructure to setup or manage BCL & FASTQ Managed service provider
  • 28. © Copyright 2020 Dell Inc. 28 Compute intensive workloads with Microsoft Azure Life sciences, media and entertainment and more • Efficiently run compute-intensive workloads in Azure • Up to 100Gbps bandwidth and as low as 1.2ms latency connection to the cloud with ExpressRoute Local • No outbound data traffic costs • Ideal for industries such as Life Sciences and Media and Entertainment, where Azure provides rich application services Azure ExpressRoute Local Life science genome analysis use case example tested on Isilon with Azure Managed service provider On-Premises Genomic Analysis Native replication Azure compute Genomic Analysis
  • 29. © Copyright 2020 Dell Inc. 29 Compression
  • 30. © Copyright 2020 Dell Inc. 30 100% lossless compression; MD5 verified 60-90% reduction in FASTQ.gz and BAM data files PETAGENE: BASICS • Novel IP in Genomic Data Compression • Compressed files look and act as original data • Speeds up data movement and processing • Cloud Edition enables streaming from object stores • PetaSuite Protect enables fine-grain control and monitoring of shared data for compliance/auditing © Copyright 2020 Dell Inc.
  • 31. © Copyright 2020 Dell Inc. 31 50% savings on storage costs 2-4x faster data PETAGENE: KEY DIFFERENTIATORS • Elegant solution for compression of genomic data • Verifiable lossless compression • No need to decompress files to use them • Compression + streaming to/from object stores © Copyright 2020 Dell Inc.
  • 32. © Copyright 2020 Dell Inc. 32 Wrap-Up
  • 33. © Copyright 2020 Dell Inc. 33 Key points We continuously test and validate solutions to assist choosing the best mix of technologies Rapid analysis and efficient data management Technology choices The primary jobs of technical computing organizations Often a reflection of the habits & practices of the organization
  • 34. © Copyright 2020 Dell Inc. 34 Dell EMC life science by the numbers Global pharmaceutical companies Global biotech companies North America academic medical centers Research centers Illumina Novaseq customer sites I N S TA L L E D AT Focused on life science organizations since 2008 Used by 400+ organizations for NGS, HPC and research archive workloads © Copyright 2020 Dell Inc. 34 Claims based on internal SFDC sales data- June 2020

Editor's Notes

  1. The completion of the project gave us an unprecedented about of data and insights into the human genetic code
  2. “The basic next-generation sequencing process involves fragmenting DNA/RNA into multiple pieces, adding adapters, sequencing the libraries, and reassembling them to form a genomic sequence. In principle, the concept is similar to capillary electrophoresis. The critical difference is that NGS sequences millions of fragments in a massively parallel fashion, improving speed and accuracy while reducing the cost of sequencing” By Sequencing the genome and looking for variants we can accelerate drug discoveries, identify mutations causing disease, and practice personalized medicine
  3. For any IT team responsible for supporting next-generation sequencing within their organization, they must address two primary challenges. The first challenge is enabling rapid processing and analysis of data. Ideally, an analysis should not cause a bottleneck such that leads to a significant backlog of unanalyzed or unprocessed raw data. In other words, the time to process, analyze, and manage data should keep up with the rate at which the data is produced from the sequencer. The second challenge is efficiently managing petabytes. The desire to improve efficiency is driven by government mandates or requirements to share and distribute research data, business policy, limited or shrinking data center space, and the overall cost to maintain data long-term. Both challenges are likely to emerge as common themes as you qualify an opportunity.
  4. One major challenge in NGS is the rate of analysis. This slide depicts ways analysis can be increased. We know that many LS companies leverage these technologies, but it is equally important that storage doesn’t become a bottle neck.
  5. CALL OUT ISILON + PARABRICKS + GPU = 1000 WGS/Week Confirm process of genomic data Step 1: Genomic sequencer Step 2: HPC and Preparation of Data the sequencer will produce Step 3: Store data as VCF (Variant Call Files) and juxtapose those files against a known control DNA Step 4: Identify million points of interest to further investigate
  6. When evaluating a Life Science opportunity for PowerScale or ECS be sure to qualify it against the backdrop of the data lifecycle. It is a simple three-phase life cycle: Generation, Analysis, and Archive. When qualifying and PowerScale or ECS opportunity, focus on understanding how data is used and how the IT environment may change for each phase of the life cycle. Data generation catches all the primary or raw data generated from the NGS (or other genomic) instrumentation and prepares it for downstream analysis. Keep in mind that third-parties like collaborators or DNA sequencing service providers can be considered data generation sources too. During the analysis phase performance is especially important. It can be further separated into two stages. The first stage of analysis often involves severing data to an HPC environment while later stages of analysis combine next-generation sequencing data with other data types using modern data analytics techniques. The final stage is the archive. The archive capacity requirements will vary with the organization. Capacity will depend on the organization size and type, types of data access, the frequency of access, retention periods, intended use (for example, research or clinical use).For a complete picture, it's highly recommended that you include IT and end-user representatives as you qualify the opportunity. It's not uncommon to discover that the IT team does not have a clear understanding of their end-user workflows, analysis or data management needs. Using the data lifecycle will help you to quickly uncover who, what, how, where and when storage will be used. Including IT and end users will reveal how storage can impact analysis, data management strategies and challenges. The input collected from your customer will put you in a better position to recommend an PowerScale, ECS or hybrid configuration.
  7. There is a lot of potential in unstructured data – from home directories to IoT sensors to video files to analytics filled  data lakes – to be able to: understand business results | anticipate what’s coming | and act quickly on risk and opportunity Every business is becoming data-driven or they risk being outsmarted. Businesses are taking steps to harness this data so they can: drive innovations get to market faster And to create differentiation
  8. We are now unlocking the power of OneFS to be able to bring software innovations to market faster, and to provide more flexibility in use cases to expand beyond the traditional datacenter.  Our customers will benefit from our engineers focusing on OneFS software features while allowing the PowerEdge team to focus on delivering bleeding edge hardware.   And this is the just the beginning of this new journey we are taking with PowerScale.
  9. PowerScale is a new unstructured data storage family based on new PowerScale OneFS 9.0 which includes new PowerScale-branded 1U sized nodes, co-existence with existing Isilon clusters, and upgraded capabilities for our cloud offerings.    It can offer simplicity at any scale, handle any data, any where, and find insights within your infrastructure and your data  Simplicity at Any Scale:  The core strength of OneFS is a future-proof design that allows new any new nodes to merge into existing clusters in 60 seconds.  Once a new node is connected, it is auto-discovered and then the data is auto-balanced across every node in the array to ensure performance is evenly distributed.  It is truly a future-proof design and we are bringing this forward with powerful new capabilities.    Any data. Anywhere:  To handle any data, we offer flexible file and object access and support for 8 protocols including S3 access for cloud-native development.  And our software-defined approach allows us to run OneFS in more places from the datacenter to the cloud and now we support smaller customers and edge locations in a way we’ve never been able to do before. Our new PowerEdge-based all flash and NVMe nodes provide incredible power in a compact, competitively priced product.  No matter the location, the system provides the same great experience and remains efficient, secure, & protected.     Intelligent Insights: We’ve expanded our software choices for our customers with free tools that help customers understand their data. CloudIQ delivers detailed infrastructure insights and storage-level health monitoring across your on-premises cloud, while DataIQ is a tool for discovering, understanding, and acting on the data you have – to provide data insights. Many customers don't know much about all the data in their infrastructure, they need a tool that gives them a better "DataIQ."       Together its a complete solution for unlocking the potential within your data.                             
  10. Simplicity at Scale:  The core strength behind Isilon's success was the OneFS file-system.  With this release we are bringing the best of OneFS forward and delivering new capabilities including inline dedupe and compression, support for new ansible workflows, and integration with popular infrastructure frameworks such as Kubernetes and OpenShift.   It is a very scalable file-system and can now go even lower than ever – starting at 11TB of space (usable) - and scale to very large capacities. Our No Node Left Behind philosophy is still with us, so you can swap in new PowerScale nodes in existing Isilon clusters in 60 seconds, and decommission old nodes  - with no downtime.  Everything is auto-balanced and resilient to lose multiple nodes at the same time – without downtime.  DevOps Ready: Programmable infrastructure and automation are hot topics these days and we've got new Ansible workflow support and support for leading management and container orchestration frameworks, such as Kubernetes and OpenShift, to help customers can streamline application development and reduce deployment timeframes.  Kubernetes integrations  is the migration-free design that allows new nodes to plug-into clusters in 60 seconds.  We can start as small as 11TB and grow to massive scale in the petabytes with the same ease of use. We’ve enhanced our efficiency and automation capabilities here.      Any scale: Terabytes to petabytes and millions of file operations    No Nodes left behind: Add nodes in 60 seconds - with no downtime    Auto-balance: Scale-out architecture ensures no hot spots    Resilient: Sustain multi-node failures with no data loss           
  11. This is the new PowerScale Family. It spans from edge to core to cloud and includes existing Isilon nodes as well as new PowerScale branded nodes. We offer all-flash, hybrid, and archive nodes – to offer the right balance between price, performance, and capacity. They can work together in the same cluster, as we maintain our No Nodes Left Behind compatibility. To be clear, PowerScale can join existing Isilon nodes in the same OneFS 9.0 cluster We have also extended PowerScale OneFS into the cloud with our partnerships with AWS, Azure, and Google.  Last month, we announced a native cloud offer for the Google Cloud Platform.  This allows our customers to leverage the cloud in situations where they don't necessarily want to spin up a new site with new hardware.
  12. Next we will talk about how FLEXIBLE the system is. We can handle virtually any unstructured data types and access method including support for 8 protocols including NFS, SMB, HDFS, REST, HTTP, NDMP, FTP and new S3 support. This flexibility allows any user to get to the data they need in order to create, share, collaborate, and develop using an incredibly powerful, multi-lingual data platform.
  13. The introduction of S3 support enables customers to run modern applications that rely on object storage – perhaps it’s a mobile app based on using video clips that are shared in a certain repository. Or it’s a fitness tracker for a school, or a system of schools. The possibilities are endless. Example: Existing dataset on Isilon. Upgrade to OneFS 9. Now you can use S3 support to provide developers an easy way to access your NFS files.
  14. Now Intelligent Insights Get insights about your infrastructure and your data with CloudIQ and DataIQ. CloudIQ makes it easy to determine the health of your systems across their datacenter.
  15. DataIQ makes it easy for anyone to find and understand data across your PowerScale - and your entire file cloud. Once locations are indexed it becomes simple to anyone to find and share files at very high-speeds. This speed can increase the speed of insights and truly help your business make decisions faster. DataIQ allows life science customers to discover where their data is, gain insight into their data, and act on their findings. Many customers are unaware of exactly where their data is living and that data that should be archived is sitting on higher performance storage. With DataIQ, IT or researchers can discover “cold data” and move it to the appropriate tier. This allows for “point and use” file storage with the ability to “right click & archive.” Since users can use DataIQ to gain insight and forecast, workflows can be set up to automatically send data where it needs to go. (Example: Sequence done, compress, sent to archive). Researchers can quickly call study information back with the ability to search across their storage for files. With DataIQ’s ability to showcase cost savings by leveraging the appropriate storage tier, IT can easily create quotas. (Example: You’re using X amount of storage) Data Mgmt.- Plug-In built Lot of customers, Isilon to much $, move to cheap archive, not sure what they had, data mgmt. tools to Point and use any file storage, fast indexing, searchable catalog, directories, view moveable/useless data/ move to archive DNA Service Provider- uses in production to save $$ across storage, Create scenarios where IT can better view of data, more tools for manage-> where it should be, automate data mgmt. UI exposed to end users allowing for self-service, “cold data” archive moved to lower tier, start to plan & forecast where data needs to go/be Talk Points- “Right Click” send to Archive Data Imaging- Catalog building Create workflows, sequence done, compress, send to archive, identifiers for data sets Value add to all stakeholders Tactical Ability to show cost savings, billing according to group usage, Isilon features hard and soft quotas, “youre using X amount of storage”   High speed search across file systems / storage repositories - Environments often consist of NetApp, Isilon, Quantum, GPFS, and archive storage systems   Single Pane of Glass Data Management - High-end knowledge workers must always be able to find and act on data without a service request from IT   100% Self-Service for Researchers / Producers / Design Managers / Engineers / etc - Allow business users to manage their own cost and workflow - Handle access, visibility, and control in a single system   Highly Available and Highly Scalable (Petabytes of data / billions of files) - wanted to handle both clinical and research data within a single system.   Initial archive was based on a tape library and SGI’s DMF… they later swapped that for ECS Challenges A scientific archive with a single pane of glass Self-service for researchers, producers, and engineers to lower reliance on IT Access, visibility, and control in a single system DataIQ A single UI to view all clinical and research data A self-service archive Fast search across billions of files Archive data to reduce tier 1 storage costs
  16. Our customers have been looking for an end-to-end solution, and this is how it comes together. PowerScale technology gives you the ability to innovate faster and unlock the potential of your data.
  17. Here is an example life science architecture that supports next-generation sequencing and other genomics workflows. Viewing the architecture from left to right, you can layer the data lifecycle over the architecture. On the far left, next-generation sequencing instruments generate data and transmit that raw data over the CIFS/SMB protocol to an Isilon cluster. PowerScale is at the center of the architecture as it bridges data generation and analysis. Once on the Isilon cluster, users may access the data using a Windows, OSX, or Linux client then submit a job to the HPC cluster over NFS to process the raw data. Alternatively, a data scientist might access the next-generation sequencing data with clinical data over the HDFS protocol to perform an interactive analysis in a Spark environment. Moving to the right side of the architecture, data moves to the archiving phase. Depending on the habits and practices of your life science customer, raw data and results may be replicated via SyncIQ to an PowerScale DR or archive cluster, or a data mover like EcsSync might move the data over to ECS object-store where it can be accessed by collaborators.
  18. Cloud Storage Services with Microsoft Azure provides a higher bandwidth (up to 100Gbps) and lower latency (as low as 1.2ms) connection to the cloud using ExpressRoute Local. This solution allows for the right combination of storage and compute in the cloud for data-intensive, high I/O throughput workloads that require high compute performance on a periodic and/or unpredictable basis. With no outbound data traffic costs, this solution enables workloads that require a lot of temporary writes to storage to cost-effectively take advantage of Azure’s application services. This is ideal for verticals such as Life Sciences and Media and Entertainment, giving users best of both worlds – reliable, cost-effective Dell EMC Storage performance at scale and the scalable compute performance of Microsoft Azure. USE CASES: Life Sciences: Genome analysis is one of the key use cases for life sciences. The raw data generated by a genomic sequencer for the complete genome of a single human is approximately 100GB. This dictates a requirement for a massively scalable file system to which capacity and performance could be added. Genome alignment and sorting, which are both part of the secondary analysis stage, are the most compute and storage demanding and can require network throughput of 10GB/s or even 100Gb/s. Dell EMC and Azure testing has demonstrated that the performance of Isilon scales out linearly to match the IO demands of an increasing number of Azure VMs that support the genome alignment stage. The 100Gb/s ExpressRoute Local connection between Isilon and Azure enables both the compute performance in Azure and the storage performance in Isilon to scale up to process real-world genome analysis. Large research facilities processing hundreds of thousands of genomes per year, generate petabytes of very large file data (typically 500GB per file set) to be stored, and have a demand for computing power that is bursty by nature – a perfect application for on-demand, easily scalable cloud computing. In addition, since genomic processing is, at its core, a pattern-matching application, there are writes to temporary files on the Isilon storage during a large part of the analysis workflows.
  19. Focused On Life Science Organizations Since 2008 Used by 400+ Organizations For NGS, HPC And Research Archive Workloads Installed At: 8 Of The Top 10 Global Pharmaceutical Companies 40% Of Top 100 North America Academic Medical Centers 11 NIH Research Centers 37% Of Sequencing Site Worldwide