1
Internal Use - Confidential
DataWorks Summit
Shawn Smith – Big Data Specialist
shawn.smith@dell.com
Accelerating Big Data Insights
Internal Use - Confidential
Transforming The Business
We help organizations reinvent themselves and realize their digital future
Digital
Transformation
Security
Transformation
Workforce
Transformation
IT
Transformation
Internal Use - Confidential
Dell EMC - Internal Use - Confidential
BUSINESS TRANSFORMATION
Ready for Whatever Comes Next:
AI, Augmented Reality, Machine Learning . . .
Emerging Challenges
Internal Use - Confidential
What is Unstructured Data?
• 80% + of data created globally is for unstructured data
• File data is growing VERY fast. Most customers see 30%
to 50% unstructured growth year over year
• Dell EMC is #1 in Scale Out File & Object storage
according to IDC and Gartner because of SIMPLICITY!
• Simple – Single Volume
• Efficient – Best Storage Utilization
• Scale-Out – Scale and grow without pain
• NO MIGRATIONS!
80%
Internal Use - Confidential
Unstructured Data Requires
Unconstrained
Scale
Optimized TCO/ROI
Longevity
Flash to Cloud
Flexibility
Enterprise
Features
Massive
Performance
SIMPLICITY
At Any Scale
Fraud
Detection &
Risk Analytics
Trading / Tick
Data Analytics
IoT
Data Driven
Business
Transformation
Unstructured Analytics Use Cases
Customer 360
Analytics
Enabling enterprises to improve operational efficiencies
and monetize new revenue streams
Internal Use - Confidential
Organizations need to deliver analytics on more than
just their traditional structured data
Evolving spectrum of data analytics
Requires infrastructure that enables multiple applications and varied use cases
Predictive
Analytics
Business
Intelligence
Analytics of
Things
Cyber security
Analytics
Real-time
Analytics
Machine
Learning
Internal Use - Confidential
Enables analytics for ALL of your data
Dell EMC Unstructured Analytics Portfolio
Performance
Centric
Storage
Centric
Predictive
Analytics
Business
Intelligence
Analytics of
Things
Cyber security
Analytics
Real-time
Analytics
Machine
Learning
Archive
Centric
Internal Use - Confidential
Proven solutions for unstructured analytics
Dell EMC Unstructured Analytics Portfolio
Solution accelerators
 Hadoop Ready Bundle
 QuickStart for Hadoop
 EDW Optimization Solutions
 Hadoop Backup Solutions
 SAS-Grid Solution with Isilon
 Streaming Analytics Solutions
 Splunk Ready System
Right Solution Configuration for the use case
 High Performance
 100% Compliance to Hadoop features
 Ability to scale down at cost
Oneor
more
 Storage scaling faster than compute
 Enterprise Grade File Mgmt.
 Consolidation of IT Workloads
 Aggregate capacity > 100 TB
One or
more
DataCompute
 Geo-distributed single namespace
 40% to 60% less than public cloud
Compute Data
Compute + Data
Direct
Attached
Storage
SharedStorage
ENTERPRISE REQUIREMENTS CONFIGURATIONdrive
Performance-
centric
Storage-
centric
Archive-
centric
11
Internal Use - Confidential
THE BEDROCK OF THE MODERN DATA CENTER
PowerEdge R740xd
High performance server
Performance and Scale
Expanded GPU & storage capacity
boost workload performance
Innovative Design
Up to 24 NVME with up
to 18 x 3.5” drives
Integrated Security
Cyber resilient architecture, security
is integrated into full server lifecycle
– from design to retirement
Intelligent automation
New OpenManage™ Enterprise
console delivers crystal clear
reporting & full lifecycle automation
11
Market Leader Hadoop
Shared Storage
Customers running
Analytics / Hadoop
PBs of Analytics / Hadoop
• World’s #1 Courier Company
• 3 of the largest telecommunications companies in the
Americas
• One of the largest online retailer
• Multiple leading financial institutions
WHO IS USING ISILON FOR ANALYTICS?
385
Isilon Analytics Momentum
21 Industry Verticals
13
Internal Use - Confidential
Ethernet
Job Tracker Task Tracker DataNode 2nd NameNode
NameNode
Hadoop Architecture - Traditional
R (RHIPE) Mahout Hive HBasePIG
NameNode
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
14
Internal Use - Confidential
Ethernet
R (RHIPE)
PIG
Mahout Hive HBase
Job Tracker Task Tracker DataNode
Compute Node Compute Node Compute Node
Compute NodeCompute Node Compute Node
NameNode
Hadoop Architecture with Isilon
name
node
name
node
name
node
name
node
datanode
15
Internal Use - Confidential
ISILON DATA LAKE
DATA PROTECTION
DATA SECURITY
PERFORMANCE MANAGEMENT
DATA MANAGEMENT
16
Internal Use - Confidential
HDFS
SMB, NFS,
HTTP, FTP,
HDFS
node
info
node
info
node
info
node
info
node
info
node
info
node
info
node
info
node
info
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
file
file
file
file
file
file
file
file
Node
reply
Node
reply
Node
reply
Node
replyNFS
NFS
SMB
SMB
name
node
name
node
name
node
name
node
name
node
name
node
name
node
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
datanodedatanode
Isilon
OneFS Compute
Data
1X
HOW IT LOOKS
Name node
Data
Compute
Workload Consolidation
and streaming analytics
/ Sharepoint
Internal Use - Confidential
Phased Approach to Hadoop Tiered Storage with Isilon
• Hadoop Cluster with DAS for interactive and batch queries
• Queriable “active archive” in Isilon / ECS configured as a separate Hadoop cluster
• Archival policy implemented using scripts executed manually
Phase 0: Archival
Cluster
• Hot data in Hadoop Cluster with DAS
• Cold data in Isilon configured as a HDFS Target
• Hive, map-reduce and Spark jobs can run across the 2 clusters
• URIs to indicate whether data is in DAS cluster or Isilon Cluster
• Tiering policy implemented using scripts executed manually
Phase 1: Tiering with
Location Aware queries
Same as Phase 1, with additional capability :
• Data location handled transparently for Hive, map-reduce and Spark jobs : URIs don’t
need to indicate whether data is in DAS cluster or Isilon Cluster
Phase 2: Tiering with
Location transparent
queries
Same as Phase 2, with additional capability :
• Tiering policy implemented using automated data movement mechanisms.
Phase 3: Automated
tiering
19
Internal Use - Confidential
It is an ecosystem where sensors, devices and equipment are connected to a
network and can transmit and receive data for tracking, analysis and action.
Operational
Technology
Industrial automation
Fleet telematics
Material handling
Information
Technology
Assets
Inventory
People
IoT
It’s not new and
not new to Dell.
It is the integration and extension
of OT and IT technologies that have
been round for decades
What is the Internet of Things?
20
Internal Use - Confidential
It’s a great big IoT world out there
Smart Connected Business – from gateways to informed decisions
Transport Connector
Private and public networks10’s of billions of connected things
Things Sensors
High-performance computer infrastructure
Application layer
SAP Hana
In-Memory database layer
Libraries
Manufacturing
Energy and Natural Resources
Transportation
Building & Industrial Automation
21
Internal Use - Confidential
Multiple Partners and Blueprints for OT / IT
SAP HANA®Software AG Apama®
Dell Edge Gateway 5000
Structured
Data
Dell EMC Data Center
Real-Time
Data
Unstructured
Data
Kepware KEPServerEX®
VisualizationsStream Analytics Machine LearningReportingAnalyticsProtocol Translation
0 0 1 0 1 1
1 0 0 1 1 0
Our Vision for
Unstructured
Storage
OBJECT
STREAM
FILE
ISILONECS
PROJECT NAUTILUS
Software-DefinedIn The CloudCommon ExperienceCommon Hardware
Internal Use - Confidential
Project “Nautilus”
Streaming Storage + Analytics EngineProject Nautilus
Turbocharge Isilon and
ECS for Streaming
Batch Storage tier
Streaming IoT data
Today’s IoT Analytics “Accidental Architecture”
Batch
Real-Time
Interactive exploration
by Data Scientists
Real-time intelligence at
the NOC
Sensors
MirrorMaker
DR Site
Mobile Devices
App Logs
Producers
Surface /
Act
Internal Use - Confidential
Project Nautilus: A Unified Data Pipeline
Strongly Consistent Storage  Exactly Once Processing  Unified Analytics
Unified Analytics
Real-Time, Batch, Interactive
Sensors
Mobile Devices
App Logs Isilon / ECS
Ingest Buffer Pub/Sub Search Persistent Data
Structures
Pravega Streams
Unified Storage
Real-time intelligence at
the NOC
Interactive exploration
by Data Scientists
Surface /
Act
Producers
Internal Use - Confidential
Project Nautilus: A Unified Data Pipeline
Strongly Consistent Storage  Exactly Once Processing  Unified Analytics
Unified Analytics
Real-Time, Batch, Interactive
Sensors
Mobile Devices
App Logs
Isilon / ECS
Ingest Pub/Sub Search S3
Pravega Streams
Unified Storage
Real-time intelligence at
the NOC
Interactive exploration
by Data Scientists
Surface /
Act
Producers
HDFS NFS SMB
Internal Use - Confidential
pravega.io
Accelerating Big Data Insights

Accelerating Big Data Insights

  • 1.
    1 Internal Use -Confidential DataWorks Summit Shawn Smith – Big Data Specialist shawn.smith@dell.com Accelerating Big Data Insights Internal Use - Confidential
  • 2.
    Transforming The Business Wehelp organizations reinvent themselves and realize their digital future Digital Transformation Security Transformation Workforce Transformation IT Transformation
  • 3.
    Internal Use -Confidential Dell EMC - Internal Use - Confidential BUSINESS TRANSFORMATION Ready for Whatever Comes Next: AI, Augmented Reality, Machine Learning . . . Emerging Challenges
  • 4.
    Internal Use -Confidential What is Unstructured Data? • 80% + of data created globally is for unstructured data • File data is growing VERY fast. Most customers see 30% to 50% unstructured growth year over year • Dell EMC is #1 in Scale Out File & Object storage according to IDC and Gartner because of SIMPLICITY! • Simple – Single Volume • Efficient – Best Storage Utilization • Scale-Out – Scale and grow without pain • NO MIGRATIONS! 80%
  • 5.
    Internal Use -Confidential Unstructured Data Requires Unconstrained Scale Optimized TCO/ROI Longevity Flash to Cloud Flexibility Enterprise Features Massive Performance SIMPLICITY At Any Scale
  • 6.
    Fraud Detection & Risk Analytics Trading/ Tick Data Analytics IoT Data Driven Business Transformation Unstructured Analytics Use Cases Customer 360 Analytics Enabling enterprises to improve operational efficiencies and monetize new revenue streams
  • 7.
    Internal Use -Confidential Organizations need to deliver analytics on more than just their traditional structured data Evolving spectrum of data analytics Requires infrastructure that enables multiple applications and varied use cases Predictive Analytics Business Intelligence Analytics of Things Cyber security Analytics Real-time Analytics Machine Learning
  • 8.
    Internal Use -Confidential Enables analytics for ALL of your data Dell EMC Unstructured Analytics Portfolio Performance Centric Storage Centric Predictive Analytics Business Intelligence Analytics of Things Cyber security Analytics Real-time Analytics Machine Learning Archive Centric
  • 9.
    Internal Use -Confidential Proven solutions for unstructured analytics Dell EMC Unstructured Analytics Portfolio Solution accelerators  Hadoop Ready Bundle  QuickStart for Hadoop  EDW Optimization Solutions  Hadoop Backup Solutions  SAS-Grid Solution with Isilon  Streaming Analytics Solutions  Splunk Ready System
  • 10.
    Right Solution Configurationfor the use case  High Performance  100% Compliance to Hadoop features  Ability to scale down at cost Oneor more  Storage scaling faster than compute  Enterprise Grade File Mgmt.  Consolidation of IT Workloads  Aggregate capacity > 100 TB One or more DataCompute  Geo-distributed single namespace  40% to 60% less than public cloud Compute Data Compute + Data Direct Attached Storage SharedStorage ENTERPRISE REQUIREMENTS CONFIGURATIONdrive Performance- centric Storage- centric Archive- centric
  • 11.
    11 Internal Use -Confidential THE BEDROCK OF THE MODERN DATA CENTER PowerEdge R740xd High performance server Performance and Scale Expanded GPU & storage capacity boost workload performance Innovative Design Up to 24 NVME with up to 18 x 3.5” drives Integrated Security Cyber resilient architecture, security is integrated into full server lifecycle – from design to retirement Intelligent automation New OpenManage™ Enterprise console delivers crystal clear reporting & full lifecycle automation 11
  • 12.
    Market Leader Hadoop SharedStorage Customers running Analytics / Hadoop PBs of Analytics / Hadoop • World’s #1 Courier Company • 3 of the largest telecommunications companies in the Americas • One of the largest online retailer • Multiple leading financial institutions WHO IS USING ISILON FOR ANALYTICS? 385 Isilon Analytics Momentum 21 Industry Verticals
  • 13.
    13 Internal Use -Confidential Ethernet Job Tracker Task Tracker DataNode 2nd NameNode NameNode Hadoop Architecture - Traditional R (RHIPE) Mahout Hive HBasePIG NameNode Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node
  • 14.
    14 Internal Use -Confidential Ethernet R (RHIPE) PIG Mahout Hive HBase Job Tracker Task Tracker DataNode Compute Node Compute Node Compute Node Compute NodeCompute Node Compute Node NameNode Hadoop Architecture with Isilon name node name node name node name node datanode
  • 15.
    15 Internal Use -Confidential ISILON DATA LAKE DATA PROTECTION DATA SECURITY PERFORMANCE MANAGEMENT DATA MANAGEMENT
  • 16.
    16 Internal Use -Confidential HDFS SMB, NFS, HTTP, FTP, HDFS node info node info node info node info node info node info node info node info node info Node reply Node reply Node reply Node reply Node reply Node reply Node reply Node reply Node reply file file file file file file file file Node reply Node reply Node reply Node replyNFS NFS SMB SMB name node name node name node name node name node name node name node MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce datanodedatanode Isilon OneFS Compute Data 1X HOW IT LOOKS Name node Data Compute
  • 17.
  • 18.
    Internal Use -Confidential Phased Approach to Hadoop Tiered Storage with Isilon • Hadoop Cluster with DAS for interactive and batch queries • Queriable “active archive” in Isilon / ECS configured as a separate Hadoop cluster • Archival policy implemented using scripts executed manually Phase 0: Archival Cluster • Hot data in Hadoop Cluster with DAS • Cold data in Isilon configured as a HDFS Target • Hive, map-reduce and Spark jobs can run across the 2 clusters • URIs to indicate whether data is in DAS cluster or Isilon Cluster • Tiering policy implemented using scripts executed manually Phase 1: Tiering with Location Aware queries Same as Phase 1, with additional capability : • Data location handled transparently for Hive, map-reduce and Spark jobs : URIs don’t need to indicate whether data is in DAS cluster or Isilon Cluster Phase 2: Tiering with Location transparent queries Same as Phase 2, with additional capability : • Tiering policy implemented using automated data movement mechanisms. Phase 3: Automated tiering
  • 19.
    19 Internal Use -Confidential It is an ecosystem where sensors, devices and equipment are connected to a network and can transmit and receive data for tracking, analysis and action. Operational Technology Industrial automation Fleet telematics Material handling Information Technology Assets Inventory People IoT It’s not new and not new to Dell. It is the integration and extension of OT and IT technologies that have been round for decades What is the Internet of Things?
  • 20.
    20 Internal Use -Confidential It’s a great big IoT world out there Smart Connected Business – from gateways to informed decisions Transport Connector Private and public networks10’s of billions of connected things Things Sensors High-performance computer infrastructure Application layer SAP Hana In-Memory database layer Libraries Manufacturing Energy and Natural Resources Transportation Building & Industrial Automation
  • 21.
    21 Internal Use -Confidential Multiple Partners and Blueprints for OT / IT SAP HANA®Software AG Apama® Dell Edge Gateway 5000 Structured Data Dell EMC Data Center Real-Time Data Unstructured Data Kepware KEPServerEX® VisualizationsStream Analytics Machine LearningReportingAnalyticsProtocol Translation 0 0 1 0 1 1 1 0 0 1 1 0
  • 22.
    Our Vision for Unstructured Storage OBJECT STREAM FILE ISILONECS PROJECTNAUTILUS Software-DefinedIn The CloudCommon ExperienceCommon Hardware
  • 23.
    Internal Use -Confidential Project “Nautilus” Streaming Storage + Analytics EngineProject Nautilus Turbocharge Isilon and ECS for Streaming Batch Storage tier Streaming IoT data
  • 24.
    Today’s IoT Analytics“Accidental Architecture” Batch Real-Time Interactive exploration by Data Scientists Real-time intelligence at the NOC Sensors MirrorMaker DR Site Mobile Devices App Logs Producers Surface / Act
  • 25.
    Internal Use -Confidential Project Nautilus: A Unified Data Pipeline Strongly Consistent Storage  Exactly Once Processing  Unified Analytics Unified Analytics Real-Time, Batch, Interactive Sensors Mobile Devices App Logs Isilon / ECS Ingest Buffer Pub/Sub Search Persistent Data Structures Pravega Streams Unified Storage Real-time intelligence at the NOC Interactive exploration by Data Scientists Surface / Act Producers
  • 26.
    Internal Use -Confidential Project Nautilus: A Unified Data Pipeline Strongly Consistent Storage  Exactly Once Processing  Unified Analytics Unified Analytics Real-Time, Batch, Interactive Sensors Mobile Devices App Logs Isilon / ECS Ingest Pub/Sub Search S3 Pravega Streams Unified Storage Real-time intelligence at the NOC Interactive exploration by Data Scientists Surface / Act Producers HDFS NFS SMB
  • 27.
    Internal Use -Confidential pravega.io