SlideShare a Scribd company logo
1 of 35
1© Copyright 2016 EMC Corporation. All rights reserved.
TAME THAT BEAST
Stefan Radtke
CTO, EMEA
EMC Emerging Technology Division
2EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Welcome !
Dr. Stefan Radtke
CTO Isilon, EMEA
EMC Emerging Technology Division
- 1995-2011: 17 Years for IBM in various technical roles
- 2011: Joined EMC
- 2012-today: CTO, EMEA for EMC Insilon
Phone: +49-176-34434460
E-Mail: Stefan.Radtke@emc.com
Linkedin: http://de.linkedin.com/in/drstefanradtke
Blog: http://stefanradtke.blogspot.com
3EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
System Availability
Uptime Downtime (per year)
99.999% (AKA 5 nines) 5.26 minutes
99.99% (AKA 4 nines) 52.6 minutes
99.5% 1.83 days
99% (AKA 2 nines) 7.30 days
95% 18.25 days
What is your Data Warehouses’ uptime SLA?
What is your Hadoop uptime SLA?
Why are they different?
4EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
We have good Hadoop Outcomes
 Smart Grid
Fraud / Broken Devices & Grid Traffic Projections
 Fraud
 Healthcare research
Genomes and Healthcare – BRCA
 Connected Car - Tesla
5EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Hadoop takes on DB like Features
• Newly Added Features in Hadoop 3.0
– Erasure Coding (HDFS-EC / HDFS-7485) is being introduced
to Hadoop
– Additional Stand By Name Nodes for increase resiliency
(HDFS-6440)
• Future Features
– Random read support from Indexed Name Node – (HDFS-
8555)
– Disaster Recovery (HDFS-5442)
6EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
So...
• IF Hadoop is the Modern Database
AND
• IF Hadoop is taking on more Modern Database Features
AND
• Successful Outcomes are becoming more prolific...
Why are Operations of Hadoop and Uptime / SLAs seem
like such an afterthought on most clusters?
7EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
KPIs
• Why do companies who have VERY successful Data
Warehouses, ETL processes, and KPI Dashboards
have so little of THOSE for their Hadoop instance
which is now generating all their Machine Learning
and Data & Analytics?
8EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
What can go wrong?
• Forbes: “..haven’t taken into account
some long-term or ongoing cost associated
with the project…”
• Information Week: “…Unanticipated
problems beyond the big data
technology…”
• Computerworld: “…there are enterprises
that underestimated the paradigm shift…”
9EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
An Intervention
• Why is the concept of 99.99% seem bad for a
production Hadoop system?
• Why is solid KPIs around data collection and capture
sound absurd?
• Since when did a backup copy or backup of your
primary analytics data become not needed?
• Is this just because Hadoop is about standing up cheap
hardware?
• Why do companies need a catalyst before these things
seem common again?
10EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Why wouldn’t you want:
• Two clusters fully addressable with data
replication located in separate geographies
• Data Re-silvering when additional capacity is
added
• Complete fault tolerance in the environment
and not just Data / Node redundancy to allow 4
Nines availability
• Operational scale that allows 24 x 7 support
EMPTYEMPTYEMPTYEMPTYEMPTYFULLFULLFULLFULLBALANCEDBALANCEDBALANCEDBALANCEDBALANCED
11EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
What is my Idea - 1
• Separation of compute and storage.
– Why do you think the cloud Hadoop is able to offer better SLAs
then on premise Hadoop? It isn’t because of a ton of single point
of failure compute boxes. They separate compute and storage.
• Look at Infrastructure / Big Data as a service centralization
– Instead of trying to staff 25 hadoop clusters for 24 x 7, centralize
the team and provide QoS back to the applications
12EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Data Gravity
• Data sets get bigger over time, and moving them becomes
increasingly difficult
– This leads to switching costs & lock in
• Data is a strategic asset to enterprises with digital strategies
• Data becomes central – build around it
– Applications tend to migrate toward the data
– Apply advanced analytics to the data “in-place”
13EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Servers
Storage
Servers
Storage
Servers
Storage
Servers
Storage
Servers
Storage
Servers
Storage
Multiple Hadoop Silos
Storage Silos
vServer
Applications
Finance Marketing Operations Sales
Servers
Storage
Servers
Storage
Servers
Storage
Servers
Storage
CRMERP SCM CRM
Servers
Storage
Servers
Storage
Servers
Storage
Analytics
Copy
Copy
Traditional IT
14EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
THE PROBLEM OF DATA MOVEMENT
• To get statistically relevant results, a typical minimal required
data set is about 100 TB.
• That’s also the recommendet minimal Hadoop cluster size
• To copy 100TB over a dedicated 10 GBE link takes about 24
hours.
You need a Data Lake that unserstands Posix/Windows
and HDFS to avoid data movement (=In-place Analytics)
15EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
EMC DATA LAKE
Isilon
Servers
Applications
Finance Marketing Operations Sales
Servers Servers Servers Servers
CRMERP SCM CRM
Servers Servers Servers
Analytics + Mobile Applications
• Data Lake
Servers Servers Servers Servers
16EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
WHAT IS A DATA LAKE?
17EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Isilon Data Lake Architecture
ClientsC
LAN
C
Clients
Clients
Isilon Node
GB/10GB
Ethernet
Isilon
SAS
Isilon Node
SAS
Isilon Node
SAS
Infiniband
Scale out Data Lake
 OneFS integrates RAID, Volume Manager and
Filesystem.
 Uses internal disk and spawns a single
filesystem accross disks
 Development start in the 2000‘s
 Extremly mature, based on FreeBSD
 Supports many access protocols
…
Scale Out
Clients
Clients
LAN
18EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
• Multi-threaded daemon runs on all nodes
– Services both NN and DN protocols
– Translates HDFS RPCs to POSIX system calls
– Stateless, underlying FS handles coherency
HDFS Implementation as a Protocol
OneFS Node
isi_hdfs_d
Thread
Request VFS
OneFS
Syscall
Response
19EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
HDFS IMPLEMENTED LIKE A NAS PROTOCOL
OneFS runs a daemon that
speaks NameNode and
DataNode natively
OneFS Clustered FileSystem
OneFS Node
NameNode
DataNode
OneFS Node
NameNode
DataNode
OneFS Node
NameNode
DataNode
OneFS Node
NameNode
DataNode
Hadoop
Node
DFSClient
1) Request(“/file”)
2) Response
(block locations) 3) GetBlock(block)
20EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
ISILON - FOR ALL TYPES OF UNSTRUCTURED DATA
Archive &
Backup Target
File shares
Home
Directories
BLOBS
Design, Test
& Manufacture
Retail &
Monetization
Transaction
Hadoop &
Analytics
Sync ‘n Share
Application Test
Content
Social &
Next-Gen
Surveillance
Isilon
Data Lake
© Copyright 2016 EMC Corporation. All rights reserved.
21EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
HDFS
SMB, NFS,
HTTP, FTP,
HDFS 1.x
...
HDFS 2.x
...
name
node
name
node
name
node
name
node
datanode
NFS
SMB
SMB
NFS MAP Reduce
MAP Reduce
MAP Reduce
MAP Reduce
MAP Reduce
MAP Reduce
SUPPORT FOR MULTIPLE ANALYTICS APPLICATIONS
22EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY© Copyright 2015 EMC Corporation. All rights reserved.
DATA CENTER
CLOUDPOOLS
SmartPools Policy
Example
<30 days
>30 days
S210
NL410
>2 years Cloud
22
EXPAND DATA LAKE TO THE CLOUD
30 days-
1 year
> 1 year HD400
CLOUD PROVIDER
1 year –
2 years
23EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
CLOUDPOOLS
DATA CENTER
23
CLOUD PROVIDER
APPS &
USERS
Access time
CLOUD ENABLED DATA LAKE
24EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Parallel Replication
 Designed ground-up for scale-out storage
 Aggregate throughput scales with capacity
 Maintain consistent RPO over growing data sets
 Underlying FS knowledge
– Snapshot integration
– Block-level deltas
– Rich meta-data transfer
 Automated Data Failover/Failback
25EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Storage Considerations
STANDARD HADOOP CLUSTER
HADOOP USING EMC ISILON DATA LAKE
100 Nodes
Compute + DAS
24 TB per Node
/3 for
Hadoop
Copies
800TB Usable,
but rarely
achieved
5+
Cabinets
Spill space for
ingestion and
extraction
20 Nodes
Compute +
800TB Isilon
Single Copy with
Erasure Coding
800TB
Usable
1 Cabinet It is NAS
26EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
What is my Idea - 2
• Build a fully functioning cost model that includes all
items you think are “free”, but costs stop when you
change the Architecture.
– Project based funding is great until you want to centralize.
Centralization models (BDaaS) work when you consider all
the sundry costs typically excluded by project based
funding (i.e., 24 x 7 support for each cluster, all in costs
that appear free but are sunk)
27EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
What is my Idea - 3
• Think about “build all yourself” vs. “buy”
• Focus on Analytics rather than infrastructure implementation,
software dependency, testing,.... etc.
• That has all been done already with EMC Big Data Systems and
Big Data Solutions
• Using pre-validated, installed and tested solutions reduces
complexity and increases reliability.
28EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
EMC BIG DATA PORTFOLIO
• Data Lake
• Data Lake Extensions
• Cloud Enabled
• Vblock
• VxRack
• VxRail
• Federation Business
Data Lake
29EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
HIGH PERFORMANCE
PREDICTABLE, LOW LATENCY
HDFS
Filesystem
Buffer Cache
Device Driver
SATA Controller
Disk
HDFS
Filesystem
Buffer Cache
Device Driver
PCIe SSD
PCIeSATA
PCIe
10msHDD
1000-2000µsHDD
Traditional PCIe SSD
Hadoop
Kernel
Motherboard
HDFS
PCIe
<100µs
DSSD
✓HDFS
Filesystem
Buffer Cache
Device Driver
SATA Controller
Disk
HDFS
Filesystem
Buffer Cache
Device Driver
PCIe SSD
PCIeSATA
PCIe
10msHDD
1000-2000µsSDD
Traditional PCIe SSD
Hadoop
Kernel
Motherboard
DSSD Hadoop
Plugin accesses
flash directly
• 10X Throughput
• 1/13th Latency
• No Application
Changes Required
30EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
P I V O T A L B I G D A T A S U I T E
V M W A R E V C L O U D S U I T E
EMC DATA LAKE FOUNDATION: ISILON + ECS
VCE VBLOCK | XTREMIO | DATA DOMAIN
O P E N
A N A L Y T I C S
T O O L B O X
D A T A A N D A N A L Y T I C S C A T A L O G
A D V A N C E D A N A L Y T I C SA P P L I C A T I O N S
A T S C A L E
D A T A
P R O C E S S I N G
GREENPLUM
DATABASE
HAWQ
SPRING XD PIVOTAL HDSPARK
REDIS
RABBITMQ
GEMFIRE
BDS ON PIVOTAL
CLOUD FOUNDRY
H A D O O P
PLATFORMMANAGER
DATAGOVERNOR
DATA
MANAGER
INGEST
MANAGER
ANALYTICS
MANAGER
EMC Business Data Lake
Look Demos at http://www.fbdldemo.com/
31EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Thursday, April 14th, 15:00 UTC
Watch out for :
• Hadoop Everywhere: Geo-Distributed Storage
for Big Data
Pesenters:
• Nikhil Joshi, EMC
• Vishrut Shah,EMC
33EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
A Remark on data locality
• U. C. Berkeley’s AMP Labs declared Data locality dead in
2011
• Cloudera has declared data locality dead in Hadoop 3.0
with HDFS-EC.
• Gartner has declared hadoop dead due to its limits
• Hadoop will only grow and have more dependency on it
going forward.
• A catalyst may be the next time I see you and uptime for
hadoop is your main concern.
34EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Simple to manage
Single file system, single volume, global namespace
Massively scalable
Scales from 16 TB to over 50 PB in a single cluster
200GB/s throughput, 3.75M IOPS
Unmatched efficiency
Over 80% storage utilization, automated tiering and SmartDedupe
Enterprise data protection
Efficient backup and disaster recovery, and N+1 thru N+4 redundancy
Robust security and compliance options
RBAC, Access Zones, WORM data security, File System Auditing
Data At Rest Encryption with SEDs, STIG hardening
CAC/PIV Smartcard authentication, FIPS OpenSSL support
Operational flexibility
Multi-protocol support including NFS, SMB, HTTP, FTP and HDFS
Object and Cloud computing including OpenStack Swift
Isilon Scale-Out NAS
35EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Geo-Scale
Geo-Replicated and Distributed to multiple locations
Massively scalable
Scales to billions of objects in a single namespace
Support for all file sizes
Support for individual files of any size.
Multi-Tenant
Efficient backup and disaster recovery, and N+1 thru N+4 redundancy
HDFS Compatible
Hortonworks Certified HDFS Compatible File System
Swift Compatible
Natively support Open Stack storage
Native Cloud Interface
Natively works with existing cloud protocols like S3 and Azure.
Elastic Cloud Storage (ECS)

More Related Content

What's hot

Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...DataWorks Summit
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobileDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
High Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into HadoopHigh Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into HadoopDataWorks Summit
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudDataWorks Summit/Hadoop Summit
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementDataWorks Summit/Hadoop Summit
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 

What's hot (20)

Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid AnalyticsTo The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016HPE Keynote Hadoop Summit San Jose 2016
HPE Keynote Hadoop Summit San Jose 2016
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Advanced Visualization of Spark jobs
Advanced Visualization of Spark jobsAdvanced Visualization of Spark jobs
Advanced Visualization of Spark jobs
 
High Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into HadoopHigh Speed Continuous & Reliable Data Ingest into Hadoop
High Speed Continuous & Reliable Data Ingest into Hadoop
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 

Viewers also liked

Presentation from physical to virtual to cloud emc
Presentation   from physical to virtual to cloud emcPresentation   from physical to virtual to cloud emc
Presentation from physical to virtual to cloud emcxKinAnx
 
ENGAGE2015: How is EMC Transforming Employee Communications? - Kevin Close, EMC
ENGAGE2015: How is EMC Transforming Employee Communications? - Kevin Close, EMCENGAGE2015: How is EMC Transforming Employee Communications? - Kevin Close, EMC
ENGAGE2015: How is EMC Transforming Employee Communications? - Kevin Close, EMCGuideSpark
 
Emc powerdata
Emc   powerdataEmc   powerdata
Emc powerdataPowerData
 
It's not the size of your cluster, it's how you use it
It's not the size of your cluster, it's how you use itIt's not the size of your cluster, it's how you use it
It's not the size of your cluster, it's how you use itDataWorks Summit/Hadoop Summit
 
EMC Documentum - xCP 2.x Installation and Deployment
EMC Documentum - xCP 2.x Installation and DeploymentEMC Documentum - xCP 2.x Installation and Deployment
EMC Documentum - xCP 2.x Installation and DeploymentHaytham Ghandour
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 

Viewers also liked (12)

Presentation from physical to virtual to cloud emc
Presentation   from physical to virtual to cloud emcPresentation   from physical to virtual to cloud emc
Presentation from physical to virtual to cloud emc
 
ENGAGE2015: How is EMC Transforming Employee Communications? - Kevin Close, EMC
ENGAGE2015: How is EMC Transforming Employee Communications? - Kevin Close, EMCENGAGE2015: How is EMC Transforming Employee Communications? - Kevin Close, EMC
ENGAGE2015: How is EMC Transforming Employee Communications? - Kevin Close, EMC
 
Emc powerdata
Emc   powerdataEmc   powerdata
Emc powerdata
 
It's not the size of your cluster, it's how you use it
It's not the size of your cluster, it's how you use itIt's not the size of your cluster, it's how you use it
It's not the size of your cluster, it's how you use it
 
Contributing to Open Source - A Beginners Guide
Contributing to Open Source - A Beginners GuideContributing to Open Source - A Beginners Guide
Contributing to Open Source - A Beginners Guide
 
EMC EC Overview
EMC EC OverviewEMC EC Overview
EMC EC Overview
 
EMC Documentum - xCP 2.x Installation and Deployment
EMC Documentum - xCP 2.x Installation and DeploymentEMC Documentum - xCP 2.x Installation and Deployment
EMC Documentum - xCP 2.x Installation and Deployment
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Rocking the World of Big Data at Centrica
Rocking the World of Big Data at CentricaRocking the World of Big Data at Centrica
Rocking the World of Big Data at Centrica
 
Running Spark in Production
Running Spark in ProductionRunning Spark in Production
Running Spark in Production
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 

Similar to Tame that Beast

Emc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshopEmc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshopsolarisyougood
 
MT47 Modernize infrastructure for a modern data center
MT47 Modernize infrastructure for a modern data centerMT47 Modernize infrastructure for a modern data center
MT47 Modernize infrastructure for a modern data centerDell EMC World
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...Brian Boyd
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Tony Pearson
 
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...epamspb
 
Disaggregated Hadoop Stacks
Disaggregated Hadoop StacksDisaggregated Hadoop Stacks
Disaggregated Hadoop StacksDataWorks Summit
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessDataWorks Summit
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015Doug O'Flaherty
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015Daniela Zuppini
 
Emc sql server 2012 overview
Emc sql server 2012 overviewEmc sql server 2012 overview
Emc sql server 2012 overviewsolarisyougood
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)Cloudera, Inc.
 
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptx
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptxCON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptx
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptxSergioBruno21
 
Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage Jürgen Ambrosi
 
EMC Big Data Solutions Overview
EMC Big Data Solutions OverviewEMC Big Data Solutions Overview
EMC Big Data Solutions Overviewwalshe1
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deckKeithETD_CTO
 

Similar to Tame that Beast (20)

Emc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshopEmc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshop
 
MT47 Modernize infrastructure for a modern data center
MT47 Modernize infrastructure for a modern data centerMT47 Modernize infrastructure for a modern data center
MT47 Modernize infrastructure for a modern data center
 
Emc isilon overview
Emc isilon overview Emc isilon overview
Emc isilon overview
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
 
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...
 
Disaggregated Hadoop Stacks
Disaggregated Hadoop StacksDisaggregated Hadoop Stacks
Disaggregated Hadoop Stacks
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data Access
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
 
Emc sql server 2012 overview
Emc sql server 2012 overviewEmc sql server 2012 overview
Emc sql server 2012 overview
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptx
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptxCON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptx
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptx
 
Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage
 
EMC Big Data Solutions Overview
EMC Big Data Solutions OverviewEMC Big Data Solutions Overview
EMC Big Data Solutions Overview
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Tame that Beast

  • 1. 1© Copyright 2016 EMC Corporation. All rights reserved. TAME THAT BEAST Stefan Radtke CTO, EMEA EMC Emerging Technology Division
  • 2. 2EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY Welcome ! Dr. Stefan Radtke CTO Isilon, EMEA EMC Emerging Technology Division - 1995-2011: 17 Years for IBM in various technical roles - 2011: Joined EMC - 2012-today: CTO, EMEA for EMC Insilon Phone: +49-176-34434460 E-Mail: Stefan.Radtke@emc.com Linkedin: http://de.linkedin.com/in/drstefanradtke Blog: http://stefanradtke.blogspot.com
  • 3. 3EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY System Availability Uptime Downtime (per year) 99.999% (AKA 5 nines) 5.26 minutes 99.99% (AKA 4 nines) 52.6 minutes 99.5% 1.83 days 99% (AKA 2 nines) 7.30 days 95% 18.25 days What is your Data Warehouses’ uptime SLA? What is your Hadoop uptime SLA? Why are they different?
  • 4. 4EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY We have good Hadoop Outcomes  Smart Grid Fraud / Broken Devices & Grid Traffic Projections  Fraud  Healthcare research Genomes and Healthcare – BRCA  Connected Car - Tesla
  • 5. 5EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY Hadoop takes on DB like Features • Newly Added Features in Hadoop 3.0 – Erasure Coding (HDFS-EC / HDFS-7485) is being introduced to Hadoop – Additional Stand By Name Nodes for increase resiliency (HDFS-6440) • Future Features – Random read support from Indexed Name Node – (HDFS- 8555) – Disaster Recovery (HDFS-5442)
  • 6. 6EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY So... • IF Hadoop is the Modern Database AND • IF Hadoop is taking on more Modern Database Features AND • Successful Outcomes are becoming more prolific... Why are Operations of Hadoop and Uptime / SLAs seem like such an afterthought on most clusters?
  • 7. 7EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY KPIs • Why do companies who have VERY successful Data Warehouses, ETL processes, and KPI Dashboards have so little of THOSE for their Hadoop instance which is now generating all their Machine Learning and Data & Analytics?
  • 8. 8EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY What can go wrong? • Forbes: “..haven’t taken into account some long-term or ongoing cost associated with the project…” • Information Week: “…Unanticipated problems beyond the big data technology…” • Computerworld: “…there are enterprises that underestimated the paradigm shift…”
  • 9. 9EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY An Intervention • Why is the concept of 99.99% seem bad for a production Hadoop system? • Why is solid KPIs around data collection and capture sound absurd? • Since when did a backup copy or backup of your primary analytics data become not needed? • Is this just because Hadoop is about standing up cheap hardware? • Why do companies need a catalyst before these things seem common again?
  • 10. 10EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY Why wouldn’t you want: • Two clusters fully addressable with data replication located in separate geographies • Data Re-silvering when additional capacity is added • Complete fault tolerance in the environment and not just Data / Node redundancy to allow 4 Nines availability • Operational scale that allows 24 x 7 support EMPTYEMPTYEMPTYEMPTYEMPTYFULLFULLFULLFULLBALANCEDBALANCEDBALANCEDBALANCEDBALANCED
  • 11. 11EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY What is my Idea - 1 • Separation of compute and storage. – Why do you think the cloud Hadoop is able to offer better SLAs then on premise Hadoop? It isn’t because of a ton of single point of failure compute boxes. They separate compute and storage. • Look at Infrastructure / Big Data as a service centralization – Instead of trying to staff 25 hadoop clusters for 24 x 7, centralize the team and provide QoS back to the applications
  • 12. 12EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY Data Gravity • Data sets get bigger over time, and moving them becomes increasingly difficult – This leads to switching costs & lock in • Data is a strategic asset to enterprises with digital strategies • Data becomes central – build around it – Applications tend to migrate toward the data – Apply advanced analytics to the data “in-place”
  • 13. 13EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY Servers Storage Servers Storage Servers Storage Servers Storage Servers Storage Servers Storage Multiple Hadoop Silos Storage Silos vServer Applications Finance Marketing Operations Sales Servers Storage Servers Storage Servers Storage Servers Storage CRMERP SCM CRM Servers Storage Servers Storage Servers Storage Analytics Copy Copy Traditional IT
  • 14. 14EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY THE PROBLEM OF DATA MOVEMENT • To get statistically relevant results, a typical minimal required data set is about 100 TB. • That’s also the recommendet minimal Hadoop cluster size • To copy 100TB over a dedicated 10 GBE link takes about 24 hours. You need a Data Lake that unserstands Posix/Windows and HDFS to avoid data movement (=In-place Analytics)
  • 15. 15EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY EMC DATA LAKE Isilon Servers Applications Finance Marketing Operations Sales Servers Servers Servers Servers CRMERP SCM CRM Servers Servers Servers Analytics + Mobile Applications • Data Lake Servers Servers Servers Servers
  • 16. 16EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY WHAT IS A DATA LAKE?
  • 17. 17EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY Isilon Data Lake Architecture ClientsC LAN C Clients Clients Isilon Node GB/10GB Ethernet Isilon SAS Isilon Node SAS Isilon Node SAS Infiniband Scale out Data Lake  OneFS integrates RAID, Volume Manager and Filesystem.  Uses internal disk and spawns a single filesystem accross disks  Development start in the 2000‘s  Extremly mature, based on FreeBSD  Supports many access protocols … Scale Out Clients Clients LAN
  • 18. 18EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY • Multi-threaded daemon runs on all nodes – Services both NN and DN protocols – Translates HDFS RPCs to POSIX system calls – Stateless, underlying FS handles coherency HDFS Implementation as a Protocol OneFS Node isi_hdfs_d Thread Request VFS OneFS Syscall Response
  • 19. 19EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY HDFS IMPLEMENTED LIKE A NAS PROTOCOL OneFS runs a daemon that speaks NameNode and DataNode natively OneFS Clustered FileSystem OneFS Node NameNode DataNode OneFS Node NameNode DataNode OneFS Node NameNode DataNode OneFS Node NameNode DataNode Hadoop Node DFSClient 1) Request(“/file”) 2) Response (block locations) 3) GetBlock(block)
  • 20. 20EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY ISILON - FOR ALL TYPES OF UNSTRUCTURED DATA Archive & Backup Target File shares Home Directories BLOBS Design, Test & Manufacture Retail & Monetization Transaction Hadoop & Analytics Sync ‘n Share Application Test Content Social & Next-Gen Surveillance Isilon Data Lake © Copyright 2016 EMC Corporation. All rights reserved.
  • 21. 21EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY HDFS SMB, NFS, HTTP, FTP, HDFS 1.x ... HDFS 2.x ... name node name node name node name node datanode NFS SMB SMB NFS MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce SUPPORT FOR MULTIPLE ANALYTICS APPLICATIONS
  • 22. 22EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY© Copyright 2015 EMC Corporation. All rights reserved. DATA CENTER CLOUDPOOLS SmartPools Policy Example <30 days >30 days S210 NL410 >2 years Cloud 22 EXPAND DATA LAKE TO THE CLOUD 30 days- 1 year > 1 year HD400 CLOUD PROVIDER 1 year – 2 years
  • 23. 23EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY CLOUDPOOLS DATA CENTER 23 CLOUD PROVIDER APPS & USERS Access time CLOUD ENABLED DATA LAKE
  • 24. 24EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY Parallel Replication  Designed ground-up for scale-out storage  Aggregate throughput scales with capacity  Maintain consistent RPO over growing data sets  Underlying FS knowledge – Snapshot integration – Block-level deltas – Rich meta-data transfer  Automated Data Failover/Failback
  • 25. 25EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY Storage Considerations STANDARD HADOOP CLUSTER HADOOP USING EMC ISILON DATA LAKE 100 Nodes Compute + DAS 24 TB per Node /3 for Hadoop Copies 800TB Usable, but rarely achieved 5+ Cabinets Spill space for ingestion and extraction 20 Nodes Compute + 800TB Isilon Single Copy with Erasure Coding 800TB Usable 1 Cabinet It is NAS
  • 26. 26EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY What is my Idea - 2 • Build a fully functioning cost model that includes all items you think are “free”, but costs stop when you change the Architecture. – Project based funding is great until you want to centralize. Centralization models (BDaaS) work when you consider all the sundry costs typically excluded by project based funding (i.e., 24 x 7 support for each cluster, all in costs that appear free but are sunk)
  • 27. 27EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY What is my Idea - 3 • Think about “build all yourself” vs. “buy” • Focus on Analytics rather than infrastructure implementation, software dependency, testing,.... etc. • That has all been done already with EMC Big Data Systems and Big Data Solutions • Using pre-validated, installed and tested solutions reduces complexity and increases reliability.
  • 28. 28EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY EMC BIG DATA PORTFOLIO • Data Lake • Data Lake Extensions • Cloud Enabled • Vblock • VxRack • VxRail • Federation Business Data Lake
  • 29. 29EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY HIGH PERFORMANCE PREDICTABLE, LOW LATENCY HDFS Filesystem Buffer Cache Device Driver SATA Controller Disk HDFS Filesystem Buffer Cache Device Driver PCIe SSD PCIeSATA PCIe 10msHDD 1000-2000µsHDD Traditional PCIe SSD Hadoop Kernel Motherboard HDFS PCIe <100µs DSSD ✓HDFS Filesystem Buffer Cache Device Driver SATA Controller Disk HDFS Filesystem Buffer Cache Device Driver PCIe SSD PCIeSATA PCIe 10msHDD 1000-2000µsSDD Traditional PCIe SSD Hadoop Kernel Motherboard DSSD Hadoop Plugin accesses flash directly • 10X Throughput • 1/13th Latency • No Application Changes Required
  • 30. 30EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY P I V O T A L B I G D A T A S U I T E V M W A R E V C L O U D S U I T E EMC DATA LAKE FOUNDATION: ISILON + ECS VCE VBLOCK | XTREMIO | DATA DOMAIN O P E N A N A L Y T I C S T O O L B O X D A T A A N D A N A L Y T I C S C A T A L O G A D V A N C E D A N A L Y T I C SA P P L I C A T I O N S A T S C A L E D A T A P R O C E S S I N G GREENPLUM DATABASE HAWQ SPRING XD PIVOTAL HDSPARK REDIS RABBITMQ GEMFIRE BDS ON PIVOTAL CLOUD FOUNDRY H A D O O P PLATFORMMANAGER DATAGOVERNOR DATA MANAGER INGEST MANAGER ANALYTICS MANAGER EMC Business Data Lake Look Demos at http://www.fbdldemo.com/
  • 31. 31EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY Thursday, April 14th, 15:00 UTC Watch out for : • Hadoop Everywhere: Geo-Distributed Storage for Big Data Pesenters: • Nikhil Joshi, EMC • Vishrut Shah,EMC
  • 32.
  • 33. 33EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY A Remark on data locality • U. C. Berkeley’s AMP Labs declared Data locality dead in 2011 • Cloudera has declared data locality dead in Hadoop 3.0 with HDFS-EC. • Gartner has declared hadoop dead due to its limits • Hadoop will only grow and have more dependency on it going forward. • A catalyst may be the next time I see you and uptime for hadoop is your main concern.
  • 34. 34EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY Simple to manage Single file system, single volume, global namespace Massively scalable Scales from 16 TB to over 50 PB in a single cluster 200GB/s throughput, 3.75M IOPS Unmatched efficiency Over 80% storage utilization, automated tiering and SmartDedupe Enterprise data protection Efficient backup and disaster recovery, and N+1 thru N+4 redundancy Robust security and compliance options RBAC, Access Zones, WORM data security, File System Auditing Data At Rest Encryption with SEDs, STIG hardening CAC/PIV Smartcard authentication, FIPS OpenSSL support Operational flexibility Multi-protocol support including NFS, SMB, HTTP, FTP and HDFS Object and Cloud computing including OpenStack Swift Isilon Scale-Out NAS
  • 35. 35EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY Geo-Scale Geo-Replicated and Distributed to multiple locations Massively scalable Scales to billions of objects in a single namespace Support for all file sizes Support for individual files of any size. Multi-Tenant Efficient backup and disaster recovery, and N+1 thru N+4 redundancy HDFS Compatible Hortonworks Certified HDFS Compatible File System Swift Compatible Natively support Open Stack storage Native Cloud Interface Natively works with existing cloud protocols like S3 and Azure. Elastic Cloud Storage (ECS)