SlideShare a Scribd company logo
1 of 32
1© Copyright 2014 EMC Corporation. All rights reserved.
EMC ViPR HDFS Data
Service Technical
Overview
Download this slide
http://ouo.io/FuYX5
VIRTUALIZE
EVERYTHING
COMPROMISE
NOTHING
2© Copyright 2014 EMC Corporation. All rights reserved.
Disruptive / Opportunistic IT Trends
Mobile Cloud Big Data Social
T R U S T
3© Copyright 2014 EMC Corporation. All rights reserved.
Mainframe, Mini Computer
Terminals
MILLIONS
OF USERS
THOUSANDS
OF APPS
LAN/Internet Client/Server
PC
HUNDREDS OF MILLIONS
OF USERS
TENS OF THOUSANDS
OF APPS
Mobile Cloud Big Data Social
Mobile Devices
BILLIONS
OF USERS
MILLIONS
OF APPS
Source: IDC, 2013
4© Copyright 2014 EMC Corporation. All rights reserved.
The Big Data Economy
More data sources, richer content, longer utility
40
ZB
Source: IDC 2012 Digital Universe Study
5© Copyright 2014 EMC Corporation. All rights reserved.
Significant financial value across many verticals
The Big Data Potential
Source: “Big Data: The Next Frontier for Innovation, Competition,
and Productivity”, McKinsey Global Institute
US Retail
• 60+% increase in net
margin possible
• 0.5-1% annual
productivity growth
US Healthcare
• $300 billion value per year
• 0.7% annual productivity
growth
Manufacturing
• Up to 50% decrease in
product development,
assembly costs
• Up to 7% reduction in
working capital
Global personal location
data
• $100 billion+ revenue for
service providers
• Up to $700 billion value to
end users
6© Copyright 2014 EMC Corporation. All rights reserved.
Supporting 3rd platform app with 2nd platform infrastructure
The Challenges to Widespread Adoption
 How to move from the lab to
production?
– Trusting an open source Hadoop
distribution
– HDFS not enterprise grade
– Analytics on existing data?
 What’s the risk?
– Dedicated cluster requires significant
investment
– ROI? – does the data have value?
 What are the costs?
– Costs increase as my dedicated
analytics cluster scales
– Bandwidth and network costs of
moving data to the cluster
7© Copyright 2014 EMC Corporation. All rights reserved.
Big Data Storage Requirements
In-place analytics and protection of all data types
 Data Unification:
– Big Data storage must support structured, semi-
structured, and unstructured data types.
 In-Place Analytics:
– Analytics, compute workloads need to execute
where the data live.
 Data Compliance:
– More sources of data, more volume, velocity,
etc. exacerbate compliance and long-term
retention requirements
40 ZB
8© Copyright 2014 EMC Corporation. All rights reserved.
ViPR Data Services
Overview
9© Copyright 2014 EMC Corporation. All rights reserved.
Data Services that Span Arrays and Support Hybrid Data Types
ViPR Data Services
 Storage services at cloud scale
– Built in software
– Layered over both traditional and new storage
devices
 Object and HDFS data services
– Many more to follow, at regular intervals
– Open API for 3rd party development
 Unified platform
– Data services can be used as different
semantic views on the same data e.g. Object
on File, HDFS on Object
10© Copyright 2014 EMC Corporation. All rights reserved.
EMC ViPR - Software-Defined Storage
ViPR
Data Services
ViPR
Controller
EMC ViPR Platform
Provisioning Self-Service Reporting Automation
Third-Party
Isilon
Atmos
VMAX VNX VPLEX
Commodity
XtremIOCentera
11© Copyright 2014 EMC Corporation. All rights reserved.
ViPR Data Services: Architecture
ViPR
Data Path
ViPR
Control Path
• Distributed Infrastructure
• Device Drivers
• Elastic Volumes
• Migration
GEO-SCALE INDEX, METADATA, TRANSACTIONS
… 3rd PARTYOBJECT HDFS KEY-VALUE
GEO SCALE INDEX, METADATA, TRANSACTIONS
Commodity
VNX Isilon
3rd Party
12© Copyright 2014 EMC Corporation. All rights reserved.
ViPR Data Services Address Big Data
Storage Requirements
 Data Unification
– Transform existing storage infrastructure into a
data lake
– Structured, semi/un-structured content
 In-place Analytics
– Run queries against data on existing arrays
– Flexible software model supports future colocation
of compute and storage
 Data Compliance
– Choice and flexibility or persistence layer
– Support cloud-scale and consumer-grade
applications on enterprise-grade infrastructure
40 ZB
13© Copyright 2014 EMC Corporation. All rights reserved.
ViPR HDFS Data
Service Overview
14© Copyright 2014 EMC Corporation. All rights reserved.
ViPR HDFS Service Overview
 HDFS is becoming the de facto file
system for distributed applications
 ViPR is a great platform for HDFS
– Addresses limitations of off-the-shelf
HDFS
– Brings HDFS to existing storage
hardware
– Enables HDFS/Object/File scenarios
– Flexible software model
15© Copyright 2014 EMC Corporation. All rights reserved.
ViPR HDFS Service Overview
 API head
– Custom client/server protocol optimized
for high scale
– Uses the same unstructured storage
engine as ViPR Object data service
 Client library over the HDFS API
– Provides a viprhdfs:// drop-in
replacement for HDFS 2.0
– Can be seamlessly added to existing
Hadoop distributions
16© Copyright 2014 EMC Corporation. All rights reserved.
EMC ViPR Data Services
ViPR
Data Services
ViPR
Controller
EMC ViPR Platform
Provisioning Self-Service Reporting Automation
Third-Party
IsilonVNX
17© Copyright 2014 EMC Corporation. All rights reserved.
How ViPR HDFS Data Service Helps
Accelerate Big Data initiatives
 Quickly move from lab to production
– Utilize existing infrastructure as a big data
repository or “data lake”
– Eliminate single namenode single point of failure
 Reduce risk
– Run queries against data on existing arrays
– Leverage existing investments
 Reduce costs
– Reduce the growth in dedicated analytics
infrastructure
– Reduce bandwidth, storage and network costs
40 ZB
18© Copyright 2014 EMC Corporation. All rights reserved.
ViPR HDFS Data
Service
Technical Deep Dive
Name Node
JOB TRACKER
Commodity Compute & Storage
TASK TRACKER
Data Store
MapReduce Task
Client
TASK TRACKER
Data Store
MapReduce Task
TASK TRACKER
Data Store
MapReduce Task
HDFS ARCHITECTURE
VNX Isilon
3rd Party
VMAX
Commodity
JOB TRACKER
TASK TRACKER
MapReduce Task
Client
TASK TRACKER
MapReduce Task
TASK TRACKER
MapReduce Task
ViPR HDFS ARCHITECTURE
VNX Isilon
3rd Party
VMAX
Commodity
JOB TRACKER
TASK TRACKER
MapReduce Task
Client
TASK TRACKER
MapReduce Task
TASK TRACKER
MapReduce Task
• No single point of failure
• Leverage existing storage
• Compatible with existing
Hadoop distribution
• Mixed workload across
HDFS and Object
ViPR HDFS ARCHITECTURE
22© Copyright 2014 EMC Corporation. All rights reserved.
MapReduce Job Flow
Master Node
Job
Tracker
Task Tracker
Data Store
Commodity Compute & Storage
MapReduce Task
Client
Task Tracker
Data Store
MapReduce Task
Task Tracker
Data Store
MapReduce Task
Name
Node
Secondary
NameNode
Submit Job
Split into tasks
Rack 1 Rack 2
Data Node 1 Data Node 2 Data Node 3
23© Copyright 2014 EMC Corporation. All rights reserved.
Presales Training
Customer’s Hadoop Compute
Cluster
ViPR Controller
ViPR Data Node(s) running outside
the ViPR managed arrays
Blob
Engine
S3
Head
HDFS
Head
Customer
AD
Trust Relationship
ViPR HDFS - Under The Hood
Trust RelationshipTrust Relationship
Data
Read/
Write
Kerberos KDC
VNX
Isilon
3rd Party
24© Copyright 2014 EMC Corporation. All rights reserved.
HDFS uses ViPR Object Storage Engine
ViPR data services creates a unified pool (bucket) of data
VIRTUAL ARRAY
 Buckets of data span file shares
– Grow and shrink on demand
 Data is distributed and intermingled across
the storage
 Provides an HDFS interface
 ViPR makes HDFS enterprise grade
– ViPR HDFS replaces namenodes, no single point of
failure
Isilon
3rd Party
VNX
5500
25© Copyright 2014 EMC Corporation. All rights reserved.
Support Mixed Workloads
Object, File and HDFS operations on the same data
VIRTUAL ARRAY
Isilon
3rd Party
VNX
5500
 ViPR Data Services offer three
bucket options:
– Object
– HDFS
– ObjectandHDFS
 ObjectandHDFS provides user with
access to either S3 or HDFS
Interface
– Full compatibility with existing
object based APIs
▪ Amazon S3, Openstack Swift, Atmos
Object HDFS
Object
& HDFS
26© Copyright 2014 EMC Corporation. All rights reserved.
ViPR HDFS Data
Value Proposition
27© Copyright 2014 EMC Corporation. All rights reserved.
Instantly Deploy a Big Data Repository
Use existing arrays as a big data store
Isilon
3rd Party
VNX
5500
VIRTUAL ARRAY
 Reduce risk
– Reduce CAPEX investment required to
perform analytics
– Maintain data protection, compliance at
array level
 Reduces cost and complexity of
dedicated clusters
– Reduce need for new vendor nodes and
storage capacity
 Reduce data transfer time and
bandwidth costs
– 10 TBs takes 25 hours via 10gE
– 10 TBs takes 3 days via dedicated WAN
28© Copyright 2014 EMC Corporation. All rights reserved.
Expand the Reach of Big Data Queries
Expand analytics to ViPR-managed data stores
 Extend big data queries to run
on existing file arrays as
existing Hadoop deployments
 Opens new opportunities and
analytics scenarios
– Faster, easier business insights
Isilon
3rd Party
VNX
5500
VIRTUAL ARRAY
29© Copyright 2014 EMC Corporation. All rights reserved.
Leverage and Extend Existing Investments
Utilize existing Hadoop infrastructure
 ViPR HDFS data service can
be the data source for
Pig/Hive queries
– Fully compatible with existing
Hive/Pig query engines
 Can use an existing
infrastructure to query ViPR-
managed data stores
– Add data stores via ViPR
without having to re-write
queriesIsilon
3rd Party
VNX
5500
VIRTUAL ARRAY
30© Copyright 2014 EMC Corporation. All rights reserved.
Support Mixed Workloads
Provide multiple semantic views of the same data
 Eliminates expensive data movement
– Object based workloads and analytics applications can
manipulate the same data
 Increase developer productivity
– Different applications can target the same data without re-
writes
– IT can serve different developer and business groups with
the same infrastructure
 Increases data value
– Extract more insight from file and object data
(unstructured, semi-structured)
 Reduce infrastructure costs
– Eliminate dedicated data silos
31© Copyright 2014 EMC Corporation. All rights reserved.
Summary
 ViPR provides storage services at cloud scale
– Built in software
– Layered over both traditional and new storage devices
 ViPR creates a unified platform
– Data services can be used as different semantic views on the
same data e.g. Object, File, HDFS interfaces for same data
 ViPR HDFS accelerates journey to 3rd Platform
– Extend Big Data queries to existing storage
– Reduces complexity and cost of dedicated analytics
infrastructure
– Leverages existing investments
EMC ViPR HDFS Data Service Technical Overview

More Related Content

What's hot

S100293 hybrid-cloud-orlando-v1804a
S100293 hybrid-cloud-orlando-v1804aS100293 hybrid-cloud-orlando-v1804a
S100293 hybrid-cloud-orlando-v1804aTony Pearson
 
S100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804aS100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804aTony Pearson
 
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...EMC
 
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC IsilonEMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC IsilonBoni Bruno
 
S100295 reporting-monitoring-orlando-v1804a
S100295 reporting-monitoring-orlando-v1804aS100295 reporting-monitoring-orlando-v1804a
S100295 reporting-monitoring-orlando-v1804aTony Pearson
 
2019 Top IT Trends - Understanding the fundamentals of the next generation ...
2019 Top IT Trends - Understanding the  fundamentals of the next  generation ...2019 Top IT Trends - Understanding the  fundamentals of the next  generation ...
2019 Top IT Trends - Understanding the fundamentals of the next generation ...Tony Pearson
 
S100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804cS100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804cTony Pearson
 
S100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804aS100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804aTony Pearson
 
Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Ali Mirfallah
 
S100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804aS100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804aTony Pearson
 
Performance,cost and reliability through hybrid cloud storage
Performance,cost and reliability through hybrid cloud storagePerformance,cost and reliability through hybrid cloud storage
Performance,cost and reliability through hybrid cloud storageNetmagic Solutions Pvt. Ltd.
 
IMEXresearch software defined storage
IMEXresearch software defined storageIMEXresearch software defined storage
IMEXresearch software defined storageIMEX Research
 
Overview of Cloud Storage Enablement and Intelligent Storage Clouds
Overview of Cloud Storage Enablement and Intelligent Storage CloudsOverview of Cloud Storage Enablement and Intelligent Storage Clouds
Overview of Cloud Storage Enablement and Intelligent Storage CloudsTwinStrata
 
IBM Cloud Storage - Cleversafe
IBM Cloud Storage - CleversafeIBM Cloud Storage - Cleversafe
IBM Cloud Storage - CleversafeMichael Beatty
 
Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage Jürgen Ambrosi
 
Преимущества облачной инфраструктуры Huawei.
Преимущества облачной инфраструктуры Huawei.Преимущества облачной инфраструктуры Huawei.
Преимущества облачной инфраструктуры Huawei.Zaur Abutalimov
 
Carrier Grade OCP: Open Solutions for Telecom Data Centers
Carrier Grade OCP: Open Solutions for Telecom Data CentersCarrier Grade OCP: Open Solutions for Telecom Data Centers
Carrier Grade OCP: Open Solutions for Telecom Data CentersRadisys Corporation
 
Software Defined Data Center: The Intersection of Networking and Storage
Software Defined Data Center: The Intersection of Networking and StorageSoftware Defined Data Center: The Intersection of Networking and Storage
Software Defined Data Center: The Intersection of Networking and StorageEMC
 
Hadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management SimplicityHadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management SimplicityDataWorks Summit
 

What's hot (20)

S100293 hybrid-cloud-orlando-v1804a
S100293 hybrid-cloud-orlando-v1804aS100293 hybrid-cloud-orlando-v1804a
S100293 hybrid-cloud-orlando-v1804a
 
S100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804aS100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804a
 
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
 
EMC EC Overview
EMC EC OverviewEMC EC Overview
EMC EC Overview
 
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC IsilonEMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
 
S100295 reporting-monitoring-orlando-v1804a
S100295 reporting-monitoring-orlando-v1804aS100295 reporting-monitoring-orlando-v1804a
S100295 reporting-monitoring-orlando-v1804a
 
2019 Top IT Trends - Understanding the fundamentals of the next generation ...
2019 Top IT Trends - Understanding the  fundamentals of the next  generation ...2019 Top IT Trends - Understanding the  fundamentals of the next  generation ...
2019 Top IT Trends - Understanding the fundamentals of the next generation ...
 
S100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804cS100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804c
 
S100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804aS100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804a
 
Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Software-Defined Storage (SDS)
Software-Defined Storage (SDS)
 
S100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804aS100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804a
 
Performance,cost and reliability through hybrid cloud storage
Performance,cost and reliability through hybrid cloud storagePerformance,cost and reliability through hybrid cloud storage
Performance,cost and reliability through hybrid cloud storage
 
IMEXresearch software defined storage
IMEXresearch software defined storageIMEXresearch software defined storage
IMEXresearch software defined storage
 
Overview of Cloud Storage Enablement and Intelligent Storage Clouds
Overview of Cloud Storage Enablement and Intelligent Storage CloudsOverview of Cloud Storage Enablement and Intelligent Storage Clouds
Overview of Cloud Storage Enablement and Intelligent Storage Clouds
 
IBM Cloud Storage - Cleversafe
IBM Cloud Storage - CleversafeIBM Cloud Storage - Cleversafe
IBM Cloud Storage - Cleversafe
 
Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage
 
Преимущества облачной инфраструктуры Huawei.
Преимущества облачной инфраструктуры Huawei.Преимущества облачной инфраструктуры Huawei.
Преимущества облачной инфраструктуры Huawei.
 
Carrier Grade OCP: Open Solutions for Telecom Data Centers
Carrier Grade OCP: Open Solutions for Telecom Data CentersCarrier Grade OCP: Open Solutions for Telecom Data Centers
Carrier Grade OCP: Open Solutions for Telecom Data Centers
 
Software Defined Data Center: The Intersection of Networking and Storage
Software Defined Data Center: The Intersection of Networking and StorageSoftware Defined Data Center: The Intersection of Networking and Storage
Software Defined Data Center: The Intersection of Networking and Storage
 
Hadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management SimplicityHadoop-as-a-Service for Lifecycle Management Simplicity
Hadoop-as-a-Service for Lifecycle Management Simplicity
 

Similar to EMC ViPR HDFS Data Service Technical Overview

Emc vi pr software defined storage
Emc vi pr software defined storageEmc vi pr software defined storage
Emc vi pr software defined storagesolarisyougood
 
EMC Hadoop Starter Kit - ViPR Edition
EMC Hadoop Starter Kit - ViPR EditionEMC Hadoop Starter Kit - ViPR Edition
EMC Hadoop Starter Kit - ViPR Editionwalshe1
 
Software Defined Datacenter als 'route' naar het 3e IT platform
Software Defined Datacenter als 'route' naar het 3e IT platform Software Defined Datacenter als 'route' naar het 3e IT platform
Software Defined Datacenter als 'route' naar het 3e IT platform Proact Netherlands B.V.
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
Emc vi pr controller customer presentation
Emc vi pr controller customer presentationEmc vi pr controller customer presentation
Emc vi pr controller customer presentationsolarisyougood
 
VIPR SOFTWARE-DEFINED STORAGE
VIPR SOFTWARE-DEFINED STORAGEVIPR SOFTWARE-DEFINED STORAGE
VIPR SOFTWARE-DEFINED STORAGEEMC Nederland
 
Disaggregated Hadoop Stacks
Disaggregated Hadoop StacksDisaggregated Hadoop Stacks
Disaggregated Hadoop StacksDataWorks Summit
 
EMC HADOOP Storage Strategy
EMC HADOOP Storage StrategyEMC HADOOP Storage Strategy
EMC HADOOP Storage Strategywalshe1
 
Emc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshopEmc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshopsolarisyougood
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageSandeep Patil
 
Introduction to Cloud Application Platform
Introduction to Cloud Application PlatformIntroduction to Cloud Application Platform
Introduction to Cloud Application PlatformVMware vFabric
 
ECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayBob Sokol
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyPraveen Kumar
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
NetApp Se training storage grid webscale technical overview
NetApp Se training   storage grid webscale technical overviewNetApp Se training   storage grid webscale technical overview
NetApp Se training storage grid webscale technical overviewsolarisyougood
 
VMWare and SoftLayer Hybrid IT
VMWare and SoftLayer Hybrid ITVMWare and SoftLayer Hybrid IT
VMWare and SoftLayer Hybrid ITBenjamin Shrive
 
Internet of Things and Hadoop
Internet of Things and HadoopInternet of Things and Hadoop
Internet of Things and Hadoopaziksa
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
NetBackup CloudCatalyst: Efficient, Cost-Effective Deduplication to the Cloud
NetBackup CloudCatalyst: Efficient, Cost-Effective Deduplication to the CloudNetBackup CloudCatalyst: Efficient, Cost-Effective Deduplication to the Cloud
NetBackup CloudCatalyst: Efficient, Cost-Effective Deduplication to the CloudVeritas Technologies LLC
 

Similar to EMC ViPR HDFS Data Service Technical Overview (20)

Emc vi pr software defined storage
Emc vi pr software defined storageEmc vi pr software defined storage
Emc vi pr software defined storage
 
EMC Hadoop Starter Kit - ViPR Edition
EMC Hadoop Starter Kit - ViPR EditionEMC Hadoop Starter Kit - ViPR Edition
EMC Hadoop Starter Kit - ViPR Edition
 
Software Defined Datacenter als 'route' naar het 3e IT platform
Software Defined Datacenter als 'route' naar het 3e IT platform Software Defined Datacenter als 'route' naar het 3e IT platform
Software Defined Datacenter als 'route' naar het 3e IT platform
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Emc vi pr controller customer presentation
Emc vi pr controller customer presentationEmc vi pr controller customer presentation
Emc vi pr controller customer presentation
 
Emc vi pr controller
Emc vi pr controllerEmc vi pr controller
Emc vi pr controller
 
VIPR SOFTWARE-DEFINED STORAGE
VIPR SOFTWARE-DEFINED STORAGEVIPR SOFTWARE-DEFINED STORAGE
VIPR SOFTWARE-DEFINED STORAGE
 
Disaggregated Hadoop Stacks
Disaggregated Hadoop StacksDisaggregated Hadoop Stacks
Disaggregated Hadoop Stacks
 
EMC HADOOP Storage Strategy
EMC HADOOP Storage StrategyEMC HADOOP Storage Strategy
EMC HADOOP Storage Strategy
 
Emc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshopEmc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshop
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better Storage
 
Introduction to Cloud Application Platform
Introduction to Cloud Application PlatformIntroduction to Cloud Application Platform
Introduction to Cloud Application Platform
 
ECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps DayECS/Cloud Object Storage - DevOps Day
ECS/Cloud Object Storage - DevOps Day
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journey
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
NetApp Se training storage grid webscale technical overview
NetApp Se training   storage grid webscale technical overviewNetApp Se training   storage grid webscale technical overview
NetApp Se training storage grid webscale technical overview
 
VMWare and SoftLayer Hybrid IT
VMWare and SoftLayer Hybrid ITVMWare and SoftLayer Hybrid IT
VMWare and SoftLayer Hybrid IT
 
Internet of Things and Hadoop
Internet of Things and HadoopInternet of Things and Hadoop
Internet of Things and Hadoop
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
NetBackup CloudCatalyst: Efficient, Cost-Effective Deduplication to the Cloud
NetBackup CloudCatalyst: Efficient, Cost-Effective Deduplication to the CloudNetBackup CloudCatalyst: Efficient, Cost-Effective Deduplication to the Cloud
NetBackup CloudCatalyst: Efficient, Cost-Effective Deduplication to the Cloud
 

More from solarisyougood

Emc recoverpoint technical
Emc recoverpoint technicalEmc recoverpoint technical
Emc recoverpoint technicalsolarisyougood
 
Emc vmax3 technical deep workshop
Emc vmax3 technical deep workshopEmc vmax3 technical deep workshop
Emc vmax3 technical deep workshopsolarisyougood
 
EMC Atmos for service providers
EMC Atmos for service providersEMC Atmos for service providers
EMC Atmos for service providerssolarisyougood
 
Cisco prime network 4.1 technical overview
Cisco prime network 4.1 technical overviewCisco prime network 4.1 technical overview
Cisco prime network 4.1 technical overviewsolarisyougood
 
Designing your xen desktop 7.5 environment with training guide
Designing your xen desktop 7.5 environment with training guideDesigning your xen desktop 7.5 environment with training guide
Designing your xen desktop 7.5 environment with training guidesolarisyougood
 
Ibm aix technical deep dive workshop advanced administration and problem dete...
Ibm aix technical deep dive workshop advanced administration and problem dete...Ibm aix technical deep dive workshop advanced administration and problem dete...
Ibm aix technical deep dive workshop advanced administration and problem dete...solarisyougood
 
Ibm power ha v7 technical deep dive workshop
Ibm power ha v7 technical deep dive workshopIbm power ha v7 technical deep dive workshop
Ibm power ha v7 technical deep dive workshopsolarisyougood
 
Power8 hardware technical deep dive workshop
Power8 hardware technical deep dive workshopPower8 hardware technical deep dive workshop
Power8 hardware technical deep dive workshopsolarisyougood
 
Power systems virtualization with power kvm
Power systems virtualization with power kvmPower systems virtualization with power kvm
Power systems virtualization with power kvmsolarisyougood
 
Power vc for powervm deep dive tips & tricks
Power vc for powervm deep dive tips & tricksPower vc for powervm deep dive tips & tricks
Power vc for powervm deep dive tips & trickssolarisyougood
 
Emc data domain technical deep dive workshop
Emc data domain  technical deep dive workshopEmc data domain  technical deep dive workshop
Emc data domain technical deep dive workshopsolarisyougood
 
Ibm flash system v9000 technical deep dive workshop
Ibm flash system v9000 technical deep dive workshopIbm flash system v9000 technical deep dive workshop
Ibm flash system v9000 technical deep dive workshopsolarisyougood
 
Emc vnx2 technical deep dive workshop
Emc vnx2 technical deep dive workshopEmc vnx2 technical deep dive workshop
Emc vnx2 technical deep dive workshopsolarisyougood
 
Emc isilon technical deep dive workshop
Emc isilon technical deep dive workshopEmc isilon technical deep dive workshop
Emc isilon technical deep dive workshopsolarisyougood
 
Cisco mds 9148 s training workshop
Cisco mds 9148 s training workshopCisco mds 9148 s training workshop
Cisco mds 9148 s training workshopsolarisyougood
 
Cisco cloud computing deploying openstack
Cisco cloud computing deploying openstackCisco cloud computing deploying openstack
Cisco cloud computing deploying openstacksolarisyougood
 
Se training storage grid webscale technical overview
Se training   storage grid webscale technical overviewSe training   storage grid webscale technical overview
Se training storage grid webscale technical overviewsolarisyougood
 
Vmware 2015 with vsphereHigh performance application platforms
Vmware 2015 with vsphereHigh performance application platformsVmware 2015 with vsphereHigh performance application platforms
Vmware 2015 with vsphereHigh performance application platformssolarisyougood
 

More from solarisyougood (20)

Emc vipr srm workshop
Emc vipr srm workshopEmc vipr srm workshop
Emc vipr srm workshop
 
Emc recoverpoint technical
Emc recoverpoint technicalEmc recoverpoint technical
Emc recoverpoint technical
 
Emc vmax3 technical deep workshop
Emc vmax3 technical deep workshopEmc vmax3 technical deep workshop
Emc vmax3 technical deep workshop
 
EMC Atmos for service providers
EMC Atmos for service providersEMC Atmos for service providers
EMC Atmos for service providers
 
Cisco prime network 4.1 technical overview
Cisco prime network 4.1 technical overviewCisco prime network 4.1 technical overview
Cisco prime network 4.1 technical overview
 
Designing your xen desktop 7.5 environment with training guide
Designing your xen desktop 7.5 environment with training guideDesigning your xen desktop 7.5 environment with training guide
Designing your xen desktop 7.5 environment with training guide
 
Ibm aix technical deep dive workshop advanced administration and problem dete...
Ibm aix technical deep dive workshop advanced administration and problem dete...Ibm aix technical deep dive workshop advanced administration and problem dete...
Ibm aix technical deep dive workshop advanced administration and problem dete...
 
Ibm power ha v7 technical deep dive workshop
Ibm power ha v7 technical deep dive workshopIbm power ha v7 technical deep dive workshop
Ibm power ha v7 technical deep dive workshop
 
Power8 hardware technical deep dive workshop
Power8 hardware technical deep dive workshopPower8 hardware technical deep dive workshop
Power8 hardware technical deep dive workshop
 
Power systems virtualization with power kvm
Power systems virtualization with power kvmPower systems virtualization with power kvm
Power systems virtualization with power kvm
 
Power vc for powervm deep dive tips & tricks
Power vc for powervm deep dive tips & tricksPower vc for powervm deep dive tips & tricks
Power vc for powervm deep dive tips & tricks
 
Emc data domain technical deep dive workshop
Emc data domain  technical deep dive workshopEmc data domain  technical deep dive workshop
Emc data domain technical deep dive workshop
 
Ibm flash system v9000 technical deep dive workshop
Ibm flash system v9000 technical deep dive workshopIbm flash system v9000 technical deep dive workshop
Ibm flash system v9000 technical deep dive workshop
 
Emc vnx2 technical deep dive workshop
Emc vnx2 technical deep dive workshopEmc vnx2 technical deep dive workshop
Emc vnx2 technical deep dive workshop
 
Emc isilon technical deep dive workshop
Emc isilon technical deep dive workshopEmc isilon technical deep dive workshop
Emc isilon technical deep dive workshop
 
Emc vplex deep dive
Emc vplex deep diveEmc vplex deep dive
Emc vplex deep dive
 
Cisco mds 9148 s training workshop
Cisco mds 9148 s training workshopCisco mds 9148 s training workshop
Cisco mds 9148 s training workshop
 
Cisco cloud computing deploying openstack
Cisco cloud computing deploying openstackCisco cloud computing deploying openstack
Cisco cloud computing deploying openstack
 
Se training storage grid webscale technical overview
Se training   storage grid webscale technical overviewSe training   storage grid webscale technical overview
Se training storage grid webscale technical overview
 
Vmware 2015 with vsphereHigh performance application platforms
Vmware 2015 with vsphereHigh performance application platformsVmware 2015 with vsphereHigh performance application platforms
Vmware 2015 with vsphereHigh performance application platforms
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

EMC ViPR HDFS Data Service Technical Overview

  • 1. 1© Copyright 2014 EMC Corporation. All rights reserved. EMC ViPR HDFS Data Service Technical Overview Download this slide http://ouo.io/FuYX5 VIRTUALIZE EVERYTHING COMPROMISE NOTHING
  • 2. 2© Copyright 2014 EMC Corporation. All rights reserved. Disruptive / Opportunistic IT Trends Mobile Cloud Big Data Social T R U S T
  • 3. 3© Copyright 2014 EMC Corporation. All rights reserved. Mainframe, Mini Computer Terminals MILLIONS OF USERS THOUSANDS OF APPS LAN/Internet Client/Server PC HUNDREDS OF MILLIONS OF USERS TENS OF THOUSANDS OF APPS Mobile Cloud Big Data Social Mobile Devices BILLIONS OF USERS MILLIONS OF APPS Source: IDC, 2013
  • 4. 4© Copyright 2014 EMC Corporation. All rights reserved. The Big Data Economy More data sources, richer content, longer utility 40 ZB Source: IDC 2012 Digital Universe Study
  • 5. 5© Copyright 2014 EMC Corporation. All rights reserved. Significant financial value across many verticals The Big Data Potential Source: “Big Data: The Next Frontier for Innovation, Competition, and Productivity”, McKinsey Global Institute US Retail • 60+% increase in net margin possible • 0.5-1% annual productivity growth US Healthcare • $300 billion value per year • 0.7% annual productivity growth Manufacturing • Up to 50% decrease in product development, assembly costs • Up to 7% reduction in working capital Global personal location data • $100 billion+ revenue for service providers • Up to $700 billion value to end users
  • 6. 6© Copyright 2014 EMC Corporation. All rights reserved. Supporting 3rd platform app with 2nd platform infrastructure The Challenges to Widespread Adoption  How to move from the lab to production? – Trusting an open source Hadoop distribution – HDFS not enterprise grade – Analytics on existing data?  What’s the risk? – Dedicated cluster requires significant investment – ROI? – does the data have value?  What are the costs? – Costs increase as my dedicated analytics cluster scales – Bandwidth and network costs of moving data to the cluster
  • 7. 7© Copyright 2014 EMC Corporation. All rights reserved. Big Data Storage Requirements In-place analytics and protection of all data types  Data Unification: – Big Data storage must support structured, semi- structured, and unstructured data types.  In-Place Analytics: – Analytics, compute workloads need to execute where the data live.  Data Compliance: – More sources of data, more volume, velocity, etc. exacerbate compliance and long-term retention requirements 40 ZB
  • 8. 8© Copyright 2014 EMC Corporation. All rights reserved. ViPR Data Services Overview
  • 9. 9© Copyright 2014 EMC Corporation. All rights reserved. Data Services that Span Arrays and Support Hybrid Data Types ViPR Data Services  Storage services at cloud scale – Built in software – Layered over both traditional and new storage devices  Object and HDFS data services – Many more to follow, at regular intervals – Open API for 3rd party development  Unified platform – Data services can be used as different semantic views on the same data e.g. Object on File, HDFS on Object
  • 10. 10© Copyright 2014 EMC Corporation. All rights reserved. EMC ViPR - Software-Defined Storage ViPR Data Services ViPR Controller EMC ViPR Platform Provisioning Self-Service Reporting Automation Third-Party Isilon Atmos VMAX VNX VPLEX Commodity XtremIOCentera
  • 11. 11© Copyright 2014 EMC Corporation. All rights reserved. ViPR Data Services: Architecture ViPR Data Path ViPR Control Path • Distributed Infrastructure • Device Drivers • Elastic Volumes • Migration GEO-SCALE INDEX, METADATA, TRANSACTIONS … 3rd PARTYOBJECT HDFS KEY-VALUE GEO SCALE INDEX, METADATA, TRANSACTIONS Commodity VNX Isilon 3rd Party
  • 12. 12© Copyright 2014 EMC Corporation. All rights reserved. ViPR Data Services Address Big Data Storage Requirements  Data Unification – Transform existing storage infrastructure into a data lake – Structured, semi/un-structured content  In-place Analytics – Run queries against data on existing arrays – Flexible software model supports future colocation of compute and storage  Data Compliance – Choice and flexibility or persistence layer – Support cloud-scale and consumer-grade applications on enterprise-grade infrastructure 40 ZB
  • 13. 13© Copyright 2014 EMC Corporation. All rights reserved. ViPR HDFS Data Service Overview
  • 14. 14© Copyright 2014 EMC Corporation. All rights reserved. ViPR HDFS Service Overview  HDFS is becoming the de facto file system for distributed applications  ViPR is a great platform for HDFS – Addresses limitations of off-the-shelf HDFS – Brings HDFS to existing storage hardware – Enables HDFS/Object/File scenarios – Flexible software model
  • 15. 15© Copyright 2014 EMC Corporation. All rights reserved. ViPR HDFS Service Overview  API head – Custom client/server protocol optimized for high scale – Uses the same unstructured storage engine as ViPR Object data service  Client library over the HDFS API – Provides a viprhdfs:// drop-in replacement for HDFS 2.0 – Can be seamlessly added to existing Hadoop distributions
  • 16. 16© Copyright 2014 EMC Corporation. All rights reserved. EMC ViPR Data Services ViPR Data Services ViPR Controller EMC ViPR Platform Provisioning Self-Service Reporting Automation Third-Party IsilonVNX
  • 17. 17© Copyright 2014 EMC Corporation. All rights reserved. How ViPR HDFS Data Service Helps Accelerate Big Data initiatives  Quickly move from lab to production – Utilize existing infrastructure as a big data repository or “data lake” – Eliminate single namenode single point of failure  Reduce risk – Run queries against data on existing arrays – Leverage existing investments  Reduce costs – Reduce the growth in dedicated analytics infrastructure – Reduce bandwidth, storage and network costs 40 ZB
  • 18. 18© Copyright 2014 EMC Corporation. All rights reserved. ViPR HDFS Data Service Technical Deep Dive
  • 19. Name Node JOB TRACKER Commodity Compute & Storage TASK TRACKER Data Store MapReduce Task Client TASK TRACKER Data Store MapReduce Task TASK TRACKER Data Store MapReduce Task HDFS ARCHITECTURE
  • 20. VNX Isilon 3rd Party VMAX Commodity JOB TRACKER TASK TRACKER MapReduce Task Client TASK TRACKER MapReduce Task TASK TRACKER MapReduce Task ViPR HDFS ARCHITECTURE
  • 21. VNX Isilon 3rd Party VMAX Commodity JOB TRACKER TASK TRACKER MapReduce Task Client TASK TRACKER MapReduce Task TASK TRACKER MapReduce Task • No single point of failure • Leverage existing storage • Compatible with existing Hadoop distribution • Mixed workload across HDFS and Object ViPR HDFS ARCHITECTURE
  • 22. 22© Copyright 2014 EMC Corporation. All rights reserved. MapReduce Job Flow Master Node Job Tracker Task Tracker Data Store Commodity Compute & Storage MapReduce Task Client Task Tracker Data Store MapReduce Task Task Tracker Data Store MapReduce Task Name Node Secondary NameNode Submit Job Split into tasks Rack 1 Rack 2 Data Node 1 Data Node 2 Data Node 3
  • 23. 23© Copyright 2014 EMC Corporation. All rights reserved. Presales Training Customer’s Hadoop Compute Cluster ViPR Controller ViPR Data Node(s) running outside the ViPR managed arrays Blob Engine S3 Head HDFS Head Customer AD Trust Relationship ViPR HDFS - Under The Hood Trust RelationshipTrust Relationship Data Read/ Write Kerberos KDC VNX Isilon 3rd Party
  • 24. 24© Copyright 2014 EMC Corporation. All rights reserved. HDFS uses ViPR Object Storage Engine ViPR data services creates a unified pool (bucket) of data VIRTUAL ARRAY  Buckets of data span file shares – Grow and shrink on demand  Data is distributed and intermingled across the storage  Provides an HDFS interface  ViPR makes HDFS enterprise grade – ViPR HDFS replaces namenodes, no single point of failure Isilon 3rd Party VNX 5500
  • 25. 25© Copyright 2014 EMC Corporation. All rights reserved. Support Mixed Workloads Object, File and HDFS operations on the same data VIRTUAL ARRAY Isilon 3rd Party VNX 5500  ViPR Data Services offer three bucket options: – Object – HDFS – ObjectandHDFS  ObjectandHDFS provides user with access to either S3 or HDFS Interface – Full compatibility with existing object based APIs ▪ Amazon S3, Openstack Swift, Atmos Object HDFS Object & HDFS
  • 26. 26© Copyright 2014 EMC Corporation. All rights reserved. ViPR HDFS Data Value Proposition
  • 27. 27© Copyright 2014 EMC Corporation. All rights reserved. Instantly Deploy a Big Data Repository Use existing arrays as a big data store Isilon 3rd Party VNX 5500 VIRTUAL ARRAY  Reduce risk – Reduce CAPEX investment required to perform analytics – Maintain data protection, compliance at array level  Reduces cost and complexity of dedicated clusters – Reduce need for new vendor nodes and storage capacity  Reduce data transfer time and bandwidth costs – 10 TBs takes 25 hours via 10gE – 10 TBs takes 3 days via dedicated WAN
  • 28. 28© Copyright 2014 EMC Corporation. All rights reserved. Expand the Reach of Big Data Queries Expand analytics to ViPR-managed data stores  Extend big data queries to run on existing file arrays as existing Hadoop deployments  Opens new opportunities and analytics scenarios – Faster, easier business insights Isilon 3rd Party VNX 5500 VIRTUAL ARRAY
  • 29. 29© Copyright 2014 EMC Corporation. All rights reserved. Leverage and Extend Existing Investments Utilize existing Hadoop infrastructure  ViPR HDFS data service can be the data source for Pig/Hive queries – Fully compatible with existing Hive/Pig query engines  Can use an existing infrastructure to query ViPR- managed data stores – Add data stores via ViPR without having to re-write queriesIsilon 3rd Party VNX 5500 VIRTUAL ARRAY
  • 30. 30© Copyright 2014 EMC Corporation. All rights reserved. Support Mixed Workloads Provide multiple semantic views of the same data  Eliminates expensive data movement – Object based workloads and analytics applications can manipulate the same data  Increase developer productivity – Different applications can target the same data without re- writes – IT can serve different developer and business groups with the same infrastructure  Increases data value – Extract more insight from file and object data (unstructured, semi-structured)  Reduce infrastructure costs – Eliminate dedicated data silos
  • 31. 31© Copyright 2014 EMC Corporation. All rights reserved. Summary  ViPR provides storage services at cloud scale – Built in software – Layered over both traditional and new storage devices  ViPR creates a unified platform – Data services can be used as different semantic views on the same data e.g. Object, File, HDFS interfaces for same data  ViPR HDFS accelerates journey to 3rd Platform – Extend Big Data queries to existing storage – Reduces complexity and cost of dedicated analytics infrastructure – Leverages existing investments

Editor's Notes

  1. There are megatrends transforming our industry that are predicated on a platform of trust. According to leading industry analysts, the four major trends that are shaping IT and the business: Mobility Cloud Big Data and Social
  2. These trends are forming what is being called the third platform - a platform architected for these trends and built to support billions of users and millions of applications As we look back, the first platform was mainframes with thousand of applications and millions of users with end user devices of choice being proprietary terminals. The second platform is and was the internet and client servers with end user devices being the PC. This platform continues to support tens of thousands of applications and hundreds of millions of users. However, current architectures are being pushed and scaling this type of environment can be costly and ineffective. The third platform is architected with web-scale in mind, supporting millions of applications and billions of users and is built on the technology pillars of mobility, cloud services, big data and analytics, and social networking. When we talk about the third platform in an enterprise setting, we’re really talking about the convergence of these forces and their powerful combination to serve as a foundational architecture for IT organizations. Beyond the individual trends, the seamless “combination” of these trends is becoming critical since it collectively represents an agile new IT fabric for applications, data centers and, most importantly, the user experience. According to  IDC, the third platform, will serve as the primary growth driver of the IT industry over the next decade, responsible for 75% of new growth as worldwide IT spending moves from $3.7 trillion in 2013 to more than $5 trillion in 2020.
  3. Unstructured data is no longer files from office productivity applications. The real growth and storage management problem is coming from: New media such as videos and podcasts Machine-generated data from devices such as sensors – telemetry data – in fact a transatlantic flight from NYC to London can generate 20-30 TB of telemetry data! Communities – social interactions Mobile Devices – pictures, music, etc. Imaging Equipment – imaging, imaging studies, health records The intelligent economy produces a constant stream of data that is being monitored and analyzed. IDC estimates that the digital universe will be 40ZB by 2020. That’s a 40 followed by 21 zeroes. Social interactions, mobile devices, facilities, equipment, R&D, simulations, and physical infrastructure all contribute to the flow of information. In aggregate, this is what is called Big Data. The Big Data economy, is characterized by: More Sources of data Communities Mobile Devices Sensors Imaging Equipment Richer Content Pictures Videos Data Streams Longer utility Durable value – information and information about information (metadata) has value for a long time after its creation. All this data can have business value. Regulatory burdens – always a contributor to the need to retain data for longer and longer periods of time, often indefinitely.
  4. Data has value well-beyond the context of the application that created it. Information-based applications and services will have tremendous financial impact across many market segments. Evolving to the 3rd platform and exploiting information will have quantifiable impact on profit margins, revenues, productivity metrics and operating costs. The potential is obvious and has been validated by early adopters. Big Web companies, Oil & Gas, Pharmaceutical firms, large retailers and many more have used Big Data analytics for deep business insights that target and retain customers and build competitive advantage. The early/late majority, however, are moving more cautiously. Enterprise customers are not starting with a blank canvas, and while they want all the benefits that the 3rd platform offers, they have invested millions if not billions of dollars into an infrastructure that they must continue to maintain and grow. The cost, risk and value of moving to a 3rd platform is still uncertain. They have questions about how they gain the value of the 3rd platform while leveraging their current IT infrastructure.
  5. Big Data and HDFS are Disruptive. According to 451 Research, the market for Hadoop/NoSQL software and services will be $3.5 billion by 2017 (45% CAGR). It’s more than analytics, though that’ a huge part of it. The disruptive change is that data has value beyond its initial application. Information about the information provides insights that are critical to understanding and predicting the business. Everyone sees the potential but adoption has still been somewhat cautious. Hadoop represents a 3rd platform infrastructure that co-locates compute and storage. But, for most enterprises, a Hadoop cluster only contains a fraction of their enterprise data. Customers need the confidence to move from the lab to production. Can they leverage their existing infrastructure and data? Which Hadoop distribution should they use? There are also concerns about HDFS not being enterprise grade. The namenode still represents and single point of failure which can be a non-starter for some data and uses. Customers are still calculating their risk . A dedicated cluster can be very cheap (free) to get started but requires significant investment as it scales. It’s also hard to calculate ROI when it’s unclear which data has value. Other costs that need to be factored in are the bandwidth and network costs of moving data to the cluster and back to primary storage. Customers see the potential and the necessity of Big Data and 3rd platform applications and services. But their 2nd platform infrastructure is not built for this new model. Yet, existing infrastructure, data and applications are not going away. Organizations need a way to “mind the gap” – leveraging their existing infrastructure and data today while building a platform for the future.
  6. The era of Big Data places new demands on data storage. Storage must contend with varying data types, all of which need to be stored securely for a long periods of time and be available for analysis. Data Unification: There is an increasing focus on data unification meaning that the storage infrastructure for Big Data has to cater to structured, semi-structured, and unstructured data types. In-Place Analytics: There is a growing emphasis on in-place analytics in which the compute workloads such as Hadoop Map/Reduce operations are run right where the data lives. Data Compliance: This market is fraught with challenges stemming from regulatory and compliance requirements. As the platform that hosts data the instant it is created, storage is not immune to these challenges — and how data gets stored in the long term.
  7. ViPR aggregates multi-vendor heterogeneous storage into a unified storage platform, that, in turn, can be leveraged as a logical scale-out layer which can serve as the underlying infrastructure for hosting a range of data services to support collecting, managing and utilizing unstructured content at massive scale. ViPR Data Services are implemented in software and feature a simple, lightweight, low-touch, scale-out design. Data services are storage abstractions that reflect the combination of a data type (file, object or block of data), access protocols (iSCSI, NFS, REST, etc.), and durability, availability, and security characteristics (snapshots, replication, etc.) In ViPR, block, file, object, and HDFS are all data services, though ViPR is not in the data path for file and block (these can be thought of us “control services”). Object and HDFS are available with more to follow. Data services can be used to provide different semantic views of the same data. You can manipulate a file as a file or as an object without having to move the data to a different platform that features that semantic.
  8. The immediate benefit of ViPR is its ability to automate storage management and provisioning and make storage available as a self-service, consumable resource within a software-defined data center (SDDC). But ViPR also transforms how enterprises deliver data services. With storage arrays and storage services defined in software and managed by policy, ViPR enables organizations to deploy unique Data Services that cloud-enable existing infrastructure and extend the use cases for their data and the value of their storage investments. ViPR aggregates multi-vendor heterogeneous storage into a unified storage platform that can be leveraged as a logical scale-out layer which can serve as the underlying infrastructure for hosting a range of data services to support collecting, managing and utilizing unstructured content at massive scale
  9. This depicts the architecture for ViPR and highlights the data services functionality. At the bottom are the physical arrays that ViPR can manage. Above the arrays is the ViPR controller which has features that enable a distributed infrastructure (Cassandra, a distributed DB and Zookeeper to manage status of different nodes in the system) and device drivers to hook into APIs of arrays so the Controller can automate provisioning, management, etc. On top of that are ViPR data services. The Object Data Service was released at the same time as ViPR Controller in October 2013. HDFS was released in December 2013. HDFS uses the same unstructured storage engine as the Object data service.
  10. The era of Big Data places new demands on data storage. Storage must contend with varying data types, all of which need to be stored securely for a long periods of time and be available for analysis. Data Unification: There is an increasing focus on data unification meaning that the storage infrastructure for Big Data has to cater to structured, semi-structured, and unstructured data types. In-Place Analytics: There is a growing emphasis on in-place analytics in which the compute workloads such as Hadoop Map/Reduce operations are run right where the data lives. Data Compliance: This market is fraught with challenges stemming from regulatory and compliance requirements. As the platform that hosts data the instant it is created, storage is not immune to these challenges — and how data gets stored in the long term.
  11. The ViPR HDFS data service is the second data service to be released by EMC. It will be available by the end of 2013. The HDFS service gives organizations the ability to run analytics using well known industry Hadoop distributions on existing data stored across heterogeneous systems such as VNX, Isilon and Netapp arrays. Hadoop has become a de-facto standard for companies that are investigating novel strategies for addressing their Big Data challenges. HDFS is the core distributed file system used by Hadoop. Many organizations have an HDFS project in their labs. However, many of these companies have found Hadoop to be difficult to deploy and manage at scale. The ViPR approach to HDFS takes advantage of proven storage hardware to overcome this challenge. Instead of building a discrete analytics silo with dedicated infrastructure, the ViPR HDFS data service leverages the existing ViPR virtualized storage environment and the backend storage platforms it utilizes.
  12. The ViPR HDFS data service is the second data service to be released by EMC. It will be available by the end of 2013. The HDFS service gives organizations the ability to run analytics using well known industry Hadoop distributions on existing data stored across heterogeneous systems such as VNX, Isilon and Netapp arrays. Hadoop has become a de-facto standard for companies that are investigating novel strategies for addressing their Big Data challenges. HDFS is the core distributed file system used by Hadoop. Many organizations have an HDFS project in their labs. However, many of these companies have found Hadoop to be difficult to deploy and manage at scale. The ViPR approach to HDFS takes advantage of proven storage hardware to overcome this challenge. Instead of building a discrete analytics silo with dedicated infrastructure, the ViPR HDFS data service leverages the existing ViPR virtualized storage environment and the backend storage platforms it utilizes.
  13. HDFS is becoming increasingly popular as a file system layer for distributed applications, and this goes beyond Hadoop. The ViPR HDFS data service is a Hadoop-compatible file system and supports any Hadoop 2.0 implementation including existing distros such as Cloudera and PivotalHD. HDFS supports high aggregate throughput access to data, e.g. MapReduce. In some cases, is provides low latency access. However, concerns to enterrpises include scale, durability, cost, and management.
  14. The era of Big Data places new demands on data storage. Storage must contend with varying data types, all of which need to be stored securely for a long periods of time and be available for analysis. Data Unification: There is an increasing focus on data unification meaning that the storage infrastructure for Big Data has to cater to structured, semi-structured, and unstructured data types. In-Place Analytics: There is a growing emphasis on in-place analytics in which the compute workloads such as Hadoop Map/Reduce operations are run right where the data lives. Data Compliance: This market is fraught with challenges stemming from regulatory and compliance requirements. As the platform that hosts data the instant it is created, storage is not immune to these challenges — and how data gets stored in the long term.
  15. Task trackers are processes on data / slave nodes that accept tasks from a Job Tracker. The tasks are Map, reduce and shuffle operations. Task trackers monitors the tasks running on a node and communicate with the job tracker. Every task tracker has a specified number of slots that correspond to how many tasks it can accept. During scheduling of a task, the Job tracker looks for an empty task slot on the same node as where the data block resides – thus achieving data locality. Next, it looks for a node with an empty slot on the same rack.
  16. ViPR HDFS provides and HDFS-compatible file system. In this way, the compute portion of an existing Hadoop cluster communicates with ViPR HDFS. Existing storage arrays managed by ViPR can now be made accessible via HDFS.
  17. Task trackers are processes on data / slave nodes that accept tasks from a Job Tracker. The tasks are Map, reduce and shuffle operations. Task trackers monitors the tasks running on a node and communicate with the job tracker. Every task tracker has a specified number of slots that correspond to how many tasks it can accept. During scheduling of a task, the Job tracker looks for an empty task slot on the same node as where the data block resides – thus achieving data locality. Next, it looks for a node with an empty slot on the same rack.
  18. The HDFS data service uses the same unstructured storage engine as the ViPR Object data service. ViPR data services create a unified pool (bucket) of data. Similar to the Object data service, users create buckets which can span file shares that can grow and shrink on demand. The data is distributed across the arrays according to how the virtual storage pool is configured. The bucket provides an HDFS interface or, optionally, an Object (S3) and HDFS interface. In this way, the compute portion of an existing Hadoop cluster communicates with ViPR HDFS, which uses existing data (added to the HDFS bucket) as the target for Big Data applications and queries. The above diagram illustrates the system architecture of how a ViPR customer can expose their existing data in a ViPR managed array to their Hadoop cluster and run MapReduce jobs on this data. The object data service and the HDFS data service run on the same set of ViPR Data Service VMs. These VMs can be scaled as the capacity of storage is increased. ViPR 1.1 will make available a client library (ViPR-HDFS Client) that needs to be installed on all the nodes that run MR jobs on the customer’s Hadoop cluster. When a task running on the node needs to read a file, the request will go to the ViPR-HDFS client (as the customer will point to viprfs:// as their data source) and the ViPR client will communicate with the HDFS head on the ViPR data node. The ViPR client passes in a authN token that identifies the user to the HDFS Head The HDFS head in the ViPR Data node receives requests from the ViPR-HDFS client . The HDFS Head then verifies the user’s identity by authenticating against the KDC. Then it talks to the Blob engine and the controller process running on the node to fetch the requested data once authN and authZ succeed.
  19. In addition to physical segregation, buckets provide logical segregation within the object store. Just like in S3, a user can create buckets which logically segregate applications or sets of data. These buckets can grown and shrink on-demand. The actual data objects are distributed and intermingled across the physical devices that comprise the virtual storage array.
  20. In addition to physical segregation, buckets provide logical segregation within the object store. Just like in S3, a user can create buckets which logically segregate applications or sets of data. These buckets can grown and shrink on-demand. The actual data objects are distributed and intermingled across the physical devices that comprise the virtual storage array.
  21. Use Case: Customer sets up ViPR across multiple Isilon and VNX arrays and ingests data into ViPR ViPR data services creates a unified pool (bucket) of data across file shares and provides user with an HDFS interface Customer installs ViPR HDFS client on an existing PivotalHD cluster Customer starts writing Hive queries referencing ViPR HDFS as the data source
  22. Use Case: Customer has an existing PivotalHD cluster with data stored in HDFS within the cluster and has also installed ViPR HDFS client on this PivotalHD cluster Customer also sets up ViPR across multiple Isilon and VNX arrays and ingests data into ViPR Customer starts writing MapReduce jobs that reference data in HDFS within the PivotalHD cluster as well as data in ViPR HDFS thereby opening up new analytics scenarios. The spanning use case is meant to explain that ViPR HDFS and HDFS can coexist. ViPR HDFS will not entirely replace HDFS.
  23. Use Case: An environment with cloudera infrastructure installs ViPR HDFS client Customer sets up ViPR across multiple Isilon and VNX arrays Customer starts writing Hive queries referencing ViPR HDFS as the data source and is able to utilize existing environment to point against ViPR HDFS
  24. Use Case: An environment with multiple VNX and Isilon, installs ViPR data services ViPR data services creates a unified pool (bucket) of data across file shares and provides user with access to either S3 or HDFS Interface Object based applications as well as analytics workload are able to use the same set of data without having to move it around.