1Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Delivering Hadoop-as-a-Service To
Your Organization
2Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Why Hadoop?
Oil Exploration Medical
Imaging
Video SurveillanceMobile Sensors
Smart Grids
Social MediaInternet of Things
Dark Data
Fast and Cheap Way For Exploiting Massive Amounts of New Data Sources
3Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Why Hadoop?
Improve
Company
Performance
Increase Revenue
Increase Demand
Increase Spend
Efficiency
Ad
Optimization
Hyper
Targeting
Campaign
Optimization
Ad
Effectiveness
Analytics
Market Mix
Modeling
Coupon
Redemption
Increase
Customer
Acquisition
Purchase Funnel
Analysis
Increase Customer
Engagement
Customer
Segmentation
Churn Prevention
Customer Lifetime
Value
Increase Basket
Size
Affinity Analytics
Next Best Offer
Cross-Sell / Upsell
Manage Demand
Demand Analysis
Price Optimization
Build Brand Equity
Increase Reach
Digital Marketing
Social Media
Improve
Customer Loyalty
Social Graph /
Influencers
Loyalty Program
Analytics
Customer
Satisfaction
Customer Care
Analytics
Reduce Costs
Click Fraud
Transaction
Anomaly
Detection
Production Cost /
Efficiency
Supply / Demand
Forecasting
General and
Administrative
Workforce
Analytics
Employee Churn
IT / Security
Analytics
Save Money Or Make Money
4Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Overview
Hadoop
is an open-source framework from Apache that allows
for parallel batch processing of very large data sets
MapReduce
is the Hadoop process that divides the workload so
multiple devices can process it
HDFS
is the file system for the data. It provides data
protection and locality with multiple mirrors (usually 3
times)
5Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
IT Challenges With Hadoop
• Time consuming and complex creating
shadow IT
• Bare metal capacity utilization is low
• Multiple Hadoop Distribution deployments
creating data siloes
6Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Typical Enterprise Deployment
• Multiple, siloed
clusters to manage
• Redundant common
data in separate
clusters
• Peak compute and I/O
resource is limited to
number of nodes in
each independent
cluster
Production
Test
Experimentation
Dept A: Recommendation engine Dept B: Ad targeting
Production
Test
Experimentation
Log files
Social data
Historical cust behavior
7Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
What If You Consolidate & Virtualize?
Production
Test
Production
Test
Experimentation Experimentation
One physical platform to support
multiple virtual big data clusters
Experimentation
Production
recommendation engine
Production
Ad Targeting
Test/Dev
Recommendation engine Ad targeting
8Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
EMC Hadoop Starter Kit
• Support for major
Hadoop distributions
• Quickly deploy, manage,
and scale Hadoop
clusters
• GUI simplifies
management tasks
• Elastic scaling optimizes
cluster performance and
resource utilization
Consolidate And Virtualized Hadoop With EMC Isilon And Vmware
HDFS
NameNode
Data
name
node
name
node
name
node
name
node
datanode
Apache
9Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Why Shared Storage For Hadoop?
10Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Bare Metals Deployment
Hadoop DAS Environment
1
Dedicated Storage Infrastructure
– One-off for Hadoop only
2
Lacking Enterprise Data Protection
– No Snapshots, replication, backup
3
Poor Storage Efficiency
– 3X mirroring
4
Fixed Scalability
– Rigid compute to storage ratio
5
Manual Import/Export
– No protocol support
1x
1x
2x
2x
3x
2x
3x
3x
1x
NameNode
11Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop On EMC Isilon Scale Out NAS
1
Scale-Out Storage Platform
– Multiple applications & workflows
2
End-to-End Data Protection
– SnapshotIQ, SyncIQ, NDMP Backup
3
Industry-Leading Storage Efficiency
– >80% Storage Utilization
4
Independent Scalability
– Add compute & storage separately
5
Multi-Protocol
– Industry standard protocols
– NFS, CIFS, FTP, HTTP, HDFS
12Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
EMC Isilon Addresses Hadoop Challenges
1
Dedicated Storage Infrastructure
– One-off for Hadoop only
2
Lacking Enterprise Data Protection
– No Snapshots, replication, backup
3
Poor Storage Efficiency
– 3X mirroring
4
Fixed Scalability
– Rigid compute to storage ratio
5
Manual Import/Export
– No protocol support
1
Scale-Out Storage Platform
– Multiple applications & workflows
2
End-to-End Data Protection
– SnapshotIQ, SyncIQ, NDMP Backup
3
Industry-Leading Storage Efficiency
– >80% Storage Utilization
4
Independent Scalability
– Add compute & storage separately
5
Multi-Protocol
– Industry standard protocols
– NFS, CIFS, FTP, HTTP, HDFS
13Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Why Virtualize Hadoop?
14Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop with Virtualization
Combined
Storage/
Compute
VM
Hadoop in VM
• VM lifecycle
determined
by Datanode
• Limited elasticity
• Limited to Hadoop
Multi-Tenancy
Storage
Comput
e
VM
VM
Separate Storage
• Separate compute
from data
• Elastic compute
• Enable shared
workloads
• Raise utilization
Storage
T1 T2
VM
VM
VM
Separate Compute Tenants
• Compute cluster per tenant
• Stronger VM-grade security
and resource isolation
• Enable deployment of
multiple Hadoop runtime
versions
Elastic, Multi-Tenant
15Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Virtualized Hadoop Performance
Native vs. Virtual, 32 hosts, 16 disks/host
Source: http://www.vmware.com/resources/techresources/10360
16Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Example Deployment With Pivotal HD
• Pre-requisities
– Isilon OneFS version 6.5.5 or
higher
– VMware vSphere 5.0 (or later)
Enterprise or Enterprise Plus
• Download Vmware Big Data
Extensions (Free)
• Configure Isilon cluster for
HDFS (Free license)
• Configure Big Data Extensions
to use Pivotal HD
• Deploy Hadoop Cluster
• Run a simple program to test
17Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Data Services
Real-time, Interactive, And Batch Processing
18Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Results
ļ‚Ÿ Fast deployment with native Hadoop integration,
enabling rapid launch of new service
ļ‚Ÿ Delivered high performance scalability
ļ‚Ÿ Simplified platform administration
Challenges
ļ‚Ÿ Rapidly launch new market intelligence service for
fashion retailers
ļ‚Ÿ Support large and growing volumes of Big Data
Solution
• Pivotal Greenplum Database
• Pivotal HD
ļ‚Ÿ EMC Isilon
ļ‚Ÿ Pivotal Data Science Labs
WGSN
Retail
ā€œPerformance, scalability, and tight integration with Hadoop
were the key reasons we chose Isilon. We also felt very
comfortable with the partnership between EMC and
Pivotal. In the end, the EMC and Pivotal solution offered
the ideal balance of storage and compute with the right
level of support.ā€
19Ā© Copyright 2014 EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved.
Download Hadoop Starter Now
• Rapid provisioning
• High availability
• Elasticity
• Multi-tenancy
• Portability
https://community.emc.com/docs/DOC-26892
EMC Big Data | Hadoop Starter Kit | EMC Forum 2014

EMC Big Data | Hadoop Starter Kit | EMC Forum 2014

  • 1.
    1Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Delivering Hadoop-as-a-Service To Your Organization
  • 2.
    2Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Why Hadoop? Oil Exploration Medical Imaging Video SurveillanceMobile Sensors Smart Grids Social MediaInternet of Things Dark Data Fast and Cheap Way For Exploiting Massive Amounts of New Data Sources
  • 3.
    3Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Why Hadoop? Improve Company Performance Increase Revenue Increase Demand Increase Spend Efficiency Ad Optimization Hyper Targeting Campaign Optimization Ad Effectiveness Analytics Market Mix Modeling Coupon Redemption Increase Customer Acquisition Purchase Funnel Analysis Increase Customer Engagement Customer Segmentation Churn Prevention Customer Lifetime Value Increase Basket Size Affinity Analytics Next Best Offer Cross-Sell / Upsell Manage Demand Demand Analysis Price Optimization Build Brand Equity Increase Reach Digital Marketing Social Media Improve Customer Loyalty Social Graph / Influencers Loyalty Program Analytics Customer Satisfaction Customer Care Analytics Reduce Costs Click Fraud Transaction Anomaly Detection Production Cost / Efficiency Supply / Demand Forecasting General and Administrative Workforce Analytics Employee Churn IT / Security Analytics Save Money Or Make Money
  • 4.
    4Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Hadoop Overview Hadoop is an open-source framework from Apache that allows for parallel batch processing of very large data sets MapReduce is the Hadoop process that divides the workload so multiple devices can process it HDFS is the file system for the data. It provides data protection and locality with multiple mirrors (usually 3 times)
  • 5.
    5Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. IT Challenges With Hadoop • Time consuming and complex creating shadow IT • Bare metal capacity utilization is low • Multiple Hadoop Distribution deployments creating data siloes
  • 6.
    6Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Typical Enterprise Deployment • Multiple, siloed clusters to manage • Redundant common data in separate clusters • Peak compute and I/O resource is limited to number of nodes in each independent cluster Production Test Experimentation Dept A: Recommendation engine Dept B: Ad targeting Production Test Experimentation Log files Social data Historical cust behavior
  • 7.
    7Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. What If You Consolidate & Virtualize? Production Test Production Test Experimentation Experimentation One physical platform to support multiple virtual big data clusters Experimentation Production recommendation engine Production Ad Targeting Test/Dev Recommendation engine Ad targeting
  • 8.
    8Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. EMC Hadoop Starter Kit • Support for major Hadoop distributions • Quickly deploy, manage, and scale Hadoop clusters • GUI simplifies management tasks • Elastic scaling optimizes cluster performance and resource utilization Consolidate And Virtualized Hadoop With EMC Isilon And Vmware HDFS NameNode Data name node name node name node name node datanode Apache
  • 9.
    9Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Why Shared Storage For Hadoop?
  • 10.
    10Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Hadoop Bare Metals Deployment Hadoop DAS Environment 1 Dedicated Storage Infrastructure – One-off for Hadoop only 2 Lacking Enterprise Data Protection – No Snapshots, replication, backup 3 Poor Storage Efficiency – 3X mirroring 4 Fixed Scalability – Rigid compute to storage ratio 5 Manual Import/Export – No protocol support 1x 1x 2x 2x 3x 2x 3x 3x 1x NameNode
  • 11.
    11Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Hadoop On EMC Isilon Scale Out NAS 1 Scale-Out Storage Platform – Multiple applications & workflows 2 End-to-End Data Protection – SnapshotIQ, SyncIQ, NDMP Backup 3 Industry-Leading Storage Efficiency – >80% Storage Utilization 4 Independent Scalability – Add compute & storage separately 5 Multi-Protocol – Industry standard protocols – NFS, CIFS, FTP, HTTP, HDFS
  • 12.
    12Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. EMC Isilon Addresses Hadoop Challenges 1 Dedicated Storage Infrastructure – One-off for Hadoop only 2 Lacking Enterprise Data Protection – No Snapshots, replication, backup 3 Poor Storage Efficiency – 3X mirroring 4 Fixed Scalability – Rigid compute to storage ratio 5 Manual Import/Export – No protocol support 1 Scale-Out Storage Platform – Multiple applications & workflows 2 End-to-End Data Protection – SnapshotIQ, SyncIQ, NDMP Backup 3 Industry-Leading Storage Efficiency – >80% Storage Utilization 4 Independent Scalability – Add compute & storage separately 5 Multi-Protocol – Industry standard protocols – NFS, CIFS, FTP, HTTP, HDFS
  • 13.
    13Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Why Virtualize Hadoop?
  • 14.
    14Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Hadoop with Virtualization Combined Storage/ Compute VM Hadoop in VM • VM lifecycle determined by Datanode • Limited elasticity • Limited to Hadoop Multi-Tenancy Storage Comput e VM VM Separate Storage • Separate compute from data • Elastic compute • Enable shared workloads • Raise utilization Storage T1 T2 VM VM VM Separate Compute Tenants • Compute cluster per tenant • Stronger VM-grade security and resource isolation • Enable deployment of multiple Hadoop runtime versions Elastic, Multi-Tenant
  • 15.
    15Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Virtualized Hadoop Performance Native vs. Virtual, 32 hosts, 16 disks/host Source: http://www.vmware.com/resources/techresources/10360
  • 16.
    16Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Example Deployment With Pivotal HD • Pre-requisities – Isilon OneFS version 6.5.5 or higher – VMware vSphere 5.0 (or later) Enterprise or Enterprise Plus • Download Vmware Big Data Extensions (Free) • Configure Isilon cluster for HDFS (Free license) • Configure Big Data Extensions to use Pivotal HD • Deploy Hadoop Cluster • Run a simple program to test
  • 17.
    17Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Hadoop Data Services Real-time, Interactive, And Batch Processing
  • 18.
    18Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Results ļ‚Ÿ Fast deployment with native Hadoop integration, enabling rapid launch of new service ļ‚Ÿ Delivered high performance scalability ļ‚Ÿ Simplified platform administration Challenges ļ‚Ÿ Rapidly launch new market intelligence service for fashion retailers ļ‚Ÿ Support large and growing volumes of Big Data Solution • Pivotal Greenplum Database • Pivotal HD ļ‚Ÿ EMC Isilon ļ‚Ÿ Pivotal Data Science Labs WGSN Retail ā€œPerformance, scalability, and tight integration with Hadoop were the key reasons we chose Isilon. We also felt very comfortable with the partnership between EMC and Pivotal. In the end, the EMC and Pivotal solution offered the ideal balance of storage and compute with the right level of support.ā€
  • 19.
    19Ā© Copyright 2014EMC Corporation. All rights reserved.Ā© Copyright 2014 EMC Corporation. All rights reserved. Download Hadoop Starter Now • Rapid provisioning • High availability • Elasticity • Multi-tenancy • Portability https://community.emc.com/docs/DOC-26892