SlideShare a Scribd company logo
1 of 27
Download to read offline
Where to Deploy Hadoop:
Bare-metal or Cloud?
Michael Wendt, Sewook Wee
Data Insights R&D Group
Copyright © 2013 Accenture All rights reserved. 2
Big Data: Bare-metal vs. Cloud
Bare-metal Cloud
On-premise
full custom
Hadoop-as-
a-Service
Hadoop
Appliance
Hadoop
Hosting
Copyright © 2013 Accenture All rights reserved. 3
Big Data: Bare-metal vs. Cloud
Bare-metal Cloud
On-premise
full custom
Hadoop-as-
a-Service
Hadoop
Appliance
Hadoop
Hosting
Data Privacy Data Gravity
Price-Performance
Ratio
Productivity of
Developers & Data Scientists
Data
Enrichment
Copyright © 2013 Accenture All rights reserved. 4
Big Data: Bare-metal vs. Cloud
Bare-metal Cloud
On-premise
full custom
Hadoop-as-
a-Service
Hadoop
Appliance
Hadoop
Hosting
Data Privacy Data Gravity
Price-Performance
Ratio
Productivity of
Developers & Data Scientists
Data
Enrichment
Copyright © 2013 Accenture All rights reserved. 5Servers designed by Daniel Campos from The Noun Project
Price-Performance Ratio Views
Bare-metal Cloud
On-premise
full custom
Hadoop-as-
a-Service
Cloud? Virtualized? Slow!
Who cares! I’m cheap,
just throw more in!
Price-Performance
Ratio
Copyright © 2013 Accenture All rights reserved. 6
Hadoop Deployment Comparison Study
Bare-metal Cloud
On-premise
full custom
Hadoop-as-
a-Service
Accenture Data
Platform Benchmark
+
TCO analysis
Price-Performance
Ratio
Price-Performance
Ratio
Copyright © 2013 Accenture All rights reserved. 7
Hadoop Deployment Comparison Study
TCO Analysis
Price-Performance
Ratio
Bare-metal Cloud
On-premise
full custom
Hadoop-as-
a-Service
Accenture Data
Platform Benchmark
+
TCO analysis
Copyright © 2013 Accenture All rights reserved. 8
TCO of Bare-metal Hadoop Cluster
On-premise
full custom
Server
hardware
Staff for
operation
Data center
facility and
electricity
Technical
support
24 server nodes and 50 TB of HDFS
capacity*
small-scale initial production deployment
$3,000.00 $2,914.58 $6,656.00 $9,274.46
$21,845.04
Servers designed by Daniel Campos from The Noun Project
Copyright © 2013 Accenture All rights reserved. 9
TCO of Hadoop-as-a-Service
Hadoop-as-
a-Service
Hadoop
service
Staff for
operation
Storage
services
Technical
support
Used bare-metal TCO for budget
Calculated the number of affordable
instances
$15,318.28 $2,063.00 $1,372.27 $3,091.49
$21,845.04
Copyright © 2013 Accenture All rights reserved. 10
TCO of Hadoop-as-a-Service – Instances
Hadoop
service
14 instance
types
3 pricing
models
42
combinations
Hadoop-as-
a-Service
Copyright © 2013 Accenture All rights reserved. 11
TCO of Hadoop-as-a-Service – Instances
Hadoop
service
m1.xl
m2.4xl
cc2.8xl
Selected representative 3 instance types:
m1.xlarge, m2.4xlarge, cc2.8xlarge
Hadoop-as-
a-Service
Copyright © 2013 Accenture All rights reserved. 12
TCO of Hadoop-as-a-Service – Affordable Instances
Hadoop
service
50% cluster
utilization assumed
1/3 of budget
allocated for Spot
instances
Instance
type
On-demand
instances
(ODI)
Reserved
instances
(RI)
Reserved +
Spot instances
(RI + SI)
m1.xlarge 68 112 192
m2.4xlarge 20 41 77
cc2.8xlarge 13 28 53$15,318.28
Hadoop-as-
a-Service
Copyright © 2013 Accenture All rights reserved. 13
Hadoop Deployment Comparison Study
Accenture Data Platform Benchmark
Price-Performance
Ratio
Bare-metal Cloud
On-premise
full custom
Hadoop-as-
a-Service
+
TCO analysis
Accenture Data
Platform Benchmark
Copyright © 2013 Accenture All rights reserved. 14
Accenture Data Platform Benchmark
Log management Sessionization
Customer preference
prediction
Recommendation engine
Text Analytics Document clustering
Use cases Workload
Suite of real-world Hadoop
MapReduce applications
From client experience,
internal roadmap, public
literature
Open-
source
libraries &
public
datasets
Categorized
& selected
common
use cases
Copyright © 2013 Accenture All rights reserved. 15
Accenture Data Platform Benchmark:
Sessionization
Log
data
Sessions
Log
data
Bucketing
Sorting
Slicing
Log
data
A session is a sequence of
related interactions, useful to
analyze as a group
~150 billion
log entries,
~24 TB
1 million
users,
1.1 billion
sessions
Copyright © 2013 Accenture All rights reserved. 16
Accenture Data Platform Benchmark:
Recommendation Engine
Ratings data
Who rated what item?
Co-occurrence matrix
How many people
rated the pair of
items?
Recommendation
Given the way the person rated
these items, he/she is likely to be
interested in these other items.
Used item-based collaborative
filtering algorithm
Mahout example library used as
foundation
Generated
300 million
ratings
3 million
population,
50,000 items
Copyright © 2013 Accenture All rights reserved. 17
Accenture Data Platform Benchmark:
Document Clustering
Corpus of crawled web pages
Filtered and tokenized documents
Term dictionary
TF vectors
Clustered documents
K-means
TF-IDF vectors
Groups similar documents
Application components used in
many areas (e.g., search
engines, e-commerce site
optimization)
Common
Crawl
dataset, 10
TB corpus*
~31,000
ARC files or
~300 million
HTML pages
Copyright © 2013 Accenture All rights reserved. 18
TCO analysis
Hadoop Deployment Comparison Study
Experiment Setup/Results
Bare-metal Cloud
+
On-premise
full custom
Hadoop-as-
a-Service
Accenture Data
Platform Benchmark
Price-Performance
Ratio
Copyright © 2013 Accenture All rights reserved. 19
Experiment Setup:
Price-Performance Ratio Comparison
Bare-metal
Hadoop
Cluster
Amazon
EMR
Clusters
1 bare-metal
cluster vs. 9
Amazon EMR
clusters
Manual and
automated
tuning
Fixed
budget for
cluster size
Measure
execution
time of
benchmark
Price-Performance
Ratio
Copyright © 2013 Accenture All rights reserved. 20
Optimize
phase
Profile
phase
Experiment Setup:
Starfish Automated Performance Tuning Tool
Starfish (now Unravel) is an
automated performance tuning
tool for MapReduce jobs
Speedometer designed by Filippo Camedda from The Noun Project
For the experiment we ran each
benchmark twice using Starfish
Manual and
automated
tuning
Measure
execution
time of
optimize
phase
Copyright © 2013 Accenture All rights reserved. 21
Experiment Results:
Starfish Automated Performance Tuning Tool
Manual and
automated
tuning
Starfish tuned
Recommendation Engine
workload w/ 11 cascaded
MapReduce jobs
Manually tuned Sessionization
workload
2+ weeks of
manual
tuning, ½ - 1
day
iterations
8x
improvement
in one tuning
cycle
Achieve
performance
increases
with less cost
using Starfish
Copyright © 2013 Accenture All rights reserved. 22
408.07
229.25
125.82
381.55
204.10
166.82
250.13
172.23
114.35
ODI RI RI+SI
ExecutionTime(minutes)
Amazon EMR Configuration
cc2.8xlarge
m2.4xlarge
m1.xlarge
Experiment Results:
Sessionization
Bare-metal: 533
13 20 68 28 41 112 53 77 192
Copyright © 2013 Accenture All rights reserved. 23
23.33
21.97
18.48
20.13
19.97
16.92
14.28
16.30
15.08
ODI RI RI+SI
ExecutionTime(minutes)
Amazon EMR Configuration
cc2.8xlarge
m2.4xlarge
m1.xlarge
Experiment Results:
Recommendation Engine
Bare-metal: 21.59
13 20 68 28 41 112 53 77 192
Copyright © 2013 Accenture All rights reserved. 24
1661.03
1157.37
784.82
1649.98
1112.68
629.98
914.35
779.98
742.38
ODI RI RI+SI
ExecutionTime(minutes)
Amazon EMR Configuration
cc2.8xlarge
m2.4xlarge
m1.xlarge
Experiment Results:
Document Clustering
Bare-metal: 1186.37
13 20 68 28 41 112 53 77 192
Copyright © 2013 Accenture All rights reserved. 25
Key Takeaways
Hadoop-as-a-Service
offers a better price-
performance ratio
Cloud expands the
performance tuning
opportunities
Automated performance
tuning tools are a
necessity
Servers designed by Daniel Campos from The Noun Project
Copyright © 2013 Accenture All rights reserved. 26
Acknowledgement
Copyright © 2013 Accenture All rights reserved. 27
More details
Contact us for the full white paper: Hadoop Deployment Comparison Study
Michael Wendt
R&D Developer
Data Insights R&D
Accenture Technology Labs
(408) 817-2190
michael.e.wendt@accenture.com
Scott Kurth
Group Lead
Data Insights R&D
Accenture Technology Labs
(408) 817-2775
scott.kurth@accenture.com

More Related Content

What's hot

Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessCloudera, Inc.
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Cloudera, Inc.
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopHortonworks
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
Self-Service Provisioning and Hadoop Management with Apache Ambari
Self-Service Provisioning and  Hadoop Management with Apache AmbariSelf-Service Provisioning and  Hadoop Management with Apache Ambari
Self-Service Provisioning and Hadoop Management with Apache AmbariDataWorks Summit
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Cloudera, Inc.
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 

What's hot (20)

Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Self-Service Provisioning and Hadoop Management with Apache Ambari
Self-Service Provisioning and  Hadoop Management with Apache AmbariSelf-Service Provisioning and  Hadoop Management with Apache Ambari
Self-Service Provisioning and Hadoop Management with Apache Ambari
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 

Viewers also liked

Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationDataWorks Summit
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopDataWorks Summit
 
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformDeploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformRackspace
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital TransformationMapR Technologies
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoopmcsrivas
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Amr Awadallah
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APImcsrivas
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successDataWorks Summit
 
Performance Reviews Are Dead - Long Live Performance Reviews
Performance Reviews Are Dead - Long Live Performance ReviewsPerformance Reviews Are Dead - Long Live Performance Reviews
Performance Reviews Are Dead - Long Live Performance ReviewsProyectalis / Improvement21
 
Merit Pay - Karim Virani
Merit Pay - Karim ViraniMerit Pay - Karim Virani
Merit Pay - Karim ViraniKarim Virani
 
Python for Linux System Administration
Python for Linux System AdministrationPython for Linux System Administration
Python for Linux System Administrationvceder
 
Case study on new performance appraisal system at xerox
Case study on new performance appraisal system at xeroxCase study on new performance appraisal system at xerox
Case study on new performance appraisal system at xeroxSachin Kharecha
 
Leading the Customer Experience Revolution: Baystate Health, Cleveland Clinic...
Leading the Customer Experience Revolution: Baystate Health, Cleveland Clinic...Leading the Customer Experience Revolution: Baystate Health, Cleveland Clinic...
Leading the Customer Experience Revolution: Baystate Health, Cleveland Clinic...Renown Health
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
The Future of Personalization with Accenture
The Future of Personalization with AccentureThe Future of Personalization with Accenture
The Future of Personalization with AccentureOptimizely
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Knowledge management at accenture
Knowledge management at accentureKnowledge management at accenture
Knowledge management at accenturesekretnay
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
 

Viewers also liked (20)

Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop Implementation
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on Hadoop
 
Hadoop on Virtual Machines
Hadoop on Virtual MachinesHadoop on Virtual Machines
Hadoop on Virtual Machines
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformDeploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
 
Performance Reviews Are Dead - Long Live Performance Reviews
Performance Reviews Are Dead - Long Live Performance ReviewsPerformance Reviews Are Dead - Long Live Performance Reviews
Performance Reviews Are Dead - Long Live Performance Reviews
 
Merit Pay - Karim Virani
Merit Pay - Karim ViraniMerit Pay - Karim Virani
Merit Pay - Karim Virani
 
Python for Linux System Administration
Python for Linux System AdministrationPython for Linux System Administration
Python for Linux System Administration
 
Case study on new performance appraisal system at xerox
Case study on new performance appraisal system at xeroxCase study on new performance appraisal system at xerox
Case study on new performance appraisal system at xerox
 
Leading the Customer Experience Revolution: Baystate Health, Cleveland Clinic...
Leading the Customer Experience Revolution: Baystate Health, Cleveland Clinic...Leading the Customer Experience Revolution: Baystate Health, Cleveland Clinic...
Leading the Customer Experience Revolution: Baystate Health, Cleveland Clinic...
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
The Future of Personalization with Accenture
The Future of Personalization with AccentureThe Future of Personalization with Accenture
The Future of Personalization with Accenture
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Knowledge management at accenture
Knowledge management at accentureKnowledge management at accenture
Knowledge management at accenture
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 

Similar to Where to Deploy Hadoop: Bare Metal or Cloud?

Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...EMC
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...Yahoo Developer Network
 
AWS Cloud for HPC and Big Data
AWS Cloud for HPC and Big DataAWS Cloud for HPC and Big Data
AWS Cloud for HPC and Big Datainside-BigData.com
 
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...Amazon Web Services
 
Pivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DancePivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DanceEMC
 
Documento11propuesta
Documento11propuestaDocumento11propuesta
Documento11propuestaOscar Trenado
 
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...Amazon Web Services
 
Disaster recovery sites on AWS: minimal costs maximum efficiency
Disaster recovery sites on AWS: minimal costs maximum efficiencyDisaster recovery sites on AWS: minimal costs maximum efficiency
Disaster recovery sites on AWS: minimal costs maximum efficiencyAmazon Web Services
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Sumeet Singh
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)
Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)
Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)NTT DATA OSS Professional Services
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRProvectus
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Amazon Web Services
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
 

Similar to Where to Deploy Hadoop: Bare Metal or Cloud? (20)

Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the Cloud
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
 
AWS Cloud for HPC and Big Data
AWS Cloud for HPC and Big DataAWS Cloud for HPC and Big Data
AWS Cloud for HPC and Big Data
 
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
 
Pivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DancePivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant Dance
 
Documento11propuesta
Documento11propuestaDocumento11propuesta
Documento11propuesta
 
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
 
Disaster recovery sites on AWS: minimal costs maximum efficiency
Disaster recovery sites on AWS: minimal costs maximum efficiencyDisaster recovery sites on AWS: minimal costs maximum efficiency
Disaster recovery sites on AWS: minimal costs maximum efficiency
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)
Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)
Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
Greenplum feature
Greenplum featureGreenplum feature
Greenplum feature
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1DianaGray10
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)codyslingerland1
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 

Recently uploaded (20)

Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 

Where to Deploy Hadoop: Bare Metal or Cloud?

  • 1. Where to Deploy Hadoop: Bare-metal or Cloud? Michael Wendt, Sewook Wee Data Insights R&D Group
  • 2. Copyright © 2013 Accenture All rights reserved. 2 Big Data: Bare-metal vs. Cloud Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Hadoop Appliance Hadoop Hosting
  • 3. Copyright © 2013 Accenture All rights reserved. 3 Big Data: Bare-metal vs. Cloud Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Hadoop Appliance Hadoop Hosting Data Privacy Data Gravity Price-Performance Ratio Productivity of Developers & Data Scientists Data Enrichment
  • 4. Copyright © 2013 Accenture All rights reserved. 4 Big Data: Bare-metal vs. Cloud Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Hadoop Appliance Hadoop Hosting Data Privacy Data Gravity Price-Performance Ratio Productivity of Developers & Data Scientists Data Enrichment
  • 5. Copyright © 2013 Accenture All rights reserved. 5Servers designed by Daniel Campos from The Noun Project Price-Performance Ratio Views Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Cloud? Virtualized? Slow! Who cares! I’m cheap, just throw more in! Price-Performance Ratio
  • 6. Copyright © 2013 Accenture All rights reserved. 6 Hadoop Deployment Comparison Study Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Accenture Data Platform Benchmark + TCO analysis Price-Performance Ratio Price-Performance Ratio
  • 7. Copyright © 2013 Accenture All rights reserved. 7 Hadoop Deployment Comparison Study TCO Analysis Price-Performance Ratio Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Accenture Data Platform Benchmark + TCO analysis
  • 8. Copyright © 2013 Accenture All rights reserved. 8 TCO of Bare-metal Hadoop Cluster On-premise full custom Server hardware Staff for operation Data center facility and electricity Technical support 24 server nodes and 50 TB of HDFS capacity* small-scale initial production deployment $3,000.00 $2,914.58 $6,656.00 $9,274.46 $21,845.04 Servers designed by Daniel Campos from The Noun Project
  • 9. Copyright © 2013 Accenture All rights reserved. 9 TCO of Hadoop-as-a-Service Hadoop-as- a-Service Hadoop service Staff for operation Storage services Technical support Used bare-metal TCO for budget Calculated the number of affordable instances $15,318.28 $2,063.00 $1,372.27 $3,091.49 $21,845.04
  • 10. Copyright © 2013 Accenture All rights reserved. 10 TCO of Hadoop-as-a-Service – Instances Hadoop service 14 instance types 3 pricing models 42 combinations Hadoop-as- a-Service
  • 11. Copyright © 2013 Accenture All rights reserved. 11 TCO of Hadoop-as-a-Service – Instances Hadoop service m1.xl m2.4xl cc2.8xl Selected representative 3 instance types: m1.xlarge, m2.4xlarge, cc2.8xlarge Hadoop-as- a-Service
  • 12. Copyright © 2013 Accenture All rights reserved. 12 TCO of Hadoop-as-a-Service – Affordable Instances Hadoop service 50% cluster utilization assumed 1/3 of budget allocated for Spot instances Instance type On-demand instances (ODI) Reserved instances (RI) Reserved + Spot instances (RI + SI) m1.xlarge 68 112 192 m2.4xlarge 20 41 77 cc2.8xlarge 13 28 53$15,318.28 Hadoop-as- a-Service
  • 13. Copyright © 2013 Accenture All rights reserved. 13 Hadoop Deployment Comparison Study Accenture Data Platform Benchmark Price-Performance Ratio Bare-metal Cloud On-premise full custom Hadoop-as- a-Service + TCO analysis Accenture Data Platform Benchmark
  • 14. Copyright © 2013 Accenture All rights reserved. 14 Accenture Data Platform Benchmark Log management Sessionization Customer preference prediction Recommendation engine Text Analytics Document clustering Use cases Workload Suite of real-world Hadoop MapReduce applications From client experience, internal roadmap, public literature Open- source libraries & public datasets Categorized & selected common use cases
  • 15. Copyright © 2013 Accenture All rights reserved. 15 Accenture Data Platform Benchmark: Sessionization Log data Sessions Log data Bucketing Sorting Slicing Log data A session is a sequence of related interactions, useful to analyze as a group ~150 billion log entries, ~24 TB 1 million users, 1.1 billion sessions
  • 16. Copyright © 2013 Accenture All rights reserved. 16 Accenture Data Platform Benchmark: Recommendation Engine Ratings data Who rated what item? Co-occurrence matrix How many people rated the pair of items? Recommendation Given the way the person rated these items, he/she is likely to be interested in these other items. Used item-based collaborative filtering algorithm Mahout example library used as foundation Generated 300 million ratings 3 million population, 50,000 items
  • 17. Copyright © 2013 Accenture All rights reserved. 17 Accenture Data Platform Benchmark: Document Clustering Corpus of crawled web pages Filtered and tokenized documents Term dictionary TF vectors Clustered documents K-means TF-IDF vectors Groups similar documents Application components used in many areas (e.g., search engines, e-commerce site optimization) Common Crawl dataset, 10 TB corpus* ~31,000 ARC files or ~300 million HTML pages
  • 18. Copyright © 2013 Accenture All rights reserved. 18 TCO analysis Hadoop Deployment Comparison Study Experiment Setup/Results Bare-metal Cloud + On-premise full custom Hadoop-as- a-Service Accenture Data Platform Benchmark Price-Performance Ratio
  • 19. Copyright © 2013 Accenture All rights reserved. 19 Experiment Setup: Price-Performance Ratio Comparison Bare-metal Hadoop Cluster Amazon EMR Clusters 1 bare-metal cluster vs. 9 Amazon EMR clusters Manual and automated tuning Fixed budget for cluster size Measure execution time of benchmark Price-Performance Ratio
  • 20. Copyright © 2013 Accenture All rights reserved. 20 Optimize phase Profile phase Experiment Setup: Starfish Automated Performance Tuning Tool Starfish (now Unravel) is an automated performance tuning tool for MapReduce jobs Speedometer designed by Filippo Camedda from The Noun Project For the experiment we ran each benchmark twice using Starfish Manual and automated tuning Measure execution time of optimize phase
  • 21. Copyright © 2013 Accenture All rights reserved. 21 Experiment Results: Starfish Automated Performance Tuning Tool Manual and automated tuning Starfish tuned Recommendation Engine workload w/ 11 cascaded MapReduce jobs Manually tuned Sessionization workload 2+ weeks of manual tuning, ½ - 1 day iterations 8x improvement in one tuning cycle Achieve performance increases with less cost using Starfish
  • 22. Copyright © 2013 Accenture All rights reserved. 22 408.07 229.25 125.82 381.55 204.10 166.82 250.13 172.23 114.35 ODI RI RI+SI ExecutionTime(minutes) Amazon EMR Configuration cc2.8xlarge m2.4xlarge m1.xlarge Experiment Results: Sessionization Bare-metal: 533 13 20 68 28 41 112 53 77 192
  • 23. Copyright © 2013 Accenture All rights reserved. 23 23.33 21.97 18.48 20.13 19.97 16.92 14.28 16.30 15.08 ODI RI RI+SI ExecutionTime(minutes) Amazon EMR Configuration cc2.8xlarge m2.4xlarge m1.xlarge Experiment Results: Recommendation Engine Bare-metal: 21.59 13 20 68 28 41 112 53 77 192
  • 24. Copyright © 2013 Accenture All rights reserved. 24 1661.03 1157.37 784.82 1649.98 1112.68 629.98 914.35 779.98 742.38 ODI RI RI+SI ExecutionTime(minutes) Amazon EMR Configuration cc2.8xlarge m2.4xlarge m1.xlarge Experiment Results: Document Clustering Bare-metal: 1186.37 13 20 68 28 41 112 53 77 192
  • 25. Copyright © 2013 Accenture All rights reserved. 25 Key Takeaways Hadoop-as-a-Service offers a better price- performance ratio Cloud expands the performance tuning opportunities Automated performance tuning tools are a necessity Servers designed by Daniel Campos from The Noun Project
  • 26. Copyright © 2013 Accenture All rights reserved. 26 Acknowledgement
  • 27. Copyright © 2013 Accenture All rights reserved. 27 More details Contact us for the full white paper: Hadoop Deployment Comparison Study Michael Wendt R&D Developer Data Insights R&D Accenture Technology Labs (408) 817-2190 michael.e.wendt@accenture.com Scott Kurth Group Lead Data Insights R&D Accenture Technology Labs (408) 817-2775 scott.kurth@accenture.com

Editor's Notes

  1. Introduction – Michael Wendt, R&D Developer in Data Insights R&D Group at ATLAccenture Technology Labs – the forward looking R&D group of Accenture, in San Jose and 4 other locations globallyWhen enterprises decide to adopt Hadoop, they are faced with having to answer the question: Where to deploy Hadoop: Bare-metal or Cloud?
  2. Four main deployment models for businesses:- On-premise full custom: purchase commodity hardware, install software and operate it themselves -> gives businesses full control of the Hadoop cluster.- Hadoop appliance: preconfigured Hadoop cluster -> bypass detailed technical configuration and jumpstart data analysisTransitioning outside of the corportation…- Hadoop hosting: similar to ISP model -> rely on a service provider to deploy and operate Hadoop clusters - Hadoop-as-a-Service:instant access to Hadoop clusters, pay-per-use consumption model -> providing greater business agilityDeciding which deployment model is appropriate depends on the five key areas below:- Price-Performance Ratio: with a limited budget how can we get the biggest ROI; -- BM: requires a larger upfront investment, limiting scale-- CL: can scale with demand- Data Privacy: concerns with corporate data-- BM: security, contains all data in-house-- CL: need for comprehensive cloud-data privacy strategy-Data Gravity: once data volume grows, physical migration becomes slow -> locked into current platform-- need to consider portability, future growth and location of data- Data Enrichment: leveraging multiple datasets to uncover new insights, determining where to host, co-locate data- Productivity: ability to test ideas, “sandbox”, deploy to production-- CL: advantage for deploying test clustersFor this study we focus on the extreme ends of the spectrum: On-premise & HaaSDive deeper into Price-Performance Ratio
  3. Four main deployment models for businesses:- On-premise full custom: purchase commodity hardware, install software and operate it themselves -> gives businesses full control of the Hadoop cluster.- Hadoop appliance: preconfigured Hadoop cluster -> bypass detailed technical configuration and jumpstart data analysisTransitioning outside of the corportation…- Hadoop hosting: similar to ISP model -> rely on a service provider to deploy and operate Hadoop clusters - Hadoop-as-a-Service:instant access to Hadoop clusters, pay-per-use consumption model -> providing greater business agilityDeciding which deployment model is appropriate depends on the five key areas below:- Price-Performance Ratio: with a limited budget how can we get the biggest ROI; -- BM: requires a larger upfront investment, limiting scale-- CL: can scale with demand- Data Privacy: concerns with corporate data-- BM: security, contains all data in-house-- CL: need for comprehensive cloud-data privacy strategy-Data Gravity: once data volume grows, physical migration becomes slow -> locked into current platform-- need to consider portability, future growth and location of data- Data Enrichment: leveraging multiple datasets to uncover new insights, determining where to host, co-locate data- Productivity: ability to test ideas, “sandbox”, deploy to production-- CL: advantage for deploying test clustersFor this study we focus on the extreme ends of the spectrum: On-premise & HaaSDive deeper into Price-Performance Ratio
  4. Four main deployment models for businesses:- On-premise full custom: purchase commodity hardware, install software and operate it themselves -> gives businesses full control of the Hadoop cluster.- Hadoop appliance: preconfigured Hadoop cluster -> bypass detailed technical configuration and jumpstart data analysisTransitioning outside of the corportation…- Hadoop hosting: similar to ISP model -> rely on a service provider to deploy and operate Hadoop clusters - Hadoop-as-a-Service:instant access to Hadoop clusters, pay-per-use consumption model -> providing greater business agilityDeciding which deployment model is appropriate depends on the five key areas below:- Price-Performance Ratio: with a limited budget how can we get the biggest ROI; -- BM: requires a larger upfront investment, limiting scale-- CL: can scale with demand- Data Privacy: concerns with corporate data-- BM: security, contains all data in-house-- CL: need for comprehensive cloud-data privacy strategy-Data Gravity: once data volume grows, physical migration becomes slow -> locked into current platform-- need to consider portability, future growth and location of data- Data Enrichment: leveraging multiple datasets to uncover new insights, determining where to host, co-locate data- Productivity: ability to test ideas, “sandbox”, deploy to production-- CL: advantage for deploying test clustersFor this study we focus on the extreme ends of the spectrum: On-premise & HaaSDive deeper into Price-Performance Ratio
  5. Price-Performance Ratio has two divergent views for Hadoop:--click--1. Virtualized Hadoop cluster is slower because Hadoop’s workload has intensive I/O operations--click--2. Cloud-based model provides compelling cost savings - nodes are less expensive; Hadoop is horizontally scalable
  6. In the Hadoop Deployment Comparison Study, we compare the price-performance ratio of a bare-metal Hadoop cluster with Hadoop-as-a-service --click--at the matched total cost of ownership (TCO) level --click--using real-world applications modeled by the Accenture Data Platform Benchmark
  7. Let’s first take a look at the TCO analysis
  8. *3 times replication factorServer hardware – depreciation accounted for over 3 years; full details in white paperData center – tier-3 data center 10,000 sq. ft; full details in white paperTech support – third party vendorsStaff – 3 full time employees
  9. Staff – one full time employee; reduced needTech Support – AWS Premium SupportDifferent needs based on cloud environment, no need for data centerStorage Services – Amazon S3No need for servers only virtual instances of Hadoop service – Amazon EMR--click--Subtracted from budget to determine number of affordable instances--click--Calculated the
  10. Time and cost prohibitive to test all 42 combinationsSelected these three instance types since they were the largest of their respective instance family
  11. Time and cost prohibitive to test all 42 combinationsSelected these three instance types since they were the largest of their respective instance family
  12. Assumed 50% utilization
  13. Now let’s look at the Accenture Data Platform Benchmark
  14. Sessionization: Constructing session from raw log data. One of several prerequisite steps for log analysis use cases (individual website optimization, infrastructure optimization, security analytics, etc.).
  15. Filteringalogrithms basic and simple, while widely used.
  16. *3 TB compressed
  17. Experiment setup, how did everything come together?
  18. Let’s switch gears…--click--8x improvement relative to default parameter settingseach iteration took about ½ - 1 full day including performance analysis, tuning, and executionThe merit of Starfish is to achieve performance increases with much less cost than manual tuning.
  19. Executive summary available in limited quantities