SlideShare a Scribd company logo
1 of 53
Download to read offline
1Pivotal Confidential–Internal Use Only 1Pivotal Confidential–Internal Use Only
MPP vs Hadoop
Alexey Grishchenko
HUG Meetup
28.11.2015
2Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
3Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
4Pivotal Confidential–Internal Use Only
Distributed Systems
Avoid distributed systems in all the problems that potentially
could be solved using non-distributed systems
5Pivotal Confidential–Internal Use Only
Distributed Systems
Ÿ  Consensus problem
–  Paxos
–  RAFT
–  ZAB
–  etc.
Ÿ  Transaction consistency
–  2PC
–  3PC
6Pivotal Confidential–Internal Use Only
Distributed Systems
Ÿ  CAP Theorem
7Pivotal Confidential–Internal Use Only
Distributed Systems
http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
8Pivotal Confidential–Internal Use Only
Distributed Systems
Reasons to use
Ÿ  Performance issues
–  More than 100’000 TPS
–  More than 4 GB/sec scan rate
–  More than 100’000 IOPS
Ÿ  Capacity issues
–  More than 50TB of data
Ÿ  DR and Geo-Distribution
9Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
10Pivotal Confidential–Internal Use Only
MPP
Main principles
Ÿ  Shared Nothing
Ÿ  Data Sharding
Ÿ  Data Replication
Ÿ  Distributed Transactions
Ÿ  Parallel Processing
11Pivotal Confidential–Internal Use Only
MPP
12Pivotal Confidential–Internal Use Only
MPP
Works well for
Ÿ  Relational data
Ÿ  Batch processing
Ÿ  Ad hoc analytical SQL
Ÿ  Low concurrency
Ÿ  Applications requiring ANSI SQL
13Pivotal Confidential–Internal Use Only
MPP
Not the best choice for
Ÿ  Non-relational data
Ÿ  OLTP and event stream processing
Ÿ  High concurrency
Ÿ  100+ server clusters
Ÿ  Non-analytical use cases
Ÿ  Geo-Distributed use cases
14Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
15Pivotal Confidential–Internal Use Only
Hadoop
Main Components
Ÿ  HDFS
Ÿ  YARN
Ÿ  MapReduce
Ÿ  HBase
Ÿ  Hive / Hive+Tez
16Pivotal Confidential–Internal Use Only
Hadoop
HDFS
Ÿ  Distributed filesystem
Ÿ  Block-level storage with big blocks
Ÿ  Non-updatable
Ÿ  Synchronous block replication
Ÿ  No built-in Geo-Distribution support
Ÿ  No built-in DR solution
17Pivotal Confidential–Internal Use Only
Hadoop
HDFS
18Pivotal Confidential–Internal Use Only
Hadoop
YARN
Ÿ  Cluster resource manager
Ÿ  Manages CPU and RAM allocation
Ÿ  Schedulers are pluggable
Ÿ  Can handle different resource pools
Ÿ  Supports both MR and non-MR workload
19Pivotal Confidential–Internal Use Only
Hadoop
YARN
20Pivotal Confidential–Internal Use Only
Hadoop
MapReduce
Ÿ  Framework for distributed data processing
Ÿ  Two main operations: map and reduce
Ÿ  Data hits disk after “map” and before “reduce”
Ÿ  Scales to thousands of servers
Ÿ  Can process petabytes of data
Ÿ  Extremely reliable
21Pivotal Confidential–Internal Use Only
Hadoop
MapReduce
22Pivotal Confidential–Internal Use Only
Hadoop
HBase
Ÿ  Distributed key-value store
Ÿ  Data is sharded by key
Ÿ  Data is stored in sorted order
Ÿ  Stores multiple versions of the row
Ÿ  Easily scales
23Pivotal Confidential–Internal Use Only
Hadoop
HBase
24Pivotal Confidential–Internal Use Only
Hadoop
Hive
Ÿ  Query engine with SQL-like syntax
Ÿ  Translates HiveQL query to MR / Tez / Spark job
Ÿ  Processes HDFS data
Ÿ  Supports UDFs and UDAFs
25Pivotal Confidential–Internal Use Only
Hadoop
Hive
26Pivotal Confidential–Internal Use Only
Hadoop
Works well for
Ÿ  Write Once Read Many
Ÿ  100+ server clusters
Ÿ  Both relational and non-relational data
Ÿ  High concurrency
Ÿ  Batch processing and analytical workload
Ÿ  Elastic scalability
27Pivotal Confidential–Internal Use Only
Hadoop
Not the best choice for
Ÿ  Write-heavy workloads
Ÿ  Small clusters
Ÿ  Analytical DWH cases
Ÿ  OLTP and event stream processing
Ÿ  Cost savings
28Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
29Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
30Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
31Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
32Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
33Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
34Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
35Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
36Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
37Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
Scalability Up to 100-300 TB Up to 100 PB
38Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
Scalability Up to 100-300 TB Up to 100 PB
Target Systems DWH Purpose-Built Batch
39Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
Scalability Up to 100-300 TB Up to 100 PB
Target Systems DWH Purpose-Built Batch
Target End Users Business Analysts Developers
40Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
41Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
42Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
43Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
44Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
45Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
46Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
47Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
Query Max Runtime 1-2 hours 1-2 weeks
48Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
Query Max Runtime 1-2 hours 1-2 weeks
Min Collection Size Megabytes Gigabytes
49Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
Query Max Runtime 1-2 hours 1-2 weeks
Min Collection Size Megabytes Gigabytes
Max Concurrency 10-15 queries 70-100 jobs
50Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Examples
Ÿ  Summary
51Pivotal Confidential–Internal Use Only
Summary
Use MPP for
Ÿ  Analytical DWH
Ÿ  Ad hoc analyst SQL queries and BI
Ÿ  Keep under 100TB of data
Use Hadoop for
Ÿ  Specialized data processing systems
Ÿ  Over 100TB of data
52Pivotal Confidential–Internal Use Only 52Pivotal Confidential–Internal Use Only
Questions?
BUILT FOR THE SPEED OF BUSINESS

More Related Content

What's hot

Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsAlluxio, Inc.
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...Edureka!
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark DownscalingDatabricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 

What's hot (20)

Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Apache spark
Apache sparkApache spark
Apache spark
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 

Similar to MPP vs Hadoop

Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Big Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya GargBig Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya GargAgile Testing Alliance
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on HadoopDataWorks Summit
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!Cloudera, Inc.
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009yhadoop
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingYahoo Developer Network
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsInside Analysis
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewYafang Chang
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingInside Analysis
 
The practice of big data - making big data approachable
The practice of big data - making big data approachableThe practice of big data - making big data approachable
The practice of big data - making big data approachablekcmallu
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenDataWorks Summit
 
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15IBMInfoSphereUGFR
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases DataWorks Summit
 

Similar to MPP vs Hadoop (20)

Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Big Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya GargBig Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya Garg
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise Thinking
 
The practice of big data - making big data approachable
The practice of big data - making big data approachableThe practice of big data - making big data approachable
The practice of big data - making big data approachable
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
 
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
 

More from Alexey Grishchenko

More from Alexey Grishchenko (6)

Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015Архитектура Apache HAWQ Highload++ 2015
Архитектура Apache HAWQ Highload++ 2015
 
Pivotal hawq internals
Pivotal hawq internalsPivotal hawq internals
Pivotal hawq internals
 

Recently uploaded

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Recently uploaded (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

MPP vs Hadoop

  • 1. 1Pivotal Confidential–Internal Use Only 1Pivotal Confidential–Internal Use Only MPP vs Hadoop Alexey Grishchenko HUG Meetup 28.11.2015
  • 2. 2Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 3. 3Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 4. 4Pivotal Confidential–Internal Use Only Distributed Systems Avoid distributed systems in all the problems that potentially could be solved using non-distributed systems
  • 5. 5Pivotal Confidential–Internal Use Only Distributed Systems Ÿ  Consensus problem –  Paxos –  RAFT –  ZAB –  etc. Ÿ  Transaction consistency –  2PC –  3PC
  • 6. 6Pivotal Confidential–Internal Use Only Distributed Systems Ÿ  CAP Theorem
  • 7. 7Pivotal Confidential–Internal Use Only Distributed Systems http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
  • 8. 8Pivotal Confidential–Internal Use Only Distributed Systems Reasons to use Ÿ  Performance issues –  More than 100’000 TPS –  More than 4 GB/sec scan rate –  More than 100’000 IOPS Ÿ  Capacity issues –  More than 50TB of data Ÿ  DR and Geo-Distribution
  • 9. 9Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 10. 10Pivotal Confidential–Internal Use Only MPP Main principles Ÿ  Shared Nothing Ÿ  Data Sharding Ÿ  Data Replication Ÿ  Distributed Transactions Ÿ  Parallel Processing
  • 12. 12Pivotal Confidential–Internal Use Only MPP Works well for Ÿ  Relational data Ÿ  Batch processing Ÿ  Ad hoc analytical SQL Ÿ  Low concurrency Ÿ  Applications requiring ANSI SQL
  • 13. 13Pivotal Confidential–Internal Use Only MPP Not the best choice for Ÿ  Non-relational data Ÿ  OLTP and event stream processing Ÿ  High concurrency Ÿ  100+ server clusters Ÿ  Non-analytical use cases Ÿ  Geo-Distributed use cases
  • 14. 14Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 15. 15Pivotal Confidential–Internal Use Only Hadoop Main Components Ÿ  HDFS Ÿ  YARN Ÿ  MapReduce Ÿ  HBase Ÿ  Hive / Hive+Tez
  • 16. 16Pivotal Confidential–Internal Use Only Hadoop HDFS Ÿ  Distributed filesystem Ÿ  Block-level storage with big blocks Ÿ  Non-updatable Ÿ  Synchronous block replication Ÿ  No built-in Geo-Distribution support Ÿ  No built-in DR solution
  • 18. 18Pivotal Confidential–Internal Use Only Hadoop YARN Ÿ  Cluster resource manager Ÿ  Manages CPU and RAM allocation Ÿ  Schedulers are pluggable Ÿ  Can handle different resource pools Ÿ  Supports both MR and non-MR workload
  • 20. 20Pivotal Confidential–Internal Use Only Hadoop MapReduce Ÿ  Framework for distributed data processing Ÿ  Two main operations: map and reduce Ÿ  Data hits disk after “map” and before “reduce” Ÿ  Scales to thousands of servers Ÿ  Can process petabytes of data Ÿ  Extremely reliable
  • 21. 21Pivotal Confidential–Internal Use Only Hadoop MapReduce
  • 22. 22Pivotal Confidential–Internal Use Only Hadoop HBase Ÿ  Distributed key-value store Ÿ  Data is sharded by key Ÿ  Data is stored in sorted order Ÿ  Stores multiple versions of the row Ÿ  Easily scales
  • 24. 24Pivotal Confidential–Internal Use Only Hadoop Hive Ÿ  Query engine with SQL-like syntax Ÿ  Translates HiveQL query to MR / Tez / Spark job Ÿ  Processes HDFS data Ÿ  Supports UDFs and UDAFs
  • 26. 26Pivotal Confidential–Internal Use Only Hadoop Works well for Ÿ  Write Once Read Many Ÿ  100+ server clusters Ÿ  Both relational and non-relational data Ÿ  High concurrency Ÿ  Batch processing and analytical workload Ÿ  Elastic scalability
  • 27. 27Pivotal Confidential–Internal Use Only Hadoop Not the best choice for Ÿ  Write-heavy workloads Ÿ  Small clusters Ÿ  Analytical DWH cases Ÿ  OLTP and event stream processing Ÿ  Cost savings
  • 28. 28Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 29. 29Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open
  • 30. 30Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity
  • 31. 31Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common
  • 32. 32Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K
  • 33. 33Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High
  • 34. 34Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source
  • 35. 35Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex
  • 36. 36Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers
  • 37. 37Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers Scalability Up to 100-300 TB Up to 100 PB
  • 38. 38Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers Scalability Up to 100-300 TB Up to 100 PB Target Systems DWH Purpose-Built Batch
  • 39. 39Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers Scalability Up to 100-300 TB Up to 100 PB Target Systems DWH Purpose-Built Batch Target End Users Business Analysts Developers
  • 40. 40Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None
  • 41. 41Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard
  • 42. 42Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java
  • 43. 43Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High
  • 44. 44Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High
  • 45. 45Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec
  • 46. 46Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins
  • 47. 47Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins Query Max Runtime 1-2 hours 1-2 weeks
  • 48. 48Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins Query Max Runtime 1-2 hours 1-2 weeks Min Collection Size Megabytes Gigabytes
  • 49. 49Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins Query Max Runtime 1-2 hours 1-2 weeks Min Collection Size Megabytes Gigabytes Max Concurrency 10-15 queries 70-100 jobs
  • 50. 50Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Examples Ÿ  Summary
  • 51. 51Pivotal Confidential–Internal Use Only Summary Use MPP for Ÿ  Analytical DWH Ÿ  Ad hoc analyst SQL queries and BI Ÿ  Keep under 100TB of data Use Hadoop for Ÿ  Specialized data processing systems Ÿ  Over 100TB of data
  • 52. 52Pivotal Confidential–Internal Use Only 52Pivotal Confidential–Internal Use Only Questions?
  • 53. BUILT FOR THE SPEED OF BUSINESS