SlideShare a Scribd company logo
1Pivotal Confidential–Internal Use Only 1Pivotal Confidential–Internal Use Only
MPP vs Hadoop
Alexey Grishchenko
HUG Meetup
28.11.2015
2Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
3Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
4Pivotal Confidential–Internal Use Only
Distributed Systems
Avoid distributed systems in all the problems that potentially
could be solved using non-distributed systems
5Pivotal Confidential–Internal Use Only
Distributed Systems
Ÿ  Consensus problem
–  Paxos
–  RAFT
–  ZAB
–  etc.
Ÿ  Transaction consistency
–  2PC
–  3PC
6Pivotal Confidential–Internal Use Only
Distributed Systems
Ÿ  CAP Theorem
7Pivotal Confidential–Internal Use Only
Distributed Systems
http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
8Pivotal Confidential–Internal Use Only
Distributed Systems
Reasons to use
Ÿ  Performance issues
–  More than 100’000 TPS
–  More than 4 GB/sec scan rate
–  More than 100’000 IOPS
Ÿ  Capacity issues
–  More than 50TB of data
Ÿ  DR and Geo-Distribution
9Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
10Pivotal Confidential–Internal Use Only
MPP
Main principles
Ÿ  Shared Nothing
Ÿ  Data Sharding
Ÿ  Data Replication
Ÿ  Distributed Transactions
Ÿ  Parallel Processing
11Pivotal Confidential–Internal Use Only
MPP
12Pivotal Confidential–Internal Use Only
MPP
Works well for
Ÿ  Relational data
Ÿ  Batch processing
Ÿ  Ad hoc analytical SQL
Ÿ  Low concurrency
Ÿ  Applications requiring ANSI SQL
13Pivotal Confidential–Internal Use Only
MPP
Not the best choice for
Ÿ  Non-relational data
Ÿ  OLTP and event stream processing
Ÿ  High concurrency
Ÿ  100+ server clusters
Ÿ  Non-analytical use cases
Ÿ  Geo-Distributed use cases
14Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
15Pivotal Confidential–Internal Use Only
Hadoop
Main Components
Ÿ  HDFS
Ÿ  YARN
Ÿ  MapReduce
Ÿ  HBase
Ÿ  Hive / Hive+Tez
16Pivotal Confidential–Internal Use Only
Hadoop
HDFS
Ÿ  Distributed filesystem
Ÿ  Block-level storage with big blocks
Ÿ  Non-updatable
Ÿ  Synchronous block replication
Ÿ  No built-in Geo-Distribution support
Ÿ  No built-in DR solution
17Pivotal Confidential–Internal Use Only
Hadoop
HDFS
18Pivotal Confidential–Internal Use Only
Hadoop
YARN
Ÿ  Cluster resource manager
Ÿ  Manages CPU and RAM allocation
Ÿ  Schedulers are pluggable
Ÿ  Can handle different resource pools
Ÿ  Supports both MR and non-MR workload
19Pivotal Confidential–Internal Use Only
Hadoop
YARN
20Pivotal Confidential–Internal Use Only
Hadoop
MapReduce
Ÿ  Framework for distributed data processing
Ÿ  Two main operations: map and reduce
Ÿ  Data hits disk after “map” and before “reduce”
Ÿ  Scales to thousands of servers
Ÿ  Can process petabytes of data
Ÿ  Extremely reliable
21Pivotal Confidential–Internal Use Only
Hadoop
MapReduce
22Pivotal Confidential–Internal Use Only
Hadoop
HBase
Ÿ  Distributed key-value store
Ÿ  Data is sharded by key
Ÿ  Data is stored in sorted order
Ÿ  Stores multiple versions of the row
Ÿ  Easily scales
23Pivotal Confidential–Internal Use Only
Hadoop
HBase
24Pivotal Confidential–Internal Use Only
Hadoop
Hive
Ÿ  Query engine with SQL-like syntax
Ÿ  Translates HiveQL query to MR / Tez / Spark job
Ÿ  Processes HDFS data
Ÿ  Supports UDFs and UDAFs
25Pivotal Confidential–Internal Use Only
Hadoop
Hive
26Pivotal Confidential–Internal Use Only
Hadoop
Works well for
Ÿ  Write Once Read Many
Ÿ  100+ server clusters
Ÿ  Both relational and non-relational data
Ÿ  High concurrency
Ÿ  Batch processing and analytical workload
Ÿ  Elastic scalability
27Pivotal Confidential–Internal Use Only
Hadoop
Not the best choice for
Ÿ  Write-heavy workloads
Ÿ  Small clusters
Ÿ  Analytical DWH cases
Ÿ  OLTP and event stream processing
Ÿ  Cost savings
28Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
29Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
30Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
31Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
32Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
33Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
34Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
35Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
36Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
37Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
Scalability Up to 100-300 TB Up to 100 PB
38Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
Scalability Up to 100-300 TB Up to 100 PB
Target Systems DWH Purpose-Built Batch
39Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
Scalability Up to 100-300 TB Up to 100 PB
Target Systems DWH Purpose-Built Batch
Target End Users Business Analysts Developers
40Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
41Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
42Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
43Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
44Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
45Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
46Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
47Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
Query Max Runtime 1-2 hours 1-2 weeks
48Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
Query Max Runtime 1-2 hours 1-2 weeks
Min Collection Size Megabytes Gigabytes
49Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
Query Max Runtime 1-2 hours 1-2 weeks
Min Collection Size Megabytes Gigabytes
Max Concurrency 10-15 queries 70-100 jobs
50Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Examples
Ÿ  Summary
51Pivotal Confidential–Internal Use Only
Summary
Use MPP for
Ÿ  Analytical DWH
Ÿ  Ad hoc analyst SQL queries and BI
Ÿ  Keep under 100TB of data
Use Hadoop for
Ÿ  Specialized data processing systems
Ÿ  Over 100TB of data
52Pivotal Confidential–Internal Use Only 52Pivotal Confidential–Internal Use Only
Questions?
BUILT FOR THE SPEED OF BUSINESS

More Related Content

What's hot

Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdfGartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
momirlan
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
Deon Huang
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
Andy Petrella
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
ConfluentInc1
 
Change data capture
Change data captureChange data capture
Change data capture
Ron Barabash
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
Kasun Don
 
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Visual_BI
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
YugabyteDB
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 

What's hot (20)

Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdfGartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Change data capture
Change data captureChange data capture
Change data capture
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
 
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 

Similar to MPP vs Hadoop

Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
Inside Analysis
 
Big Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya GargBig Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya Garg
Agile Testing Alliance
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
Nicolas Morales
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
DataWorks Summit
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
Tony Ng
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009yhadoop
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!Cloudera, Inc.
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
Hortonworks
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
Yahoo Developer Network
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
Inside Analysis
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
ModusOptimum
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
Hortonworks
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
Inside Analysis
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Inside Analysis
 
The practice of big data - making big data approachable
The practice of big data - making big data approachableThe practice of big data - making big data approachable
The practice of big data - making big data approachable
kcmallu
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
DataWorks Summit
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
DataWorks Summit
 
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15
IBMInfoSphereUGFR
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
DataWorks Summit
 

Similar to MPP vs Hadoop (20)

Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Big Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya GargBig Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya Garg
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise Thinking
 
The practice of big data - making big data approachable
The practice of big data - making big data approachableThe practice of big data - making big data approachable
The practice of big data - making big data approachable
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
 
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
 

Recently uploaded

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 

Recently uploaded (20)

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 

MPP vs Hadoop

  • 1. 1Pivotal Confidential–Internal Use Only 1Pivotal Confidential–Internal Use Only MPP vs Hadoop Alexey Grishchenko HUG Meetup 28.11.2015
  • 2. 2Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 3. 3Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 4. 4Pivotal Confidential–Internal Use Only Distributed Systems Avoid distributed systems in all the problems that potentially could be solved using non-distributed systems
  • 5. 5Pivotal Confidential–Internal Use Only Distributed Systems Ÿ  Consensus problem –  Paxos –  RAFT –  ZAB –  etc. Ÿ  Transaction consistency –  2PC –  3PC
  • 6. 6Pivotal Confidential–Internal Use Only Distributed Systems Ÿ  CAP Theorem
  • 7. 7Pivotal Confidential–Internal Use Only Distributed Systems http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
  • 8. 8Pivotal Confidential–Internal Use Only Distributed Systems Reasons to use Ÿ  Performance issues –  More than 100’000 TPS –  More than 4 GB/sec scan rate –  More than 100’000 IOPS Ÿ  Capacity issues –  More than 50TB of data Ÿ  DR and Geo-Distribution
  • 9. 9Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 10. 10Pivotal Confidential–Internal Use Only MPP Main principles Ÿ  Shared Nothing Ÿ  Data Sharding Ÿ  Data Replication Ÿ  Distributed Transactions Ÿ  Parallel Processing
  • 12. 12Pivotal Confidential–Internal Use Only MPP Works well for Ÿ  Relational data Ÿ  Batch processing Ÿ  Ad hoc analytical SQL Ÿ  Low concurrency Ÿ  Applications requiring ANSI SQL
  • 13. 13Pivotal Confidential–Internal Use Only MPP Not the best choice for Ÿ  Non-relational data Ÿ  OLTP and event stream processing Ÿ  High concurrency Ÿ  100+ server clusters Ÿ  Non-analytical use cases Ÿ  Geo-Distributed use cases
  • 14. 14Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 15. 15Pivotal Confidential–Internal Use Only Hadoop Main Components Ÿ  HDFS Ÿ  YARN Ÿ  MapReduce Ÿ  HBase Ÿ  Hive / Hive+Tez
  • 16. 16Pivotal Confidential–Internal Use Only Hadoop HDFS Ÿ  Distributed filesystem Ÿ  Block-level storage with big blocks Ÿ  Non-updatable Ÿ  Synchronous block replication Ÿ  No built-in Geo-Distribution support Ÿ  No built-in DR solution
  • 18. 18Pivotal Confidential–Internal Use Only Hadoop YARN Ÿ  Cluster resource manager Ÿ  Manages CPU and RAM allocation Ÿ  Schedulers are pluggable Ÿ  Can handle different resource pools Ÿ  Supports both MR and non-MR workload
  • 20. 20Pivotal Confidential–Internal Use Only Hadoop MapReduce Ÿ  Framework for distributed data processing Ÿ  Two main operations: map and reduce Ÿ  Data hits disk after “map” and before “reduce” Ÿ  Scales to thousands of servers Ÿ  Can process petabytes of data Ÿ  Extremely reliable
  • 21. 21Pivotal Confidential–Internal Use Only Hadoop MapReduce
  • 22. 22Pivotal Confidential–Internal Use Only Hadoop HBase Ÿ  Distributed key-value store Ÿ  Data is sharded by key Ÿ  Data is stored in sorted order Ÿ  Stores multiple versions of the row Ÿ  Easily scales
  • 24. 24Pivotal Confidential–Internal Use Only Hadoop Hive Ÿ  Query engine with SQL-like syntax Ÿ  Translates HiveQL query to MR / Tez / Spark job Ÿ  Processes HDFS data Ÿ  Supports UDFs and UDAFs
  • 26. 26Pivotal Confidential–Internal Use Only Hadoop Works well for Ÿ  Write Once Read Many Ÿ  100+ server clusters Ÿ  Both relational and non-relational data Ÿ  High concurrency Ÿ  Batch processing and analytical workload Ÿ  Elastic scalability
  • 27. 27Pivotal Confidential–Internal Use Only Hadoop Not the best choice for Ÿ  Write-heavy workloads Ÿ  Small clusters Ÿ  Analytical DWH cases Ÿ  OLTP and event stream processing Ÿ  Cost savings
  • 28. 28Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 29. 29Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open
  • 30. 30Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity
  • 31. 31Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common
  • 32. 32Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K
  • 33. 33Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High
  • 34. 34Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source
  • 35. 35Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex
  • 36. 36Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers
  • 37. 37Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers Scalability Up to 100-300 TB Up to 100 PB
  • 38. 38Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers Scalability Up to 100-300 TB Up to 100 PB Target Systems DWH Purpose-Built Batch
  • 39. 39Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers Scalability Up to 100-300 TB Up to 100 PB Target Systems DWH Purpose-Built Batch Target End Users Business Analysts Developers
  • 40. 40Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None
  • 41. 41Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard
  • 42. 42Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java
  • 43. 43Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High
  • 44. 44Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High
  • 45. 45Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec
  • 46. 46Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins
  • 47. 47Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins Query Max Runtime 1-2 hours 1-2 weeks
  • 48. 48Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins Query Max Runtime 1-2 hours 1-2 weeks Min Collection Size Megabytes Gigabytes
  • 49. 49Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins Query Max Runtime 1-2 hours 1-2 weeks Min Collection Size Megabytes Gigabytes Max Concurrency 10-15 queries 70-100 jobs
  • 50. 50Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Examples Ÿ  Summary
  • 51. 51Pivotal Confidential–Internal Use Only Summary Use MPP for Ÿ  Analytical DWH Ÿ  Ad hoc analyst SQL queries and BI Ÿ  Keep under 100TB of data Use Hadoop for Ÿ  Specialized data processing systems Ÿ  Over 100TB of data
  • 52. 52Pivotal Confidential–Internal Use Only 52Pivotal Confidential–Internal Use Only Questions?
  • 53. BUILT FOR THE SPEED OF BUSINESS