Submit Search
Upload
Improving MySQL performance with Hadoop
β’
41 likes
β’
16,163 views
Sagar Jauhari
Follow
Presented at Java One & Oracle Develop 2012.
Read less
Read more
Technology
Report
Share
Report
Share
1 of 39
Recommended
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
Mats Kindahl
Β
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Β
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
Β
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
Β
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
HBaseCon
Β
Sql on everything with drill
Sql on everything with drill
Julien Le Dem
Β
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Β
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
Β
Recommended
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
MySQL Applier for Apache Hadoop: Real-Time Event Streaming to HDFS
Mats Kindahl
Β
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Β
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
Β
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
Β
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
HBaseCon
Β
Sql on everything with drill
Sql on everything with drill
Julien Le Dem
Β
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Β
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
Β
Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
Β
Cloudera Impala
Cloudera Impala
Scott Leberknight
Β
NoSQL Needs SomeSQL
NoSQL Needs SomeSQL
DataWorks Summit
Β
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
Guy Harrison
Β
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
Β
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Tsz-Wo (Nicholas) Sze
Β
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
Β
Big Data Journey
Big Data Journey
Tugdual Grall
Β
Syncsort et le retour d'expΓ©rience ComScore
Syncsort et le retour d'expΓ©rience ComScore
Modern Data Stack France
Β
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
Β
HBaseCon 2012 | Youβve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | Youβve got HBase! How AOL Mail Handles Big Data
Cloudera, Inc.
Β
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
Β
Spark + HBase
Spark + HBase
DataWorks Summit/Hadoop Summit
Β
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
Β
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Adam Muise
Β
Integration of HIve and HBase
Integration of HIve and HBase
Hortonworks
Β
Applications on Hadoop
Applications on Hadoop
markgrover
Β
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
Yahoo Developer Network
Β
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
DataWorks Summit/Hadoop Summit
Β
Apache Tez β Present and Future
Apache Tez β Present and Future
DataWorks Summit
Β
Introduction to Hadoop
Introduction to Hadoop
Joey Jablonski
Β
Hadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
business Corporate
Β
More Related Content
What's hot
Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
Β
Cloudera Impala
Cloudera Impala
Scott Leberknight
Β
NoSQL Needs SomeSQL
NoSQL Needs SomeSQL
DataWorks Summit
Β
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
Guy Harrison
Β
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
Β
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Tsz-Wo (Nicholas) Sze
Β
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
Β
Big Data Journey
Big Data Journey
Tugdual Grall
Β
Syncsort et le retour d'expΓ©rience ComScore
Syncsort et le retour d'expΓ©rience ComScore
Modern Data Stack France
Β
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
Β
HBaseCon 2012 | Youβve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | Youβve got HBase! How AOL Mail Handles Big Data
Cloudera, Inc.
Β
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
MapR Technologies
Β
Spark + HBase
Spark + HBase
DataWorks Summit/Hadoop Summit
Β
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
Β
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Adam Muise
Β
Integration of HIve and HBase
Integration of HIve and HBase
Hortonworks
Β
Applications on Hadoop
Applications on Hadoop
markgrover
Β
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
Yahoo Developer Network
Β
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
DataWorks Summit/Hadoop Summit
Β
Apache Tez β Present and Future
Apache Tez β Present and Future
DataWorks Summit
Β
What's hot
(20)
Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
Β
Cloudera Impala
Cloudera Impala
Β
NoSQL Needs SomeSQL
NoSQL Needs SomeSQL
Β
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
Β
Architecting Applications with Hadoop
Architecting Applications with Hadoop
Β
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Β
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Β
Big Data Journey
Big Data Journey
Β
Syncsort et le retour d'expΓ©rience ComScore
Syncsort et le retour d'expΓ©rience ComScore
Β
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
Β
HBaseCon 2012 | Youβve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | Youβve got HBase! How AOL Mail Handles Big Data
Β
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
Β
Spark + HBase
Spark + HBase
Β
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
Β
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Β
Integration of HIve and HBase
Integration of HIve and HBase
Β
Applications on Hadoop
Applications on Hadoop
Β
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
Β
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
Β
Apache Tez β Present and Future
Apache Tez β Present and Future
Β
Similar to Improving MySQL performance with Hadoop
Introduction to Hadoop
Introduction to Hadoop
Joey Jablonski
Β
Hadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
business Corporate
Β
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Cloudera, Inc.
Β
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
Β
WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS?
nakshatraL
Β
Introduction to ApacheΒ hadoop
Introduction to ApacheΒ hadoop
Omar Jaber
Β
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
Cloudera, Inc.
Β
Hadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
Sachin Holla
Β
Hw09 Hadoop Db
Hw09 Hadoop Db
Cloudera, Inc.
Β
Big Data Training in Amritsar
Big Data Training in Amritsar
E2MATRIX
Β
Big Data Training in Mohali
Big Data Training in Mohali
E2MATRIX
Β
Big data ppt
Big data ppt
Thirunavukkarasu Ps
Β
Big Data Training in Ludhiana
Big Data Training in Ludhiana
E2MATRIX
Β
2.1-HADOOP.pdf
2.1-HADOOP.pdf
MarianJRuben
Β
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
SpringPeople
Β
Hadoop online training
Hadoop online training
Keylabs
Β
The power of hadoop in cloud computing
The power of hadoop in cloud computing
Joey Echeverria
Β
Hadoop - A Very Short Introduction
Hadoop - A Very Short Introduction
dewang_mistry
Β
Apache hadoop introduction and architecture
Apache hadoop introduction and architecture
Harikrishnan K
Β
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
Dmitry Makarchuk
Β
Similar to Improving MySQL performance with Hadoop
(20)
Introduction to Hadoop
Introduction to Hadoop
Β
Hadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
Β
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Β
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
Β
WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS?
Β
Introduction to ApacheΒ hadoop
Introduction to ApacheΒ hadoop
Β
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
Β
Hadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
Β
Hw09 Hadoop Db
Hw09 Hadoop Db
Β
Big Data Training in Amritsar
Big Data Training in Amritsar
Β
Big Data Training in Mohali
Big Data Training in Mohali
Β
Big data ppt
Big data ppt
Β
Big Data Training in Ludhiana
Big Data Training in Ludhiana
Β
2.1-HADOOP.pdf
2.1-HADOOP.pdf
Β
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
Β
Hadoop online training
Hadoop online training
Β
The power of hadoop in cloud computing
The power of hadoop in cloud computing
Β
Hadoop - A Very Short Introduction
Hadoop - A Very Short Introduction
Β
Apache hadoop introduction and architecture
Apache hadoop introduction and architecture
Β
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
Β
Recently uploaded
π¬ The future of MySQL is Postgres π
π¬ The future of MySQL is Postgres π
RTylerCroy
Β
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Β
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
apidays
Β
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
Β
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Β
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Β
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Β
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Β
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
Β
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Β
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Β
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Β
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
Β
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Β
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Β
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
Β
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Antenna Manufacturer Coco
Β
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
Β
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Β
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
Β
Recently uploaded
(20)
π¬ The future of MySQL is Postgres π
π¬ The future of MySQL is Postgres π
Β
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Β
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Β
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Β
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Β
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Β
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Β
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Β
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Β
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Β
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Β
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Β
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Β
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Β
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Β
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Β
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Β
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Β
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Β
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Β
Improving MySQL performance with Hadoop
1.
Copyright Β© 2012,
Oracle and/or its affiliates. All rights reserved.
2.
Improving MySQL Performance
with Hadoop Sagar Jauhari, Manish Kumar Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
3.
India
May 03 β May 04, 2012 San Francisco September 30 β October 4, 2012 Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
4.
Program Agenda β
Introduction β Inside Hadoop! β Integration with MySQL β Facebook's usage of MySQL & Hadoop β Twitter's usage of MySQL &Hadoop Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
5.
Introduction MySQL
β 12 million product installations β 65,000 downloads each day β Part of the rapidly growing open source LAMP stack β MySQL Commercial Editions Available Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
6.
Introduction Hadoop
β Highly scalable Distributed Framework β Yahoo! has a 4000 node cluster! β Extremely powerful in terms of computation β Sorts a TB of random integers in 62 seconds! Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
7.
Introduction Hadoop is ..
β A scalable system for data storage and processing. β Fault tolerant β Parallelizes data processing across many nodes β Leverages its distributed file system (HDFS)* to cheaply and reliably replicate chunks of data. Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
8.
Introduction Who uses Hadoop?
β Yahoo: β Ad Systems and Web Search. β Facebook: β Reporting/analytics and machine learning. β Twitter: β Data warehousing, data analysis. β Netflix: β Movie recommendation algorithm uses Hive ( which uses Hadoop, HDFS & MapReduce underneath) Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
9.
Introduction MySQL Vs Hadoop
MySQL Hadoop Data Capacity TB+ (may require sharding) PB+ Data per query GB? PB+ Read/Write Random read/write Sequential scans, Append - only Query Language SQL Java MapReduce, scripting languages, Hive QL Transaction Yes No Indexes Yes No Latence Sub-second (hopefully) Minutes to hours Data structure Structured Structured or unstructured Courtesy: Leveraging Hadoop to Augment MySQL Deployments, Sarah Sproehnle, Cloudera, 2010 Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
10.
Inside Hadoop
A shallow Deep Dive Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
11.
Inside Hadoop HDFS
β A distributed, scalable, Name Node and portable file system written in Java β Each node in a Hadoop HDFS instance typically has a single name-node; a cluster of data-nodes form the HDFS cluster. Map / Reduce Workers Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
12.
Inside Hadoop HDFS
β Uses the TCP/IP layer for Name Node communication β Stores large files across multiple machines HDFS β Single name node stores metadata in-memory. Map / Reduce Workers Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
13.
Inside Hadoop HDFS Copyright Β©
2012, Oracle and/or its affiliates. All rights reserved.
14.
Inside Hadoop Map Reduce
β Design Goals β Scalability β Cost Efficiency β Implementation β User Jobs are executed as 'map' and 'reduce' functions β Work distribution and fault tolerance are managed Input Map Shuffle and sort Reduce Output Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
15.
Inside Hadoop Map Reduce
β Map β Map Reduce job splits input data into independent chunks β Each chunk is processed by the map task in a parallel manner β Generic key-value computation Input Map Shuffle and sort Reduce Output Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
16.
Inside Hadoop Map Reduce
β Reduce β Data from data nodes is merge sorted so that the key-value pairs for a given key are contiguous β The merged data is read sequentially and the values are passed to the reduce method with an iterator reading the input file until the next key value is encountered Input Map Shuffle and sort Reduce Output Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
17.
Inside Hadoop Map Reduce
Input Map Shuffle and sort Reduce Output Word Word Count Hadoop Map Hadoop 2 Reduce MySQL MySQL 1 Hive Map Hive 1 Sqoop Reduce Sqoop 1 Pig Map Pig 1 Hadoop Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
18.
Inside Hadoop How does
hadoop use Map-Reduce β Framework consists of a single master JobTracker and one slave TaskTracker per cluster-node. β Master β Schedules the jobs' component tasks on the slaves β Monitors the jobs β Re-executes the failed tasks β Slave β Executes the tasks as directed by the master. Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
19.
Inside Hadoop Why Map
Reduce ? β Language support β Java, PHP, Hive, Pig, Python, Wukong (Ruby), Rhipe (R) . β Scales Horizontally β Programmer is isolated from individual failed tasks β Tasks are restarted on another node Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
20.
Inside Hadoop Map Reduce
Limitations β Not a good fit for problems that exhibit task-driven parallelism. β Requires a particular form of input - a set of (key, pair) pairs. β A lot of MapReduce applications end up sharing data one way or another. Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
21.
Integration with MySQL
Leveraging Hadoop to Improve MySQL performance Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
22.
Integration with MySQL β
The benefits of MySQL to developers is the speed, reliability, data integrity and scalability it provides. β It can successfully process large amounts of data (in petabytes). β But for applications that require a massive parallel processing we may need the benefits of a parallel processing system, such as hadoop. Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
23.
Integration with MySQL Image
Source: Leveraging Hadoop to Augment MySQL Deployments, Sarah Sproehnle, Cloudera, 2010 Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
24.
Integration with MySQL
Problem Statement Word Count Problem β In a large set of documents, find the number of occurrences of each word. Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
25.
Integration with MySQL Word
count problem Input Map Shuffle and sort Reduce Output Word Word Count Hadoop Map Hadoop 2 Reduce MySQL MySQL 1 Hive Map Hive 1 Sqoop Reduce Sqoop 1 Pig Map Pig 1 Hadoop Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
26.
Integration with MySQL Mapping
Key and Value represent a row of data: Map key is the byte office, value in a line. (key, value) Intermediate Output foreach <word1>, 1 (word in <word2>, 1 the <word3>, 1 value) output (word,1) Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
27.
Integration with MySQL Reducing
Hadoop aggregates the keys Reduce and calls reduce for each (key, list) unique key: sum <word1>, (1,1,1,1,1,1β¦1) the list <word2>, (1,1,1) Output <word3>, (1,1,1,1,1,1) . (key, Final result: sum) <word1>, 45823 <word2>, 1204 <word3>, 2693 Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
28.
Integration with MySQL
Demo Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
29.
Integration with MySQL Video Copyright
Β© 2012, Oracle and/or its affiliates. All rights reserved.
30.
Facebook's usage of
MySQL & Hadoop β Facebook collects TB of data everyday from around 800 million users. β MySQL handles pretty much every user interaction: likes, shares, status updates, alerts, requests, etc. β Hadoop/Hive Warehouse β 4800 cores, 2 PetaBytes (July 2009) β 4800 cores, 12 PetaBytes (Sept 2009) β Hadoop Archival Store β 200 TB Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
31.
Facebook's usage of
MySQL & Hadoop Hive β Data warehouse system for Hadoop. β Facilitates easy data summarization. β Hive translates HiveQL to MapReduce code. β Querying β Provides a mechanism to project structure onto this data β Allows querying the data using a SQL-like language called HiveQL Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
32.
Facebook's usage of
MySQL & Hadoop Image Source: Leveraging Hadoop to Augment MySQL Deployments, Sarah Sproehnle, Cloudera, 2010 Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
33.
Hive Vs SQL
RDBMS HIVE SQL-92 standard (maybe) Subset of SQL-92 plus Hive- Language specific extension INSERT, UPDATE and INSERT but not UPDATE or Update Capabilities DELETE DELETE Yes No Transactions Sub-Second Minutes or more Latency Any number of indexes, No indexes, data is always Indexes very scanned (in parallel) important for performance TBs PBs Data size Data per query GBs Image Source: Leveraging Hadoop to Augment MySQL Deployments, Sarah Sproehnle, Cloudera, 2010 PBs Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
34.
Hadoop Implementation At Twitter
β > 12 terabytes of new data per day! β Most stored data is LZ0 compressed β Uses Scribe to write logs to Hadoop β Scribe: a log collection framework created and open- sourced by Facebook. β Hadoop used for data warehousing, data analysis. Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
35.
References
β Leveraging Hadoop to Augment MySQL Deployments - Sarah Sproehnle, Cloudera β http://engineering.twitter.com/2010/04/hadoop-at-twitter.html β http://semanticvoid.com β http://michael-noll.com β http://hadoop.apache.org/ Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
36.
Legal Disclaimer
β All other products, company names, brand names, trademarks and logos are the property of their respective owners. Copyright Β© 2012, Oracle and/or its affiliates. All rights reserved.
37.
Copyright Β© 2012,
Oracle and/or its affiliates. All rights reserved.
38.
Thank You Copyright Β©
2012, Oracle and/or its affiliates. All rights reserved.
39.
Copyright Β© 2012,
Oracle and/or its affiliates. All rights reserved.