Emergence of SQL over Hadoop
Sudheesh Narayanan
Chief Architect – Big Data
About Me
Author of
My Expertise
• Hadoop and Ecosystem Components
• Machine Learning
• Text Analytics
• Image Analytics
• ...
Agenda
•
•
•
•
•
•

Why SQL Over Hadoop ?
Technology Landscape
Fundamentals behind SQL over Hadoop
Understand different ty...
SQL has come full Circle!!
• SQL has been ruling since 1970!!
• Hadoop came…But little traction…
• Facebook open-sourced H...
SQL OVER HADOOP IS REALLY CROWDED!!
Which one is better!!
HIVE  First SQL over Hadoop!!
HQL
(Hive Query Language)

HIVE
Query Engine

Name Node

Storage Formats

Compressions
Meta...
The Fundamentals!!
Processing
Logic

App Server

App Server

Data Transfer
Data

Network Switch

1.
2.
3.
4.
5.

DB Server...
So Lets Understand different types
of SQL Over Hadoop!!
Type 1MapReduce Batch
Map Reduce Latency still exist

1
2
3

HQL
(Hive Query Language)

4

HIVE
Query Engine

File Format...
Type 2:- Pull Data Out of HDFS to Query Engine
RDBMS Vendors supporting Hadoop as External
Tables
1. Oracle Hadoop Connect...
Type 3:- Pull Data Out of HDFS to Parallel Query Engine
Leverage Specialized Query Engine

No Data Local Processing

SQL

...
Type 4:- MPP Database using HDFS as Data store
Leverage MPP Query Framework
Data Local Processing but streaming pipeline
S...
Type 5:- RDBMS Locally on a HDFS Node
Wrapper for access Hadoop data locally on each node
Data Local Processing
Limited AN...
Type 6:- Distributed Native SQL Query on HDFS
Distributed SQL Engine
Data Local Processing with streaming Pipeline
Differe...
Summary
The 6 Types of SQL over Hadoop!!
Batch Map Reduce
RDBMS Connector to HDFS as External Tables
Parallel Query Engine...
What should you look for when you choose SQL over Hadoop!!
Standard ANSI SQL Compliance

Push Down Distributed Data Local ...
Upcoming SlideShare
Loading in...5
×

Final version sql over hadoop ver1

1,111

Published on

SQL Over Hadoop Comparison presented as BigData Tech Conclave 2013

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,111
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Final version sql over hadoop ver1"

  1. 1. Emergence of SQL over Hadoop Sudheesh Narayanan Chief Architect – Big Data
  2. 2. About Me Author of My Expertise • Hadoop and Ecosystem Components • Machine Learning • Text Analytics • Image Analytics • Data Science • Real Time Event Stream Processing • NoSQL Databases • Complex Event Processing
  3. 3. Agenda • • • • • • Why SQL Over Hadoop ? Technology Landscape Fundamentals behind SQL over Hadoop Understand different type of SQL over Hadoop Architecture Comparisons Conclusions
  4. 4. SQL has come full Circle!! • SQL has been ruling since 1970!! • Hadoop came…But little traction… • Facebook open-sourced HIVE in 2008.. Hadoop takes the next leap in adoption • RDBMS and MPP Vendors brought Hadoop Connectors • Niche players used SQL engine to run Distributed Query on Hadoop • In 2012 Cloudera Impala sets the trend for Real time Query over Hadoop • Facebook open sourced Presto in 2013!!
  5. 5. SQL OVER HADOOP IS REALLY CROWDED!! Which one is better!!
  6. 6. HIVE  First SQL over Hadoop!! HQL (Hive Query Language) HIVE Query Engine Name Node Storage Formats Compressions Metastore Schema on Read Mid-Query Fault Tolerance Map-Reduce Pipelines Hadoop Map Reduce Latency Job Tracker/ Resource Manager Processing Logic(MR) Processing Logic(MR) Processing Logic(MR) Processing Logic(MR) Data Blocks Data Blocks Data Blocks Data Blocks Node1 Node 2 Node 3 Node…
  7. 7. The Fundamentals!! Processing Logic App Server App Server Data Transfer Data Network Switch 1. 2. 3. 4. 5. DB Server Query Engine Network Latency Storage Layer Scalability File Formats and Compressions ANSI SQL Compliance Storage Switch Storage Array Disk1 Disk2 Disk3 Source: http://hortonworks.com/labs/stinger/
  8. 8. So Lets Understand different types of SQL Over Hadoop!!
  9. 9. Type 1MapReduce Batch Map Reduce Latency still exist 1 2 3 HQL (Hive Query Language) 4 HIVE Query Engine File Format Support Improved Query Optimizer Vectorized Query Engine Metastore Map-Reduce Pipelines IBM BigSQL Hadoop Node 1 Node 2 Node 3 Stinger Improved Original HIVE Performance by 35%
  10. 10. Type 2:- Pull Data Out of HDFS to Query Engine RDBMS Vendors supporting Hadoop as External Tables 1. Oracle Hadoop Connector 2. DB2 Hadoop Connector 3. Microsoft PDW Connector SQL Database Server Leverage Database Query Engine Query Engine Pull Data from HDFS Hadoop Data Node No Data Local Processing Full ANSI SQL Compliance Data Node Data Node Poor Response Time (Limited to Low Volumes)
  11. 11. Type 3:- Pull Data Out of HDFS to Parallel Query Engine Leverage Specialized Query Engine No Data Local Processing SQL Full ANSI SQL Compliance Better Response Time due to Parallel processing Polybase Query Node is separate from Data Node!!
  12. 12. Type 4:- MPP Database using HDFS as Data store Leverage MPP Query Framework Data Local Processing but streaming pipeline SQL ANSI SQL Compliance Example Example Response Time is good Example Greenplum over HDFS Data is moved out of HDFS to MPP Engine
  13. 13. Type 5:- RDBMS Locally on a HDFS Node Wrapper for access Hadoop data locally on each node Data Local Processing Limited ANSI SQL Compliance SQL Response Time is better than HIVE Example Example Metadata is replicated Still File Formats and Compression support expected Query is pushed down to the local DB Engine on Each Node
  14. 14. Type 6:- Distributed Native SQL Query on HDFS Distributed SQL Engine Data Local Processing with streaming Pipeline Different File Format and Compressions Limited ANSI SQL support Fast Response Time and Highly Scalable
  15. 15. Summary The 6 Types of SQL over Hadoop!! Batch Map Reduce RDBMS Connector to HDFS as External Tables Parallel Query Engine pull data out of HDFS MPP Database using HDFS as storage RDBMS Store Locally on HDFS Node Distributed Query Engine
  16. 16. What should you look for when you choose SQL over Hadoop!! Standard ANSI SQL Compliance Push Down Distributed Data Local Processing Support Variety of File Formats including Compressions Optimized Query Engine JDBC/ODBC Connectivity Linear Scalability Low Latency Query and Cost
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×