SlideShare a Scribd company logo
A STUDY OF SQL OPTIMIZER AND HIVE
COEN 380 Project
Project Group : 1
Bhide, Aishwarya
Patnaik, Anita
Sekar, Vishaka Balasubramanian
Yoloye, Mose
2
Project Goal
● Understand how SQL Optimizer works
● Generate query plans using Oracle Explain
● Understand the basic principles of Hive
● Execute queries on Hive
● Compare query execution using Oracle and Hive
3
The SQL Optimizer
● Why do we need the optimizer?
Select * from Books where author = ‘Ernest Hemingway’;
Two ways to execute it –
• Full table scan
• Index on author
Is there a difference?
• 10 rows
• 10 million rows
4
The SQL Optimizer
● SQL is a declarative language
○ Query specifies what, the SQL engine decides how
○ How does understanding SQL optimizer help?
5
Data Set Up
6Reference : Database System Concepts - Silberschatz,Korth,Sudarshan
Data Set Up
● Queries
○ single relation
○ join ( 2-way and 3-way join)
○ aggregate function
○ Aggregates with grouping
○ Set function – Union, Except
○ Sub queries
○ Sub queries using with clause
○ Update and Delete 7
Project Execution
● Set up Oracle database
● Generate query optimizer plan using Oracle
Explain
● Set up tables and insert data in Hive
● Execute queries on Hive
8
Oracle Query Plan Results - 1
Query using single relation
SELECT title FROM course WHERE dept_name = 'Comp. Sci.' AND credits = 3;
9
Oracle Query Plan Results - 1
Query using single relation
SELECT title FROM course WHERE dept_name = 'Comp. Sci.' AND credits = 3;
10
Oracle Query Plan Results - 2
Query using 2-way join
SELECT DISTINCT ID FROM takes WHERE (takes.course_id , takes.sec_id, takes.semester,
takes.year) IN (SELECT course_id, sec_id,semester, year FROM teaches NATURAL JOIN
instructor WHERE name = 'Einstein');
11
Oracle Query Plan Results - 2
Query using 2-way join
SELECT DISTINCT ID FROM takes WHERE (takes.course_id , takes.sec_id, takes.semester,
takes.year) IN (SELECT course_id, sec_id,semester, year FROM teaches NATURAL JOIN
instructor WHERE name = 'Einstein');
Relational Algebra Expression:
12
Based on Oracle generated query Plan Self created query Plan
Oracle Query Plan Results - 3
Query using 3-way join
SELECT name, title FROM (instructor NATURAL JOIN teaches) JOIN course USING
(course_id);
13
Oracle Query Plan Results - 3
Query using 3-way join
SELECT name, title FROM (instructor NATURAL JOIN teaches) JOIN course USING
(course_id);
Relational Algebra Expression:
Equivalent Expression:
Based on Oracle generated query Plan
instructor(ID, name, dept_name,salary)
teaches(ID, course_id, sec_id, semester, year)
course(course_id, title, dept_name, credits) 14
Self Created Query Plan
Oracle Query Plan Results - 4
Query using aggregate function
SELECT MAX(salary) FROM instructor;
Relational Algebra Expression:
15
Oracle Query Plan Results - 5
Query for aggregate with grouping
SELECT COUNT(ID), course_id, sec_id FROM section NATURAL JOIN takes
WHERE semester='Fall' AND year=2009 GROUP BY course_id, sec_id;
16
Oracle Query Plan Results - 5
Query for aggregate with grouping
17
SELECT COUNT(ID), course_id, sec_id FROM section NATURAL JOIN takes
WHERE semester='Fall' AND year=2009 GROUP BY course_id, sec_id;
Oracle Query Plan Results - 6
Query using union operation
(SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009) UNION
(SELECT course_id FROM section WHERE semester='Spring' AND year=2010);
18
Oracle Query Plan Results - 6
Query using union operation (Expected plan)
(SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009) UNION
(SELECT course_id FROM section WHERE semester='Spring' AND year=2010);
19
Oracle Query Plan Results - 6
Query using union operation (Oracle plan)
(SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009) UNION
(SELECT course_id FROM section WHERE semester='Spring' AND year=2010);
20
Oracle Query Plan Results - 7
Query using except (intersect) operation
(SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009)
INTERSECT (SELECT course_id FROM section WHERE semester='Spring' AND
year=2010);
21
Oracle Query Plan Results - 7
Query using except (intersect) operation (Expected Plan)
(SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009)
INTERSECT (SELECT course_id FROM section WHERE semester='Spring' AND
year=2010);
22
Oracle Query Plan Results - 7
Query using except (intersect) operation (Oracle Plan)
(SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009)
INTERSECT (SELECT course_id FROM section WHERE semester='Spring' AND
year=2010);
23
Oracle Query Plan Results - 8
Query using a subquery
SELECT name FROM instructor WHERE salary = (SELECT MAX(salary) FROM
instructor);
24
Oracle Query Plan Results - 8
Query using a subquery
SELECT name FROM instructor WHERE salary = (SELECT MAX(salary) FROM
instructor);
25
Expected
Oracle
Oracle Query Plan Results # 9
Query using subquery and rename operation
SELECT MAX(enrollment), course_id FROM (SELECT Count(ID) as enrollment, sec_id, course_id
FROM takes WHERE year=2009 and semester='Fall' GROUP BY sec_id, course_id) GROUP BY
course_id;
26
Query Plan # 9 - using subquery
SELECT MAX(enrollment),
course_id
FROM (SELECT Count(ID) as
enrollment, sec_id, course_id FROM
takes
WHERE year=2009 and
semester='Fall'
GROUP BY sec_id, course_id)
GROUP BY course_id;
27
Matches with Oracle’s plan
Oracle Query Plan Results - 10
Find the maximum enrollment across all sections in Fall 2009
WITH enrollment(course_id, sec_id, total) AS (SELECT course_id, sec_id, COUNT(ID) FROM
section NATURAL JOIN takes WHERE semester='Fall' and year='2009' GROUP BY course_id,
sec_id) SELECT MAX(total) FROM enrollment;
28
Query # 10 subquery and aggregation
SELECT COUNT(ID) as id FROM
section NATURAL JOIN takes
WHERE semester='Fall' and
year=2009 GROUP BY course_id,
sec_id
select max(id)
29
Matches with Oracle’s plan
Oracle Query Plan Results -11
Increase salary of each instructor in comp. sci dept. by 10%
UPDATE instructor SET salary = salary * 1.10 WHERE dept_name = 'Comp. Sci.';
30
Query #11 update query
instructor<- ∏name, ID,dept_name,(salary*0.10)((σinstructor..dept_name =
‘Comp Sci’ ) U ( σ instructor..dept_name <> ‘Comp Sci’) )
31
Oracle Query Plan Results -12
Delete all courses that have never been offered
DELETE FROM course
WHERE course_id IN (SELECT course_id FROM course MINUS SELECT course_id FROM course
NATURAL JOIN section);
32
Query #12 join
33
Oracle Optimizer - Summary
The purpose of the Oracle Optimizer is to determine the most efficient
execution plan for the queries
Explain plan is the most efficient tool to see why the current plan was chosen
It chooses the best plan by reviewing four key elements of queries:
cardinality, access methods, join methods, and join orders
34
Hive
● Why Hive?
Rapidly increasing size of datasets - 700TB data set
Warehouse built using RDBMS failed to scale
Need for scalable analysis on large data sets
Hadoop was not easy for the end users
Need for improved querying capability
Need for diverse applications and users
35
Hive is NOT
A relational database
A design for OnLine Transaction Processing (OLTP)
A language for real-time queries and row-level updates
36
Hive - Features
● Features of Hive
○ It stores schema in a database and processed data into HDFS.
○ It is designed for OLAP.
○ It provides SQL type language for querying called HiveQL or HQL.
○ It is familiar, fast, scalable, and extensible.
37
HiveQL - Query Language
Query Language (HiveQL)
subset of SQL queries - SQL like language
metadata browsing capabilities
explain plan capabilities (naive rule based optimizer)
seamless plugging in of map-reduce programs
eg. FROM(
MAP doctext USING ‘python wc_mapper.py’ AS (word,cnt)
FROM docs
CLUSTER BY word
) a
REDUCE word, cnt USING ‘python wc_reduce.py’;
38
Data Model and Query Language
HiveQL - Limitations
No support for where clause subqueries (not in the initial version)
Only equality predicates supported for join
Does not support inserting into an existing table (UPDATE, DELETE
or INSERT INTO are not supported)
Why is this not a problem at FB?
Almost all queries can be expressed using equi-join
Data is loaded in separate partitions
No Complex locking protocol required
39
Hive Query Execution
Parse the query
Type Checking and Semantic Analysis
Optimization
performs a chain of transformations
Walks the DAG, checks for Rule condition fulfillment, rule execution
40
Hive - Query Optimizer
Query Optimizer - Transformations
Column Pruning
Predicate Pushdown
Partition pruning
Map side joins
small tables kept in all mappers memory
minimizes cost of sorting and merging
Join Reordering 41
Hive: Comparison with RDBMS
● Hive
designed for analytics performed on static data
lack of record level update/delete functionality
Write once read many times
process massive amount of data
supports subset of sql queries
● RDBMS
designed for transaction processing and analytics on dynamic data
does support record level update/delete
Read and write many times 42
Hive Query Execution Results
(Simple Select Query)
43
SELECT title FROM course WHERE dept_name = 'Comp. Sci.' AND credits = 3
Hive Query Execution Results
(subquery in FROM clause)
44
Hive Query Execution Results
(Aggregation & Join)
45
Hive Query Execution Results
(subquery)
46
SELECT name,salary FROM instructor i WHERE salary = (SELECT MAX(salary) FROM
instructor)
Hive Query Execution Inference
● Queries which include subqueries in Where or Having clause, e.g.
SELECT t.sec_id, t.course_id FROM takes t WHERE t.year=2009 AND
t.semester='Fall' HAVING count(t.ID) IN (SELECT MAX(enrollment) FROM
(SELECT COUNT(tin.ID) AS enrollment, tin.sec_id, tin.course_id FROM takes
tin WHERE tin.year=2009 AND tin.semester='Fall' GROUP BY
tin.sec_id,tin.course_id))
Queries which include subqueries in From clause, e.g.,
SELECT MAX(enrollment), s.course_id FROM (SELECT Count(t.ID) as
enrollment, t.sec_id, t.course_id FROM takes t WHERE t.year=2009 and
t.semester='Fall' GROUP BY t.sec_id,t.course_id) s GROUP BY s.course_id")
47
Hive - Use cases
● Hive should be used for analytical querying of data collected over a period of
time - for instance, to calculate trends or website logs.
● Hive should not be used for real-time querying
● It provides us data warehousing facilities on top of an existing Hadoop
cluster. Along with that it provides an SQL like interface which makes work
easier.
● create tables in Hive and store data there. Along with that, an existing HBase
tables can be mapped to Hive and operate on them.
48
Hive Query execution inference
Data Size: 20MB
49
HADOOP ORACLE
Hardware
Configuration
Environment: Cloudera CDH-5.6 -
YARN (MapReduce v2) and
Spark (1.5)
Worker Nodes: 24
Cores: 96 (4 cores per node)
Threads: 192
RAM: 768GB
● AMD A8-4555M APU
with Radeon HD
Graphics 1.60 GHz
● 4 cores
● 8GB Ram
● 64-bit operating
system
Average
Execution time
of queries
31.85 seconds 1 second
Hive Query execution inference
50
Executed Queries Failed Queries
● simple SELECT queries
● join
● subqueries within FROM
clause
● Union
● Intersection (sub-queries
within FROM clause)
● Aggregation with grouping
● Update
● Delete
● Queries with ‘WITH’ clause
● Sub queries within WHERE
clause
Demo
● Oracle Explain
● Hive
51
Thank you!
52

More Related Content

What's hot

How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
Oracle 12c New Features For Better Performance
Oracle 12c New Features For Better PerformanceOracle 12c New Features For Better Performance
Oracle 12c New Features For Better Performance
Zohar Elkayam
 
Query parameterization
Query parameterizationQuery parameterization
Query parameterization
Riteshkiit
 
Single-Row Functions in orcale Data base
Single-Row Functions in orcale Data baseSingle-Row Functions in orcale Data base
Single-Row Functions in orcale Data base
Salman Memon
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
Syed Hadoop
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
Venkata Reddy Konasani
 
Dynamic Publishing with Arbortext Data Merge
Dynamic Publishing with Arbortext Data MergeDynamic Publishing with Arbortext Data Merge
Dynamic Publishing with Arbortext Data Merge
Clay Helberg
 
Les10
Les10Les10

What's hot (8)

How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
Oracle 12c New Features For Better Performance
Oracle 12c New Features For Better PerformanceOracle 12c New Features For Better Performance
Oracle 12c New Features For Better Performance
 
Query parameterization
Query parameterizationQuery parameterization
Query parameterization
 
Single-Row Functions in orcale Data base
Single-Row Functions in orcale Data baseSingle-Row Functions in orcale Data base
Single-Row Functions in orcale Data base
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Dynamic Publishing with Arbortext Data Merge
Dynamic Publishing with Arbortext Data MergeDynamic Publishing with Arbortext Data Merge
Dynamic Publishing with Arbortext Data Merge
 
Les10
Les10Les10
Les10
 

Similar to SQL Optimizer vs Hive

05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx
KareemBullard1
 
Oracle Database Advanced Querying (2016)
Oracle Database Advanced Querying (2016)Oracle Database Advanced Querying (2016)
Oracle Database Advanced Querying (2016)
Zohar Elkayam
 
Chapter15
Chapter15Chapter15
Chapter15
gourab87
 
MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0
Manyi Lu
 
Explain the explain_plan
Explain the explain_planExplain the explain_plan
Explain the explain_plan
Maria Colgan
 
D80194GC20_sg1.pdf
D80194GC20_sg1.pdfD80194GC20_sg1.pdf
D80194GC20_sg1.pdf
Edris Fedlu
 
Understanding DB2 Optimizer
Understanding DB2 OptimizerUnderstanding DB2 Optimizer
Understanding DB2 Optimizer
terraborealis
 
Presentation interpreting execution plans for sql statements
Presentation    interpreting execution plans for sql statementsPresentation    interpreting execution plans for sql statements
Presentation interpreting execution plans for sql statements
xKinAnx
 
Sql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ISql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices I
Carlos Oliveira
 
dd presentation.pdf
dd presentation.pdfdd presentation.pdf
dd presentation.pdf
AnSHiKa187943
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Ronald Francisco Vargas Quesada
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Julian Hyde
 
Web Cloud Computing SQL Server - Ferrara University
Web Cloud Computing SQL Server  -  Ferrara UniversityWeb Cloud Computing SQL Server  -  Ferrara University
Web Cloud Computing SQL Server - Ferrara University
antimo musone
 
Query processing
Query processingQuery processing
Query processing
Dr. C.V. Suresh Babu
 
Database and application performance vivek sharma
Database and application performance vivek sharmaDatabase and application performance vivek sharma
Database and application performance vivek sharma
aioughydchapter
 
SQL Tunning
SQL TunningSQL Tunning
SQL Tunning
Dhananjay Goel
 
Java Database Connectivity with JDBC.pptx
Java Database Connectivity with JDBC.pptxJava Database Connectivity with JDBC.pptx
Java Database Connectivity with JDBC.pptx
takomatiesucy
 
The art of querying – newest and advanced SQL techniques
The art of querying – newest and advanced SQL techniquesThe art of querying – newest and advanced SQL techniques
The art of querying – newest and advanced SQL techniques
Zohar Elkayam
 
Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and Underscores
Jitendra Singh
 
Beginners guide to_optimizer
Beginners guide to_optimizerBeginners guide to_optimizer
Beginners guide to_optimizer
Maria Colgan
 

Similar to SQL Optimizer vs Hive (20)

05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx
 
Oracle Database Advanced Querying (2016)
Oracle Database Advanced Querying (2016)Oracle Database Advanced Querying (2016)
Oracle Database Advanced Querying (2016)
 
Chapter15
Chapter15Chapter15
Chapter15
 
MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0MySQL Optimizer: What's New in 8.0
MySQL Optimizer: What's New in 8.0
 
Explain the explain_plan
Explain the explain_planExplain the explain_plan
Explain the explain_plan
 
D80194GC20_sg1.pdf
D80194GC20_sg1.pdfD80194GC20_sg1.pdf
D80194GC20_sg1.pdf
 
Understanding DB2 Optimizer
Understanding DB2 OptimizerUnderstanding DB2 Optimizer
Understanding DB2 Optimizer
 
Presentation interpreting execution plans for sql statements
Presentation    interpreting execution plans for sql statementsPresentation    interpreting execution plans for sql statements
Presentation interpreting execution plans for sql statements
 
Sql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ISql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices I
 
dd presentation.pdf
dd presentation.pdfdd presentation.pdf
dd presentation.pdf
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Web Cloud Computing SQL Server - Ferrara University
Web Cloud Computing SQL Server  -  Ferrara UniversityWeb Cloud Computing SQL Server  -  Ferrara University
Web Cloud Computing SQL Server - Ferrara University
 
Query processing
Query processingQuery processing
Query processing
 
Database and application performance vivek sharma
Database and application performance vivek sharmaDatabase and application performance vivek sharma
Database and application performance vivek sharma
 
SQL Tunning
SQL TunningSQL Tunning
SQL Tunning
 
Java Database Connectivity with JDBC.pptx
Java Database Connectivity with JDBC.pptxJava Database Connectivity with JDBC.pptx
Java Database Connectivity with JDBC.pptx
 
The art of querying – newest and advanced SQL techniques
The art of querying – newest and advanced SQL techniquesThe art of querying – newest and advanced SQL techniques
The art of querying – newest and advanced SQL techniques
 
Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and Underscores
 
Beginners guide to_optimizer
Beginners guide to_optimizerBeginners guide to_optimizer
Beginners guide to_optimizer
 

Recently uploaded

Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
Ayan Halder
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 

Recently uploaded (20)

Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 

SQL Optimizer vs Hive

  • 1. A STUDY OF SQL OPTIMIZER AND HIVE COEN 380 Project
  • 2. Project Group : 1 Bhide, Aishwarya Patnaik, Anita Sekar, Vishaka Balasubramanian Yoloye, Mose 2
  • 3. Project Goal ● Understand how SQL Optimizer works ● Generate query plans using Oracle Explain ● Understand the basic principles of Hive ● Execute queries on Hive ● Compare query execution using Oracle and Hive 3
  • 4. The SQL Optimizer ● Why do we need the optimizer? Select * from Books where author = ‘Ernest Hemingway’; Two ways to execute it – • Full table scan • Index on author Is there a difference? • 10 rows • 10 million rows 4
  • 5. The SQL Optimizer ● SQL is a declarative language ○ Query specifies what, the SQL engine decides how ○ How does understanding SQL optimizer help? 5
  • 6. Data Set Up 6Reference : Database System Concepts - Silberschatz,Korth,Sudarshan
  • 7. Data Set Up ● Queries ○ single relation ○ join ( 2-way and 3-way join) ○ aggregate function ○ Aggregates with grouping ○ Set function – Union, Except ○ Sub queries ○ Sub queries using with clause ○ Update and Delete 7
  • 8. Project Execution ● Set up Oracle database ● Generate query optimizer plan using Oracle Explain ● Set up tables and insert data in Hive ● Execute queries on Hive 8
  • 9. Oracle Query Plan Results - 1 Query using single relation SELECT title FROM course WHERE dept_name = 'Comp. Sci.' AND credits = 3; 9
  • 10. Oracle Query Plan Results - 1 Query using single relation SELECT title FROM course WHERE dept_name = 'Comp. Sci.' AND credits = 3; 10
  • 11. Oracle Query Plan Results - 2 Query using 2-way join SELECT DISTINCT ID FROM takes WHERE (takes.course_id , takes.sec_id, takes.semester, takes.year) IN (SELECT course_id, sec_id,semester, year FROM teaches NATURAL JOIN instructor WHERE name = 'Einstein'); 11
  • 12. Oracle Query Plan Results - 2 Query using 2-way join SELECT DISTINCT ID FROM takes WHERE (takes.course_id , takes.sec_id, takes.semester, takes.year) IN (SELECT course_id, sec_id,semester, year FROM teaches NATURAL JOIN instructor WHERE name = 'Einstein'); Relational Algebra Expression: 12 Based on Oracle generated query Plan Self created query Plan
  • 13. Oracle Query Plan Results - 3 Query using 3-way join SELECT name, title FROM (instructor NATURAL JOIN teaches) JOIN course USING (course_id); 13
  • 14. Oracle Query Plan Results - 3 Query using 3-way join SELECT name, title FROM (instructor NATURAL JOIN teaches) JOIN course USING (course_id); Relational Algebra Expression: Equivalent Expression: Based on Oracle generated query Plan instructor(ID, name, dept_name,salary) teaches(ID, course_id, sec_id, semester, year) course(course_id, title, dept_name, credits) 14 Self Created Query Plan
  • 15. Oracle Query Plan Results - 4 Query using aggregate function SELECT MAX(salary) FROM instructor; Relational Algebra Expression: 15
  • 16. Oracle Query Plan Results - 5 Query for aggregate with grouping SELECT COUNT(ID), course_id, sec_id FROM section NATURAL JOIN takes WHERE semester='Fall' AND year=2009 GROUP BY course_id, sec_id; 16
  • 17. Oracle Query Plan Results - 5 Query for aggregate with grouping 17 SELECT COUNT(ID), course_id, sec_id FROM section NATURAL JOIN takes WHERE semester='Fall' AND year=2009 GROUP BY course_id, sec_id;
  • 18. Oracle Query Plan Results - 6 Query using union operation (SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009) UNION (SELECT course_id FROM section WHERE semester='Spring' AND year=2010); 18
  • 19. Oracle Query Plan Results - 6 Query using union operation (Expected plan) (SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009) UNION (SELECT course_id FROM section WHERE semester='Spring' AND year=2010); 19
  • 20. Oracle Query Plan Results - 6 Query using union operation (Oracle plan) (SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009) UNION (SELECT course_id FROM section WHERE semester='Spring' AND year=2010); 20
  • 21. Oracle Query Plan Results - 7 Query using except (intersect) operation (SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009) INTERSECT (SELECT course_id FROM section WHERE semester='Spring' AND year=2010); 21
  • 22. Oracle Query Plan Results - 7 Query using except (intersect) operation (Expected Plan) (SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009) INTERSECT (SELECT course_id FROM section WHERE semester='Spring' AND year=2010); 22
  • 23. Oracle Query Plan Results - 7 Query using except (intersect) operation (Oracle Plan) (SELECT course_id FROM section WHERE semester = 'Fall' AND year = 2009) INTERSECT (SELECT course_id FROM section WHERE semester='Spring' AND year=2010); 23
  • 24. Oracle Query Plan Results - 8 Query using a subquery SELECT name FROM instructor WHERE salary = (SELECT MAX(salary) FROM instructor); 24
  • 25. Oracle Query Plan Results - 8 Query using a subquery SELECT name FROM instructor WHERE salary = (SELECT MAX(salary) FROM instructor); 25 Expected Oracle
  • 26. Oracle Query Plan Results # 9 Query using subquery and rename operation SELECT MAX(enrollment), course_id FROM (SELECT Count(ID) as enrollment, sec_id, course_id FROM takes WHERE year=2009 and semester='Fall' GROUP BY sec_id, course_id) GROUP BY course_id; 26
  • 27. Query Plan # 9 - using subquery SELECT MAX(enrollment), course_id FROM (SELECT Count(ID) as enrollment, sec_id, course_id FROM takes WHERE year=2009 and semester='Fall' GROUP BY sec_id, course_id) GROUP BY course_id; 27 Matches with Oracle’s plan
  • 28. Oracle Query Plan Results - 10 Find the maximum enrollment across all sections in Fall 2009 WITH enrollment(course_id, sec_id, total) AS (SELECT course_id, sec_id, COUNT(ID) FROM section NATURAL JOIN takes WHERE semester='Fall' and year='2009' GROUP BY course_id, sec_id) SELECT MAX(total) FROM enrollment; 28
  • 29. Query # 10 subquery and aggregation SELECT COUNT(ID) as id FROM section NATURAL JOIN takes WHERE semester='Fall' and year=2009 GROUP BY course_id, sec_id select max(id) 29 Matches with Oracle’s plan
  • 30. Oracle Query Plan Results -11 Increase salary of each instructor in comp. sci dept. by 10% UPDATE instructor SET salary = salary * 1.10 WHERE dept_name = 'Comp. Sci.'; 30
  • 31. Query #11 update query instructor<- ∏name, ID,dept_name,(salary*0.10)((σinstructor..dept_name = ‘Comp Sci’ ) U ( σ instructor..dept_name <> ‘Comp Sci’) ) 31
  • 32. Oracle Query Plan Results -12 Delete all courses that have never been offered DELETE FROM course WHERE course_id IN (SELECT course_id FROM course MINUS SELECT course_id FROM course NATURAL JOIN section); 32
  • 34. Oracle Optimizer - Summary The purpose of the Oracle Optimizer is to determine the most efficient execution plan for the queries Explain plan is the most efficient tool to see why the current plan was chosen It chooses the best plan by reviewing four key elements of queries: cardinality, access methods, join methods, and join orders 34
  • 35. Hive ● Why Hive? Rapidly increasing size of datasets - 700TB data set Warehouse built using RDBMS failed to scale Need for scalable analysis on large data sets Hadoop was not easy for the end users Need for improved querying capability Need for diverse applications and users 35
  • 36. Hive is NOT A relational database A design for OnLine Transaction Processing (OLTP) A language for real-time queries and row-level updates 36
  • 37. Hive - Features ● Features of Hive ○ It stores schema in a database and processed data into HDFS. ○ It is designed for OLAP. ○ It provides SQL type language for querying called HiveQL or HQL. ○ It is familiar, fast, scalable, and extensible. 37
  • 38. HiveQL - Query Language Query Language (HiveQL) subset of SQL queries - SQL like language metadata browsing capabilities explain plan capabilities (naive rule based optimizer) seamless plugging in of map-reduce programs eg. FROM( MAP doctext USING ‘python wc_mapper.py’ AS (word,cnt) FROM docs CLUSTER BY word ) a REDUCE word, cnt USING ‘python wc_reduce.py’; 38
  • 39. Data Model and Query Language HiveQL - Limitations No support for where clause subqueries (not in the initial version) Only equality predicates supported for join Does not support inserting into an existing table (UPDATE, DELETE or INSERT INTO are not supported) Why is this not a problem at FB? Almost all queries can be expressed using equi-join Data is loaded in separate partitions No Complex locking protocol required 39
  • 40. Hive Query Execution Parse the query Type Checking and Semantic Analysis Optimization performs a chain of transformations Walks the DAG, checks for Rule condition fulfillment, rule execution 40
  • 41. Hive - Query Optimizer Query Optimizer - Transformations Column Pruning Predicate Pushdown Partition pruning Map side joins small tables kept in all mappers memory minimizes cost of sorting and merging Join Reordering 41
  • 42. Hive: Comparison with RDBMS ● Hive designed for analytics performed on static data lack of record level update/delete functionality Write once read many times process massive amount of data supports subset of sql queries ● RDBMS designed for transaction processing and analytics on dynamic data does support record level update/delete Read and write many times 42
  • 43. Hive Query Execution Results (Simple Select Query) 43 SELECT title FROM course WHERE dept_name = 'Comp. Sci.' AND credits = 3
  • 44. Hive Query Execution Results (subquery in FROM clause) 44
  • 45. Hive Query Execution Results (Aggregation & Join) 45
  • 46. Hive Query Execution Results (subquery) 46 SELECT name,salary FROM instructor i WHERE salary = (SELECT MAX(salary) FROM instructor)
  • 47. Hive Query Execution Inference ● Queries which include subqueries in Where or Having clause, e.g. SELECT t.sec_id, t.course_id FROM takes t WHERE t.year=2009 AND t.semester='Fall' HAVING count(t.ID) IN (SELECT MAX(enrollment) FROM (SELECT COUNT(tin.ID) AS enrollment, tin.sec_id, tin.course_id FROM takes tin WHERE tin.year=2009 AND tin.semester='Fall' GROUP BY tin.sec_id,tin.course_id)) Queries which include subqueries in From clause, e.g., SELECT MAX(enrollment), s.course_id FROM (SELECT Count(t.ID) as enrollment, t.sec_id, t.course_id FROM takes t WHERE t.year=2009 and t.semester='Fall' GROUP BY t.sec_id,t.course_id) s GROUP BY s.course_id") 47
  • 48. Hive - Use cases ● Hive should be used for analytical querying of data collected over a period of time - for instance, to calculate trends or website logs. ● Hive should not be used for real-time querying ● It provides us data warehousing facilities on top of an existing Hadoop cluster. Along with that it provides an SQL like interface which makes work easier. ● create tables in Hive and store data there. Along with that, an existing HBase tables can be mapped to Hive and operate on them. 48
  • 49. Hive Query execution inference Data Size: 20MB 49 HADOOP ORACLE Hardware Configuration Environment: Cloudera CDH-5.6 - YARN (MapReduce v2) and Spark (1.5) Worker Nodes: 24 Cores: 96 (4 cores per node) Threads: 192 RAM: 768GB ● AMD A8-4555M APU with Radeon HD Graphics 1.60 GHz ● 4 cores ● 8GB Ram ● 64-bit operating system Average Execution time of queries 31.85 seconds 1 second
  • 50. Hive Query execution inference 50 Executed Queries Failed Queries ● simple SELECT queries ● join ● subqueries within FROM clause ● Union ● Intersection (sub-queries within FROM clause) ● Aggregation with grouping ● Update ● Delete ● Queries with ‘WITH’ clause ● Sub queries within WHERE clause