Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1© Copyright 2015 Pivotal. All rights reserved. 1© Copyright 2015 Pivotal. All rights reserved.
GPORCA: Query Optimization...
2© Copyright 2015 Pivotal. All rights reserved.
• Motivation
• Introduction to GPORCA (a.k.a Orca)
• How to add an Orca fe...
3© Copyright 2015 Pivotal. All rights reserved.
Why a New Optimizer?
• Query optimization is key to performance
• Legacy P...
4© Copyright 2015 Pivotal. All rights reserved.
Legacy Query Planner
• Technology of the 90’s
• Addresses Join re-ordering...
5© Copyright 2015 Pivotal. All rights reserved.
Join Ordering vs. “Everything Else”
• TPC-H Query 5
– 6 Tables
– “Harmless...
6© Copyright 2015 Pivotal. All rights reserved.
What Is GP-Orca?
• State-of-the-art query optimization framework designed ...
7© Copyright 2015 Pivotal. All rights reserved.
Modularity
• Orca is not baked into one host system
Orca
Parser
Host Syste...
8© Copyright 2015 Pivotal. All rights reserved.
Key Features
• Smarter partition elimination
• Subquery unnesting
• Common...
9© Copyright 2015 Pivotal. All rights reserved.
Not Yet Feature Complete
• Improve Performance for Short Running Queries
•...
10© Copyright 2015 Pivotal. All rights reserved.
Currently in Apache HAWQ
Parser Executor
Orca
PlannerSQL Results
Query Pl...
11© Copyright 2015 Pivotal. All rights reserved.
When Orca is exercised
Parser Executor
Orca
Planner
Query Plan
SQL Results
12© Copyright 2015 Pivotal. All rights reserved.
When Orca fallbacks
Parser Executor
Orca
Planner
Query
SQL Results
Fallba...
13© Copyright 2015 Pivotal. All rights reserved. 13© Copyright 2015 Pivotal. All rights reserved. 13© Copyright 2015 Pivot...
14© Copyright 2015 Pivotal. All rights reserved.
• A query that is nested inside an outer query block
• Correlated Subquer...
15© Copyright 2015 Pivotal. All rights reserved.
Subqueries: Impact
• Heavily used in many workloads
– BI/Reporting tools ...
16© Copyright 2015 Pivotal. All rights reserved.
Subqueries in Disjunctive Filters
• Find parts with: size > 40 OR price >...
17© Copyright 2015 Pivotal. All rights reserved.
Subqueries in Disjunctive Filters
18© Copyright 2015 Pivotal. All rights reserved.
Subquery Handling: Orca vs. Planner
CSQ Class Planner Orca
CSQ in select ...
19© Copyright 2015 Pivotal. All rights reserved.
TPC-DS Orca vs. Planner
TPC-DS 10TB, 16 nodes, 48 GB/node
20© Copyright 2015 Pivotal. All rights reserved. 20© Copyright 2015 Pivotal. All rights reserved. 20© Copyright 2015 Pivot...
21© Copyright 2015 Pivotal. All rights reserved.
• Step 1: Pre-Process Input Logical Expression
– Apply heuristics like pu...
22© Copyright 2015 Pivotal. All rights reserved.
• Split an aggregate into a pair of local and global aggregate.
• Schema:...
23© Copyright 2015 Pivotal. All rights reserved.
// HEADER FILES
~/orca/libgpopt/include/gpopt/xforms
// SOURCE FILES
~/or...
24© Copyright 2015 Pivotal. All rights reserved.
• Pattern
• Pre-Condition Check
Define What Will Trigger This Transformat...
25© Copyright 2015 Pivotal. All rights reserved.
Pattern
GPOS_NEW(pmp)
CExpression
(
pmp,
// logical aggregate operator
GP...
26© Copyright 2015 Pivotal. All rights reserved.
// Compatibility function for splitting aggregates
virtual
BOOL FCompatib...
27© Copyright 2015 Pivotal. All rights reserved.
void Transform
(
CXformContext *pxfctxt,
CXformResult *pxfres,
CExpressio...
28© Copyright 2015 Pivotal. All rights reserved.
void CXformFactory::Instantiate()
{
….
Add(GPOS_NEW(m_pmp) CXformSplitGbA...
29© Copyright 2015 Pivotal. All rights reserved. 29© Copyright 2015 Pivotal. All rights reserved. 29© Copyright 2015 Pivot...
30© Copyright 2015 Pivotal. All rights reserved.
• GPORCA https://github.com/greenplum-db/gporca
• White Paper: bit.ly/1nt...
31© Copyright 2015 Pivotal. All rights reserved.
• Step 1: Get Centos07 Docker Image
https://github.com/xinzweb/hawq-devel...
32© Copyright 2015 Pivotal. All rights reserved.
• Step 4: Apply my changes to Apache HAWQ Makefile
https://github.com/vra...
33© Copyright 2015 Pivotal. All rights reserved.
Publications
• Optimization of Common Table Expressions in MPP Database S...
Upcoming SlideShare
Loading in …5
×

GPORCA: Query Optimization as a Service

745 views

Published on

GPORCA is newly open source advanced query optimizer that is a subproject of Greenplum Database open source project. GPORCA is the query optimizer used in commercial distributions of both Greenplum and HAWQ. In these distributions GPORCA has achieved 1000x performance improvement across TPC-DS queries by focusing on three distinct areas: Dynamic Partition Elimination, SubQuery Unnesting, and Common Table Expression.

Now that GPORCA is open source, we are looking for collaborators to help us realize the ultimate dream for GPORCA - to work with any database.

The new breed of data management systems in Big Data have to process so much data that optimization mistakes are magnified in traditional optimizers. Furthermore, coding and manual optimization of complex queries has proven to be hard.

In this session, Venkatesh will discuss:

- Overview of GPORCA
- How to add GPORCA to HAWQ with a build option
- How GPORCA could be made to work with any database
- Future vision for GPORCA and more immediate plans
- How to work with GPORCA, and how to contribute to GPORCA

Published in: Data & Analytics
  • Be the first to comment

GPORCA: Query Optimization as a Service

  1. 1. 1© Copyright 2015 Pivotal. All rights reserved. 1© Copyright 2015 Pivotal. All rights reserved. GPORCA: Query Optimization as a Service Venkatesh Raghavan Apache HAWQ Nest
  2. 2. 2© Copyright 2015 Pivotal. All rights reserved. • Motivation • Introduction to GPORCA (a.k.a Orca) • How to add an Orca feature? • How to enable Orca to Apache HAWQ? Outline
  3. 3. 3© Copyright 2015 Pivotal. All rights reserved. Why a New Optimizer? • Query optimization is key to performance • Legacy Planner was not initially designed with distributed data processing in mind • Average time to fix customer issues: – Legacy Optimizer (Planner) ~ 70 days – Pivotal Query Optimizer (Orca) ~ 13 days
  4. 4. 4© Copyright 2015 Pivotal. All rights reserved. Legacy Query Planner • Technology of the 90’s • Addresses Join re-ordering • Treats everything else as add-on (grouping, with clause, etc.) • Imposes order on specific optimization steps • Recursively descends into sub-queries – Cannot Unnest Complex Correlated Sub Queries • High Code Complexity – Maximum: 102 (Orca 8.5) – Minimum: 6.4 (Orca 1.5)
  5. 5. 5© Copyright 2015 Pivotal. All rights reserved. Join Ordering vs. “Everything Else” • TPC-H Query 5 – 6 Tables – “Harmless” query “Everything Else” Size of search space ~230,000,000 Join Order Problem Size of search space < 100,000
  6. 6. 6© Copyright 2015 Pivotal. All rights reserved. What Is GP-Orca? • State-of-the-art query optimization framework designed from scratch to – Improve – performance, ease-of-use – Enable – foundation for future research and development – Connect – applies to multiple host systems (GPDB and HAWQ)
  7. 7. 7© Copyright 2015 Pivotal. All rights reserved. Modularity • Orca is not baked into one host system Orca Parser Host System < /> SQL Q2DXL DXL Query MD Provider MDreq. Catalog Executor DXL MD DXL Plan DXL2Plan Results
  8. 8. 8© Copyright 2015 Pivotal. All rights reserved. Key Features • Smarter partition elimination • Subquery unnesting • Common table expressions (CTE) • Additional Functionality – Improved join ordering – Join-Aggregate reordering – Sort order optimization – Skew aware
  9. 9. 9© Copyright 2015 Pivotal. All rights reserved. Not Yet Feature Complete • Improve Performance for Short Running Queries • External parameters • Cubes • Multiple grouping sets • Inverse distribution functions • Ordered aggregates • Catalog Queries
  10. 10. 10© Copyright 2015 Pivotal. All rights reserved. Currently in Apache HAWQ Parser Executor Orca PlannerSQL Results Query Plan
  11. 11. 11© Copyright 2015 Pivotal. All rights reserved. When Orca is exercised Parser Executor Orca Planner Query Plan SQL Results
  12. 12. 12© Copyright 2015 Pivotal. All rights reserved. When Orca fallbacks Parser Executor Orca Planner Query SQL Results Fallback Plan Orca Orca will automatically fallback to the legacy optimizer for unsupported features
  13. 13. 13© Copyright 2015 Pivotal. All rights reserved. 13© Copyright 2015 Pivotal. All rights reserved. 13© Copyright 2015 Pivotal. All rights reserved. Subquery Unnesting
  14. 14. 14© Copyright 2015 Pivotal. All rights reserved. • A query that is nested inside an outer query block • Correlated Subquery (CSQ) is a subquery that uses values from the outer query Subqueries: Definition SELECT * FROM part p1 WHERE price > (SELECT avg(price) FROM part p2 WHERE p2.brand = p1.brand)
  15. 15. 15© Copyright 2015 Pivotal. All rights reserved. Subqueries: Impact • Heavily used in many workloads – BI/Reporting tools generate substantial number of subqueries – TPC-H workload: 40% of the 22 queries – TPC-DS workload: 20% of the 111 queries • Inefficient plans means query takes a long time or does not terminate • Optimizations – De-correlation – Conversion of subqueries to joins
  16. 16. 16© Copyright 2015 Pivotal. All rights reserved. Subqueries in Disjunctive Filters • Find parts with: size > 40 OR price > the average brand price SELECT * FROM part p1 WHERE p_size > 40 OR p_retailprice >(SELECT avg(p_retailprice) FROM part p2 WHERE p2.p_brand = p1.p_brand)
  17. 17. 17© Copyright 2015 Pivotal. All rights reserved. Subqueries in Disjunctive Filters
  18. 18. 18© Copyright 2015 Pivotal. All rights reserved. Subquery Handling: Orca vs. Planner CSQ Class Planner Orca CSQ in select list Correlated Execution Join CSQ in disjunctive filter Correlated Execution Join Multi-Level CSQ No Plan Join CSQ with group by and inequality Correlated Execution Join CSQ must return one row Correlated Execution Join CSQ with correlation in select list Correlated Execution Correlated Execution
  19. 19. 19© Copyright 2015 Pivotal. All rights reserved. TPC-DS Orca vs. Planner TPC-DS 10TB, 16 nodes, 48 GB/node
  20. 20. 20© Copyright 2015 Pivotal. All rights reserved. 20© Copyright 2015 Pivotal. All rights reserved. 20© Copyright 2015 Pivotal. All rights reserved. How to add an Orca feature?
  21. 21. 21© Copyright 2015 Pivotal. All rights reserved. • Step 1: Pre-Process Input Logical Expression – Apply heuristics like pushing selects down etc. • Step 2: Exploration (via Transforms) – Generate all equivalent logical plans • Step 3: Implementation (via Transforms) – Generate all physical implementation for all logical operators • Step 4: Optimization – Enforce distribution and ordering requirements and pick the cheapest plan Optimization Steps in Orca
  22. 22. 22© Copyright 2015 Pivotal. All rights reserved. • Split an aggregate into a pair of local and global aggregate. • Schema: CREATE TABLE foo (a int, b int, c int) distributed by (a); • Query: SELECT sum(c) FROM foo GROUP BY b • Do local aggregation and then send the partial aggregation to the master. • The final aggregation can then be done on the master. Let’s Pair
  23. 23. 23© Copyright 2015 Pivotal. All rights reserved. // HEADER FILES ~/orca/libgpopt/include/gpopt/xforms // SOURCE FILES ~/orca/libgpopt/src/xforms CXformSplitGbAgg
  24. 24. 24© Copyright 2015 Pivotal. All rights reserved. • Pattern • Pre-Condition Check Define What Will Trigger This Transformation
  25. 25. 25© Copyright 2015 Pivotal. All rights reserved. Pattern GPOS_NEW(pmp) CExpression ( pmp, // logical aggregate operator GPOS_NEW(pmp) CLogicalGbAgg(pmp), // relational child GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternLeaf(pmp)), // scalar project list GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternTree(pmp)) ));
  26. 26. 26© Copyright 2015 Pivotal. All rights reserved. // Compatibility function for splitting aggregates virtual BOOL FCompatible(CXform::EXformId exfid) { return (CXform::ExfSplitGbAgg != exfid); } Pre-Condition Check Common sense rules such as do not fire this rule on a logical operator that was a result of the same rule. (Avoid Infinite Recurssion)
  27. 27. 27© Copyright 2015 Pivotal. All rights reserved. void Transform ( CXformContext *pxfctxt, CXformResult *pxfres, CExpression *pexpr ) const; The Actual Transformation
  28. 28. 28© Copyright 2015 Pivotal. All rights reserved. void CXformFactory::Instantiate() { …. Add(GPOS_NEW(m_pmp) CXformSplitGbAgg(m_pmp)); …. } Register Transformation
  29. 29. 29© Copyright 2015 Pivotal. All rights reserved. 29© Copyright 2015 Pivotal. All rights reserved. 29© Copyright 2015 Pivotal. All rights reserved. How to enable Orca on HAWQ?
  30. 30. 30© Copyright 2015 Pivotal. All rights reserved. • GPORCA https://github.com/greenplum-db/gporca • White Paper: bit.ly/1ntrE8v • Pivotal Tracker: bit.ly/1m1WGDn Get Your Hands On It!
  31. 31. 31© Copyright 2015 Pivotal. All rights reserved. • Step 1: Get Centos07 Docker Image https://github.com/xinzweb/hawq-devel-env/blob/master/README.md • Step 2: Update CMake to 3.4.3 • Step 3: Install GPORCA – Install Xerces 3.1.2 – Install GPOS Getting Orca on Apache HAWQ
  32. 32. 32© Copyright 2015 Pivotal. All rights reserved. • Step 4: Apply my changes to Apache HAWQ Makefile https://github.com/vraghavan78/incubator-hawq/tree/fix-enable-orca • Step 5: Compile HAWQ with orca enabled ./configure --enable-orca --with-perl --with-python --with-libxml • Step 6: May have to copy libraries to the different segments For Example: scp -r . gpadmin@centos7-datanode1:/usr/local/lib Getting Orca on Apache HAWQ
  33. 33. 33© Copyright 2015 Pivotal. All rights reserved. Publications • Optimization of Common Table Expressions in MPP Database Systems, VLDB 2015 – Amr El-Helw, Venkatesh Raghavan, Mohamed A. Soliman, George C. Caragea, Zhongxian Gu, Michalis Petropoulos. • Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014 – Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia- Alvarado, Foyzur Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan, Konstantinos Krikellas, Rhonda Baldwin • Optimizing Queries over Partitioned Tables in MPP Systems, SIGMOD 2014 – Lyublena Antova, Amr El-Helw, Mohamed Soliman, Zhongxian Gu, Michalis Petropoulos, Florian Waas • Reversing Statistics for Scalable Test Databases Generation, DBTest 2013 – Entong Shen, Lyublena Antova • Total Operator State Recall - Cost-Effective Reuse of Results in Greenplum Database, ICDE Workshops 2013 – George C. Caragea, Carlos Garcia-Alvarado, Michalis Petropoulos, Florian M. Waas • Testing the Accuracy of Query Optimizers, DBTest 2012 – Zhongxian Gu, Mohamed A. Soliman, Florian M. Waas – Automatic Capture of Minimal, Portable, and Executable Bug Repros using AMPERe, DBTest 2012 – Lyublena Antova, Konstantinos Krikellas, Florian M. Waas – Automatic Data Placement in MPP Databases, ICDE Workshops 2012 – Carlos Garcia-Alvarado, Venkatesh Raghavan, Sivaramakrishnan Narayanan, Florian M. Waas

×