1
Apache Drill: YASOH
yet another sql on h(base|adoop)
Jacques Nadeau, HBaseCon June 13, 2013
jacques@apache.org |@intjesus
2
Me
 Software Architect @ MapR leading our Apache Drill
contributions
 Previously:
– Lead development of distributed se...
3
Apache Drill
 Apache Incubating Project
 Interactive Analysis of large scale datasets
– Inspired by Google Dremel
 Ma...
4
Basic Process
Zookeeper
DFS/HBase DFS/HBase DFS/HBase
Drillbit
Distributed Cache
Drillbit
Distributed Cache
Drillbit
Dis...
5
Core Modules within a Drillbit
SQL Parser
Optimizer
PhysicalPlan
DFS Engine
HBase Engine
RPC Endpoint
Distributed Cache
...
6
SQL Options for HBase
Drill Phoenix Impala Hive+Tez
Overall
Status Alpha 1.2 1.0 Alpha
Typical Shortest Query 100ms 10ms...
7
What’s different about Drill
 Late-bind schema doesn’t require metastore definitions
SELECT cf1.month, cf1.year, FROM h...
8
What’s different about Drill, cont’d
 Community-driven Apache development process and peace of
mind
 Leverages recent ...
9
Drill + HBase Roadmap
 Native support for Orderly complex keys
– Orderly encodes a compound field (including null suppo...
10
Other Interesting Things
 Drill keeps data off-heap to avoid garbage collection problems
– Metadata stays on heap
– Ut...
11
Thanks!
 Join the Community
– Join the mailing list:
• drill-dev-subscribe@incubator.apache.org
• drill-user-subscribe...
Upcoming SlideShare
Loading in...5
×

HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI SQL Capabilities for Apache HBase

1,377

Published on

Presented by: Jacques Nadeau, MapR

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,377
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI SQL Capabilities for Apache HBase"

  1. 1. 1 Apache Drill: YASOH yet another sql on h(base|adoop) Jacques Nadeau, HBaseCon June 13, 2013 jacques@apache.org |@intjesus
  2. 2. 2 Me  Software Architect @ MapR leading our Apache Drill contributions  Previously: – Lead development of distributed search engine at YapMap – Lead R&D team at contextual advertising company Quigo, sold to AOL – Built big data warehousing and analytical reporting products at Aquantive, sold to Microsoft
  3. 3. 3 Apache Drill  Apache Incubating Project  Interactive Analysis of large scale datasets – Inspired by Google Dremel  MapReduce greatest strength is also an Achilles heel for high performance queries – Pessimistic execution is great for long running jobs – Optimistic execution is better for shorter jobs – Hive solves many needs but its organic growth and dependence on MapReduce make it hard to bring forward – Tez is a new project that tries to bring Hive a new execution model  Not Done—alpha next month
  4. 4. 4 Basic Process Zookeeper DFS/HBase DFS/HBase DFS/HBase Drillbit Distributed Cache Drillbit Distributed Cache Drillbit Distributed Cache Query 1. Query comes to any Drillbit (JDBC, ODBC, CLI, protobuf) 2. Drillbit generates execution plan based on query optimization & locality 3. Fragments are farmed to individual nodes 4. Data is returned to driving node
  5. 5. 5 Core Modules within a Drillbit SQL Parser Optimizer PhysicalPlan DFS Engine HBase Engine RPC Endpoint Distributed Cache StorageEngineInterface LogicalPlan Execution
  6. 6. 6 SQL Options for HBase Drill Phoenix Impala Hive+Tez Overall Status Alpha 1.2 1.0 Alpha Typical Shortest Query 100ms 10ms 100ms ?? Query HBase ✓ ✓ ✓ ✓ Query Any SerDe ✓ ✓ Hive UDF support ✓ ✓ Contribution/Dev Model Apache GitHub MySQL Apache Execution programming language Java Java C++ Java Query language Supports Write ✓ ✓ ✓ Query Language SQL2003 SQL92 ~HiveQL HiveQL Data Supports data without schema ✓ Nested Relational Operators ✓ Internal sort & join ✓ ✓ ✓ External Sort/Join/Aggregation ✓ ✓ Execution Code Generation ✓ ✓ Columnar Execution ✓ Vectorized Operators ✓ ✓
  7. 7. 7 What’s different about Drill  Late-bind schema doesn’t require metastore definitions SELECT cf1.month, cf1.year, FROM hbase.table1  Nested data as first class entity: Extensions to SQL for nested data types, similar to BigQuery (four-value semantics) SELECT c.name, c.address, COUNT(c.children) FROM( SELECT CONVERT_FROM(cf1.user-json-blob, JSON) AS c FROM hbase.table1 )
  8. 8. 8 What’s different about Drill, cont’d  Community-driven Apache development process and peace of mind  Leverages recent research approaches – Late record materialization – Vectorized Operators  Extensibility – Supports Hive UDFs/SerDes – Well defined storage engine and operator interfaces – Logical and physical plan API layers for optimization and extension – Targeting Phoenix support  Works like other things in the Hadoop ecosystem – Apache development process & Java codebase
  9. 9. 9 Drill + HBase Roadmap  Native support for Orderly complex keys – Orderly encodes a compound field (including null support) as a single, sortable byte value  Drill on top of Phoenix to leverage great Coprocessor work  Optimized HBase join leveraging bloomfilters  Memory mapped RegionServer <> Drillbit communication  Expression evaluation bytecode pushdown
  10. 10. 10 Other Interesting Things  Drill keeps data off-heap to avoid garbage collection problems – Metadata stays on heap – Utilizes Netty’s arena-based NativeByteBuffer pooling and ByteBuf abstraction – RPC engine specifically designed to avoid extra memory copies – In memory representation is documented, allowing native operators as required  Code is compiled at a record batch level, avoiding record level function call overhead – Janino + ASM for code compilation – Recompiled for each schema change  Record batches are maintained in columnar format and leverage a selection vector execution method to speed query performance – Minimize branches and instruction complexity – Maximizes cache locality
  11. 11. 11 Thanks!  Join the Community – Join the mailing list: • drill-dev-subscribe@incubator.apache.org • drill-user-subscribe@incubator.apache.org – Fork us on GitHub: http://github.com/apache/incubator-drill/ – Create a JIRA: https://issues.apache.org/jira/browse/DRI LL  Join the Drill team at MapR Technologies  Let us know what you think on the Drill mailing lists  Shout out to supporting projects – Jackson – Typesafe HOCON – Netty4 – Protobuf – Vanilla Java – Larray – Hazelcast – Curator – Optiq – Hive ORC – Parquet – Janino – ASM – Yammer Metrics – Guava – Carrot HPPC

×