Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
®
© 2014 MapR Technologies 1
®
© 2014 MapR Technologies
Self Service Data Exploration with
Apache Drill
{
Author: { “name”...
®
© 2014 MapR Technologies 2
Data is doubling in
size every two years
®
© 2014 MapR Technologies 3
2011 2013
In 2020 it is estimated to be 44 zettabytes of data in the world
2020
Source: IDC D...
®
© 2014 MapR Technologies 4
UNSTRUCTURED
DATA
1980 2000 20101990 2020
Unstructured data will account for more than 80%
of...
®
© 2014 MapR Technologies 5
Evolving distance to data
Business
(analysts,
developers)
“Plumbing”
development
Business
(an...
®
© 2014 MapR Technologies 6
SQL in a NoSchema World
•  SQL
•  BI (Tableau, MicroStrategy, etc.)
•  Low latency
•  Scalabi...
®
© 2014 MapR Technologies 7
• Schema-free scale-out query engine for Hadoop and NoSQL
• Low latency
• Extreme ease of use...
®
© 2014 MapR Technologies 8
Drill’s Data Model is Flexible
HBase
JSON
BSON
CSV
TSV
Parquet
Avro
Schema-lessFixed schema
F...
®
© 2014 MapR Technologies 9
Running Drill takes 10 minutes
	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-...
®
© 2014 MapR Technologies 10
Introduce external data sources to Drill
Ø 
SELECT	
  *	
  FROM	
  dfs.root.`/
E:/drill/dat...
®
© 2014 MapR Technologies 11
Introduce external data sources to Drill
Storage Plugin
Provider
Workspace Table
files Path ...
®
© 2014 MapR Technologies 12
{	
  
	
  	
  "votes":	
  {"funny":	
  0,	
  "useful":	
  2,	
  "cool":	
  1},	
  
	
  	
  "...
®
© 2014 MapR Technologies 13
business.json (1)
{	
  
	
  "business_id":	
  "4bEjOyTaDG24SY5TxsaUNQ",	
  
	
  "full_addres...
®
© 2014 MapR Technologies 14
business.json (2)
	
  "state":	
  "NV",	
  
	
  "stars":	
  4.0,	
  
	
   	
  "attributes":	...
®
© 2014 MapR Technologies 15
Use cases
LAS VEGAS
NEW
RESTAURANT
®
© 2014 MapR Technologies 16
NEW RESTAURANT
Customers
for opening
party
>	
  SELECT	
  name,	
  review_count	
  
	
  	
  ...
®
© 2014 MapR Technologies 17
Cities
with most
businesses
NEW RESTAURANT
>	
  SELECT	
  state,	
  city,	
  COUNT(*)	
  AS	...
®
© 2014 MapR Technologies 18
Use cases
LAS VEGAS
LAS VEGAS
RESTAURANT
®
© 2014 MapR Technologies 19
Open
restaurants
at 22:00
LAS VEGAS RESTAURANT
>	
  SELECT	
  name,	
  b.hours	
  
	
  	
  F...
®
© 2014 MapR Technologies 20
Finding
hummus
at 22:00
LAS VEGAS RESTAURANT
>	
  SELECT	
  name,	
  stars,	
  b.hours.Wedne...
®
© 2014 MapR Technologies 21
• Working with repeated values
APACHE DRILL
Unique benefits
®
© 2014 MapR Technologies 22
Flatten Repeated Values
>	
  SELECT	
  name,	
  categories	
  
	
  	
  FROM	
  dfs.yelp.`bus...
®
© 2014 MapR Technologies 23
Most and Least Common Business Categories
>	
  SELECT	
  category,	
  COUNT(*)	
  AS	
  busi...
®
© 2014 MapR Technologies 24
• Views - Dynamic and Materialized
APACHE DRILL
®
© 2014 MapR Technologies 25
Create a view combining business and reviews datasets.
>	
  CREATE	
  OR	
  REPLACE	
  VIEW	...
®
© 2014 MapR Technologies 26
Materialized Views AKA Tables
>	
  ALTER	
  SESSION	
  SET	
  `store.format`	
  =	
  'parque...
®
© 2014 MapR Technologies 27
DRILL ARCHITECTURE
Under the hood
®
© 2014 MapR Technologies 28
High Level Architecture
Cluster of commodity servers
–  Daemon (drillbit) on each node
ZooKe...
®
© 2014 MapR Technologies 29
Drill Maximizes Data Locality
Data Source Best Practice
HDFS or MapR-FS drillbit on each Dat...
®
© 2014 MapR Technologies 30
Core Modules within drillbit	
  
SQL Parser
Hive
HBase
Distributed Cache
StoragePlugins
Mong...
®
© 2014 MapR Technologies 31
SELECT * Query Execution
drillbit	
  
ZooKeeper
Client
(JDBC, ODBC,
REST)
1.  Find drillbits...
®
© 2014 MapR Technologies 32
Participate
•  Learn: http://drill.apache.org/
•  Download: http://drill.apache.org/download...
®
© 2014 MapR Technologies 33
Thank You
@mapr maprtech
aditya@mapr.com
Aditya Kishore
MapRTechnologies
maprtech
mapr-techn...
®
© 2014 MapR Technologies 34
Or Run Drill in Distributed Mode…
$	
  zkServer	
  start	
  
•  Make sure ZooKeeper (zkServe...
®
© 2014 MapR Technologies 35
user.json
{	
  
	
  "yelping_since":	
  "2007-­‐08",	
  
	
  "votes":	
  {	
  
	
   	
  "fun...
Upcoming SlideShare
Loading in …5
×

Introduction to Apache Drill - NYC Apache Drill Meetup

1,202 views

Published on

Ted Dunning's presentation to the NYC Apache Drill Meetup.

Published in: Software
  • Be the first to comment

Introduction to Apache Drill - NYC Apache Drill Meetup

  1. 1. ® © 2014 MapR Technologies 1 ® © 2014 MapR Technologies Self Service Data Exploration with Apache Drill { Author: { “name” : “Aditya Kishore”, “github” : “adityakishore”, “twitter” : “@adiore” } Presenter: {“name”:”Ted Dunning”, “github”: “tdunning”, “twitter”: “@ted_dunning”} }
  2. 2. ® © 2014 MapR Technologies 2 Data is doubling in size every two years
  3. 3. ® © 2014 MapR Technologies 3 2011 2013 In 2020 it is estimated to be 44 zettabytes of data in the world 2020 Source: IDC Digital Universe 44ZETTABYTES* 4.4ZETTABYTES 1.8ZETTABYTES … * Equivalent of 700 trillion 64GB iPhones
  4. 4. ® © 2014 MapR Technologies 4 UNSTRUCTURED DATA 1980 2000 20101990 2020 Unstructured data will account for more than 80% of the data collected by organizations Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data TotalDataStored STRUCTURED DATA
  5. 5. ® © 2014 MapR Technologies 5 Evolving distance to data Business (analysts, developers) “Plumbing” development Business (analysts, developers) Existing approaches require a middleman (IT) Data Data Data Business (analysts, developers) Modeling and transformations Map/Reduce Traditional SQL-on-Hadoop New SQL-on-Hadoop
  6. 6. ® © 2014 MapR Technologies 6 SQL in a NoSchema World •  SQL •  BI (Tableau, MicroStrategy, etc.) •  Low latency •  Scalability •  Create and maintain schemas on: –  HDFS (Parquet, JSON, etc.) –  HBase –  MongoDB •  Transform or copy data 2 DON’T WANT WANT
  7. 7. ® © 2014 MapR Technologies 7 • Schema-free scale-out query engine for Hadoop and NoSQL • Low latency • Extreme ease of use • Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs APACHE DRILL
  8. 8. ® © 2014 MapR Technologies 8 Drill’s Data Model is Flexible HBase JSON BSON CSV TSV Parquet Avro Schema-lessFixed schema Flat Complex Flexibility Name! Gender! Age! Michael! M! 6! Jennifer! F! 3! {! name: {! first: Michael,! last: Smith! },! hobbies: [ski, soccer],! district: Los Altos! }! {! name: {! first: Jennifer,! last: Gates! },! hobbies: [sing],! preschool: CCLC! }! RDBMS/SQL-on-Hadoop table Apache Drill table Flexibility
  9. 9. ® © 2014 MapR Technologies 9 Running Drill takes 10 minutes   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  full_name        |  position_title  |      salary      |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Sheri  Nowmer  |  President            |  80000.0        |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   1  row  selected  (0.417  seconds)   DOWNLOAD https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes EXTRACT $  tar  xf  apache-­‐drill-­‐0.7.0.tar.gz   $  cd  apache-­‐drill-­‐0.7.0   RUN $  bin/sqlline  -­‐u  jdbc:drill:zk=local   >  SELECT  full_name,  position_title,  salary      FROM  cp.`employee.json  `      LIMIT  1;  QUERY & step by step In SQL format
  10. 10. ® © 2014 MapR Technologies 10 Introduce external data sources to Drill Ø  SELECT  *  FROM  dfs.root.`/ E:/drill/data/yelp/ review.json`;   Ø  SELECT  *  FROM   dfs.yelp.`review.json`   LIMIT  1;   Ø  USE  dfs.yelp;   Ø  SELECT  *  FROM   `review.json`  LIMIT  1;   Ø  SELECT  *  FROM  hbase.users   LIMIT  1;   Storage Plugin Provider Workspace Table files Path Path relative to workspace mongo Database Collection hive Database Table hbase Namespace Table Coordinates: Currently Supported Providers . .
  11. 11. ® © 2014 MapR Technologies 11 Introduce external data sources to Drill Storage Plugin Provider Workspace Table files Path Path relative to workspace mongo Database Collection hive Database Table hbase Namespace Table Coordinates: Currently Supported Providers . . Example: Ø  SELECT  *  FROM  dfs.root.`/E:/drill/data/yelp/ review.json`;   Ø  SELECT  *  FROM  dfs.yelp.`review.json`  LIMIT  1;   Ø  USE  dfs.yelp;   Ø  SELECT  *  FROM  `review.json`  LIMIT  1;   Ø  SELECT  *  FROM  hbase.users  LIMIT  1;   Ø  SELECT  *  FROM  dfs.root.`/ E:/drill/data/yelp/ review.json`;   Ø  SELECT  *  FROM   dfs.yelp.`review.json`   LIMIT  1;   Ø  USE  dfs.yelp;   Ø  SELECT  *  FROM   `review.json`  LIMIT  1;   Ø  SELECT  *  FROM  hbase.users   LIMIT  1;  
  12. 12. ® © 2014 MapR Technologies 12 {      "votes":  {"funny":  0,  "useful":  2,  "cool":  1},      "user_id":  "Xqd0DzHaiyRqVH3WRG7hzg",      "review_id":  "15SdjuK7DmYqUAj6rjGowg",      "stars":  5,      "date":  "2007-­‐05-­‐17",      "text":  "dr.  goldberg  offers  everything  ...",      "type":  "review",      "business_id":  "vcNAWiLM4dR7D2nwwJ7nCA"   }   Inventory: DFS Files
  13. 13. ® © 2014 MapR Technologies 13 business.json (1) {    "business_id":  "4bEjOyTaDG24SY5TxsaUNQ",    "full_address":  "3655  Las  Vegas  Blvd  SnThe  StripnLas  Vegas,  NV  89109",    "hours":  {      "Monday":  {"close":  "23:00",  "open":  "07:00"},      "Tuesday":  {"close":  "23:00",  "open":  "07:00"},      "Friday":  {"close":  "00:00",  "open":  "07:00"},      "Wednesday":  {"close":  "23:00",  "open":  "07:00"},      "Thursday":  {"close":  "23:00",  "open":  "07:00"},      "Sunday":  {"close":  "23:00",  "open":  "07:00"},      "Saturday":  {"close":  "00:00",  "open":  "07:00"}    },    "open":  true,    "categories":  ["Breakfast  &  Brunch",  "Steakhouses",  "French",  "Restaurants"],    "city":  "Las  Vegas",    "review_count":  4084,    "name":  "Mon  Ami  Gabi",    "neighborhoods":  ["The  Strip"],    "longitude":  -­‐115.172588519464,  
  14. 14. ® © 2014 MapR Technologies 14 business.json (2)  "state":  "NV",    "stars":  4.0,      "attributes":  {      "Alcohol":  "full_bar”,        "Noise  Level":  "average",      "Has  TV":  false,      "Attire":  "casual",      "Ambience":  {        "romantic":  true,        "intimate":  false,        "touristy":  false,        "hipster":  false,          "classy":  true,        "trendy":  false,          "casual":  false      },      "Good  For":  {"dessert":  false,  "latenight":  false,  "lunch":  false,                                                  "dinner":  true,  "breakfast":  false,  "brunch":  false},    }   }  
  15. 15. ® © 2014 MapR Technologies 15 Use cases LAS VEGAS NEW RESTAURANT
  16. 16. ® © 2014 MapR Technologies 16 NEW RESTAURANT Customers for opening party >  SELECT  name,  review_count      FROM  dfs.yelp.`user.json`      ORDER  BY  review_count  DESC      LIMIT  50;     +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |  review_count  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Victor          |  8062                  |   |  Jennifer      |  4244                  |   |  Anita            |  3829                  |   |  ......          |  ....                  |   |  Eileen          |  1947                  |   |  J                    |  1946                  |   |  Matt              |  1942                  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   50  rows  selected  (1.16  seconds)  
  17. 17. ® © 2014 MapR Technologies 17 Cities with most businesses NEW RESTAURANT >  SELECT  state,  city,  COUNT(*)  AS  businesses          FROM  dfs.yelp.`business.json`          GROUP  BY  state,  city          ORDER  BY  reviews  DESC  LIMIT  10;   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |      state        |        city        |  businesses  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  NV                  |  Las  Vegas    |  12021            |   |  AZ                  |  Phoenix        |  7499              |   |  AZ                  |  Scottsdale  |  3605              |   |  EDH                |  Edinburgh    |  2804              |   |  AZ                  |  Mesa              |  2041              |   |  AZ                  |  Tempe            |  2025              |   |  NV                  |  Henderson    |  1914              |   |  AZ                  |  Chandler      |  1637              |   |  WI                  |  Madison        |  1630              |   |  AZ                  |  Glendale      |  1196              |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+  
  18. 18. ® © 2014 MapR Technologies 18 Use cases LAS VEGAS LAS VEGAS RESTAURANT
  19. 19. ® © 2014 MapR Technologies 19 Open restaurants at 22:00 LAS VEGAS RESTAURANT >  SELECT  name,  b.hours      FROM  dfs.yelp.`business.json`  b      WHERE  b.hours.Saturday.`open`  <  '22:00'  AND                  b.hours.Saturday.`close`  >  '22:00'      LIMIT  1;     +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |      hours        |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Chang  Jiang  Chinese  Kitchen  |  {"Tuesday": {"close":"22:00","open":"11:00"},"Friday": {"close":"22:30","open":"11:00"},"Monday": {"close":"22:00","open":"11:00"},"Wednesday": {"close":"22:00","open":"11:00"},"Thursday": {"close":"22:00","open":"11:00"},"Sunday": {"close":"21:00","open":"16:00"},"Saturday":{"close":"22:30","open":"11:00"}}  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   1  row  selected  (0.013  seconds)    
  20. 20. ® © 2014 MapR Technologies 20 Finding hummus at 22:00 LAS VEGAS RESTAURANT >  SELECT  name,  stars,  b.hours.Wednesday,  categories      FROM  dfs.yelp.`business.json`  b      WHERE  b.hours.Wednesday.`open`  <  '22:00'  AND                  b.hours.Wednesday.`close`  >  '22:00'  AND                  REPEATED_CONTAINS(categories,  'Mediterranean')   AND                  city  =  'Las  Vegas'          ORDER  BY  stars  DESC          LIMIT  1;     +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |      stars        |      EXPR$2      |  categories  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Marrakech  Moroccan  Restaurant  |  4.0                |  {"close":"23:00","open":"17:30"}  |   ["Mediterranean","Middle  Eastern","Moroccan","Restaurants"]  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   1  row  selected  (2.185  seconds)  
  21. 21. ® © 2014 MapR Technologies 21 • Working with repeated values APACHE DRILL Unique benefits
  22. 22. ® © 2014 MapR Technologies 22 Flatten Repeated Values >  SELECT  name,  categories      FROM  dfs.yelp.`business.json`  LIMIT  2;   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |  categories  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Eric  Goldberg,  MD  |  ["Doctors","Health  &  Medical"]  |   |  Pine  Cone  Restaurant  |  ["Restaurants"]  |   |  Deforest  Family  Restaurant  |  ["American  (Traditional)","Restaurants"]  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+     >  SELECT  name,  FLATTEN(categories)  AS  categories      FROM  dfs.yelp.`business.json`  LIMIT  3;   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |  categories  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Eric  Goldberg,  MD  |  Doctors        |   |  Eric  Goldberg,  MD  |  Health  &  Medical  |   |  Pine  Cone  Restaurant  |  Restaurants  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+  
  23. 23. ® © 2014 MapR Technologies 23 Most and Least Common Business Categories >  SELECT  category,  COUNT(*)  AS  businesses      FROM  (SELECT  name,  FLATTEN(categories)  AS  category                          FROM  dfs.yelp.`business.json`)      GROUP  BY  category  ORDER  BY  businesses  DESC;   +------------+------------+ | category | businesses | +------------+------------+ | Restaurants | 14303 | | ............... | | Firewood | 1 | +------------+------------+ 715 rows selected (3.439 seconds)     >  SELECT  name,  categories  FROM  dfs.yelp.`business.json`      WHERE  true  AND  REPEATED_CONTAINS(categories,  'Australian');   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |  categories  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  The  Australian  AZ  |  ["Bars","Burgers","Nightlife","Australian","Sports  Bars","Restaurants"]  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+  
  24. 24. ® © 2014 MapR Technologies 24 • Views - Dynamic and Materialized APACHE DRILL
  25. 25. ® © 2014 MapR Technologies 25 Create a view combining business and reviews datasets. >  CREATE  OR  REPLACE  VIEW  dfs.tmp.BusinessReviews  AS          SELECT  b.name,  b.stars,  r.votes.funny,                        r.votes.useful,  r.votes.cool,  r.`date`              FROM  dfs.yelp.`business.json`  b,  dfs.yelp.`review.json`  r              WHERE  r.business_id  =  b.business_id;     +------------+------------+ | ok | summary | +------------+------------+ | true | View 'BusinessReviews' created successfully in 'dfs.tmp' schema | +------------+------------+   >  SELECT  COUNT(*)  AS  Total  FROM  dfs.tmp.BusinessReviews;     +------------+ | Total | +------------+ | 1125458 | +------------+
  26. 26. ® © 2014 MapR Technologies 26 Materialized Views AKA Tables >  ALTER  SESSION  SET  `store.format`  =  'parquet';     >  CREATE  TABLE  dfs.tmp.BusinessReviewsTbl  AS          SELECT  b.name,  b.stars,  r.votes.funny  funny,                        r.votes.useful  useful,  r.votes.cool  cool,  r.`date`              FROM  dfs.yelp.`business.json`  b,  dfs.yelp.`review.json`  r              WHERE  r.business_id  =  b.business_id;     +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |    Fragment    |  Number  of  records  written  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  1_0                |  176448                                        |   |  1_1                |  192439                                        |   |  1_2                |  198625                                        |   |  1_3                |  200863                                        |   |  1_4                |  181420                                        |   |  1_5                |  175663                                        |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+  
  27. 27. ® © 2014 MapR Technologies 27 DRILL ARCHITECTURE Under the hood
  28. 28. ® © 2014 MapR Technologies 28 High Level Architecture Cluster of commodity servers –  Daemon (drillbit) on each node ZooKeeper maintains ephemeral cluster membership information –  Drillbit uses ZooKeeper to find other drillbits in the cluster –  Client uses ZooKeeper to find drillbits Built-in, optimistic query execution engine. Doesn’t require a particular storage or execution system (MapReduce, Spark, Tez) –  Better performance and manageability Data processing unit is columnar record batches   –  Enables schema flexibility with negligible performance impact
  29. 29. ® © 2014 MapR Technologies 29 Drill Maximizes Data Locality Data Source Best Practice HDFS or MapR-FS drillbit on each DataNode HBase or MapR-DB drillbit on each RegionServer MongoDB drillbit on each mongod node (when using replicas, run it on the replica node) drillbit DataNode/ RegionServer/ mongod drillbit DataNode/ RegionServer/ mongod drillbit DataNode/ RegionServer/ mongod ZooKeeper ZooKeeper ZooKeeper …
  30. 30. ® © 2014 MapR Technologies 30 Core Modules within drillbit   SQL Parser Hive HBase Distributed Cache StoragePlugins MongoDB DFS PhysicalPlan ExecutionLogicalPlan Optimizer RPC Endpoint
  31. 31. ® © 2014 MapR Technologies 31 SELECT * Query Execution drillbit   ZooKeeper Client (JDBC, ODBC, REST) 1.  Find drillbits (once per session) 3.  Create logical and physical execution plans 4.  Farm out execution of fragments to cluster (completely distributed execution) ZooKeeper ZooKeeper drillbit  drillbit   2.  Submit query to drillbit 5.  Return results to client * CTAS (CREATE TABLE AS SELECT) queries include steps 1-4
  32. 32. ® © 2014 MapR Technologies 32 Participate •  Learn: http://drill.apache.org/ •  Download: http://drill.apache.org/download/ •  Ask Questions: user@drill.apache.org •  Engage on Twitter: @ApacheDrill
  33. 33. ® © 2014 MapR Technologies 33 Thank You @mapr maprtech aditya@mapr.com Aditya Kishore MapRTechnologies maprtech mapr-technologies adi@apache.org
  34. 34. ® © 2014 MapR Technologies 34 Or Run Drill in Distributed Mode… $  zkServer  start   •  Make sure ZooKeeper (zkServer) is running: •  Access the Web UI: http://localhost:8047 •  Connect a client to the cluster (e.g., sqlline): •  Clients (like sqlline) connect to ZooKeeper to discover the cluster nodes •  If you have multiple Drill clusters registered in one ZooKeeper ensemble, specify the desired cluster in the JDBC connection string: jdbc:drill:zk=localhost:2181/drill/ <clustername> •  Not sure if ZooKeeper is running? Run telnet  localhost  2181 and make sure it connects •  Define the Drill cluster name and ZooKeeper nodes in conf/drill-­‐override.conf •  Start drillbit:   $  bin/drillbit.sh  start   $  bin/sqlline  -­‐u  jdbc:drill:zk=localhost:2181  
  35. 35. ® © 2014 MapR Technologies 35 user.json {    "yelping_since":  "2007-­‐08",    "votes":  {      "funny":  198,      "useful":  415,      "cool":  206    },    "review_count":  283,    "name":  "Adele",    "user_id":  "9NJdKpRNwwaL4cvKq0cN6g",    "friends":  ["DrKQzBFAvxhyjLgbPSW2Qw",  "ebXx-­‐G5eFqWkfDuk22f81w",  "qWLezzHxOXN-­‐ GQdInixZzw"],    "fans":  10,    "average_stars":  3.6499999999999999,    "compliments":  {      "funny":  4,      "hot":  17,      "cool":  20    },    "elite":  [2008,  2009,  2010,  2011,  2012,  2013,  2014]   }  

×