SlideShare a Scribd company logo
1 of 36
®
© 2015 MapR Technologies 1
®
© 2015 MapR Technologies
Self Service Data Exploration with
Apache Drill
®
© 2015 MapR Technologies 2
Data is doubling in
size every two years
®
© 2015 MapR Technologies 3
2011 2013
In 2020 it is estimated to be 44 zettabytes of data in the world
2020
Source: IDC Digital Universe
44ZETTABYTES*
4.4ZETTABYTES
1.8ZETTABYTES
…
*A zettabyte is 1 BILLION terabytes.
®
© 2015 MapR Technologies 4
2011 2013
In 2020 it is estimated to be 44 zettabytes of data in the world
2020
Source: IDC Digital Universe
44ZETTABYTES*
4.4ZETTABYTES
1.8ZETTABYTES
…
700 trillion
*A zettabyte is 1 BILLION terabytes.
64GB
Each person in the
world had 100k of
these iPhones
®
© 2015 MapR Technologies 5
UNSTRUCTURED
DATA
1980 2000 20101990 2020
Unstructured data will account for more than 80%
of the data collected by organizations
Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data
TotalDataStored
STRUCTURED
DATA
®
© 2015 MapR Technologies 6
Evolving distance to data
Business
(analysts,
developers)
“Plumbing”
development
Business
(analysts, developers)
Existing approaches require
a middleman (IT)
Data
Data
Data
Business
(analysts,
developers)
Modeling and
transformations
Map/Reduce
Traditional
SQL-on-Hadoop
New
SQL-on-Hadoop
®
© 2015 MapR Technologies 7
SQL in a NoSchema World
•  SQL
•  BI (Tableau, MicroStrategy, etc.)
•  Low latency
•  Scalability
•  Create and maintain schemas on:
–  HDFS (Parquet, JSON, etc.)
–  HBase
–  MongoDB
•  Transform or copy data
2 DON’T WANT WANT
®
© 2015 MapR Technologies 8
• Schema-free scale-out query engine for Hadoop and NoSQL
• Low latency
• Extreme ease of use
• Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs
APACHE DRILL
®
© 2015 MapR Technologies 9
Drill’s Data Model is Flexible
HBase
JSON
BSON
CSV
TSV
Parquet
Avro
Schema-lessFixed schema
Flat
Complex
Flexibility
Name! Gender! Age!
Michael! M! 6!
Jennifer! F! 3!
{!
name: {!
first: Michael,!
last: Smith!
},!
hobbies: [ski, soccer],!
district: Los Altos!
}!
{!
name: {!
first: Jennifer,!
last: Gates!
},!
hobbies: [sing],!
preschool: CCLC!
}!
RDBMS/SQL-on-Hadoop table
Apache Drill table
Flexibility
®
© 2015 MapR Technologies 10
Running Drill takes 10 minutes
	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  full_name	
  	
  	
  	
  |	
  position_title	
  |	
  	
  	
  salary	
  	
  	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  Sheri	
  Nowmer	
  |	
  President	
  	
  	
  	
  	
  	
  |	
  80000.0	
  	
  	
  	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
1	
  row	
  selected	
  (0.417	
  seconds)	
  
DOWNLOAD HTTP://DRILL.APACHE.ORG/DOWNLOAD
EXTRACT
$	
  tar	
  xf	
  apache-­‐drill-­‐0.7.0.tar.gz	
  
$	
  cd	
  apache-­‐drill-­‐0.7.0	
  
RUN $	
  bin/sqlline	
  -­‐u	
  jdbc:drill:zk=local	
  
>	
  SELECT	
  full_name,	
  position_title,	
  salary	
  
	
  	
  FROM	
  cp.`employee.json	
  `	
  
	
  	
  LIMIT	
  1;	
  QUERY
& step by step
In SQL format
®
© 2015 MapR Technologies 11
Introduce external data sources to Drill
Ø 
SELECT	
  *	
  FROM	
  dfs.root.`/
E:/drill/data/yelp/
review.json`;	
  
Ø 
SELECT	
  *	
  FROM	
  
dfs.yelp.`review.json`	
  
LIMIT	
  1;	
  
Ø 
USE	
  dfs.yelp;	
  
Ø 
SELECT	
  *	
  FROM	
  
`review.json`	
  LIMIT	
  1;	
  
Ø 
SELECT	
  *	
  FROM	
  hbase.users	
  
LIMIT	
  1;	
  
Storage Plugin
Provider
Workspace Table
files Path Path relative to workspace
mongo Database Collection
hive Database Table
hbase Namespace Table
Coordinates:
Currently
Supported
Providers
. .
…
®
© 2015 MapR Technologies 12
Introduce external data sources to Drill
Example:
Ø  SELECT	
  *	
  FROM	
  dfs.root.`/E:/drill/data/yelp/
review.json`;	
  
Ø  SELECT	
  *	
  FROM	
  dfs.yelp.`review.json`	
  LIMIT	
  1;	
  
Ø  USE	
  dfs.yelp;	
  
Ø  SELECT	
  *	
  FROM	
  `review.json`	
  LIMIT	
  1;	
  
Ø  SELECT	
  *	
  FROM	
  hbase.users	
  LIMIT	
  1;	
  
Ø 
SELECT	
  *	
  FROM	
  dfs.root.`/
E:/drill/data/yelp/
review.json`;	
  
Ø 
SELECT	
  *	
  FROM	
  
dfs.yelp.`review.json`	
  
LIMIT	
  1;	
  
Ø 
USE	
  dfs.yelp;	
  
Ø 
SELECT	
  *	
  FROM	
  
`review.json`	
  LIMIT	
  1;	
  
Ø 
SELECT	
  *	
  FROM	
  hbase.users	
  
LIMIT	
  1;	
  
Storage Plugin
Provider
Workspace Table
files Path Path relative to workspace
mongo Database Collection
hive Database Table
hbase Namespace Table
Coordinates:
Currently
Supported
Providers
. .
…
®
© 2015 MapR Technologies 13
{	
  
	
  	
  "votes":	
  {"funny":	
  0,	
  "useful":	
  2,	
  "cool":	
  1},	
  
	
  	
  "user_id":	
  "Xqd0DzHaiyRqVH3WRG7hzg",	
  
	
  	
  "review_id":	
  "15SdjuK7DmYqUAj6rjGowg",	
  
	
  	
  "stars":	
  5,	
  
	
  	
  "date":	
  "2007-­‐05-­‐17",	
  
	
  	
  "text":	
  "dr.	
  goldberg	
  offers	
  everything	
  ...",	
  
	
  	
  "type":	
  "review",	
  
	
  	
  "business_id":	
  "vcNAWiLM4dR7D2nwwJ7nCA"	
  
}	
  
Inventory: DFS Files
®
© 2015 MapR Technologies 14
business.json (1)
{	
  
	
  "business_id":	
  "4bEjOyTaDG24SY5TxsaUNQ",	
  
	
  "full_address":	
  "3655	
  Las	
  Vegas	
  Blvd	
  SnThe	
  StripnLas	
  Vegas,	
  NV	
  89109",	
  
	
  "hours":	
  {	
  
	
   	
  "Monday":	
  {"close":	
  "23:00",	
  "open":	
  "07:00"},	
  
	
   	
  "Tuesday":	
  {"close":	
  "23:00",	
  "open":	
  "07:00"},	
  
	
   	
  "Friday":	
  {"close":	
  "00:00",	
  "open":	
  "07:00"},	
  
	
   	
  "Wednesday":	
  {"close":	
  "23:00",	
  "open":	
  "07:00"},	
  
	
   	
  "Thursday":	
  {"close":	
  "23:00",	
  "open":	
  "07:00"},	
  
	
   	
  "Sunday":	
  {"close":	
  "23:00",	
  "open":	
  "07:00"},	
  
	
   	
  "Saturday":	
  {"close":	
  "00:00",	
  "open":	
  "07:00"}	
  
	
  },	
  
	
  "open":	
  true,	
  
	
  "categories":	
  ["Breakfast	
  &	
  Brunch",	
  "Steakhouses",	
  "French",	
  "Restaurants"],	
  
	
  "city":	
  "Las	
  Vegas",	
  
	
  "review_count":	
  4084,	
  
	
  "name":	
  "Mon	
  Ami	
  Gabi",	
  
	
  "neighborhoods":	
  ["The	
  Strip"],	
  
	
  "longitude":	
  -­‐115.172588519464,	
  
®
© 2015 MapR Technologies 15
business.json (2)
	
  "state":	
  "NV",	
  
	
  "stars":	
  4.0,	
  
	
   	
  "attributes":	
  {	
  
	
   	
  "Alcohol":	
  "full_bar”,	
  
	
   	
   	
  "Noise	
  Level":	
  "average",	
  
	
   	
  "Has	
  TV":	
  false,	
  
	
   	
  "Attire":	
  "casual",	
  
	
   	
  "Ambience":	
  {	
  
	
   	
   	
  "romantic":	
  true,	
  
	
   	
   	
  "intimate":	
  false,	
  
	
   	
   	
  "touristy":	
  false,	
  
	
   	
   	
  "hipster":	
  false,	
  
	
   	
   	
   	
  "classy":	
  true,	
  
	
   	
   	
  "trendy":	
  false,	
  
	
   	
   	
   	
  "casual":	
  false	
  
	
   	
  },	
  
	
   	
  "Good	
  For":	
  {"dessert":	
  false,	
  "latenight":	
  false,	
  "lunch":	
  false,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "dinner":	
  true,	
  "breakfast":	
  false,	
  "brunch":	
  false},	
  
	
  }	
  
}	
  
®
© 2015 MapR Technologies 16
Use cases
LAS VEGAS
NEW
RESTAURANT
®
© 2015 MapR Technologies 17
NEW RESTAURANT
Customers
for opening
party
>	
  SELECT	
  name,	
  review_count	
  
	
  	
  FROM	
  dfs.yelp.`user.json`	
  
	
  	
  ORDER	
  BY	
  review_count	
  DESC	
  
	
  	
  LIMIT	
  50;	
  
	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  	
  	
  	
  name	
  	
  	
  	
  |	
  review_count	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  Victor	
  	
  	
  	
  	
  |	
  8062	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  Jennifer	
  	
  	
  |	
  4244	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  Anita	
  	
  	
  	
  	
  	
  |	
  3829	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  ......	
  	
  	
  	
  	
  |	
  ....	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  Eileen	
  	
  	
  	
  	
  |	
  1947	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  J	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  1946	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  Matt	
  	
  	
  	
  	
  	
  	
  |	
  1942	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
50	
  rows	
  selected	
  (1.16	
  seconds)	
  
®
© 2015 MapR Technologies 18
Cities
with most
businesses
NEW RESTAURANT
>	
  SELECT	
  state,	
  city,	
  COUNT(*)	
  AS	
  businesses	
  
	
  	
  	
  	
  FROM	
  dfs.yelp.`business.json`	
  
	
  	
  	
  	
  GROUP	
  BY	
  state,	
  city	
  
	
  	
  	
  	
  ORDER	
  BY	
  reviews	
  DESC	
  LIMIT	
  10;	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  	
  	
  state	
  	
  	
  	
  |	
  	
  	
  	
  city	
  	
  	
  	
  |	
  businesses	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  NV	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  Las	
  Vegas	
  	
  |	
  12021	
  	
  	
  	
  	
  	
  |	
  
|	
  AZ	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  Phoenix	
  	
  	
  	
  |	
  7499	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  AZ	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  Scottsdale	
  |	
  3605	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  EDH	
  	
  	
  	
  	
  	
  	
  	
  |	
  Edinburgh	
  	
  |	
  2804	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  AZ	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  Mesa	
  	
  	
  	
  	
  	
  	
  |	
  2041	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  AZ	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  Tempe	
  	
  	
  	
  	
  	
  |	
  2025	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  NV	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  Henderson	
  	
  |	
  1914	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  AZ	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  Chandler	
  	
  	
  |	
  1637	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  WI	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  Madison	
  	
  	
  	
  |	
  1630	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  AZ	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  Glendale	
  	
  	
  |	
  1196	
  	
  	
  	
  	
  	
  	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
®
© 2015 MapR Technologies 19
Use cases
LAS VEGAS
LAS VEGAS
RESTAURANT
®
© 2015 MapR Technologies 20
Open
restaurants
at 22:00
LAS VEGAS RESTAURANT
>	
  SELECT	
  name,	
  b.hours	
  
	
  	
  FROM	
  dfs.yelp.`business.json`	
  b	
  
	
  	
  WHERE	
  b.hours.Saturday.`open`	
  <	
  '22:00'	
  AND	
  
	
  	
  	
  	
  	
  	
  	
  	
  b.hours.Saturday.`close`	
  >	
  '22:00'	
  
	
  	
  LIMIT	
  1;	
  
	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  	
  	
  	
  name	
  	
  	
  	
  |	
  	
  	
  hours	
  	
  	
  	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  Chang	
  Jiang	
  Chinese	
  Kitchen	
  |	
  {"Tuesday":
{"close":"22:00","open":"11:00"},"Friday":
{"close":"22:30","open":"11:00"},"Monday":
{"close":"22:00","open":"11:00"},"Wednesday":
{"close":"22:00","open":"11:00"},"Thursday":
{"close":"22:00","open":"11:00"},"Sunday":
{"close":"21:00","open":"16:00"},"Saturday":{"close":"22:30","open":"11:00"}}	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
1	
  row	
  selected	
  (0.013	
  seconds)	
  
	
  
®
© 2015 MapR Technologies 21
Finding
hummus
at 22:00
LAS VEGAS RESTAURANT
>	
  SELECT	
  name,	
  stars,	
  b.hours.Wednesday,	
  categories	
  
	
  	
  FROM	
  dfs.yelp.`business.json`	
  b	
  
	
  	
  WHERE	
  b.hours.Wednesday.`open`	
  <	
  '22:00'	
  AND	
  
	
  	
  	
  	
  	
  	
  	
  	
  b.hours.Wednesday.`close`	
  >	
  '22:00'	
  AND	
  
	
  	
  	
  	
  	
  	
  	
  	
  REPEATED_CONTAINS(categories,	
  'Mediterranean')	
  
AND	
  
	
  	
  	
  	
  	
  	
  	
  	
  city	
  =	
  'Las	
  Vegas'	
  
	
  	
  	
  	
  ORDER	
  BY	
  stars	
  DESC	
  
	
  	
  	
  	
  LIMIT	
  1;	
  
	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  	
  	
  	
  name	
  	
  	
  	
  |	
  	
  	
  stars	
  	
  	
  	
  |	
  	
  	
  EXPR$2	
  	
  	
  |	
  categories	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  Marrakech	
  Moroccan	
  Restaurant	
  |	
  4.0	
  	
  	
  	
  	
  	
  	
  	
  |	
  {"close":"23:00","open":"17:30"}	
  |	
  
["Mediterranean","Middle	
  Eastern","Moroccan","Restaurants"]	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
1	
  row	
  selected	
  (2.185	
  seconds)	
  
®
© 2015 MapR Technologies 22
• Working with repeated values
APACHE DRILL
Unique benefits
®
© 2015 MapR Technologies 23
Flatten Repeated Values
>	
  SELECT	
  name,	
  categories	
  
	
  	
  FROM	
  dfs.yelp.`business.json`	
  LIMIT	
  2;	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  	
  	
  	
  name	
  	
  	
  	
  |	
  categories	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  Eric	
  Goldberg,	
  MD	
  |	
  ["Doctors","Health	
  &	
  Medical"]	
  |	
  
|	
  Pine	
  Cone	
  Restaurant	
  |	
  ["Restaurants"]	
  |	
  
|	
  Deforest	
  Family	
  Restaurant	
  |	
  ["American	
  (Traditional)","Restaurants"]	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
	
  
>	
  SELECT	
  name,	
  FLATTEN(categories)	
  AS	
  categories	
  
	
  	
  FROM	
  dfs.yelp.`business.json`	
  LIMIT	
  3;	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  	
  	
  	
  name	
  	
  	
  	
  |	
  categories	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  Eric	
  Goldberg,	
  MD	
  |	
  Doctors	
  	
  	
  	
  |	
  
|	
  Eric	
  Goldberg,	
  MD	
  |	
  Health	
  &	
  Medical	
  |	
  
|	
  Pine	
  Cone	
  Restaurant	
  |	
  Restaurants	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
®
© 2015 MapR Technologies 24
Most and Least Common Business Categories
>	
  SELECT	
  category,	
  COUNT(*)	
  AS	
  businesses	
  
	
  	
  FROM	
  (SELECT	
  name,	
  FLATTEN(categories)	
  AS	
  category	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  FROM	
  dfs.yelp.`business.json`)	
  
	
  	
  GROUP	
  BY	
  category	
  ORDER	
  BY	
  businesses	
  DESC;	
  
+------------+------------+
| category | businesses |
+------------+------------+
| Restaurants | 14303 |
| ............... |
| Firewood | 1 |
+------------+------------+
715 rows selected (3.439 seconds)	
  
	
  
>	
  SELECT	
  name,	
  categories	
  FROM	
  dfs.yelp.`business.json`	
  
	
  	
  WHERE	
  true	
  AND	
  REPEATED_CONTAINS(categories,	
  'Australian');	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  	
  	
  	
  name	
  	
  	
  	
  |	
  categories	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  The	
  Australian	
  AZ	
  |	
  ["Bars","Burgers","Nightlife","Australian","Sports	
  Bars","Restaurants"]	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
®
© 2015 MapR Technologies 25
• Views - Dynamic and Materialized
APACHE DRILL
®
© 2015 MapR Technologies 26
Create a view combining business and reviews datasets.
>	
  CREATE	
  OR	
  REPLACE	
  VIEW	
  dfs.tmp.BusinessReviews	
  AS	
  
	
  	
  	
  	
  SELECT	
  b.name,	
  b.stars,	
  r.votes.funny,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  r.votes.useful,	
  r.votes.cool,	
  r.`date`	
  
	
  	
  	
  	
  	
  	
  FROM	
  dfs.yelp.`business.json`	
  b,	
  dfs.yelp.`review.json`	
  r	
  
	
  	
  	
  	
  	
  	
  WHERE	
  r.business_id	
  =	
  b.business_id;	
  
	
  
+------------+------------+
| ok | summary |
+------------+------------+
| true | View 'BusinessReviews' created successfully in 'dfs.tmp' schema |
+------------+------------+
	
  
>	
  SELECT	
  COUNT(*)	
  AS	
  Total	
  FROM	
  dfs.tmp.BusinessReviews;	
  
	
  
+------------+
| Total |
+------------+
| 1125458 |
+------------+
®
© 2015 MapR Technologies 27
Materialized Views AKA Tables
>	
  ALTER	
  SESSION	
  SET	
  `store.format`	
  =	
  'parquet';	
  
	
  
>	
  CREATE	
  TABLE	
  dfs.tmp.BusinessReviewsTbl	
  AS	
  
	
  	
  	
  	
  SELECT	
  b.name,	
  b.stars,	
  r.votes.funny	
  funny,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  r.votes.useful	
  useful,	
  r.votes.cool	
  cool,	
  r.`date`	
  
	
  	
  	
  	
  	
  	
  FROM	
  dfs.yelp.`business.json`	
  b,	
  dfs.yelp.`review.json`	
  r	
  
	
  	
  	
  	
  	
  	
  WHERE	
  r.business_id	
  =	
  b.business_id;	
  
	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  	
  Fragment	
  	
  |	
  Number	
  of	
  records	
  written	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
|	
  1_0	
  	
  	
  	
  	
  	
  	
  	
  |	
  176448	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  1_1	
  	
  	
  	
  	
  	
  	
  	
  |	
  192439	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  1_2	
  	
  	
  	
  	
  	
  	
  	
  |	
  198625	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  1_3	
  	
  	
  	
  	
  	
  	
  	
  |	
  200863	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  1_4	
  	
  	
  	
  	
  	
  	
  	
  |	
  181420	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
|	
  1_5	
  	
  	
  	
  	
  	
  	
  	
  |	
  175663	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  |	
  
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+	
  
®
© 2015 MapR Technologies 28
DRILL ARCHITECTURE
Under the hood
®
© 2015 MapR Technologies 29
High Level Architecture
Cluster of commodity servers
–  Daemon (drillbit) on each node
ZooKeeper maintains ephemeral cluster membership information
–  Drillbit uses ZooKeeper to find other drillbits in the cluster
–  Client uses ZooKeeper to find drillbits
Built-in, optimistic query execution engine. Doesn’t require a
particular storage or execution system (MapReduce, Spark, Tez)
–  Better performance and manageability
Data processing unit is columnar record batches	
  
–  Enables schema flexibility with negligible performance impact
®
© 2015 MapR Technologies 30
Drill Maximizes Data Locality
Data Source Best Practice
HDFS or MapR-FS drillbit on each DataNode
HBase or MapR-DB drillbit on each RegionServer
MongoDB drillbit on each mongod node (when using replicas, run it on the replica node)
drillbit
DataNode/
RegionServer/
mongod
drillbit
DataNode/
RegionServer/
mongod
drillbit
DataNode/
RegionServer/
mongod
ZooKeeper
ZooKeeper
ZooKeeper
…
®
© 2015 MapR Technologies 31
Core Modules within drillbit	
  
SQL Parser
Hive
HBase
Distributed Cache
StoragePlugins
MongoDB
DFS
PhysicalPlan
ExecutionLogicalPlan Optimizer
RPC Endpoint
®
© 2015 MapR Technologies 32
SELECT * Query Execution
drillbit	
  
ZooKeeper
Client
(JDBC, ODBC,
REST)
1.  Find drillbits
(once per session)
3.  Create logical and physical execution plans
4.  Farm out execution of fragments to cluster
(completely distributed execution)
ZooKeeper
ZooKeeper
drillbit	
  drillbit	
  
2.  Submit query to
drillbit
5.  Return results
to client
* CTAS (CREATE TABLE AS SELECT) queries include steps 1-4
®
© 2015 MapR Technologies 33
Participate
•  Learn: http://drill.apache.org/
•  Download: http://drill.apache.org/download/
•  Ask Questions: user@drill.apache.org
•  Engage on Twitter: @ApacheDrill
®
© 2015 MapR Technologies 34
Thank You
@mapr maprtech
MapRTechnologies
maprtech
mapr-technologies
®
© 2015 MapR Technologies 35
Or Run Drill in Distributed Mode…
$	
  zkServer	
  start	
  
•  Make sure ZooKeeper (zkServer) is running:
•  Access the Web UI: http://localhost:8047
•  Connect a client to the cluster (e.g., sqlline):
•  Clients (like sqlline) connect to ZooKeeper to discover the cluster nodes
•  If you have multiple Drill clusters registered in one ZooKeeper ensemble, specify the desired
cluster in the JDBC connection string: jdbc:drill:zk=localhost:2181/drill/
<clustername>
•  Not sure if ZooKeeper is running? Run telnet	
  localhost	
  2181 and make sure it connects
•  Define the Drill cluster name and ZooKeeper nodes in conf/drill-­‐override.conf
•  Start drillbit:	
  
$	
  bin/drillbit.sh	
  start	
  
$	
  bin/sqlline	
  -­‐u	
  jdbc:drill:zk=localhost:2181	
  
®
© 2015 MapR Technologies 36
user.json
{	
  
	
  "yelping_since":	
  "2007-­‐08",	
  
	
  "votes":	
  {	
  
	
   	
  "funny":	
  198,	
  
	
   	
  "useful":	
  415,	
  
	
   	
  "cool":	
  206	
  
	
  },	
  
	
  "review_count":	
  283,	
  
	
  "name":	
  "Adele",	
  
	
  "user_id":	
  "9NJdKpRNwwaL4cvKq0cN6g",	
  
	
  "friends":	
  ["DrKQzBFAvxhyjLgbPSW2Qw",	
  "ebXx-­‐G5eFqWkfDuk22f81w",	
  "qWLezzHxOXN-­‐
GQdInixZzw"],	
  
	
  "fans":	
  10,	
  
	
  "average_stars":	
  3.6499999999999999,	
  
	
  "compliments":	
  {	
  
	
   	
  "funny":	
  4,	
  
	
   	
  "hot":	
  17,	
  
	
   	
  "cool":	
  20	
  
	
  },	
  
	
  "elite":	
  [2008,	
  2009,	
  2010,	
  2011,	
  2012,	
  2013,	
  2014]	
  
}	
  

More Related Content

Similar to Self service data exploration with apache drill

What and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual GrallWhat and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual Gralldistributed matters
 
Free Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillFree Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillMapR Technologies
 
Rethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillRethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillMapR Technologies
 
Apache Drill – Hands-On SQL References
Apache Drill – Hands-On SQL ReferencesApache Drill – Hands-On SQL References
Apache Drill – Hands-On SQL ReferencesMapR Technologies
 
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistHUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistSpagoWorld
 
Fosscon 2012 firewall workshop
Fosscon 2012 firewall workshopFosscon 2012 firewall workshop
Fosscon 2012 firewall workshopjvehent
 
Practical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondPractical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondIke Walker
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint MongoDB
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEOAltinity Ltd
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drilltshiran
 
Big Ruby 2014: A 4 Pack of Lightning Talks
Big Ruby 2014: A 4 Pack of Lightning TalksBig Ruby 2014: A 4 Pack of Lightning Talks
Big Ruby 2014: A 4 Pack of Lightning Talksthe_chrismo
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Neo4j Makes Graphs Easy: Nicole White
Neo4j Makes Graphs Easy: Nicole WhiteNeo4j Makes Graphs Easy: Nicole White
Neo4j Makes Graphs Easy: Nicole WhiteNeo4j
 
NoSQL's biggest lie: SQL never went away - Martin Esmann
NoSQL's biggest lie: SQL never went away - Martin EsmannNoSQL's biggest lie: SQL never went away - Martin Esmann
NoSQL's biggest lie: SQL never went away - Martin Esmanndistributed matters
 
Geolocation and Cassandra at Physi
Geolocation and Cassandra at PhysiGeolocation and Cassandra at Physi
Geolocation and Cassandra at PhysiCassandra Austin
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQLDag H. Wanvik
 
Blinded Stack Overflow: Just Another Common Technique
Blinded Stack Overflow: Just Another Common TechniqueBlinded Stack Overflow: Just Another Common Technique
Blinded Stack Overflow: Just Another Common TechniqueThomas Gregory
 

Similar to Self service data exploration with apache drill (20)

What and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual GrallWhat and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual Grall
 
Free Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillFree Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache Drill
 
Rethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillRethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache Drill
 
Apache Drill – Hands-On SQL References
Apache Drill – Hands-On SQL ReferencesApache Drill – Hands-On SQL References
Apache Drill – Hands-On SQL References
 
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistHUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
 
Fosscon 2012 firewall workshop
Fosscon 2012 firewall workshopFosscon 2012 firewall workshop
Fosscon 2012 firewall workshop
 
Practical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondPractical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and Beyond
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint Building and Scaling the Internet of Things with MongoDB at Vivint
Building and Scaling the Internet of Things with MongoDB at Vivint
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Big Ruby 2014: A 4 Pack of Lightning Talks
Big Ruby 2014: A 4 Pack of Lightning TalksBig Ruby 2014: A 4 Pack of Lightning Talks
Big Ruby 2014: A 4 Pack of Lightning Talks
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
Neo4j Makes Graphs Easy: Nicole White
Neo4j Makes Graphs Easy: Nicole WhiteNeo4j Makes Graphs Easy: Nicole White
Neo4j Makes Graphs Easy: Nicole White
 
NoSQL's biggest lie: SQL never went away - Martin Esmann
NoSQL's biggest lie: SQL never went away - Martin EsmannNoSQL's biggest lie: SQL never went away - Martin Esmann
NoSQL's biggest lie: SQL never went away - Martin Esmann
 
Geolocation and Cassandra at Physi
Geolocation and Cassandra at PhysiGeolocation and Cassandra at Physi
Geolocation and Cassandra at Physi
 
SQL window functions for MySQL
SQL window functions for MySQLSQL window functions for MySQL
SQL window functions for MySQL
 
Blinded Stack Overflow: Just Another Common Technique
Blinded Stack Overflow: Just Another Common TechniqueBlinded Stack Overflow: Just Another Common Technique
Blinded Stack Overflow: Just Another Common Technique
 

More from MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

More from MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Self service data exploration with apache drill

  • 1. ® © 2015 MapR Technologies 1 ® © 2015 MapR Technologies Self Service Data Exploration with Apache Drill
  • 2. ® © 2015 MapR Technologies 2 Data is doubling in size every two years
  • 3. ® © 2015 MapR Technologies 3 2011 2013 In 2020 it is estimated to be 44 zettabytes of data in the world 2020 Source: IDC Digital Universe 44ZETTABYTES* 4.4ZETTABYTES 1.8ZETTABYTES … *A zettabyte is 1 BILLION terabytes.
  • 4. ® © 2015 MapR Technologies 4 2011 2013 In 2020 it is estimated to be 44 zettabytes of data in the world 2020 Source: IDC Digital Universe 44ZETTABYTES* 4.4ZETTABYTES 1.8ZETTABYTES … 700 trillion *A zettabyte is 1 BILLION terabytes. 64GB Each person in the world had 100k of these iPhones
  • 5. ® © 2015 MapR Technologies 5 UNSTRUCTURED DATA 1980 2000 20101990 2020 Unstructured data will account for more than 80% of the data collected by organizations Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data TotalDataStored STRUCTURED DATA
  • 6. ® © 2015 MapR Technologies 6 Evolving distance to data Business (analysts, developers) “Plumbing” development Business (analysts, developers) Existing approaches require a middleman (IT) Data Data Data Business (analysts, developers) Modeling and transformations Map/Reduce Traditional SQL-on-Hadoop New SQL-on-Hadoop
  • 7. ® © 2015 MapR Technologies 7 SQL in a NoSchema World •  SQL •  BI (Tableau, MicroStrategy, etc.) •  Low latency •  Scalability •  Create and maintain schemas on: –  HDFS (Parquet, JSON, etc.) –  HBase –  MongoDB •  Transform or copy data 2 DON’T WANT WANT
  • 8. ® © 2015 MapR Technologies 8 • Schema-free scale-out query engine for Hadoop and NoSQL • Low latency • Extreme ease of use • Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs APACHE DRILL
  • 9. ® © 2015 MapR Technologies 9 Drill’s Data Model is Flexible HBase JSON BSON CSV TSV Parquet Avro Schema-lessFixed schema Flat Complex Flexibility Name! Gender! Age! Michael! M! 6! Jennifer! F! 3! {! name: {! first: Michael,! last: Smith! },! hobbies: [ski, soccer],! district: Los Altos! }! {! name: {! first: Jennifer,! last: Gates! },! hobbies: [sing],! preschool: CCLC! }! RDBMS/SQL-on-Hadoop table Apache Drill table Flexibility
  • 10. ® © 2015 MapR Technologies 10 Running Drill takes 10 minutes   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  full_name        |  position_title  |      salary      |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Sheri  Nowmer  |  President            |  80000.0        |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   1  row  selected  (0.417  seconds)   DOWNLOAD HTTP://DRILL.APACHE.ORG/DOWNLOAD EXTRACT $  tar  xf  apache-­‐drill-­‐0.7.0.tar.gz   $  cd  apache-­‐drill-­‐0.7.0   RUN $  bin/sqlline  -­‐u  jdbc:drill:zk=local   >  SELECT  full_name,  position_title,  salary      FROM  cp.`employee.json  `      LIMIT  1;  QUERY & step by step In SQL format
  • 11. ® © 2015 MapR Technologies 11 Introduce external data sources to Drill Ø  SELECT  *  FROM  dfs.root.`/ E:/drill/data/yelp/ review.json`;   Ø  SELECT  *  FROM   dfs.yelp.`review.json`   LIMIT  1;   Ø  USE  dfs.yelp;   Ø  SELECT  *  FROM   `review.json`  LIMIT  1;   Ø  SELECT  *  FROM  hbase.users   LIMIT  1;   Storage Plugin Provider Workspace Table files Path Path relative to workspace mongo Database Collection hive Database Table hbase Namespace Table Coordinates: Currently Supported Providers . . …
  • 12. ® © 2015 MapR Technologies 12 Introduce external data sources to Drill Example: Ø  SELECT  *  FROM  dfs.root.`/E:/drill/data/yelp/ review.json`;   Ø  SELECT  *  FROM  dfs.yelp.`review.json`  LIMIT  1;   Ø  USE  dfs.yelp;   Ø  SELECT  *  FROM  `review.json`  LIMIT  1;   Ø  SELECT  *  FROM  hbase.users  LIMIT  1;   Ø  SELECT  *  FROM  dfs.root.`/ E:/drill/data/yelp/ review.json`;   Ø  SELECT  *  FROM   dfs.yelp.`review.json`   LIMIT  1;   Ø  USE  dfs.yelp;   Ø  SELECT  *  FROM   `review.json`  LIMIT  1;   Ø  SELECT  *  FROM  hbase.users   LIMIT  1;   Storage Plugin Provider Workspace Table files Path Path relative to workspace mongo Database Collection hive Database Table hbase Namespace Table Coordinates: Currently Supported Providers . . …
  • 13. ® © 2015 MapR Technologies 13 {      "votes":  {"funny":  0,  "useful":  2,  "cool":  1},      "user_id":  "Xqd0DzHaiyRqVH3WRG7hzg",      "review_id":  "15SdjuK7DmYqUAj6rjGowg",      "stars":  5,      "date":  "2007-­‐05-­‐17",      "text":  "dr.  goldberg  offers  everything  ...",      "type":  "review",      "business_id":  "vcNAWiLM4dR7D2nwwJ7nCA"   }   Inventory: DFS Files
  • 14. ® © 2015 MapR Technologies 14 business.json (1) {    "business_id":  "4bEjOyTaDG24SY5TxsaUNQ",    "full_address":  "3655  Las  Vegas  Blvd  SnThe  StripnLas  Vegas,  NV  89109",    "hours":  {      "Monday":  {"close":  "23:00",  "open":  "07:00"},      "Tuesday":  {"close":  "23:00",  "open":  "07:00"},      "Friday":  {"close":  "00:00",  "open":  "07:00"},      "Wednesday":  {"close":  "23:00",  "open":  "07:00"},      "Thursday":  {"close":  "23:00",  "open":  "07:00"},      "Sunday":  {"close":  "23:00",  "open":  "07:00"},      "Saturday":  {"close":  "00:00",  "open":  "07:00"}    },    "open":  true,    "categories":  ["Breakfast  &  Brunch",  "Steakhouses",  "French",  "Restaurants"],    "city":  "Las  Vegas",    "review_count":  4084,    "name":  "Mon  Ami  Gabi",    "neighborhoods":  ["The  Strip"],    "longitude":  -­‐115.172588519464,  
  • 15. ® © 2015 MapR Technologies 15 business.json (2)  "state":  "NV",    "stars":  4.0,      "attributes":  {      "Alcohol":  "full_bar”,        "Noise  Level":  "average",      "Has  TV":  false,      "Attire":  "casual",      "Ambience":  {        "romantic":  true,        "intimate":  false,        "touristy":  false,        "hipster":  false,          "classy":  true,        "trendy":  false,          "casual":  false      },      "Good  For":  {"dessert":  false,  "latenight":  false,  "lunch":  false,                                                  "dinner":  true,  "breakfast":  false,  "brunch":  false},    }   }  
  • 16. ® © 2015 MapR Technologies 16 Use cases LAS VEGAS NEW RESTAURANT
  • 17. ® © 2015 MapR Technologies 17 NEW RESTAURANT Customers for opening party >  SELECT  name,  review_count      FROM  dfs.yelp.`user.json`      ORDER  BY  review_count  DESC      LIMIT  50;     +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |  review_count  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Victor          |  8062                  |   |  Jennifer      |  4244                  |   |  Anita            |  3829                  |   |  ......          |  ....                  |   |  Eileen          |  1947                  |   |  J                    |  1946                  |   |  Matt              |  1942                  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   50  rows  selected  (1.16  seconds)  
  • 18. ® © 2015 MapR Technologies 18 Cities with most businesses NEW RESTAURANT >  SELECT  state,  city,  COUNT(*)  AS  businesses          FROM  dfs.yelp.`business.json`          GROUP  BY  state,  city          ORDER  BY  reviews  DESC  LIMIT  10;   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |      state        |        city        |  businesses  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  NV                  |  Las  Vegas    |  12021            |   |  AZ                  |  Phoenix        |  7499              |   |  AZ                  |  Scottsdale  |  3605              |   |  EDH                |  Edinburgh    |  2804              |   |  AZ                  |  Mesa              |  2041              |   |  AZ                  |  Tempe            |  2025              |   |  NV                  |  Henderson    |  1914              |   |  AZ                  |  Chandler      |  1637              |   |  WI                  |  Madison        |  1630              |   |  AZ                  |  Glendale      |  1196              |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+  
  • 19. ® © 2015 MapR Technologies 19 Use cases LAS VEGAS LAS VEGAS RESTAURANT
  • 20. ® © 2015 MapR Technologies 20 Open restaurants at 22:00 LAS VEGAS RESTAURANT >  SELECT  name,  b.hours      FROM  dfs.yelp.`business.json`  b      WHERE  b.hours.Saturday.`open`  <  '22:00'  AND                  b.hours.Saturday.`close`  >  '22:00'      LIMIT  1;     +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |      hours        |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Chang  Jiang  Chinese  Kitchen  |  {"Tuesday": {"close":"22:00","open":"11:00"},"Friday": {"close":"22:30","open":"11:00"},"Monday": {"close":"22:00","open":"11:00"},"Wednesday": {"close":"22:00","open":"11:00"},"Thursday": {"close":"22:00","open":"11:00"},"Sunday": {"close":"21:00","open":"16:00"},"Saturday":{"close":"22:30","open":"11:00"}}  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   1  row  selected  (0.013  seconds)    
  • 21. ® © 2015 MapR Technologies 21 Finding hummus at 22:00 LAS VEGAS RESTAURANT >  SELECT  name,  stars,  b.hours.Wednesday,  categories      FROM  dfs.yelp.`business.json`  b      WHERE  b.hours.Wednesday.`open`  <  '22:00'  AND                  b.hours.Wednesday.`close`  >  '22:00'  AND                  REPEATED_CONTAINS(categories,  'Mediterranean')   AND                  city  =  'Las  Vegas'          ORDER  BY  stars  DESC          LIMIT  1;     +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |      stars        |      EXPR$2      |  categories  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Marrakech  Moroccan  Restaurant  |  4.0                |  {"close":"23:00","open":"17:30"}  |   ["Mediterranean","Middle  Eastern","Moroccan","Restaurants"]  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   1  row  selected  (2.185  seconds)  
  • 22. ® © 2015 MapR Technologies 22 • Working with repeated values APACHE DRILL Unique benefits
  • 23. ® © 2015 MapR Technologies 23 Flatten Repeated Values >  SELECT  name,  categories      FROM  dfs.yelp.`business.json`  LIMIT  2;   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |  categories  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Eric  Goldberg,  MD  |  ["Doctors","Health  &  Medical"]  |   |  Pine  Cone  Restaurant  |  ["Restaurants"]  |   |  Deforest  Family  Restaurant  |  ["American  (Traditional)","Restaurants"]  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+     >  SELECT  name,  FLATTEN(categories)  AS  categories      FROM  dfs.yelp.`business.json`  LIMIT  3;   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |  categories  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  Eric  Goldberg,  MD  |  Doctors        |   |  Eric  Goldberg,  MD  |  Health  &  Medical  |   |  Pine  Cone  Restaurant  |  Restaurants  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+  
  • 24. ® © 2015 MapR Technologies 24 Most and Least Common Business Categories >  SELECT  category,  COUNT(*)  AS  businesses      FROM  (SELECT  name,  FLATTEN(categories)  AS  category                          FROM  dfs.yelp.`business.json`)      GROUP  BY  category  ORDER  BY  businesses  DESC;   +------------+------------+ | category | businesses | +------------+------------+ | Restaurants | 14303 | | ............... | | Firewood | 1 | +------------+------------+ 715 rows selected (3.439 seconds)     >  SELECT  name,  categories  FROM  dfs.yelp.`business.json`      WHERE  true  AND  REPEATED_CONTAINS(categories,  'Australian');   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |        name        |  categories  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  The  Australian  AZ  |  ["Bars","Burgers","Nightlife","Australian","Sports  Bars","Restaurants"]  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+  
  • 25. ® © 2015 MapR Technologies 25 • Views - Dynamic and Materialized APACHE DRILL
  • 26. ® © 2015 MapR Technologies 26 Create a view combining business and reviews datasets. >  CREATE  OR  REPLACE  VIEW  dfs.tmp.BusinessReviews  AS          SELECT  b.name,  b.stars,  r.votes.funny,                        r.votes.useful,  r.votes.cool,  r.`date`              FROM  dfs.yelp.`business.json`  b,  dfs.yelp.`review.json`  r              WHERE  r.business_id  =  b.business_id;     +------------+------------+ | ok | summary | +------------+------------+ | true | View 'BusinessReviews' created successfully in 'dfs.tmp' schema | +------------+------------+   >  SELECT  COUNT(*)  AS  Total  FROM  dfs.tmp.BusinessReviews;     +------------+ | Total | +------------+ | 1125458 | +------------+
  • 27. ® © 2015 MapR Technologies 27 Materialized Views AKA Tables >  ALTER  SESSION  SET  `store.format`  =  'parquet';     >  CREATE  TABLE  dfs.tmp.BusinessReviewsTbl  AS          SELECT  b.name,  b.stars,  r.votes.funny  funny,                        r.votes.useful  useful,  r.votes.cool  cool,  r.`date`              FROM  dfs.yelp.`business.json`  b,  dfs.yelp.`review.json`  r              WHERE  r.business_id  =  b.business_id;     +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |    Fragment    |  Number  of  records  written  |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+   |  1_0                |  176448                                        |   |  1_1                |  192439                                        |   |  1_2                |  198625                                        |   |  1_3                |  200863                                        |   |  1_4                |  181420                                        |   |  1_5                |  175663                                        |   +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+  
  • 28. ® © 2015 MapR Technologies 28 DRILL ARCHITECTURE Under the hood
  • 29. ® © 2015 MapR Technologies 29 High Level Architecture Cluster of commodity servers –  Daemon (drillbit) on each node ZooKeeper maintains ephemeral cluster membership information –  Drillbit uses ZooKeeper to find other drillbits in the cluster –  Client uses ZooKeeper to find drillbits Built-in, optimistic query execution engine. Doesn’t require a particular storage or execution system (MapReduce, Spark, Tez) –  Better performance and manageability Data processing unit is columnar record batches   –  Enables schema flexibility with negligible performance impact
  • 30. ® © 2015 MapR Technologies 30 Drill Maximizes Data Locality Data Source Best Practice HDFS or MapR-FS drillbit on each DataNode HBase or MapR-DB drillbit on each RegionServer MongoDB drillbit on each mongod node (when using replicas, run it on the replica node) drillbit DataNode/ RegionServer/ mongod drillbit DataNode/ RegionServer/ mongod drillbit DataNode/ RegionServer/ mongod ZooKeeper ZooKeeper ZooKeeper …
  • 31. ® © 2015 MapR Technologies 31 Core Modules within drillbit   SQL Parser Hive HBase Distributed Cache StoragePlugins MongoDB DFS PhysicalPlan ExecutionLogicalPlan Optimizer RPC Endpoint
  • 32. ® © 2015 MapR Technologies 32 SELECT * Query Execution drillbit   ZooKeeper Client (JDBC, ODBC, REST) 1.  Find drillbits (once per session) 3.  Create logical and physical execution plans 4.  Farm out execution of fragments to cluster (completely distributed execution) ZooKeeper ZooKeeper drillbit  drillbit   2.  Submit query to drillbit 5.  Return results to client * CTAS (CREATE TABLE AS SELECT) queries include steps 1-4
  • 33. ® © 2015 MapR Technologies 33 Participate •  Learn: http://drill.apache.org/ •  Download: http://drill.apache.org/download/ •  Ask Questions: user@drill.apache.org •  Engage on Twitter: @ApacheDrill
  • 34. ® © 2015 MapR Technologies 34 Thank You @mapr maprtech MapRTechnologies maprtech mapr-technologies
  • 35. ® © 2015 MapR Technologies 35 Or Run Drill in Distributed Mode… $  zkServer  start   •  Make sure ZooKeeper (zkServer) is running: •  Access the Web UI: http://localhost:8047 •  Connect a client to the cluster (e.g., sqlline): •  Clients (like sqlline) connect to ZooKeeper to discover the cluster nodes •  If you have multiple Drill clusters registered in one ZooKeeper ensemble, specify the desired cluster in the JDBC connection string: jdbc:drill:zk=localhost:2181/drill/ <clustername> •  Not sure if ZooKeeper is running? Run telnet  localhost  2181 and make sure it connects •  Define the Drill cluster name and ZooKeeper nodes in conf/drill-­‐override.conf •  Start drillbit:   $  bin/drillbit.sh  start   $  bin/sqlline  -­‐u  jdbc:drill:zk=localhost:2181  
  • 36. ® © 2015 MapR Technologies 36 user.json {    "yelping_since":  "2007-­‐08",    "votes":  {      "funny":  198,      "useful":  415,      "cool":  206    },    "review_count":  283,    "name":  "Adele",    "user_id":  "9NJdKpRNwwaL4cvKq0cN6g",    "friends":  ["DrKQzBFAvxhyjLgbPSW2Qw",  "ebXx-­‐G5eFqWkfDuk22f81w",  "qWLezzHxOXN-­‐ GQdInixZzw"],    "fans":  10,    "average_stars":  3.6499999999999999,    "compliments":  {      "funny":  4,      "hot":  17,      "cool":  20    },    "elite":  [2008,  2009,  2010,  2011,  2012,  2013,  2014]   }