SlideShare a Scribd company logo
1 of 41
Download to read offline
©  2015.   All  Rights  Reserved.1
SQL-­on-­Everything  with  
Apache  Drill
Julien  Le  Dem,  Principal  Architect  at  Dremio
VP  Apache  Parquet
Apache  Pig  PMC
julien@dremio.com |  @J_
Big  Data  Apps  meetup
January  27,  2016
©  2015.   All  Rights  Reserved.2
From  Proprietary  to  Open  Source…
Teradata
Oracle
SQL Server
…
Apache Hive
Apache Drill
Open source technologies are emerging as
the preferred approach for large-scale SQL-
based analytics
©  2015.   All  Rights  Reserved.3
Apache  Drill:  Open  Source  Schema-­Free  SQL  Engine
Tableau,  Excel,  Qlik,  … Custom  Applications
MongoDB
CLI
HBase Elasticsearch* MapR
HDFS NAS Local  Files Amazon  S3
*  Currently  being  developed
RDBMS
Azure  Blob  Storage
Apache  Drill
Kudu Phoenix
©  2015.   All  Rights  Reserved.4
Apache  Drill:  Open  Source  Schema-­Free  SQL  Engine
Open  Source  Apache  Project
•Contributors  from  many  companies  including  
Dremio,  MapR and  Hortonworks
•3-­year  engineering  effort,  200K+  lines  of  
code
Innovative  Schema-­free  Engine
•Point-­and-­query  vs.  schema-­first
•No data loading, schemas or ETL
•Handles complex (eg, JSON) data natively
Extreme  Scale  &  Performance
•Scales  from  one  laptop  to  1000s  of  servers
•High  performance  via  columnar  execution  &  
dynamic  query  compilation
Extensible  Architecture
•Pluggable  high-­speed  datastore
connectors  (eg,  MongoDB,  Amazon  S3)
•Custom  operators  and  UDFs
©  2015.   All  Rights  Reserved.5
JSON  Model,  Columnar  Speed
JSON
BSON
MongoDB
HBase
NoSQL
Parquet
Avro
CSV
TSV
Schema-­lessFixed  schema
Flat  
Complex
Name Gender Age
Michael M 6
Jennifer F 3
{
name:  {
first:  Michael,
last:  Smith
},
hobbies:  [ski,  soccer],
district:  Los  Altos
}
{
name:  {
first:  Jennifer,
last:  Gates
},
hobbies:  [sing],
preschool:  CCLC
}
RDBMS/SQL-­on-­Hadoop table
Apache  Drill  table
©  2015.   All  Rights  Reserved.6
Drill  Supports  Schema  Discovery  On-­The-­Fly
• Fixed  schema
• Leverage  schema  in  centralized  
repository  (Hive  Metastore)
• Fixed  schema,  evolving  schema  or  
schema-­free
• Leverage  schema  in  centralized  
repository  or  self-­describing  data
2
SCHEMA  
BEFORE  READ
SCHEMA  ON  
THE  FLY
Schema  Discovered  On-­The-­FlySchema  Declared  In  Advance
SCHEMA  ON  
WRITE
©  2015.   All  Rights  Reserved.7
Apache  Drill  is  Not  Just  SQL-­on-­Hadoop
Drill SQL-­on-­Hadoop (Hive, Impala,  etc.)
Use  case Self-­service,  in-­situ,  SQL-­based analytics Teradata  offload
Deployment  model Standalone  or  co-­located  with  
NoSQL/Hadoop
Hadoop service
User  experience Point-­and-­query Ingest  data à define  schemas  à query
Data  model Schema-­free JSON  (like  MongoDB) Relational  (like Postgres)
Data  sources NoSQL,  Cloud  Storage, Hadoop,  SaaS,  
local  files  (including  multiple  instances)
A  single  Hadoop cluster
Data management Logical,  by  IT or  end-­users (self-­service) Physical, by  IT  only
1.0 availability Q2  2015 Q2  2013  or  earlier
©  2015.   All  Rights  Reserved.8
Omni-­SQL  (“SQL-­on-­Everything”)
Drill:  Omni-­SQL
Whereas  the  other  engines  we're  discussing  here  create  a  relational  database  
environment  on  top  of  Hadoop,  Drill  instead  enables  a  SQL  language  interface  to  
data  in  numerous  formats,  without  requiring  a  formal  schema  to  be  declared.  This  
enables  plug-­and-­play   discovery  over  a  huge  universe  of  data  without  
prerequisites  and  preparation.  So  while  Drill  uses  SQL,  and  can  connect  to  
Hadoop,  calling  it  SQL-­on-­Hadoop kind  of  misses  the  point.  A  better  name  might  
be  SQL-­on-­Everything,   with  very  low  setup  requirements.
Andrew  Brust,
“
”
©  2015.   All  Rights  Reserved.9
ARCHITECTURE
©  2015.   All  Rights  Reserved.10
Everything  Starts  With  a  Drillbit…
• High  performance  query  executor
• In-­memory  columnar  execution
• Directly  interacts  with  data,  acquiring  
knowledge  as  it  reads
• Built  to  leverage  large  amounts  of  memory
• Networked  or  not
• Exposes  ODBC,  JDBC,  REST
• Built-­in  Web  UI  and  CLI
• Extensible
Single  process  
(daemon  or  
CLI)
drillbit
©  2015.   All  Rights  Reserved.11
Data  Lake,  More  Like  Data  Maelstrom
Clustered  Services Desktops
HDFS HDFS HDFS
HBase HBase HBase
HDFS HDFS HDFS
ES ES ES
MongoDB MongoDB MongoDB
Cloud  Services
DynamoDB
Amazon  S3
Linux
Mac
Windows
MongoDB Cluster
Elasticsearch Cluster
Hadoop  Cluster
HBase Cluster
©  2015.   All  Rights  Reserved.12
Run  Drillbits Wherever;;  Whatever  Your  Data
Clustered  Services Desktops
HDFS HDFS HDFS
HBase HBase HBase
HDFS HDFS HDFS
ES ES ES
MongoDB MongoDB MongoDB
Cloud  Services
DynamoDB
Amazon  S3
Linux
Mac
Windows
drillbit drillbit drillbit
drillbit drillbit drillbit
drillbit drillbit drillbit
drillbit drillbit drillbit
drillbit drillbit
drillbit drillbit
drillbit drillbit
drillbit drillbit
drillbit
drillbit
drillbit
©  2015.   All  Rights  Reserved.13
Deployment  Modes
drillbit
sqlline
drillbit drillbit drillbit drillbit
zkServer zkServer zkServer
Embedded  Mode Distributed  Mode  (aka  Drill  Cluster)
sqlline BI  tools
1,  3  or  5  ZooKeeper nodes
1-­1000  Drill  nodes
Remote  clients
©  2015.   All  Rights  Reserved.14
Extensible  Datastore Architecture
Storage  Plugin  API
MongoDB
Plugin
File  Plugin
Execution  Engine
Format  Plugin  APIFileSystem API
HDFS
S3
MapR-­FS
Parquet
JSON
CSV
HBase
Plugin
Hive
Plugin
Chapter  2:  Connecting  to  Datastores
Kudu
Plugin
Phoenix
Plugin
©  2015.   All  Rights  Reserved.15
GETTING  STARTED
©  2015.   All  Rights  Reserved.16
Install  Drill
From  the  browser:  http://drill.apache.org/download
From  the  command  line:
(Also  make  sure  you  have  Java  1.7+  installed…)
$  curl  -­‐L  http://www.dremio.com/drill-­‐latest.tgz |  tar  xz
©  2015.   All  Rights  Reserved.17
Install  a  Local  MongoDB Instance
Install  and  run  MongoDB:  http://docs.mongodb.org/manual/installation/
Import  the  dataset  of  Yelp  businesses:
$  mongoimport -­‐-­‐db yelp  -­‐-­‐collection  business  -­‐-­‐drop  –file  
yelp/business.json
$  mongo
>  use  yelp;
>  db.business.findOne().pretty();
{
"_id"  :  ObjectId("55921ddfc6c0a4a2d8ef700c"),
…
}
©  2015.   All  Rights  Reserved.18
Test  Your  Drill  Setup
$  apache-­‐drill-­‐x.y.z/bin/drill-­‐embedded
Enable  the  MongoDB plugin  by  clicking  Enable  
in  the  Storage  tab:
Access  the  Drill  web  UI  at  localhost:8047:
Start  Drill  shell  with  an  embedded  drillbit daemon:
(If  you  are  using  the  tutorial  EC2  instance,  type  drill instead  of  …/drill-­‐embedded)
©  2015.   All  Rights  Reserved.19
Run  Your  First  Query
>  SELECT  name  FROM  mongo.yelp.business LIMIT  1;
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|   name               |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|  Eric  Goldberg,   MD   |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
>  SELECT  name  FROM  dfs.root.`/opt/tutorial/yelp/business.json`  
LIMIT  1;
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|   name               |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|  Eric  Goldberg,   MD   |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
©  2015.   All  Rights  Reserved.20
Referencing  a  Table
SELECT  *  FROM  production.website.users;
Chapter  3:  The  Universal  Namespace
Datastore Workspace Table
©  2015.   All  Rights  Reserved.21
Namespaces  &  Tables
Storage  Plugin Type Workspace Table
mongo Database Collection
hive Database Table
hbase Namespace Table
file  (HDFS cluster,  S3,  …) Directory File  or  directory
… … …
User  defines  these  in  the  
datastore configuration
©  2015.   All  Rights  Reserved.22
>  SELECT  *
FROM  dfs.root.`/opt/tutorial/yelp/review.json`   r,
mongo.yelp.business b
WHERE  r.business_id =  b.business_id;
Joining  Across  Datastores is  Easy!
Alias  to  a  specific  file  system  (S3,  HDFS,  local,  NAS)  
Alias  to  a  specific  MongoDB cluster
©  2015.   All  Rights  Reserved.23
Joining  Across  Datastores
• Local  file:  yelp/review.json (dfs.yelp.`review.json`)
• MongoDB collection:  yelp.business
(mongo.yelp.business)
Data
• What’s  the  name  of  the  business  with  the  most  reviews  
on  Yelp?
Question
©  2015.   All  Rights  Reserved.24
>  SELECT  b.name AS  name,  COUNT(*)  AS  reviews
FROM  dfs.tutorial.`yelp/review.json`  r,
mongo.yelp.business b
WHERE  r.business_id =  b.business_id
GROUP  BY  b.business_id,  b.name
ORDER  BY  reviews  DESC
LIMIT  3;
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|   name               |  reviews   |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|  Mon Ami Gabi |  3695         |
|  Earl  of  Sandwich   |  3263         |
|  Wicked  Spoon           |  3011         |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
©  2015.   All  Rights  Reserved.25
Accessing  Array  Elements
>  SELECT  categories  FROM  business  LIMIT  2;
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|   categories                                 |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|  ["American  (Traditional)","Restaurants"]   |
|  ["Chinese","Restaurants"]                                 |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
>  SELECT  categories[0]  FROM  business  LIMIT  2;
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|   EXPR$0                   |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|  American  (Traditional)   |
|  Chinese                                 |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
©  2015.   All  Rights  Reserved.26
Accessing  Data  in  Maps
>  SELECT  attributes  FROM  business;
>  SELECT  business.attributes FROM  business;
>  SELECT  b.attributes FROM  business  b;
>  SELECT  attributes.Parking FROM  business;
Error:  PARSE  ERROR:  Table  'attributes'  not  found
>  SELECT  business.attributes.Parking FROM  business;
>  SELECT  b.attributes.Parking FROM  business  b;
• Use  the  dot  (.)  notation  to  access  nested  fields
• Must  specify  the  table  name/alias  when  accessing  nested  fields
• <table>.<field>.<nested  field  1>.<nested  field  2>
• When  Drill  sees  one  or  more  dots,  it  assumes  the  first  is  a  table  
name/alias
©  2015.   All  Rights  Reserved.27
FLATTEN
• FLATTEN converts  single  record  with  array  field  into  multiple  records
– One  output  record  for  each  array  element
• Non  FLATTENed fields  are  repeated  in  each  of  the  output  records
>  SELECT  categories
FROM  business  LIMIT  2;
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|   categories                                 |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|  ["American  (Traditional)","Restaurants"]   |
|  ["Chinese","Restaurants"]                                  |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
>  SELECT  FLATTEN(categories)
FROM  business  LIMIT  4;
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|   EXPR$0                   |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|  American  (Traditional)   |
|  Restaurants                         |
|  Chinese                                 |
|  Restaurants                         |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
©  2015.   All  Rights  Reserved.28
Non-­FLATTENed Fields  are  Repeated
>  SELECT  name,  categories  FROM  business  LIMIT  2;
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|   name                         |   categories                                 |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|  Deforest  Family  Restaurant     |  ["American  (Traditional)","Restaurants"]   |
|  Chang  Jiang  Chinese  Kitchen   |  ["Chinese","Restaurants"]                                  |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
>  SELECT  name,  FLATTEN(categories)  FROM  business  LIMIT  4;
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|   name                         |   EXPR$1                   |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
|  Deforest  Family  Restaurant     |  American  (Traditional)   |
|  Deforest  Family  Restaurant     |  Restaurants                         |
|  Chang  Jiang  Chinese  Kitchen   |  Chinese                                 |
|  Chang  Jiang  Chinese  Kitchen   |  Restaurants                         |
+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
©  2015.   All  Rights  Reserved.29
BI
©  2015.   All  Rights  Reserved.30
ODBC  and  JDBC
• Drill  includes  standard  
ODBC/JDBC  drivers
– ODBC  for  native  apps
– JDBC  for  Java  apps
• User  installs  the  driver  
on  the  client
– The  same  machine  as  
the  BI  tool
• Driver  communicates  
with  Drill  cluster(s)
• Make  sure  driver  and  
cluster  are  compatible  
versions
Drill  Cluster
Drill  JDBC  Driver
TIBCO  Spotfire
Client
Drill  ODBC  Driver
Tableau
Client  (eg,  Laptop)
©  2015.   All  Rights  Reserved.31
Install  the  ODBC  Driver
• Download  &  install  the  Drill  
ODBC  driver  from  the  Drill  
website  (drill.apache.org)
• If  you  will  use  Tableau,  install  
Tableau  before  the  ODBC  driver  
or  get  the  TDC  file  separately
• Open  the  system’s  ODBC  Data  
Source  Administrator
©  2015.   All  Rights  Reserved.32
Configure  the  ODBC  Driver
• Two  connection  options:
– Connect  to  a  specific  node  
(drillbit)  in  the  Drill  cluster
– Connect  to  any  node  (drillbit)  in  
the  Drill  cluster
• Click  “Test…”  to  make  sure  the  
ODBC  driver  can  connect  to  Drill
©  2015.   All  Rights  Reserved.33
Connect  with  Tableau
©  2015.   All  Rights  Reserved.34
Choose  ODBC  DSN
©  2015.   All  Rights  Reserved.35
Choose  Drill  Workspace  &  Table
• sdfsd
All  <datastore>.<workspace>  
combinations  will  be  shown  here
In  a  file  system  datastore,  currently  
only  views  will  be  shown  here  (an  
upcoming  feature  will  enable  
selection  of  any  file/directory  
representing  a  dataset)
©  2015.   All  Rights  Reserved.36
©  2015.   All  Rights  Reserved.37
©  2015.   All  Rights  Reserved.38
REST
©  2015.   All  Rights  Reserved.39
Using  the  REST  API
Client  application  sends  a  properly  structured  HTTP  
request  to  any  drillbit in  the  cluster
curl  
-­‐-­‐header  "Content-­‐Type:  application/json"  
-­‐-­‐request  POST  
-­‐-­‐data  '{"queryType":  "SQL",  "query":  "SELECT  *  FROM  …"}'  
http://localhost:8047/query.json
Part Required Content
Method POST
Headers Content-­‐Type:  application/json
Payload JSON
©  2015.   All  Rights  Reserved.40
Run  a  Query  via  REST
$  curl  -­‐-­‐header  "Content-­‐Type:  application/json"  
-­‐-­‐request  POST  
-­‐-­‐data  '{"queryType":  "SQL",  "query":  "SELECT  name  FROM  
dfs.tutorial.`yelp/business.json`  LIMIT  3"}'  
http://localhost:8047/query.json |  python  -­‐m  json.tool
{
"columns":  ["name"],
"rows":  [
{"name":  "Eric  Goldberg,  MD"  },
{"name":  "Pine  Cone  Restaurant"},
{"name":  "Deforest  Family  Restaurant"}
]
}
©  2015.   All  Rights  Reserved.41
Thank  You!
• Download  at  drill.apache.org
• Ask  questions:
• user@drill.apache.org
• Tweet
• @ApacheDrill

More Related Content

What's hot

Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowDataWorks Summit/Hadoop Summit
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jDataWorks Summit
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitDataWorks Summit
 
Data Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet ArrowData Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet ArrowJulien Le Dem
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDataWorks Summit
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...Julien Le Dem
 

What's hot (20)

Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and HadoopEmpower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Data Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet ArrowData Eng Conf NY Nov 2016 Parquet Arrow
Data Eng Conf NY Nov 2016 Parquet Arrow
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
 

Viewers also liked

Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File IntroductionOwen O'Malley
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVROairisData
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Julien Le Dem
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
If you have your own Columnar format, stop now and use Parquet 😛
If you have your own Columnar format,  stop now and use Parquet  😛If you have your own Columnar format,  stop now and use Parquet  😛
If you have your own Columnar format, stop now and use Parquet 😛Julien Le Dem
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Wes McKinney
 

Viewers also liked (11)

Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File Introduction
 
Parquet overview
Parquet overviewParquet overview
Parquet overview
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
 
ORC Files
ORC FilesORC Files
ORC Files
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
If you have your own Columnar format, stop now and use Parquet 😛
If you have your own Columnar format,  stop now and use Parquet  😛If you have your own Columnar format,  stop now and use Parquet  😛
If you have your own Columnar format, stop now and use Parquet 😛
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
 

Similar to Sql on everything with drill

Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionMapR Technologies
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
Big SQL NYC Event December by Virender
Big SQL NYC Event December by VirenderBig SQL NYC Event December by Virender
Big SQL NYC Event December by Virendervithakur
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics systemModusOptimum
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
The Central View of your Data with Postgres
The Central View of your Data with PostgresThe Central View of your Data with Postgres
The Central View of your Data with PostgresEDB
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of ChoicePaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of ChoiceIsaac Christoffersen
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
What is A Cloud Stack in 2017
What is A Cloud Stack in 2017What is A Cloud Stack in 2017
What is A Cloud Stack in 2017Gaurav Roy
 
Docker based Hadoop Deployment
Docker based Hadoop DeploymentDocker based Hadoop Deployment
Docker based Hadoop DeploymentRakesh Saha
 
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
DEVNET-1141	Dynamic Dockerized Hadoop ProvisioningDEVNET-1141	Dynamic Dockerized Hadoop Provisioning
DEVNET-1141 Dynamic Dockerized Hadoop ProvisioningCisco DevNet
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 

Similar to Sql on everything with drill (20)

Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
Big SQL NYC Event December by Virender
Big SQL NYC Event December by VirenderBig SQL NYC Event December by Virender
Big SQL NYC Event December by Virender
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
The Central View of your Data with Postgres
The Central View of your Data with PostgresThe Central View of your Data with Postgres
The Central View of your Data with Postgres
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of ChoicePaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
What is A Cloud Stack in 2017
What is A Cloud Stack in 2017What is A Cloud Stack in 2017
What is A Cloud Stack in 2017
 
Docker based Hadoop Deployment
Docker based Hadoop DeploymentDocker based Hadoop Deployment
Docker based Hadoop Deployment
 
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
DEVNET-1141	Dynamic Dockerized Hadoop ProvisioningDEVNET-1141	Dynamic Dockerized Hadoop Provisioning
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 

More from Julien Le Dem

Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageJulien Le Dem
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & MarquezJulien Le Dem
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageJulien Le Dem
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Julien Le Dem
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Julien Le Dem
 
Strata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseStrata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseJulien Le Dem
 
From flat files to deconstructed database
From flat files to deconstructed databaseFrom flat files to deconstructed database
From flat files to deconstructed databaseJulien Le Dem
 
Strata NY 2017 Parquet Arrow roadmap
Strata NY 2017 Parquet Arrow roadmapStrata NY 2017 Parquet Arrow roadmap
Strata NY 2017 Parquet Arrow roadmapJulien Le Dem
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowJulien Le Dem
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowJulien Le Dem
 
Mule soft mar 2017 Parquet Arrow
Mule soft mar 2017 Parquet ArrowMule soft mar 2017 Parquet Arrow
Mule soft mar 2017 Parquet ArrowJulien Le Dem
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Julien Le Dem
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
Parquet Twitter Seattle open house
Parquet Twitter Seattle open houseParquet Twitter Seattle open house
Parquet Twitter Seattle open houseJulien Le Dem
 
Poster Hadoop summit 2011: pig embedding in scripting languages
Poster Hadoop summit 2011: pig embedding in scripting languagesPoster Hadoop summit 2011: pig embedding in scripting languages
Poster Hadoop summit 2011: pig embedding in scripting languagesJulien Le Dem
 
Embedding Pig in scripting languages
Embedding Pig in scripting languagesEmbedding Pig in scripting languages
Embedding Pig in scripting languagesJulien Le Dem
 

More from Julien Le Dem (17)

Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineage
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
 
Strata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseStrata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed database
 
From flat files to deconstructed database
From flat files to deconstructed databaseFrom flat files to deconstructed database
From flat files to deconstructed database
 
Strata NY 2017 Parquet Arrow roadmap
Strata NY 2017 Parquet Arrow roadmapStrata NY 2017 Parquet Arrow roadmap
Strata NY 2017 Parquet Arrow roadmap
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
 
Mule soft mar 2017 Parquet Arrow
Mule soft mar 2017 Parquet ArrowMule soft mar 2017 Parquet Arrow
Mule soft mar 2017 Parquet Arrow
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Parquet Twitter Seattle open house
Parquet Twitter Seattle open houseParquet Twitter Seattle open house
Parquet Twitter Seattle open house
 
Poster Hadoop summit 2011: pig embedding in scripting languages
Poster Hadoop summit 2011: pig embedding in scripting languagesPoster Hadoop summit 2011: pig embedding in scripting languages
Poster Hadoop summit 2011: pig embedding in scripting languages
 
Embedding Pig in scripting languages
Embedding Pig in scripting languagesEmbedding Pig in scripting languages
Embedding Pig in scripting languages
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Sql on everything with drill

  • 1. ©  2015.   All  Rights  Reserved.1 SQL-­on-­Everything  with   Apache  Drill Julien  Le  Dem,  Principal  Architect  at  Dremio VP  Apache  Parquet Apache  Pig  PMC julien@dremio.com |  @J_ Big  Data  Apps  meetup January  27,  2016
  • 2. ©  2015.   All  Rights  Reserved.2 From  Proprietary  to  Open  Source… Teradata Oracle SQL Server … Apache Hive Apache Drill Open source technologies are emerging as the preferred approach for large-scale SQL- based analytics
  • 3. ©  2015.   All  Rights  Reserved.3 Apache  Drill:  Open  Source  Schema-­Free  SQL  Engine Tableau,  Excel,  Qlik,  … Custom  Applications MongoDB CLI HBase Elasticsearch* MapR HDFS NAS Local  Files Amazon  S3 *  Currently  being  developed RDBMS Azure  Blob  Storage Apache  Drill Kudu Phoenix
  • 4. ©  2015.   All  Rights  Reserved.4 Apache  Drill:  Open  Source  Schema-­Free  SQL  Engine Open  Source  Apache  Project •Contributors  from  many  companies  including   Dremio,  MapR and  Hortonworks •3-­year  engineering  effort,  200K+  lines  of   code Innovative  Schema-­free  Engine •Point-­and-­query  vs.  schema-­first •No data loading, schemas or ETL •Handles complex (eg, JSON) data natively Extreme  Scale  &  Performance •Scales  from  one  laptop  to  1000s  of  servers •High  performance  via  columnar  execution  &   dynamic  query  compilation Extensible  Architecture •Pluggable  high-­speed  datastore connectors  (eg,  MongoDB,  Amazon  S3) •Custom  operators  and  UDFs
  • 5. ©  2015.   All  Rights  Reserved.5 JSON  Model,  Columnar  Speed JSON BSON MongoDB HBase NoSQL Parquet Avro CSV TSV Schema-­lessFixed  schema Flat   Complex Name Gender Age Michael M 6 Jennifer F 3 { name:  { first:  Michael, last:  Smith }, hobbies:  [ski,  soccer], district:  Los  Altos } { name:  { first:  Jennifer, last:  Gates }, hobbies:  [sing], preschool:  CCLC } RDBMS/SQL-­on-­Hadoop table Apache  Drill  table
  • 6. ©  2015.   All  Rights  Reserved.6 Drill  Supports  Schema  Discovery  On-­The-­Fly • Fixed  schema • Leverage  schema  in  centralized   repository  (Hive  Metastore) • Fixed  schema,  evolving  schema  or   schema-­free • Leverage  schema  in  centralized   repository  or  self-­describing  data 2 SCHEMA   BEFORE  READ SCHEMA  ON   THE  FLY Schema  Discovered  On-­The-­FlySchema  Declared  In  Advance SCHEMA  ON   WRITE
  • 7. ©  2015.   All  Rights  Reserved.7 Apache  Drill  is  Not  Just  SQL-­on-­Hadoop Drill SQL-­on-­Hadoop (Hive, Impala,  etc.) Use  case Self-­service,  in-­situ,  SQL-­based analytics Teradata  offload Deployment  model Standalone  or  co-­located  with   NoSQL/Hadoop Hadoop service User  experience Point-­and-­query Ingest  data à define  schemas  à query Data  model Schema-­free JSON  (like  MongoDB) Relational  (like Postgres) Data  sources NoSQL,  Cloud  Storage, Hadoop,  SaaS,   local  files  (including  multiple  instances) A  single  Hadoop cluster Data management Logical,  by  IT or  end-­users (self-­service) Physical, by  IT  only 1.0 availability Q2  2015 Q2  2013  or  earlier
  • 8. ©  2015.   All  Rights  Reserved.8 Omni-­SQL  (“SQL-­on-­Everything”) Drill:  Omni-­SQL Whereas  the  other  engines  we're  discussing  here  create  a  relational  database   environment  on  top  of  Hadoop,  Drill  instead  enables  a  SQL  language  interface  to   data  in  numerous  formats,  without  requiring  a  formal  schema  to  be  declared.  This   enables  plug-­and-­play   discovery  over  a  huge  universe  of  data  without   prerequisites  and  preparation.  So  while  Drill  uses  SQL,  and  can  connect  to   Hadoop,  calling  it  SQL-­on-­Hadoop kind  of  misses  the  point.  A  better  name  might   be  SQL-­on-­Everything,   with  very  low  setup  requirements. Andrew  Brust, “ ”
  • 9. ©  2015.   All  Rights  Reserved.9 ARCHITECTURE
  • 10. ©  2015.   All  Rights  Reserved.10 Everything  Starts  With  a  Drillbit… • High  performance  query  executor • In-­memory  columnar  execution • Directly  interacts  with  data,  acquiring   knowledge  as  it  reads • Built  to  leverage  large  amounts  of  memory • Networked  or  not • Exposes  ODBC,  JDBC,  REST • Built-­in  Web  UI  and  CLI • Extensible Single  process   (daemon  or   CLI) drillbit
  • 11. ©  2015.   All  Rights  Reserved.11 Data  Lake,  More  Like  Data  Maelstrom Clustered  Services Desktops HDFS HDFS HDFS HBase HBase HBase HDFS HDFS HDFS ES ES ES MongoDB MongoDB MongoDB Cloud  Services DynamoDB Amazon  S3 Linux Mac Windows MongoDB Cluster Elasticsearch Cluster Hadoop  Cluster HBase Cluster
  • 12. ©  2015.   All  Rights  Reserved.12 Run  Drillbits Wherever;;  Whatever  Your  Data Clustered  Services Desktops HDFS HDFS HDFS HBase HBase HBase HDFS HDFS HDFS ES ES ES MongoDB MongoDB MongoDB Cloud  Services DynamoDB Amazon  S3 Linux Mac Windows drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit drillbit
  • 13. ©  2015.   All  Rights  Reserved.13 Deployment  Modes drillbit sqlline drillbit drillbit drillbit drillbit zkServer zkServer zkServer Embedded  Mode Distributed  Mode  (aka  Drill  Cluster) sqlline BI  tools 1,  3  or  5  ZooKeeper nodes 1-­1000  Drill  nodes Remote  clients
  • 14. ©  2015.   All  Rights  Reserved.14 Extensible  Datastore Architecture Storage  Plugin  API MongoDB Plugin File  Plugin Execution  Engine Format  Plugin  APIFileSystem API HDFS S3 MapR-­FS Parquet JSON CSV HBase Plugin Hive Plugin Chapter  2:  Connecting  to  Datastores Kudu Plugin Phoenix Plugin
  • 15. ©  2015.   All  Rights  Reserved.15 GETTING  STARTED
  • 16. ©  2015.   All  Rights  Reserved.16 Install  Drill From  the  browser:  http://drill.apache.org/download From  the  command  line: (Also  make  sure  you  have  Java  1.7+  installed…) $  curl  -­‐L  http://www.dremio.com/drill-­‐latest.tgz |  tar  xz
  • 17. ©  2015.   All  Rights  Reserved.17 Install  a  Local  MongoDB Instance Install  and  run  MongoDB:  http://docs.mongodb.org/manual/installation/ Import  the  dataset  of  Yelp  businesses: $  mongoimport -­‐-­‐db yelp  -­‐-­‐collection  business  -­‐-­‐drop  –file   yelp/business.json $  mongo >  use  yelp; >  db.business.findOne().pretty(); { "_id"  :  ObjectId("55921ddfc6c0a4a2d8ef700c"), … }
  • 18. ©  2015.   All  Rights  Reserved.18 Test  Your  Drill  Setup $  apache-­‐drill-­‐x.y.z/bin/drill-­‐embedded Enable  the  MongoDB plugin  by  clicking  Enable   in  the  Storage  tab: Access  the  Drill  web  UI  at  localhost:8047: Start  Drill  shell  with  an  embedded  drillbit daemon: (If  you  are  using  the  tutorial  EC2  instance,  type  drill instead  of  …/drill-­‐embedded)
  • 19. ©  2015.   All  Rights  Reserved.19 Run  Your  First  Query >  SELECT  name  FROM  mongo.yelp.business LIMIT  1; +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |   name               | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |  Eric  Goldberg,   MD   | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ >  SELECT  name  FROM  dfs.root.`/opt/tutorial/yelp/business.json`   LIMIT  1; +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |   name               | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |  Eric  Goldberg,   MD   | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
  • 20. ©  2015.   All  Rights  Reserved.20 Referencing  a  Table SELECT  *  FROM  production.website.users; Chapter  3:  The  Universal  Namespace Datastore Workspace Table
  • 21. ©  2015.   All  Rights  Reserved.21 Namespaces  &  Tables Storage  Plugin Type Workspace Table mongo Database Collection hive Database Table hbase Namespace Table file  (HDFS cluster,  S3,  …) Directory File  or  directory … … … User  defines  these  in  the   datastore configuration
  • 22. ©  2015.   All  Rights  Reserved.22 >  SELECT  * FROM  dfs.root.`/opt/tutorial/yelp/review.json`   r, mongo.yelp.business b WHERE  r.business_id =  b.business_id; Joining  Across  Datastores is  Easy! Alias  to  a  specific  file  system  (S3,  HDFS,  local,  NAS)   Alias  to  a  specific  MongoDB cluster
  • 23. ©  2015.   All  Rights  Reserved.23 Joining  Across  Datastores • Local  file:  yelp/review.json (dfs.yelp.`review.json`) • MongoDB collection:  yelp.business (mongo.yelp.business) Data • What’s  the  name  of  the  business  with  the  most  reviews   on  Yelp? Question
  • 24. ©  2015.   All  Rights  Reserved.24 >  SELECT  b.name AS  name,  COUNT(*)  AS  reviews FROM  dfs.tutorial.`yelp/review.json`  r, mongo.yelp.business b WHERE  r.business_id =  b.business_id GROUP  BY  b.business_id,  b.name ORDER  BY  reviews  DESC LIMIT  3; +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |   name               |  reviews   | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |  Mon Ami Gabi |  3695         | |  Earl  of  Sandwich   |  3263         | |  Wicked  Spoon           |  3011         | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
  • 25. ©  2015.   All  Rights  Reserved.25 Accessing  Array  Elements >  SELECT  categories  FROM  business  LIMIT  2; +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |   categories                                 | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |  ["American  (Traditional)","Restaurants"]   | |  ["Chinese","Restaurants"]                                 | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ >  SELECT  categories[0]  FROM  business  LIMIT  2; +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |   EXPR$0                   | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |  American  (Traditional)   | |  Chinese                                 | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
  • 26. ©  2015.   All  Rights  Reserved.26 Accessing  Data  in  Maps >  SELECT  attributes  FROM  business; >  SELECT  business.attributes FROM  business; >  SELECT  b.attributes FROM  business  b; >  SELECT  attributes.Parking FROM  business; Error:  PARSE  ERROR:  Table  'attributes'  not  found >  SELECT  business.attributes.Parking FROM  business; >  SELECT  b.attributes.Parking FROM  business  b; • Use  the  dot  (.)  notation  to  access  nested  fields • Must  specify  the  table  name/alias  when  accessing  nested  fields • <table>.<field>.<nested  field  1>.<nested  field  2> • When  Drill  sees  one  or  more  dots,  it  assumes  the  first  is  a  table   name/alias
  • 27. ©  2015.   All  Rights  Reserved.27 FLATTEN • FLATTEN converts  single  record  with  array  field  into  multiple  records – One  output  record  for  each  array  element • Non  FLATTENed fields  are  repeated  in  each  of  the  output  records >  SELECT  categories FROM  business  LIMIT  2; +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |   categories                                 | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |  ["American  (Traditional)","Restaurants"]   | |  ["Chinese","Restaurants"]                                 | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ >  SELECT  FLATTEN(categories) FROM  business  LIMIT  4; +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |   EXPR$0                   | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |  American  (Traditional)   | |  Restaurants                         | |  Chinese                                 | |  Restaurants                         | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
  • 28. ©  2015.   All  Rights  Reserved.28 Non-­FLATTENed Fields  are  Repeated >  SELECT  name,  categories  FROM  business  LIMIT  2; +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |   name                         |   categories                                 | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |  Deforest  Family  Restaurant     |  ["American  (Traditional)","Restaurants"]   | |  Chang  Jiang  Chinese  Kitchen   |  ["Chinese","Restaurants"]                                 | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ >  SELECT  name,  FLATTEN(categories)  FROM  business  LIMIT  4; +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |   name                         |   EXPR$1                   | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+ |  Deforest  Family  Restaurant     |  American  (Traditional)   | |  Deforest  Family  Restaurant     |  Restaurants                         | |  Chang  Jiang  Chinese  Kitchen   |  Chinese                                 | |  Chang  Jiang  Chinese  Kitchen   |  Restaurants                         | +-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐+
  • 29. ©  2015.   All  Rights  Reserved.29 BI
  • 30. ©  2015.   All  Rights  Reserved.30 ODBC  and  JDBC • Drill  includes  standard   ODBC/JDBC  drivers – ODBC  for  native  apps – JDBC  for  Java  apps • User  installs  the  driver   on  the  client – The  same  machine  as   the  BI  tool • Driver  communicates   with  Drill  cluster(s) • Make  sure  driver  and   cluster  are  compatible   versions Drill  Cluster Drill  JDBC  Driver TIBCO  Spotfire Client Drill  ODBC  Driver Tableau Client  (eg,  Laptop)
  • 31. ©  2015.   All  Rights  Reserved.31 Install  the  ODBC  Driver • Download  &  install  the  Drill   ODBC  driver  from  the  Drill   website  (drill.apache.org) • If  you  will  use  Tableau,  install   Tableau  before  the  ODBC  driver   or  get  the  TDC  file  separately • Open  the  system’s  ODBC  Data   Source  Administrator
  • 32. ©  2015.   All  Rights  Reserved.32 Configure  the  ODBC  Driver • Two  connection  options: – Connect  to  a  specific  node   (drillbit)  in  the  Drill  cluster – Connect  to  any  node  (drillbit)  in   the  Drill  cluster • Click  “Test…”  to  make  sure  the   ODBC  driver  can  connect  to  Drill
  • 33. ©  2015.   All  Rights  Reserved.33 Connect  with  Tableau
  • 34. ©  2015.   All  Rights  Reserved.34 Choose  ODBC  DSN
  • 35. ©  2015.   All  Rights  Reserved.35 Choose  Drill  Workspace  &  Table • sdfsd All  <datastore>.<workspace>   combinations  will  be  shown  here In  a  file  system  datastore,  currently   only  views  will  be  shown  here  (an   upcoming  feature  will  enable   selection  of  any  file/directory   representing  a  dataset)
  • 36. ©  2015.   All  Rights  Reserved.36
  • 37. ©  2015.   All  Rights  Reserved.37
  • 38. ©  2015.   All  Rights  Reserved.38 REST
  • 39. ©  2015.   All  Rights  Reserved.39 Using  the  REST  API Client  application  sends  a  properly  structured  HTTP   request  to  any  drillbit in  the  cluster curl   -­‐-­‐header  "Content-­‐Type:  application/json"   -­‐-­‐request  POST   -­‐-­‐data  '{"queryType":  "SQL",  "query":  "SELECT  *  FROM  …"}'   http://localhost:8047/query.json Part Required Content Method POST Headers Content-­‐Type:  application/json Payload JSON
  • 40. ©  2015.   All  Rights  Reserved.40 Run  a  Query  via  REST $  curl  -­‐-­‐header  "Content-­‐Type:  application/json"   -­‐-­‐request  POST   -­‐-­‐data  '{"queryType":  "SQL",  "query":  "SELECT  name  FROM   dfs.tutorial.`yelp/business.json`  LIMIT  3"}'   http://localhost:8047/query.json |  python  -­‐m  json.tool { "columns":  ["name"], "rows":  [ {"name":  "Eric  Goldberg,  MD"  }, {"name":  "Pine  Cone  Restaurant"}, {"name":  "Deforest  Family  Restaurant"} ] }
  • 41. ©  2015.   All  Rights  Reserved.41 Thank  You! • Download  at  drill.apache.org • Ask  questions: • user@drill.apache.org • Tweet • @ApacheDrill