Submit Search
Upload
Drilling into data with Apache Drill
•
0 likes
•
247 views
MapR Technologies
Follow
Tug at the Java User Group in Ukraine
Read less
Read more
Technology
Report
Report
1 of 42
Download now
Download to read offline
Recommended
Converging your data landscape
Converging your data landscape
MapR Technologies
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
Recommended
Converging your data landscape
Converging your data landscape
MapR Technologies
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
MapR Technologies
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
MapR Technologies
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
MapR Technologies
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR Technologies
MapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
MapR Technologies
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
MapR Technologies
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR Technologies
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR Technologies
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
MapR Technologies
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
MapR Technologies
The Keys to Digital Transformation
The Keys to Digital Transformation
MapR Technologies
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Antenna Manufacturer Coco
More Related Content
More from MapR Technologies
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
MapR Technologies
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
MapR Technologies
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
MapR Technologies
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR Technologies
MapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
MapR Technologies
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
MapR Technologies
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR Technologies
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR Technologies
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
MapR Technologies
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
MapR Technologies
The Keys to Digital Transformation
The Keys to Digital Transformation
MapR Technologies
More from MapR Technologies
(20)
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR and Cisco Make IT Better
MapR and Cisco Make IT Better
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
The Keys to Digital Transformation
The Keys to Digital Transformation
Recently uploaded
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Antenna Manufacturer Coco
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
hans926745
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Evaluating the top large language models.pdf
Evaluating the top large language models.pdf
ChristopherTHyatt
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Enterprise Knowledge
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
Recently uploaded
(20)
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Evaluating the top large language models.pdf
Evaluating the top large language models.pdf
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Drilling into data with Apache Drill
1.
© 2015 MapR
Technologies ‹#›© 2015 MapR Technologies Tugdual Grall Technical Evangelist @tgrall Drilling into data with @ApacheDrill Java User Group of Ukraine (JUG UA) Dec, 16, 2015
2.
© 2015 MapR
Technologies ‹#›@tgrall {“about” : “me”} Tugdual “Tug” Grall • MapR • Technical Evangelist • MongoDB • Technical Evangelist • Couchbase • Technical Evangelist • eXo • CTO • Oracle • Developer/Product Manager • Mainly Java/SOA • Developer in consulting firms • Web • @tgrall • http://tgrall.github.io • tgrall • NantesJUG co-founder • Pet Project : • http://www.resultri.com • tug@mapr.com • tugdual@gmail.com
3.
© 2015 MapR
Technologies ‹#›@tgrall Ingest Store Process Consume
4.
© 2015 MapR
Technologies ‹#› The MapR Distribution including Apache Hadoop APACHE HADOOP AND OSS ECOSYSTEM Security YARN Spark Streaming Storm StreamingNoSQL & Search Sahara Provisioning & Coordination ML, Graph Mahout MLLib GraphX EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Pig Spark Batch MapReduce v1 & v2 HBase Solr Hive Impala Spark SQL Drill SQL Sentry Oozie ZooKeeperSqoop Flume Data Integration & Access HttpFS Hue Management Data HubEnterprise Grade Operational Data PlatformMapR-FS MapR-DB
5.
© 2015 MapR
Technologies ‹#›@tgrall Agenda • Why Drill? • Some Drilling with Open Data • How does it work?
6.
© 2015 MapR
Technologies ‹#›@tgrall 1980 2000 20101990 2020 Fixed schema DBA controls structure Dynamic / Flexible schema Application controls structure NON-RELATIONAL DATASTORESRELATIONAL DATABASES GBs-TBs TBs-PBsVolume Database Data Increasingly Stored in Non-Relational Datastores Structure Development Structured Structured, semi-structured and unstructured Planned (release cycle = months-years) Iterative (release cycle = days-weeks)
7.
© 2015 MapR
Technologies ‹#›@tgrall How To Bring SQL to Non-Relational Data Stores? Familiarity of SQL Agility of NoSQL • ANSI SQL semantics • Low latency • Integrated with Tools/Applications • No schema management – HDFS (Parquet, JSON, etc.) – HBase – … • No transformation – No silos of data • Ease of use
8.
© 2015 MapR
Technologies ‹#›@tgrall Drill Supports Schema Discovery On-The-Fly • Fixed schema • Leverage schema in centralized repository (Hive Metastore) • Fixed schema, evolving schema or schema-less • Leverage schema in centralized repository or self-describing data 2Schema Discovered On-The-FlySchema Declared In Advance SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY
9.
© 2015 MapR
Technologies ‹#›@tgrall • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications 50+ contributors 150+ years of experience building databases and distributed systems Contributing to Apache Drill
10.
© 2015 MapR
Technologies ‹#›@tgrall - Sub-directory - HBase namespace - Hive database - Database Drill Enables ‘SQL-on-Everything’ SELECT * FROM dfs.yelp.`business.json` Workspace - Pathnames - Hive table - HBase table - Table Table - DFS (Text, Parquet, JSON) - HBase/MapR-DB - Hive Metastore/HCatalog - RDBMS using JDBC - Easy API to go beyond Hadoop Storage plugin instance
11.
© 2015 MapR
Technologies ‹#›@tgrall Drill’s Data Model is Flexible JSON BSON HBase Parquet Avro CSV TSV Dynamic schema Fixed schema Complex Flat Flexibility Name Gender Age Michael M 6 Jennifer F 3 RDBMS/SQL-on-Hadoop table Flexibility Apache Drill table { name: { first: Michael, last: Smith }, hobbies: [ski, soccer], district: Los Altos } { name: { first: Jennifer, last: Gates }, hobbies: [sing], preschool: CCLC }
12.
© 2015 MapR
Technologies ‹#›@tgrall Drill into Data 12 http://www.yelp.com/dataset_challenge
13.
© 2015 MapR
Technologies ‹#›@tgrall Business dataset { "business_id": "4bEjOyTaDG24SY5TxsaUNQ", "full_address": "3655 Las Vegas Blvd SnThe StripnLas Vegas, NV 89109", "hours": { "Monday": {"close": "23:00", "open": "07:00"}, "Tuesday": {"close": "23:00", "open": "07:00"}, "Friday": {"close": "00:00", "open": "07:00"}, "Wednesday": {"close": "23:00", "open": "07:00"}, "Thursday": {"close": "23:00", "open": "07:00"}, "Sunday": {"close": "23:00", "open": "07:00"}, "Saturday": {"close": "00:00", "open": "07:00"} }, "open": true, "categories": ["Breakfast & Brunch", "Steakhouses", "French", "Restaurants"], "city": "Las Vegas", "review_count": 4084, "name": "Mon Ami Gabi", "neighborhoods": ["The Strip"], "longitude": -115.172588519464, "state": "NV", "stars": 4.0, "attributes": { "Alcohol": "full_bar”, "Noise Level": "average", "Has TV": false, "Attire": "casual", "Ambience": { "romantic": true, "intimate": false, "touristy": false, "hipster": false, "classy": true, "trendy": false, "casual": false }, "Good For": {"dessert": false, "latenight": false, "lunch": false, "dinner": true, "breakfast": false, "brunch": false}, } }
14.
© 2015 MapR
Technologies ‹#›@tgrall Reviews dataset { "votes": {"funny": 0, "useful": 2, "cool": 1}, "user_id": "Xqd0DzHaiyRqVH3WRG7hzg", "review_id": "15SdjuK7DmYqUAj6rjGowg", "stars": 5, "date": "2007-05-17", "text": "dr. goldberg offers everything ...", "type": "review", "business_id": "vcNAWiLM4dR7D2nwwJ7nCA" }
15.
© 2015 MapR
Technologies ‹#›@tgrall ` $ tar -xvzf apache-drill-1.3.0.tar.gz $ bin/sqlline -u jdbc:drill:zk=local $ bin/drill-embedded > SELECT state, city, count(*) AS businesses FROM dfs.yelp.`business.json.gz` GROUP BY state, city ORDER BY businesses DESC LIMIT 10; +------------+------------+-------------+ | state | city | businesses | +------------+------------+-------------+ | NV | Las Vegas | 12021 | | AZ | Phoenix | 7499 | | AZ | Scottsdale | 3605 | | EDH | Edinburgh | 2804 | | AZ | Mesa | 2041 | | AZ | Tempe | 2025 | | NV | Henderson | 1914 | | AZ | Chandler | 1637 | | WI | Madison | 1630 | | AZ | Glendale | 1196 | +------------+------------+-------------+
16.
© 2015 MapR
Technologies ‹#›@tgrall Intuitive SQL Access to Complex Data // It’s Friday 10pm in Vegas and looking for Hummus > SELECT name, stars, b.hours.Friday friday, categories FROM dfs.yelp.`business.json` b WHERE b.hours.Friday.`open` < '22:00' AND b.hours.Friday.`close` > '22:00' AND REPEATED_CONTAINS(categories, 'Mediterranean') AND city = 'Las Vegas' ORDER BY stars DESC LIMIT 2; +------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+ | name | stars | friday | categories | +------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+ | Khoury's Mediterranean Restaurant | 4.0 | {"close":"23:00","open":"11:00"} | ["Greek","Mediterranean","Middle Eastern","Restaurants"] | | Olives | 4.0 | {"close":"22:30","open":"11:00"} | ["Bars","Mediterranean","Nightlife","Restaurants"] | +------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+
17.
© 2015 MapR
Technologies ‹#›@tgrall ANSI SQL Compatibility //Get top cool rated businesses ➢ SELECT b.name from dfs.yelp.`business.json` b WHERE b.business_id IN (SELECT r.business_id FROM dfs.yelp.`review.json` r GROUP BY r.business_id HAVING SUM(r.votes.cool) > 2000 ORDER BY SUM(r.votes.cool) DESC); +--------------------------------+ | name | +--------------------------------+ | Earl of Sandwich | | XS Nightclub | | The Cosmopolitan of Las Vegas | | Wicked Spoon | | Bacchanal Buffet | +--------------------------------+
18.
© 2015 MapR
Technologies ‹#›@tgrall Logical Views //Create a view combining business and reviews datasets > CREATE OR REPLACE VIEW dfs.tmp.BusinessReviews AS SELECT b.name, b.stars, r.votes.funny, r.votes.useful, r.votes.cool, r.`date` FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r WHERE r.business_id = b.business_id; +-------+-------------------------------------------------------------------+ | ok | summary | +-------+-------------------------------------------------------------------+ | true | View 'BusinessReviews' replaced successfully in 'dfs.tmp' schema | +-------+-------------------------------------------------------------------+ > SELECT COUNT(*) AS Total FROM dfs.tmp.BusinessReviews; +------------+ | Total | +------------+ | 1125458 | +------------+
19.
© 2015 MapR
Technologies ‹#›@tgrall Materialized Views AKA Tables > ALTER SESSION SET `store.format` = 'parquet'; > CREATE TABLE dfs.yelp.BusinessReviewsTbl AS SELECT b.name, b.stars, r.votes.funny funny, r.votes.useful useful, r.votes.cool cool, r.`date` FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r WHERE r.business_id = b.business_id; +------------+---------------------------+ | Fragment | Number of records written | +------------+---------------------------+ | 1_0 | 176448 | | 1_1 | 192439 | | 1_2 | 198625 | | 1_3 | 200863 | | 1_4 | 181420 | | 1_5 | 175663 | +------------+---------------------------+
20.
© 2015 MapR
Technologies ‹#›@tgrall Repeated Values Support // Flatten repeated categories > SELECT name, categories FROM dfs.yelp.`business.json` LIMIT 3; +---------------------------+-------------------------------------+ | name | categories | +---------------------------+-------------------------------------+ | Eric Goldberg, MD | ["Doctors","Health & Medical"] | | Clancy's Pub | ["Nightlife"] | | Cool Springs Golf Center | ["Active Life","Mini Golf","Golf"] | +—————————————+-------------------------------------+ > SELECT name, FLATTEN(categories) AS categories FROM dfs.yelp.`business.json` LIMIT 5; +---------------------------+-------------------+ | name | categories | +---------------------------+-------------------+ | Eric Goldberg, MD | Doctors | | Eric Goldberg, MD | Health & Medical | | Clancy's Pub | Nightlife | | Cool Springs Golf Center | Active Life | | Cool Springs Golf Center | Mini Golf | +---------------------------+-------------------+
21.
© 2015 MapR
Technologies ‹#›@tgrall Extensions to ANSI SQL to work with repeated values // Get most common business categories >SELECT category, count(*) AS categorycount FROM (SELECT name, FLATTEN(categories) AS category FROM dfs.yelp.`business.json`) c GROUP BY category ORDER BY categorycount DESC; +-------------------------------+----------------+ | category | categorycount | +-------------------------------+----------------+ | Restaurants | 21892 | | Shopping | 8919 | | Automotive | 2965 | … … … … … … … … … … … … … … … … … … … … … … … … … … … | Oriental | 1 | | High Fidelity Audio Equipment | 1 | +--------------------------------+---------------+
22.
© 2015 MapR
Technologies ‹#›@tgrall Checkins dataset { "checkin_info":{ "3-4":1, "13-5":1, "6-6":1, "14-5":1, "14-6":1, "14-2":1, "14-3":1, "19-0":1, "11-5":1, "13-2":1, "11-6":2, "11-3":1, "12-6":1, "6-5":1, "5-5":1, "9-2":1, "9-5":1, "9-6":1, "5-2":1, "7-6":1, "7-5":1, "7-4":1, "17-5":1, "8-5":1, "10-2":1, "10-5":1, "10-6":1 }, "type":"checkin", "business_id":"JwUE5GmEO-sH1FuwJgKBlQ" }
23.
© 2015 MapR
Technologies ‹#›@tgrall Supports Dynamic / Unknown Columns > SELECT KVGEN(checkin_info) checkins FROM dfs.yelp.`checkin.json` LIMIT 1; +------------+ | checkins | +------------+ | [{"key":"3-4","value":1},{"key":"13-5","value":1},{"key":"6-6","value":1},{"key":"14-5","value": 1},{"key":"14-6","value":1},{"key":"14-2","value":1},{"key":"14-3","value":1},{"key":"19-0","value": 1},{"key":"11-5","value":1},{"key":"13-2","value":1},{"key":"11-6","value":2},{"key":"11-3","value": 1},{"key":"12-6","value":1},{"key":"6-5","value":1},{"key":"5-5","value":1},{"key":"9-2","value":1}, {"key":"9-5","value":1},{"key":"9-6","value":1},{"key":"5-2","value":1},{"key":"7-6","value":1}, {"key":"7-5","value":1},{"key":"7-4","value":1},{"key":"17-5","value":1},{"key":"8-5","value":1}, {"key":"10-2","value":1},{"key":"10-5","value":1},{"key":"10-6","value":1}] | +------------+ > SELECT FLATTEN(KVGEN(checkin_info)) checkins FROM dfs.yelp.`checkin.json` limit 6; +---------------------------+ | checkins | +---------------------------+ | {"key":"9-5","value":1} | | {"key":"7-5","value":1} | | {"key":"13-3","value":1} | | {"key":"17-6","value":1} | | {"key":"13-0","value":1} | | {"key":"17-3","value":1} | +---------------------------+
24.
© 2015 MapR
Technologies ‹#›@tgrall Makes it easy to work with dynamic/unknown columns // Count total number of checkins on Sunday midnight > SELECT SUM(checkintbl.checkins.`value`) as SundayMidnightCheckins FROM (SELECT FLATTEN(KVGEN(checkin_info)) checkins FROM dfs.yelp.checkin.json`) checkintbl WHERE checkintbl.checkins.key='23-0'; +------------------------+ | SundayMidnightCheckins | +------------------------+ | 8575 | +------------------------+
25.
© 2015 MapR
Technologies ‹#›@tgrall Federated Queries // Join JSON File, Parquet and MongoDB collection > SELECT u.name, b.category, count(1) nb_review FROM mongo.yelp.`user` u , dfs.yelp.`review.parquet` r, (select business_id, flatten(categories) category from dfs.yelp.`business.json` ) b WHERE u.user_id = r.user_id AND b.business_id = r.business_id GROUP BY u.user_id, u.name, b.category ORDER BY nb_review DESC LIMIT 10; +-----------+--------------+------------+ | name | category | nb_review | +-----------+--------------+------------+ | Rand | Restaurants | 1086 | | J | Restaurants | 661 | | Jennifer | Restaurants | 657 | | Brad | Restaurants | 638 | | Jade | Restaurants | 586 | … … … … … … … … … .. .. .. | Emily | Restaurants | 560 | | Norm | Restaurants | 543 | | Aileen | Restaurants | 499 | | Michael | Restaurants | 496 | +-----------+--------------+------------+
26.
© 2015 MapR
Technologies ‹#›@tgrall Directories are implicit partitions select dir0, count(1) from dfs.logs.`*` where dir1 in (1,2,3) group by dir0 logs ├── 2014 │ ├── 1 │ ├── 2 │ ├── 3 │ └── 4 └── 2015 └── 1
27.
© 2015 MapR
Technologies ‹#›@tgrall Hands-On 27 https://github.com/tgrall/drill-workshop
28.
© 2015 MapR
Technologies ‹#›@tgrall How does it work? 28
29.
© 2015 MapR
Technologies ‹#›@tgrall Everything is a “Drillbit” • “Drillbits” run on each node • In-memory columnar execution • Coordination, Planning, Execution • Networked or not • Exposes JDBC, ODBC, REST • Built In Web UI and CLI • Extensible • Custom Functions • Data Sources Drillbit
30.
© 2015 MapR
Technologies ‹#›@tgrall Data Locality HBase HBase Mac HDFS HDFS HDFS HDFS HDFS HDFS mongod mongod HBase Windows Clusters DesktopClusters HDFS & HBase cluster HDFS cluster MongoDB cluster
31.
© 2015 MapR
Technologies ‹#›@tgrall Run drillbits close to your data! Drillbit HBase HBase Mac HDFS HDFS HDFS HDFS HDFS HDFS mongod mongod HBase Windows Clusters DesktopClusters HDFS & HBase cluster HDFS cluster MongoDB cluster Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit
32.
© 2015 MapR
Technologies ‹#›@tgrall Query Execution Client Drillbit Drillbit Drillbit • Client connects to “a” Drillbit • This Drillbit becomes the foreman • Foreman generates execution plan • cost base query optimisation
33.
© 2015 MapR
Technologies ‹#›@tgrall Query Execution Client Drillbit Drillbit Drillbit • Execution fragment are farmed to other Drillbits • Drillbits exchange data when necessary
34.
© 2015 MapR
Technologies ‹#›@tgrall Query Execution Client Drillbit Drillbit Drillbit • Results are returned to the user through foreman
35.
© 2015 MapR
Technologies ‹#›@tgrall Security? 35
36.
© 2015 MapR
Technologies ‹#›@tgrall Granular Security via Drill Views Name City State Credit Card # Dave San Jose CA 1374-7914-3865-4817 John Boulder CO 1374-9735-1794-9711 Raw File (/raw/cards.csv) Owner Admins Permission Admins Business Analyst Data Scientist Name City State Credit Card # Dave San Jose CA 1374-1111-1111-1111 John Boulder CO 1374-1111-1111-1111 Data Scientist View (/views/maskedcards.csv) Not a physical data copy Name City State Dave San Jose CA John Boulder CO Business Analyst View Owner Admins Permission Business Analysts Owner Admins Permission Data Scientists
37.
© 2015 MapR
Technologies ‹#›@tgrall Extensibility 37
38.
© 2015 MapR
Technologies ‹#›@tgrall Extending Apache Drill • File Format • ORC • XML • … • Data Sources • NoSQL Databases • Search Engines • REST • …. • Custom Functions https://github.com/mapr-demos/simple-drill-functions
39.
© 2015 MapR
Technologies ‹#›@tgrall Recommendations for Getting Started with Drill New to Drill? – Get started with Free MapR On Demand training – Test Drive Drill on cloud with AWS – Learn how to use Drill with Hadoop using MapR sandbox Ready to play with your data? – Try out Apache Drill in 10 mins guide on your desktop – Download Drill for your cluster and start exploration – Comprehensive tutorials and documentation available Ask questions – user@drill.apache.org – dev@drill.apache.com
40.
© 2014 MapR
Technologies 40 Find my presentation and other related resources here: http://events.mapr.com/UkraineJUG Today’s Presentation Whiteboard & demo videos Free On-Demand Training Free Cheat Sheet Articles And more…
41.
© 2015 MapR
Technologies ‹#›@tgrall
42.
© 2015 MapR
Technologies ‹#›@tgrall Q&A @tgrall maprtech tug@mapr.com Engage with us! MapR maprtech mapr-technologies
Download now