SlideShare a Scribd company logo
Gera Shegalov @PJUG, Jan 15, 2013
/home/gera: whoami

■ Saarland University
■ 1st intern in Immortal DB @ Microsoft Research
■ JMS, RDBMS HA @ Oracle




■ Hadoop MapReduce / Hadoop Core
■ Founding member of Apache Drill
■ Open enterprise-grade distribution for Hadoop
 ● Easy, dependable and fast
 ● Open source with standards-based extensions


■ MapR is deployed at 1000’s of companies
 ● From small Internet startups to Fortune 100


■ MapR customers analyze massive amounts of data:
 ● Hundreds of billions of events daily
 ● 90% of the world’s Internet population monthly
 ● $1 trillion in retail purchases annually


■ MapR in the Cloud:
 ● partnered with Google: Hadoop on Google Compute Engine
 ● partnered with Amazon: M3/M5 options for Elastic Map Reduce
Agenda
■ What?
 ● What exactly does Drill do?


■ Why?
 ● Why do we need Apache Drill?


■ Who?
 ● Who is doing this?


■ How?
 ● How does Drill work inside?


■ Conclusion
 ● How can you help?
 ● Where can you find out more?
Apache Drill Overview

■ Drill overview
  ● Low latency interactive queries
  ● Standard ANSI SQL support
  ● Domain Specific Languages / Your own QL

■ Open-Source
  ● Apache Incubator
  ● 100’s involved across US and Europe
  ● Community consensus on API, functionality
Big Data Processing
                Batch              Interactive         Stream
                processing         analysis            processing

                                     Milliseconds to
Query runtime   Minutes to hours                         Never-ending
                                         minutes

Data volume        TBs to PBs         GBs to PBs       Continuous stream

Programming
                  MapReduce             Queries              DAG
model

                                      Analysts and
Users              Developers                             Developers
                                       developers
Google
                  MapReduce              Dremel
project
Open source        Hadoop
                                      Apache Drill       Storm and S4
project           MapReduce
Latency Matters

■ Ad-hoc analysis with interactive tools


■ Real-time dashboards




■ Event/trend detection and analysis
  ●   Network intrusions
  ●   Fraud
  ●   Failures
Nested Query Languages

■ DrQL
  ●   SQL-like query language for nested data

  ●   Compatible with Google BigQuery/Dremel
      ● BigQuery applications should work with Drill



  ●   Designed to support efficient column-based processing
      ● No record assembly during query processing




■ Mongo Query Language
  ●   {$query: {x: 3, y: "abc"}, $orderby: {x: 1}}

■ Other languages/programming models can plug in
Nested Data Model
■ The data model in Dremel is Protocol Buffers
  ●   Nested
  ●   Schema
■ Apache Drill is designed to support multiple data models
  ●   Schema: Protocol Buffers, Apache Avro, …
  ●   Schema-less: JSON, BSON, …
■ Flat records are supported as a special case of nested data
  ●   CSV, TSV, …
               Avro IDL                               JSON
      enum Gender {                      {
        MALE, FEMALE                         "name": "Srivas",
      }                                      "gender": "Male",
                                             "followers": 100
      record User {                      }
        string name;                     {
        Gender gender;                       "name": "Raina",
        long followers;                      "gender": "Female",
      }                                      "followers": 200,
                                             "zip": "94305"
                                         }
Extensibility
■ Nested query languages
  ● Pluggable model

  ● DrQL

  ● Mongo Query Language

  ● Cascading



■ Distributed execution engine
  ● Extensible model (eg, Dryad)

  ● Low-latency

  ● Fault tolerant



■ Nested data formats
  ● Pluggable model

  ● Column-based (ColumnIO/Dremel, Trevni, RCFile) and row-based (RecordIO,

    Avro, JSON, CSV)
  ● Schema (Protocol Buffers, Avro, CSV) and schema-less (JSON, BSON)



■ Scalable data sources
  ● Pluggable model

  ● Hadoop

  ● HBase
Design Principles

  Flexible                            Easy
  ●   Pluggable query languages       ●   Unzip and run
  ●   Extensible execution engine     ●   Zero configuration
  ●   Pluggable data formats          ●   Reverse DNS not needed
      ● Column-based and row-based    ●   IP addresses can change
      ● Schema and schema-less        ●   Clear and concise log messages
  ●   Pluggable data sources
  ●   N(ot)O(nly) Hadoop


  Dependable                          Fast
  ●   No SPOF                         ●   Minimum Java core
  ●   Instant recovery from crashes   ●   C/C++ core with Java support
                                          ● Google C++ style guide
                                      ●   Min latency and max throughput
                                          (limited only by hardware)
Architecture
Execution Engine
Operator layer is serialization-aware
   Processes individual records

Execution layer is not serialization-aware
   Processes batches of records (blobs/JSON trees)
   Responsible for communication, dependencies and fault tolerance
DrQL Example
local-logs = donuts.json:
                                                     SELECT
{                                                     ppu,
     "id": "0003",                                    typeCount =
     "type": "donut",
                                                        COUNT(*) OVER PARTITION BY ppu,
     "name": "Old Fashioned",
                                                      quantity =
     "ppu": 0.55,
     "sales": 300,                                      SUM(sales) OVER PARTITION BY ppu,
     "batters":                                        sales =
       {                                                 SUM(ppu*sales) OVER PARTITION BY
         "batter":                                    ppu
           [                                         FROM local-logs donuts
             { "id": "1001", "type": "Regular" },
             { "id": "1002", "type": "Chocolate" }   WHERE donuts.ppu < 1.00
           ]                                         ORDER BY dountuts.ppu DESC;
       },
     "topping":
       [
         { "id": "5001", "type": "None" },
         { "id": "5002", "type": "Glazed" },
         { "id": "5003", "type": "Chocolate" },
         { "id": "5004", "type": "Maple" }
       ]
 }
Query Components

■ User Query (DrQL) components:
  ● SELECT

  ● FROM

  ● WHERE

  ● GROUP BY

  ● HAVING

  ● (JOIN)




■ Logical operators:
  ● Scan

  ● Filter

  ● Aggregate

  ● (Join)
Logical Plan
Logical Plan Syntax:
Operators & Expressions
        query:[
         {
           op:"sequence",
           do:[
           {
             op: "scan",
             memo: "initial_scan",
             ref: "donuts",
             source: "local-logs",
             selection: {data: "activity"}
           },
           {
             op: "transform",
             transforms: [
               { ref: "donuts.quanity", expr: "donuts.sales"}
             ]
           },
           {
             op: "filter",
             expr: "donuts.ppu < 1.00"
           },
           ---
Logical Streaming Example

                     0
                     1
                     2
                     3
                     4

{ @id: <refnum>, op: “window-frame”,
 input: <input>,
 keys: [                               0
   <name>,...                          01
 ],                                    012
 ref: <name>,                          123
 before: 2,                            234
 after: here
}
Representing a DAG




          { @id: 19, op: "aggregate",
            input: 18,
            type: <simple|running|repeat>,
            keys: [<name>,...],
            aggregations: [
              {ref: <name>, expr: <aggexpr> },...
            ]
          }
Multiple Inputs




                  { @id: 25, op: "cogroup",
                    groupings: [
                      {ref: 23, expr: “id”}, {ref: 24, expr: “id”}
                    ]
                  }
Physical Scan Operators


               Scan with schema                Scan without schema
Operator       Protocol Buffers                JSON-like (MessagePack)
output
Supported      ColumnIO (column-based          JSON
data formats   protobuf/Dremel)                HBase
               RecordIO (row-based protobuf)
               CSV
SELECT …       ColumnIO(proto URI, data URI)   Json(data URI)
FROM …         RecordIO(proto URI, data URI)   HBase(table name)
Hadoop Integration

■   Hadoop data sources
    ●   Hadoop FileSystem API (HDFS/MapR-FS)
    ●   HBase

■   Hadoop data formats
    ●   Apache Avro
    ●   RCFile

■   MapReduce-based tools to create column-based formats

■   Table registry in HCatalog

■   Run long-running services in YARN
Where is Drill now?

■ API Definition


■ Reference Implementation for Logical Plan Interpreter
 ● 1:1 mapping logical/physical op
 ● Single JVM


■ Demo
Contribute!

■ Participate in Design discussions: JIRA, ML, Wiki, Google Doc!


■ Write a parser for your favorite QL / Domain-Specific Language


■ Write Storage Engine API implementations
 ● HDFS, Hbase, relational, XML DB.


■ Write Physical Operators
 ● scan-hbase, scan-cassandra, scan-mongo
 ● scan-jdbc, scan-odbc, scan-jms (browse topic/queue), scan-*
 ● combined functionality operators: group-aggregate, ...
 ● sort-merge-join, hash-join, index-lookup-join

■ Etc...
Thanks, Q&A

■ Download these slides
  ●   http://www.mapr.com/company/events/pjug-1-15-2013

■ Join the project
  ●   drill-dev-subscribe@incubator.apache.org
  ●   #apachedrill

■ Contact me:
  ●   gshegalov@maprtech.com

■ Join MapR
  ●   jobs@mapr.com

More Related Content

What's hot

Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillMapR Technologies
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache Drill
MapR Technologies
 
Apache drill
Apache drillApache drill
Apache drill
Jakub Pieprzyk
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
Chicago Hadoop Users Group
 
Apache Drill
Apache DrillApache Drill
Apache Drill
Ted Dunning
 
SQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache DrillSQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache Drill
MapR Technologies
 
Rethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillRethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache Drill
MapR Technologies
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
MapR Technologies
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
The Hive
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
MapR Technologies
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
mcsrivas
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache Drill
Charles Givre
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleMapR Technologies
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
datasalt
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
Siva Pandeti
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
DataWorks Summit/Hadoop Summit
 
Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0
Vince Gonzalez
 

What's hot (20)

Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache Drill
 
Apache drill
Apache drillApache drill
Apache drill
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
SQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache DrillSQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache Drill
 
Rethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillRethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache Drill
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache Drill
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scale
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0
 

Viewers also liked

Why Being a Creeper is Awesome
Why Being a Creeper is AwesomeWhy Being a Creeper is Awesome
Why Being a Creeper is Awesomerelak213
 
Materi 2 teori teori belajar
Materi 2 teori teori belajarMateri 2 teori teori belajar
Materi 2 teori teori belajar
Nhia Item
 
cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6Muadzam Peace
 
Thermo part 2
Thermo part 2Thermo part 2
Thermo part 2
elly_q3a
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Gera Shegalov
 
The Role of Database Systems in the Era of Big Data
The Role  of Database Systems  in the Era of Big DataThe Role  of Database Systems  in the Era of Big Data
The Role of Database Systems in the Era of Big Data
Gera Shegalov
 
CTL Model Checking in Database Cloud
CTL Model Checking in Database CloudCTL Model Checking in Database Cloud
CTL Model Checking in Database CloudGera Shegalov
 
Place
PlacePlace
Place
Nhia Item
 
Materi 1 hakekat psikologi
Materi 1 hakekat psikologiMateri 1 hakekat psikologi
Materi 1 hakekat psikologi
Nhia Item
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale
Gera Shegalov
 
Responsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice ApproachResponsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice Approach
let's dev GmbH & Co. KG
 

Viewers also liked (17)

Why Being a Creeper is Awesome
Why Being a Creeper is AwesomeWhy Being a Creeper is Awesome
Why Being a Creeper is Awesome
 
Materi 2 teori teori belajar
Materi 2 teori teori belajarMateri 2 teori teori belajar
Materi 2 teori teori belajar
 
Ppr1
Ppr1Ppr1
Ppr1
 
Fr
FrFr
Fr
 
cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6
 
Presentación2
Presentación2Presentación2
Presentación2
 
Thermo part 2
Thermo part 2Thermo part 2
Thermo part 2
 
Regolamento tarsu
Regolamento tarsuRegolamento tarsu
Regolamento tarsu
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
 
The Role of Database Systems in the Era of Big Data
The Role  of Database Systems  in the Era of Big DataThe Role  of Database Systems  in the Era of Big Data
The Role of Database Systems in the Era of Big Data
 
CTL Model Checking in Database Cloud
CTL Model Checking in Database CloudCTL Model Checking in Database Cloud
CTL Model Checking in Database Cloud
 
Usl6
Usl6Usl6
Usl6
 
Place
PlacePlace
Place
 
Materi 1 hakekat psikologi
Materi 1 hakekat psikologiMateri 1 hakekat psikologi
Materi 1 hakekat psikologi
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale
 
Biynees khemjee awah
Biynees khemjee awahBiynees khemjee awah
Biynees khemjee awah
 
Responsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice ApproachResponsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice Approach
 

Similar to Apache Drill @ PJUG, Jan 15, 2013

MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
Victoria Malaya
 
Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19jasonfrantz
 
Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis
Yahoo Developer Network
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
Ted Dunning
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
Jose Mº Muñoz
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
Holden Karau
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
DataStax Academy
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
Edward Capriolo
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
MapR Technologies
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
Neville Li
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Holden Karau
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
MapR Technologies
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Data Con LA
 
Rust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMSRust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMS
Andy Grove
 
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...LOADays 2015 - syslog-ng - from log collection to processing and infomation e...
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...
BalaBit
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
Daniel Cousineau
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
MongoDB
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
javier ramirez
 
MongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL DatabaseMongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL DatabaseRuben Inoto Soto
 

Similar to Apache Drill @ PJUG, Jan 15, 2013 (20)

MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 
Drill dchug-29 nov2012
Drill dchug-29 nov2012Drill dchug-29 nov2012
Drill dchug-29 nov2012
 
Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19
 
Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 
Rust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMSRust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMS
 
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...LOADays 2015 - syslog-ng - from log collection to processing and infomation e...
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
MongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL DatabaseMongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL Database
 

More from Gera Shegalov

#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
Gera Shegalov
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Gera Shegalov
 
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...Gera Shegalov
 
Logging Last Resource Optimization for Distributed Transactions in Oracle…
Logging Last Resource Optimization for Distributed Transactions in  Oracle…Logging Last Resource Optimization for Distributed Transactions in  Oracle…
Logging Last Resource Optimization for Distributed Transactions in Oracle…Gera Shegalov
 
Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesGera Shegalov
 
Unstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesUnstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesGera Shegalov
 
Formal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractFormal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractGera Shegalov
 
Formal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsFormal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsGera Shegalov
 

More from Gera Shegalov (8)

#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
 
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
 
Logging Last Resource Optimization for Distributed Transactions in Oracle…
Logging Last Resource Optimization for Distributed Transactions in  Oracle…Logging Last Resource Optimization for Distributed Transactions in  Oracle…
Logging Last Resource Optimization for Distributed Transactions in Oracle…
 
Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal Databases
 
Unstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesUnstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web Services
 
Formal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractFormal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction Contract
 
Formal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsFormal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction Contracts
 

Recently uploaded

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 

Recently uploaded (20)

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 

Apache Drill @ PJUG, Jan 15, 2013

  • 1. Gera Shegalov @PJUG, Jan 15, 2013
  • 2. /home/gera: whoami ■ Saarland University ■ 1st intern in Immortal DB @ Microsoft Research ■ JMS, RDBMS HA @ Oracle ■ Hadoop MapReduce / Hadoop Core ■ Founding member of Apache Drill
  • 3. ■ Open enterprise-grade distribution for Hadoop ● Easy, dependable and fast ● Open source with standards-based extensions ■ MapR is deployed at 1000’s of companies ● From small Internet startups to Fortune 100 ■ MapR customers analyze massive amounts of data: ● Hundreds of billions of events daily ● 90% of the world’s Internet population monthly ● $1 trillion in retail purchases annually ■ MapR in the Cloud: ● partnered with Google: Hadoop on Google Compute Engine ● partnered with Amazon: M3/M5 options for Elastic Map Reduce
  • 4. Agenda ■ What? ● What exactly does Drill do? ■ Why? ● Why do we need Apache Drill? ■ Who? ● Who is doing this? ■ How? ● How does Drill work inside? ■ Conclusion ● How can you help? ● Where can you find out more?
  • 5. Apache Drill Overview ■ Drill overview ● Low latency interactive queries ● Standard ANSI SQL support ● Domain Specific Languages / Your own QL ■ Open-Source ● Apache Incubator ● 100’s involved across US and Europe ● Community consensus on API, functionality
  • 6. Big Data Processing Batch Interactive Stream processing analysis processing Milliseconds to Query runtime Minutes to hours Never-ending minutes Data volume TBs to PBs GBs to PBs Continuous stream Programming MapReduce Queries DAG model Analysts and Users Developers Developers developers Google MapReduce Dremel project Open source Hadoop Apache Drill Storm and S4 project MapReduce
  • 7. Latency Matters ■ Ad-hoc analysis with interactive tools ■ Real-time dashboards ■ Event/trend detection and analysis ● Network intrusions ● Fraud ● Failures
  • 8. Nested Query Languages ■ DrQL ● SQL-like query language for nested data ● Compatible with Google BigQuery/Dremel ● BigQuery applications should work with Drill ● Designed to support efficient column-based processing ● No record assembly during query processing ■ Mongo Query Language ● {$query: {x: 3, y: "abc"}, $orderby: {x: 1}} ■ Other languages/programming models can plug in
  • 9. Nested Data Model ■ The data model in Dremel is Protocol Buffers ● Nested ● Schema ■ Apache Drill is designed to support multiple data models ● Schema: Protocol Buffers, Apache Avro, … ● Schema-less: JSON, BSON, … ■ Flat records are supported as a special case of nested data ● CSV, TSV, … Avro IDL JSON enum Gender { { MALE, FEMALE "name": "Srivas", } "gender": "Male", "followers": 100 record User { } string name; { Gender gender; "name": "Raina", long followers; "gender": "Female", } "followers": 200, "zip": "94305" }
  • 10. Extensibility ■ Nested query languages ● Pluggable model ● DrQL ● Mongo Query Language ● Cascading ■ Distributed execution engine ● Extensible model (eg, Dryad) ● Low-latency ● Fault tolerant ■ Nested data formats ● Pluggable model ● Column-based (ColumnIO/Dremel, Trevni, RCFile) and row-based (RecordIO, Avro, JSON, CSV) ● Schema (Protocol Buffers, Avro, CSV) and schema-less (JSON, BSON) ■ Scalable data sources ● Pluggable model ● Hadoop ● HBase
  • 11. Design Principles Flexible Easy ● Pluggable query languages ● Unzip and run ● Extensible execution engine ● Zero configuration ● Pluggable data formats ● Reverse DNS not needed ● Column-based and row-based ● IP addresses can change ● Schema and schema-less ● Clear and concise log messages ● Pluggable data sources ● N(ot)O(nly) Hadoop Dependable Fast ● No SPOF ● Minimum Java core ● Instant recovery from crashes ● C/C++ core with Java support ● Google C++ style guide ● Min latency and max throughput (limited only by hardware)
  • 13. Execution Engine Operator layer is serialization-aware Processes individual records Execution layer is not serialization-aware Processes batches of records (blobs/JSON trees) Responsible for communication, dependencies and fault tolerance
  • 14. DrQL Example local-logs = donuts.json: SELECT { ppu, "id": "0003", typeCount = "type": "donut", COUNT(*) OVER PARTITION BY ppu, "name": "Old Fashioned", quantity = "ppu": 0.55, "sales": 300, SUM(sales) OVER PARTITION BY ppu, "batters": sales = { SUM(ppu*sales) OVER PARTITION BY "batter": ppu [ FROM local-logs donuts { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" } WHERE donuts.ppu < 1.00 ] ORDER BY dountuts.ppu DESC; }, "topping": [ { "id": "5001", "type": "None" }, { "id": "5002", "type": "Glazed" }, { "id": "5003", "type": "Chocolate" }, { "id": "5004", "type": "Maple" } ] }
  • 15. Query Components ■ User Query (DrQL) components: ● SELECT ● FROM ● WHERE ● GROUP BY ● HAVING ● (JOIN) ■ Logical operators: ● Scan ● Filter ● Aggregate ● (Join)
  • 17. Logical Plan Syntax: Operators & Expressions query:[ { op:"sequence", do:[ { op: "scan", memo: "initial_scan", ref: "donuts", source: "local-logs", selection: {data: "activity"} }, { op: "transform", transforms: [ { ref: "donuts.quanity", expr: "donuts.sales"} ] }, { op: "filter", expr: "donuts.ppu < 1.00" }, ---
  • 18. Logical Streaming Example 0 1 2 3 4 { @id: <refnum>, op: “window-frame”, input: <input>, keys: [ 0 <name>,... 01 ], 012 ref: <name>, 123 before: 2, 234 after: here }
  • 19. Representing a DAG { @id: 19, op: "aggregate", input: 18, type: <simple|running|repeat>, keys: [<name>,...], aggregations: [ {ref: <name>, expr: <aggexpr> },... ] }
  • 20. Multiple Inputs { @id: 25, op: "cogroup", groupings: [ {ref: 23, expr: “id”}, {ref: 24, expr: “id”} ] }
  • 21. Physical Scan Operators Scan with schema Scan without schema Operator Protocol Buffers JSON-like (MessagePack) output Supported ColumnIO (column-based JSON data formats protobuf/Dremel) HBase RecordIO (row-based protobuf) CSV SELECT … ColumnIO(proto URI, data URI) Json(data URI) FROM … RecordIO(proto URI, data URI) HBase(table name)
  • 22. Hadoop Integration ■ Hadoop data sources ● Hadoop FileSystem API (HDFS/MapR-FS) ● HBase ■ Hadoop data formats ● Apache Avro ● RCFile ■ MapReduce-based tools to create column-based formats ■ Table registry in HCatalog ■ Run long-running services in YARN
  • 23. Where is Drill now? ■ API Definition ■ Reference Implementation for Logical Plan Interpreter ● 1:1 mapping logical/physical op ● Single JVM ■ Demo
  • 24. Contribute! ■ Participate in Design discussions: JIRA, ML, Wiki, Google Doc! ■ Write a parser for your favorite QL / Domain-Specific Language ■ Write Storage Engine API implementations ● HDFS, Hbase, relational, XML DB. ■ Write Physical Operators ● scan-hbase, scan-cassandra, scan-mongo ● scan-jdbc, scan-odbc, scan-jms (browse topic/queue), scan-* ● combined functionality operators: group-aggregate, ... ● sort-merge-join, hash-join, index-lookup-join ■ Etc...
  • 25. Thanks, Q&A ■ Download these slides ● http://www.mapr.com/company/events/pjug-1-15-2013 ■ Join the project ● drill-dev-subscribe@incubator.apache.org ● #apachedrill ■ Contact me: ● gshegalov@maprtech.com ■ Join MapR ● jobs@mapr.com