SlideShare a Scribd company logo
1 of 25
Download to read offline
Gera Shegalov @PJUG, Jan 15, 2013
/home/gera: whoami

■ Saarland University
■ 1st intern in Immortal DB @ Microsoft Research
■ JMS, RDBMS HA @ Oracle




■ Hadoop MapReduce / Hadoop Core
■ Founding member of Apache Drill
■ Open enterprise-grade distribution for Hadoop
 ● Easy, dependable and fast
 ● Open source with standards-based extensions


■ MapR is deployed at 1000’s of companies
 ● From small Internet startups to Fortune 100


■ MapR customers analyze massive amounts of data:
 ● Hundreds of billions of events daily
 ● 90% of the world’s Internet population monthly
 ● $1 trillion in retail purchases annually


■ MapR in the Cloud:
 ● partnered with Google: Hadoop on Google Compute Engine
 ● partnered with Amazon: M3/M5 options for Elastic Map Reduce
Agenda
■ What?
 ● What exactly does Drill do?


■ Why?
 ● Why do we need Apache Drill?


■ Who?
 ● Who is doing this?


■ How?
 ● How does Drill work inside?


■ Conclusion
 ● How can you help?
 ● Where can you find out more?
Apache Drill Overview

■ Drill overview
  ● Low latency interactive queries
  ● Standard ANSI SQL support
  ● Domain Specific Languages / Your own QL

■ Open-Source
  ● Apache Incubator
  ● 100’s involved across US and Europe
  ● Community consensus on API, functionality
Big Data Processing
                Batch              Interactive         Stream
                processing         analysis            processing

                                     Milliseconds to
Query runtime   Minutes to hours                         Never-ending
                                         minutes

Data volume        TBs to PBs         GBs to PBs       Continuous stream

Programming
                  MapReduce             Queries              DAG
model

                                      Analysts and
Users              Developers                             Developers
                                       developers
Google
                  MapReduce              Dremel
project
Open source        Hadoop
                                      Apache Drill       Storm and S4
project           MapReduce
Latency Matters

■ Ad-hoc analysis with interactive tools


■ Real-time dashboards




■ Event/trend detection and analysis
  ●   Network intrusions
  ●   Fraud
  ●   Failures
Nested Query Languages

■ DrQL
  ●   SQL-like query language for nested data

  ●   Compatible with Google BigQuery/Dremel
      ● BigQuery applications should work with Drill



  ●   Designed to support efficient column-based processing
      ● No record assembly during query processing




■ Mongo Query Language
  ●   {$query: {x: 3, y: "abc"}, $orderby: {x: 1}}

■ Other languages/programming models can plug in
Nested Data Model
■ The data model in Dremel is Protocol Buffers
  ●   Nested
  ●   Schema
■ Apache Drill is designed to support multiple data models
  ●   Schema: Protocol Buffers, Apache Avro, …
  ●   Schema-less: JSON, BSON, …
■ Flat records are supported as a special case of nested data
  ●   CSV, TSV, …
               Avro IDL                               JSON
      enum Gender {                      {
        MALE, FEMALE                         "name": "Srivas",
      }                                      "gender": "Male",
                                             "followers": 100
      record User {                      }
        string name;                     {
        Gender gender;                       "name": "Raina",
        long followers;                      "gender": "Female",
      }                                      "followers": 200,
                                             "zip": "94305"
                                         }
Extensibility
■ Nested query languages
  ● Pluggable model

  ● DrQL

  ● Mongo Query Language

  ● Cascading



■ Distributed execution engine
  ● Extensible model (eg, Dryad)

  ● Low-latency

  ● Fault tolerant



■ Nested data formats
  ● Pluggable model

  ● Column-based (ColumnIO/Dremel, Trevni, RCFile) and row-based (RecordIO,

    Avro, JSON, CSV)
  ● Schema (Protocol Buffers, Avro, CSV) and schema-less (JSON, BSON)



■ Scalable data sources
  ● Pluggable model

  ● Hadoop

  ● HBase
Design Principles

  Flexible                            Easy
  ●   Pluggable query languages       ●   Unzip and run
  ●   Extensible execution engine     ●   Zero configuration
  ●   Pluggable data formats          ●   Reverse DNS not needed
      ● Column-based and row-based    ●   IP addresses can change
      ● Schema and schema-less        ●   Clear and concise log messages
  ●   Pluggable data sources
  ●   N(ot)O(nly) Hadoop


  Dependable                          Fast
  ●   No SPOF                         ●   Minimum Java core
  ●   Instant recovery from crashes   ●   C/C++ core with Java support
                                          ● Google C++ style guide
                                      ●   Min latency and max throughput
                                          (limited only by hardware)
Architecture
Execution Engine
Operator layer is serialization-aware
   Processes individual records

Execution layer is not serialization-aware
   Processes batches of records (blobs/JSON trees)
   Responsible for communication, dependencies and fault tolerance
DrQL Example
local-logs = donuts.json:
                                                     SELECT
{                                                     ppu,
     "id": "0003",                                    typeCount =
     "type": "donut",
                                                        COUNT(*) OVER PARTITION BY ppu,
     "name": "Old Fashioned",
                                                      quantity =
     "ppu": 0.55,
     "sales": 300,                                      SUM(sales) OVER PARTITION BY ppu,
     "batters":                                        sales =
       {                                                 SUM(ppu*sales) OVER PARTITION BY
         "batter":                                    ppu
           [                                         FROM local-logs donuts
             { "id": "1001", "type": "Regular" },
             { "id": "1002", "type": "Chocolate" }   WHERE donuts.ppu < 1.00
           ]                                         ORDER BY dountuts.ppu DESC;
       },
     "topping":
       [
         { "id": "5001", "type": "None" },
         { "id": "5002", "type": "Glazed" },
         { "id": "5003", "type": "Chocolate" },
         { "id": "5004", "type": "Maple" }
       ]
 }
Query Components

■ User Query (DrQL) components:
  ● SELECT

  ● FROM

  ● WHERE

  ● GROUP BY

  ● HAVING

  ● (JOIN)




■ Logical operators:
  ● Scan

  ● Filter

  ● Aggregate

  ● (Join)
Logical Plan
Logical Plan Syntax:
Operators & Expressions
        query:[
         {
           op:"sequence",
           do:[
           {
             op: "scan",
             memo: "initial_scan",
             ref: "donuts",
             source: "local-logs",
             selection: {data: "activity"}
           },
           {
             op: "transform",
             transforms: [
               { ref: "donuts.quanity", expr: "donuts.sales"}
             ]
           },
           {
             op: "filter",
             expr: "donuts.ppu < 1.00"
           },
           ---
Logical Streaming Example

                     0
                     1
                     2
                     3
                     4

{ @id: <refnum>, op: “window-frame”,
 input: <input>,
 keys: [                               0
   <name>,...                          01
 ],                                    012
 ref: <name>,                          123
 before: 2,                            234
 after: here
}
Representing a DAG




          { @id: 19, op: "aggregate",
            input: 18,
            type: <simple|running|repeat>,
            keys: [<name>,...],
            aggregations: [
              {ref: <name>, expr: <aggexpr> },...
            ]
          }
Multiple Inputs




                  { @id: 25, op: "cogroup",
                    groupings: [
                      {ref: 23, expr: “id”}, {ref: 24, expr: “id”}
                    ]
                  }
Physical Scan Operators


               Scan with schema                Scan without schema
Operator       Protocol Buffers                JSON-like (MessagePack)
output
Supported      ColumnIO (column-based          JSON
data formats   protobuf/Dremel)                HBase
               RecordIO (row-based protobuf)
               CSV
SELECT …       ColumnIO(proto URI, data URI)   Json(data URI)
FROM …         RecordIO(proto URI, data URI)   HBase(table name)
Hadoop Integration

■   Hadoop data sources
    ●   Hadoop FileSystem API (HDFS/MapR-FS)
    ●   HBase

■   Hadoop data formats
    ●   Apache Avro
    ●   RCFile

■   MapReduce-based tools to create column-based formats

■   Table registry in HCatalog

■   Run long-running services in YARN
Where is Drill now?

■ API Definition


■ Reference Implementation for Logical Plan Interpreter
 ● 1:1 mapping logical/physical op
 ● Single JVM


■ Demo
Contribute!

■ Participate in Design discussions: JIRA, ML, Wiki, Google Doc!


■ Write a parser for your favorite QL / Domain-Specific Language


■ Write Storage Engine API implementations
 ● HDFS, Hbase, relational, XML DB.


■ Write Physical Operators
 ● scan-hbase, scan-cassandra, scan-mongo
 ● scan-jdbc, scan-odbc, scan-jms (browse topic/queue), scan-*
 ● combined functionality operators: group-aggregate, ...
 ● sort-merge-join, hash-join, index-lookup-join

■ Etc...
Thanks, Q&A

■ Download these slides
  ●   http://www.mapr.com/company/events/pjug-1-15-2013

■ Join the project
  ●   drill-dev-subscribe@incubator.apache.org
  ●   #apachedrill

■ Contact me:
  ●   gshegalov@maprtech.com

■ Join MapR
  ●   jobs@mapr.com

More Related Content

What's hot

Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillMapR Technologies
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillMapR Technologies
 
SQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache DrillSQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache DrillMapR Technologies
 
Rethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillRethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillMapR Technologies
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into ProductionMapR Technologies
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache DrillCharles Givre
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleMapR Technologies
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoopdatasalt
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Vince Gonzalez
 

What's hot (20)

Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache Drill
 
Apache drill
Apache drillApache drill
Apache drill
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
SQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache DrillSQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache Drill
 
Rethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillRethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache Drill
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache Drill
 
Introduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scaleIntroduction to Apache Drill - interactive query and analysis at scale
Introduction to Apache Drill - interactive query and analysis at scale
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0
 

Viewers also liked

Why Being a Creeper is Awesome
Why Being a Creeper is AwesomeWhy Being a Creeper is Awesome
Why Being a Creeper is Awesomerelak213
 
Materi 2 teori teori belajar
Materi 2 teori teori belajarMateri 2 teori teori belajar
Materi 2 teori teori belajarNhia Item
 
cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6Muadzam Peace
 
Thermo part 2
Thermo part 2Thermo part 2
Thermo part 2elly_q3a
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Gera Shegalov
 
The Role of Database Systems in the Era of Big Data
The Role  of Database Systems  in the Era of Big DataThe Role  of Database Systems  in the Era of Big Data
The Role of Database Systems in the Era of Big DataGera Shegalov
 
CTL Model Checking in Database Cloud
CTL Model Checking in Database CloudCTL Model Checking in Database Cloud
CTL Model Checking in Database CloudGera Shegalov
 
Materi 1 hakekat psikologi
Materi 1 hakekat psikologiMateri 1 hakekat psikologi
Materi 1 hakekat psikologiNhia Item
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale Gera Shegalov
 
Responsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice ApproachResponsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice Approachlet's dev GmbH & Co. KG
 

Viewers also liked (17)

Why Being a Creeper is Awesome
Why Being a Creeper is AwesomeWhy Being a Creeper is Awesome
Why Being a Creeper is Awesome
 
Materi 2 teori teori belajar
Materi 2 teori teori belajarMateri 2 teori teori belajar
Materi 2 teori teori belajar
 
Ppr1
Ppr1Ppr1
Ppr1
 
Fr
FrFr
Fr
 
cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6cara membuat fotfolio sains tahun 6
cara membuat fotfolio sains tahun 6
 
Presentación2
Presentación2Presentación2
Presentación2
 
Thermo part 2
Thermo part 2Thermo part 2
Thermo part 2
 
Regolamento tarsu
Regolamento tarsuRegolamento tarsu
Regolamento tarsu
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
 
The Role of Database Systems in the Era of Big Data
The Role  of Database Systems  in the Era of Big DataThe Role  of Database Systems  in the Era of Big Data
The Role of Database Systems in the Era of Big Data
 
CTL Model Checking in Database Cloud
CTL Model Checking in Database CloudCTL Model Checking in Database Cloud
CTL Model Checking in Database Cloud
 
Usl6
Usl6Usl6
Usl6
 
Place
PlacePlace
Place
 
Materi 1 hakekat psikologi
Materi 1 hakekat psikologiMateri 1 hakekat psikologi
Materi 1 hakekat psikologi
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale
 
Biynees khemjee awah
Biynees khemjee awahBiynees khemjee awah
Biynees khemjee awah
 
Responsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice ApproachResponsive Web Design – Best Practice Approach
Responsive Web Design – Best Practice Approach
 

Similar to Apache Drill @ PJUG, Jan 15, 2013

Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19jasonfrantz
 
Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis Yahoo Developer Network
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Ted Dunning
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaJose Mº Muñoz
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for CassandraEdward Capriolo
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"DataStax Academy
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasMapR Technologies
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesHolden Karau
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsMapR Technologies
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Data Con LA
 
Rust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMSRust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMSAndy Grove
 
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...LOADays 2015 - syslog-ng - from log collection to processing and infomation e...
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...BalaBit
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!Daniel Cousineau
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)javier ramirez
 
MongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL DatabaseMongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL DatabaseRuben Inoto Soto
 

Similar to Apache Drill @ PJUG, Jan 15, 2013 (20)

MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 
Drill dchug-29 nov2012
Drill dchug-29 nov2012Drill dchug-29 nov2012
Drill dchug-29 nov2012
 
Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19
 
Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis Sep 2012 HUG: Apache Drill for Interactive Analysis
Sep 2012 HUG: Apache Drill for Interactive Analysis
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 
Rust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMSRust & Apache Arrow @ RMS
Rust & Apache Arrow @ RMS
 
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...LOADays 2015 - syslog-ng - from log collection to processing and infomation e...
LOADays 2015 - syslog-ng - from log collection to processing and infomation e...
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
MongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL DatabaseMongoDB - A Document NoSQL Database
MongoDB - A Document NoSQL Database
 

More from Gera Shegalov

#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More CapacityGera Shegalov
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Gera Shegalov
 
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...Gera Shegalov
 
Logging Last Resource Optimization for Distributed Transactions in Oracle…
Logging Last Resource Optimization for Distributed Transactions in  Oracle…Logging Last Resource Optimization for Distributed Transactions in  Oracle…
Logging Last Resource Optimization for Distributed Transactions in Oracle…Gera Shegalov
 
Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesGera Shegalov
 
Unstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesUnstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesGera Shegalov
 
Formal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractFormal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractGera Shegalov
 
Formal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsFormal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsGera Shegalov
 

More from Gera Shegalov (8)

#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
 
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
Integrated Data, Message, and Process Recovery for Failure Masking in Web Ser...
 
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
Logging Last Resource Optimization for Distributed Transactions in  Oracle We...Logging Last Resource Optimization for Distributed Transactions in  Oracle We...
Logging Last Resource Optimization for Distributed Transactions in Oracle We...
 
Logging Last Resource Optimization for Distributed Transactions in Oracle…
Logging Last Resource Optimization for Distributed Transactions in  Oracle…Logging Last Resource Optimization for Distributed Transactions in  Oracle…
Logging Last Resource Optimization for Distributed Transactions in Oracle…
 
Transaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal DatabasesTransaction Timestamping in Temporal Databases
Transaction Timestamping in Temporal Databases
 
Unstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web ServicesUnstoppable Stateful PHP Web Services
Unstoppable Stateful PHP Web Services
 
Formal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction ContractFormal Verification of Transactional Interaction Contract
Formal Verification of Transactional Interaction Contract
 
Formal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction ContractsFormal Verification of Web Service Interaction Contracts
Formal Verification of Web Service Interaction Contracts
 

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Apache Drill @ PJUG, Jan 15, 2013

  • 1. Gera Shegalov @PJUG, Jan 15, 2013
  • 2. /home/gera: whoami ■ Saarland University ■ 1st intern in Immortal DB @ Microsoft Research ■ JMS, RDBMS HA @ Oracle ■ Hadoop MapReduce / Hadoop Core ■ Founding member of Apache Drill
  • 3. ■ Open enterprise-grade distribution for Hadoop ● Easy, dependable and fast ● Open source with standards-based extensions ■ MapR is deployed at 1000’s of companies ● From small Internet startups to Fortune 100 ■ MapR customers analyze massive amounts of data: ● Hundreds of billions of events daily ● 90% of the world’s Internet population monthly ● $1 trillion in retail purchases annually ■ MapR in the Cloud: ● partnered with Google: Hadoop on Google Compute Engine ● partnered with Amazon: M3/M5 options for Elastic Map Reduce
  • 4. Agenda ■ What? ● What exactly does Drill do? ■ Why? ● Why do we need Apache Drill? ■ Who? ● Who is doing this? ■ How? ● How does Drill work inside? ■ Conclusion ● How can you help? ● Where can you find out more?
  • 5. Apache Drill Overview ■ Drill overview ● Low latency interactive queries ● Standard ANSI SQL support ● Domain Specific Languages / Your own QL ■ Open-Source ● Apache Incubator ● 100’s involved across US and Europe ● Community consensus on API, functionality
  • 6. Big Data Processing Batch Interactive Stream processing analysis processing Milliseconds to Query runtime Minutes to hours Never-ending minutes Data volume TBs to PBs GBs to PBs Continuous stream Programming MapReduce Queries DAG model Analysts and Users Developers Developers developers Google MapReduce Dremel project Open source Hadoop Apache Drill Storm and S4 project MapReduce
  • 7. Latency Matters ■ Ad-hoc analysis with interactive tools ■ Real-time dashboards ■ Event/trend detection and analysis ● Network intrusions ● Fraud ● Failures
  • 8. Nested Query Languages ■ DrQL ● SQL-like query language for nested data ● Compatible with Google BigQuery/Dremel ● BigQuery applications should work with Drill ● Designed to support efficient column-based processing ● No record assembly during query processing ■ Mongo Query Language ● {$query: {x: 3, y: "abc"}, $orderby: {x: 1}} ■ Other languages/programming models can plug in
  • 9. Nested Data Model ■ The data model in Dremel is Protocol Buffers ● Nested ● Schema ■ Apache Drill is designed to support multiple data models ● Schema: Protocol Buffers, Apache Avro, … ● Schema-less: JSON, BSON, … ■ Flat records are supported as a special case of nested data ● CSV, TSV, … Avro IDL JSON enum Gender { { MALE, FEMALE "name": "Srivas", } "gender": "Male", "followers": 100 record User { } string name; { Gender gender; "name": "Raina", long followers; "gender": "Female", } "followers": 200, "zip": "94305" }
  • 10. Extensibility ■ Nested query languages ● Pluggable model ● DrQL ● Mongo Query Language ● Cascading ■ Distributed execution engine ● Extensible model (eg, Dryad) ● Low-latency ● Fault tolerant ■ Nested data formats ● Pluggable model ● Column-based (ColumnIO/Dremel, Trevni, RCFile) and row-based (RecordIO, Avro, JSON, CSV) ● Schema (Protocol Buffers, Avro, CSV) and schema-less (JSON, BSON) ■ Scalable data sources ● Pluggable model ● Hadoop ● HBase
  • 11. Design Principles Flexible Easy ● Pluggable query languages ● Unzip and run ● Extensible execution engine ● Zero configuration ● Pluggable data formats ● Reverse DNS not needed ● Column-based and row-based ● IP addresses can change ● Schema and schema-less ● Clear and concise log messages ● Pluggable data sources ● N(ot)O(nly) Hadoop Dependable Fast ● No SPOF ● Minimum Java core ● Instant recovery from crashes ● C/C++ core with Java support ● Google C++ style guide ● Min latency and max throughput (limited only by hardware)
  • 13. Execution Engine Operator layer is serialization-aware Processes individual records Execution layer is not serialization-aware Processes batches of records (blobs/JSON trees) Responsible for communication, dependencies and fault tolerance
  • 14. DrQL Example local-logs = donuts.json: SELECT { ppu, "id": "0003", typeCount = "type": "donut", COUNT(*) OVER PARTITION BY ppu, "name": "Old Fashioned", quantity = "ppu": 0.55, "sales": 300, SUM(sales) OVER PARTITION BY ppu, "batters": sales = { SUM(ppu*sales) OVER PARTITION BY "batter": ppu [ FROM local-logs donuts { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" } WHERE donuts.ppu < 1.00 ] ORDER BY dountuts.ppu DESC; }, "topping": [ { "id": "5001", "type": "None" }, { "id": "5002", "type": "Glazed" }, { "id": "5003", "type": "Chocolate" }, { "id": "5004", "type": "Maple" } ] }
  • 15. Query Components ■ User Query (DrQL) components: ● SELECT ● FROM ● WHERE ● GROUP BY ● HAVING ● (JOIN) ■ Logical operators: ● Scan ● Filter ● Aggregate ● (Join)
  • 17. Logical Plan Syntax: Operators & Expressions query:[ { op:"sequence", do:[ { op: "scan", memo: "initial_scan", ref: "donuts", source: "local-logs", selection: {data: "activity"} }, { op: "transform", transforms: [ { ref: "donuts.quanity", expr: "donuts.sales"} ] }, { op: "filter", expr: "donuts.ppu < 1.00" }, ---
  • 18. Logical Streaming Example 0 1 2 3 4 { @id: <refnum>, op: “window-frame”, input: <input>, keys: [ 0 <name>,... 01 ], 012 ref: <name>, 123 before: 2, 234 after: here }
  • 19. Representing a DAG { @id: 19, op: "aggregate", input: 18, type: <simple|running|repeat>, keys: [<name>,...], aggregations: [ {ref: <name>, expr: <aggexpr> },... ] }
  • 20. Multiple Inputs { @id: 25, op: "cogroup", groupings: [ {ref: 23, expr: “id”}, {ref: 24, expr: “id”} ] }
  • 21. Physical Scan Operators Scan with schema Scan without schema Operator Protocol Buffers JSON-like (MessagePack) output Supported ColumnIO (column-based JSON data formats protobuf/Dremel) HBase RecordIO (row-based protobuf) CSV SELECT … ColumnIO(proto URI, data URI) Json(data URI) FROM … RecordIO(proto URI, data URI) HBase(table name)
  • 22. Hadoop Integration ■ Hadoop data sources ● Hadoop FileSystem API (HDFS/MapR-FS) ● HBase ■ Hadoop data formats ● Apache Avro ● RCFile ■ MapReduce-based tools to create column-based formats ■ Table registry in HCatalog ■ Run long-running services in YARN
  • 23. Where is Drill now? ■ API Definition ■ Reference Implementation for Logical Plan Interpreter ● 1:1 mapping logical/physical op ● Single JVM ■ Demo
  • 24. Contribute! ■ Participate in Design discussions: JIRA, ML, Wiki, Google Doc! ■ Write a parser for your favorite QL / Domain-Specific Language ■ Write Storage Engine API implementations ● HDFS, Hbase, relational, XML DB. ■ Write Physical Operators ● scan-hbase, scan-cassandra, scan-mongo ● scan-jdbc, scan-odbc, scan-jms (browse topic/queue), scan-* ● combined functionality operators: group-aggregate, ... ● sort-merge-join, hash-join, index-lookup-join ■ Etc...
  • 25. Thanks, Q&A ■ Download these slides ● http://www.mapr.com/company/events/pjug-1-15-2013 ■ Join the project ● drill-dev-subscribe@incubator.apache.org ● #apachedrill ■ Contact me: ● gshegalov@maprtech.com ■ Join MapR ● jobs@mapr.com