Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Dremel            Interactive Analysis           of Web-Scale DatasetsSergey Melnik, Andrey Gubarev, Jing Jing Long, Geoff...
Outline●   Motivation●   Dremel – basic information●   Dremels Key Aspects    –   Columnar Format    –   Query Execution● ...
Motivation    Data                  Big Data●   Web-scale Datasets → more frequent●   Large-scale Data Analysis → essentia...
Dremel to the rescue!●   Interactive ad-hoc query system    Scalable     Fault tolerant   Fast                            ...
MapReduce or Dremel      or both        ?                      5
Key Aspects of Dremel●   Storage Format    –   Columnar storage representation for nested        data●   Query Language & ...
Storage FormatColumnar Storage Representation                                  7
Data Model     ●   Based on strongly-typed nested records                                            schemaRepetition  Lev...
Query Language & Execution          SQL & Multi-level Serving Tree  Tablet ContainsN rows from the table                  ...
Query Execution                 Query Dispatcher●   Schedules queries based on their priorities●   Balances the load      ...
ExperimentsEnvironment              11
ExperimentsLocal Disk - Performance                           12
Experiments                 MapReduce and DremelCounts the average number of terms in a specific field                    ...
ExperimentsImpact of Stragglers                       14
Experiments                          Scalability Selects top-20 adverts andTheir number of occurrences            In T4   ...
Whats happening today?●   Google BigQuery    –   Web Service [pay-per-query]●   Open Dremel → Apache Drill    –   Open Sou...
MapReduce or Dremel                  or both                                       ?                      MR           Dre...
ConclusionsMulti-level             ColumnarExecution                 Data  trees                  Layout      Scalable & E...
Dremel            Interactive Analysis           of Web-Scale DatasetsSergey Melnik, Andrey Gubarev, Jing Jing Long, Geoff...
References●   S. Melnik et al. Dremel: Interactive Analysis of Web-    Scale Datasets. PVLDB, 3(1):330–339, 2010●    G. Cz...
Upcoming SlideShare
Loading in …5
×

Google's Dremel

15,134 views

Published on

Course: Advanced Topics in Distributed Computing
30-minute presentation

Published in: Technology
  • Be the first to comment

Google's Dremel

  1. 1. Dremel Interactive Analysis of Web-Scale DatasetsSergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis Presented by Maria Stylianou marsty5@gmail.com November 8th, 2012 KTH – Royal Institute of Technology
  2. 2. Outline● Motivation● Dremel – basic information● Dremels Key Aspects – Columnar Format – Query Execution● Evaluation & Conclusions 2
  3. 3. Motivation Data Big Data● Web-scale Datasets → more frequent● Large-scale Data Analysis → essential! NOT FAST Speed Matters! 3
  4. 4. Dremel to the rescue!● Interactive ad-hoc query system Scalable Fault tolerant Fast Access data in place● Analysis on in situ nested data Non relational 4
  5. 5. MapReduce or Dremel or both ? 5
  6. 6. Key Aspects of Dremel● Storage Format – Columnar storage representation for nested data● Query Language & Execution – SQL & Multi-level serving tree 6
  7. 7. Storage FormatColumnar Storage Representation 7
  8. 8. Data Model ● Based on strongly-typed nested records schemaRepetition Level Definition Level records
  9. 9. Query Language & Execution SQL & Multi-level Serving Tree Tablet ContainsN rows from the table 9
  10. 10. Query Execution Query Dispatcher● Schedules queries based on their priorities● Balances the load Servers● Provides fault tolerance running – Handles stragglers slow – Tablets are three-way replicated 10
  11. 11. ExperimentsEnvironment 11
  12. 12. ExperimentsLocal Disk - Performance 12
  13. 13. Experiments MapReduce and DremelCounts the average number of terms in a specific field 3000 workers hours minutes seconds 13
  14. 14. ExperimentsImpact of Stragglers 14
  15. 15. Experiments Scalability Selects top-20 adverts andTheir number of occurrences In T4 15
  16. 16. Whats happening today?● Google BigQuery – Web Service [pay-per-query]● Open Dremel → Apache Drill – Open Source Implementation of Google BigQuery – Flexibility: broader range of query languages 16
  17. 17. MapReduce or Dremel or both ? MR DremelData Processing Record Column Oriented OrientedIn-situ Processing No Yes!Size of Queries Large Small/Medium MapReduce AND Dremel 17
  18. 18. ConclusionsMulti-level ColumnarExecution Data trees Layout Scalable & Efficient MapReduce benefits Near-linear scalability 18
  19. 19. Dremel Interactive Analysis of Web-Scale DatasetsSergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis Presented by Maria Stylianou marsty5@gmail.com November 8th, 2012 KTH – Royal Institute of Technology
  20. 20. References● S. Melnik et al. Dremel: Interactive Analysis of Web- Scale Datasets. PVLDB, 3(1):330–339, 2010● G. Czajkowski. Sorting 1PB with MapReduce. http://googleblog.blogspot.se/2008/11/sorting-1pb-with-mapreduce.html● Apache Drill, http://wiki.apache.org/incubator/DrillProposal● Google BigQuery, https://developers.google.com/bigquery/

×