• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Dremel: interactive analysis of web-scale datasets
 

Dremel: interactive analysis of web-scale datasets

on

  • 1,058 views

 

Statistics

Views

Total Views
1,058
Views on SlideShare
1,058
Embed Views
0

Actions

Likes
2
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Dremel: interactive analysis of web-scale datasets Dremel: interactive analysis of web-scale datasets Presentation Transcript

    • Dremel: Interactive Analysis of Web-Scale Datasets Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis Google, Inc. VLDB, 2010 Presentation by ensky (enskylin@gmail.com)
    • Outline Introduction Nested columnar storage Query processing Experiments Conclusion 2
    • Introduction Dremel is an query system For analysis of read-only nested data Use case Interactive Trends Tools Detection Web Spam Network Dashboards Optimization 3
    • DocId: 10 Links Features Forward: 20 Name Language Code: en-us Country: us  Multi-level structure Url: http://A Name  SQL-like query language Url: http://B  Fast: Trillion-row tables in second level  Big scale: thousands of CPUs and petabytes of data  Widely used by google: in production since 2006SELECT A, COUNT(B) FROM TGROUP BY AT = {/gfs/1, /gfs/2, …, /gfs/100000} 4
    • Contribution This paper presented two major technique in Dremel: Nested columnar storage ◦ store / split into columns ◦ Assembly to record Query ◦ Language ◦ Execution 5
    • Outline Introduction Nested columnar storage Query processing Experiments Conclusion 6
    • Why nested? Flexible ◦ Data in web and scientific is often non- relational Reduce cost ◦ Normalizing and recombining nested data is often prohibited in TB, PB scale of data. 7
    • Why column? DocId: 10 Links column-oriented Forward: 20 Name A * * ... Language Code: en-us Country: us B E Url: http://A * C D r1 Name Url: http://B r1 r1 r1 r2 r2 r2 Read less, r2 cheaper Record-oriented decompressionChallenge: preserve structure, reconstruct from a subset of fields 8
    • DocId: 10 Nested data modelLinks Forward: 20 message Document { Forward: 40 Forward: 60 required int64 DocId; [1,1]Name optional group Links { Language Code: en-us repeated int64 Backward; [0,*] Country: us repeated int64 Forward; Language } Code: en Url: http://A repeated group Name {Name repeated group Language { Url: http://BName required string Code; Language optional string Country; [0,1] Code: en-gb Country: gb } optional string Url;DocId: 20 }Links } Backward: 10 Backward: 30 Forward: 80Name http://code.google.com/apis/protocolbuffers Url: http://C 9
    • Column-striped representation DocId Name.Url Links.Forward Links.Backward value r d value r d value r d value r d 10 0 0 http://A 0 2 20 0 2 NULL 0 1 20 0 0 http://B 1 2 40 1 2 10 0 2DocId: 10 NULL 1 1 60 1 2 30 1 2Links http://C 0 2 80 0 2 Forward: 20 Forward: 40 Forward: 60Name Language Name.Language.Code Name.Language.Country Code: en-us Country: us value r d value r d Language en-us 0 2 us 0 3 Code: en Url: http://A en 2 2 NULL 2 2Name Url: http://B NULL 1 1 NULL 1 1Name Language en-gb 1 2 gb 1 3 Code: en-gb NULL 0 1 NULL 0 1 Country: gb 10
    • Column-oriented problem When query sub-tree, there is no way for you to know the whole structure(even the position of yourself) A * * B ... E To solve this problem, * this paper presents C D repetition & definition r1 r1 r1 r2 r2 Where am I? r2 11
    • Repetition and definition levels r DocId: 10 Links 1 Forward: 20 Forward: 40 Name.Language.Code Forward: 60 value r d Name First time, repeat=0 Language en-us 0 2 Code: en-us Language repeat, level = 2 Country: us en 2 2 Language Code: en NULL 1 1 Url: http://A en-gb 1 2 Name Url: http://B NULL 0 1 Name Language Code: en-gb None repeat, level = 0 Country: gb DocId: 20 2 r Links Backward: 10 Backward: 30 r: At what repeated field in the fields path Forward: 80 the value has repeated Name Url: http://C12
    • Repetition and definition levels r DocId: 10 Links 1 Forward: 20 Forward: 40 Name.Language.Country Forward: 60 value r d Name Language us 0 3 Code: en-us Country: us NULL 2 2 Language Code: en NULL 1 1 Url: http://A gb 1 3 Name Url: http://B NULL 0 1 Name Language Code: en-gb Country: gb DocId: 20 2 r Links Backward: 10 Backward: 30 d: How many fields in paths that could be Forward: 80 Name undefined (opt. or rep.) are actually present Url: http://C13
    • Column-oriented is all right,but what if I still need record-oriented query? (e.g., MapReduce) 14
    • Record assembly FSM Transitions labeled with repetition levels DocId 0 0 1 1 Links.Backward Links.Forward 0 0,1,2 Name.Language.Code Name.Language.Country 2 Name.Ur 0,1 1 l 0For record-oriented data processing (e.g., MapReduce) 15
    • Reading two fields DocId: 10 s 1 Name Language DocId Country: us Language 0 Name Name 1,2 Name.Language.Country Language 0 Country: gb DocId: 20 s2 NameStructure of parent fields is preserved.Useful for queries like /Name[3]/Language[1]/Country 16
    • Outline Introduction Nested columnar storage Query processing Experiments Conclusion 17
    • Query language Based on SQL Performs ◦ Projection ◦ Selection ◦ Nested subqueries ◦ inner and intra-record aggregation ◦ Top-k ◦ Joins ◦ User-defined functions 18
    • DocId: 10 Links Forward: 20 Forward: 40 Example usage Forward: 60 Name Language Code: en-us Country: usSELECT DocId AS Id, Language Code: en COUNT(Name.Language.Code) WITHIN Name AS Cnt, Url: http://A Name Name.Url + , + Name.Language.Code AS Str Url: http://B NameFROM t LanguageWHERE REGEXP(Name.Url, ^http) AND DocId < 20; Code: en-gb Country: gbOutput table Output schemaId: 10 t1 message QueryResult {Name required int64 Id; Cnt: 2 repeated group Name { Language optional uint64 Cnt; Str: http://A,en-us repeated group Language { Str: http://A,en optional string Str;Name } Cnt: 0 } } 19
    • Query execution client • Parallelizes schedulingroot server and aggregation • Fault toleranceintermediate ... • query dispatcher canservers dispatch serversleaf servers ...(with local ... storage) storage layer (e.g., GFS) 20
    • Example: count() SELECT A, COUNT(B) FROM T GROUP SELECT A, SUM(c)0 BY A FROM (R11 UNION ALL R110) T = {/gfs/1, /gfs/2, …, /gfs/100000} GROUP BY A R11 R12 SELECT A, COUNT(B) AS c SELECT A, COUNT(B) AS c1 FROM T11 GROUP BY A FROM T12 GROUP BY A T11 = {/gfs/1, …, /gfs/10000} T12 = {/gfs/10001, …, /gfs/20000}... SELECT A, COUNT(B) AS c3 FROM T31 GROUP BY A ... T31 = {/gfs/1} Data access ops 21
    • Outline Introduction Nested columnar storage Query processing Experiments Conclusion 22
    • Experiment Data• 1 PB of real data (uncompressed, non-replicated)• 100K-800K tablets per table• Experiments run during business hoursTable Number of Size (unrepl., Number Data Repl.name records compressed) of fields center factor T1 85 billion 87 TB 270 A 3× T2 24 billion 13 TB 530 A 3× T3 4 billion 70 TB 1200 A 3× T4 1+ trillion 105 TB 50 B 3× T5 1+ trillion 20 TB 30 B 2× 23
    • Column v.s. Record"cold" time on local disk, time (sec) averaged over 30 runs (e) parse as from records C++ objects10x speedupusing columnar objectsstorage (d) read + decompress records (c) parse as from columns columns C++ objects (b) assemble2-4x overhead of recordsusing records (a) read + decompress number of fields Table partition: 375 MB (compressed), 300K rows, 125 columns 24
    • MR and Dremel execution Avg # of terms in txtField in 85 billion record table T1 execution time (sec) on 3000 nodes Sawzall program ran on MR: num_recs: table sum of int; num_words: table sum of int; emit num_recs <- 1; emit num_words <- count_words(input.txtField); 87 TB 0.5 TB 0.5 TBQ1: SELECT SUM(count_words(txtField)) / COUNT(*) FROM T1MR overheads: launch jobs, schedule 0.5M tasks, assemble record 25
    • Impact of serving tree depth execution time (sec) (returns 100s of records) (returns 1M records)Q2: SELECT country, SUM(item.amount) FROM T2 GROUP BY countryQ3: SELECT domain, SUM(item.amount) FROM T2 WHERE domain CONTAINS ’.net’ GROUP BY domain 40 billion nested items26
    • Scalability execution time (sec) number of leaf serversQ5 on a trillion-row table T4: SELECT TOP(aid, 20), COUNT(*) FROM T4 27
    • Outline Introduction Nested columnar storage Query processing Experiments Conclusion 28
    • Observation Monthly query workload of one 3000-nodepercentage of queries Dremel instance execution time (sec) Most queries complete under 10 sec 29
    • Conclusion Dremel is an query system For analysis of read-only nested data Main feature is fast(interactive response time), column-oriented, SQL-like Query language Introduced two major method: ◦ Nested columnar storage – in order to solve partial query problem. ◦ Query processing – parallel processing & distributing, decompressing queries 30
    • Comments Google is awesome! Pros ◦ Nested storage gives us more flexibility. ◦ repetition & definition is an novel idea and it can solve the locality problem easily. ◦ Distributed serving tree is awesome, faster than MR. 31
    • Comments Cons ◦ Read-only may not fit every requirement ◦ Dremel is not a database, so you’ll need to convert your real data into dremel when analyzing  Converting may cost lost of time and space  Google doesn’t care this problem, they have GFS and many servers. 32
    • Thank you for listening. 33