Your SlideShare is downloading. ×
0
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Invisible loading
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Invisible loading

1,607

Published on

Invisible Loading Talk by Azza Abouzied at the VLDB Workshop on End-to-end Management of Big Data 2012

Invisible Loading Talk by Azza Abouzied at the VLDB Workshop on End-to-end Management of Big Data 2012

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,607
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Invisible  Loading   Yalies:  Azza  Abouzied,    Daniel  Abadi,  Avi  Silberschatz   BigData  2012  
  • 2. Problem:  The  Crying  Baby  
  • 3. Two  ways  to  deal  with  this:   Immediate  GraDficaDon   Long  term  $$$  costs   Misery  &  sleep  deprivaDon   Long  term  benefits  
  • 4. The  Crying  Baby  Problem   Wants  A(en*on  Now!   ≈  The  ImpaDent  Boss  Problem   Wants  Answers  Now!  
  • 5. Two  ways  to  analyze  data  MapReduce  way   Immediate  GraDficaDon   Hack it: Locate Determine Parse Long-­‐term  cumulaDve  costs     Key File +Map because  MR  is  slow!   Attributes +ReduceDB  &  HadoopDB  way   Organize Query: Figure Determine Process Locate or Index out Load File Key File DB without schema Attributes tables Parse Misery  &  sleep  deprivaDon   Long  term  benefits  
  • 6. The  Problem  Can  we  get  the  immediate  gra*fica*on  of  working  with  MapReduce  and  make  progress  towards  the  performances  advantages  of  working  with  Databases?    
  • 7. Our  SoluDon  Begin  with  the  MapReduce  Way   File System Write Determine Map/ Locate Run it! Key Reduce File Attributes Scripts Database System BEHIND-­‐THE-­‐SCENCES   PER  JOB   Organize Figure or Index out Load File DB schema tables INCREMENTALLY  
  • 8. Figureout P1)  How  to  automaDcally  figure  schema out  a  schema?  Short  answer:  DON’T    Split  map  phase  into  Parse  and  Map  phases.      Enforce  a  simple  Parse  API:  Parser  has  one  output  method:  getField(int  id)    Name  a  table  aZer  its  Parser-­‐implementaDon  and  label  a[ributes  with  their  field  id.    Different  parsers  on  the  same  file  result  in  different  tables.  
  • 9. Incrementally P2)  How  to  load  files  with   minimal  marginal  costs?   Load File•  Load  only  touched  a[ributes  (VerDcal   ParDDon)   –  Requires  a  Column-­‐Store  •  Load  only  parts  of  a  column  (Horizontal   parDDon)   –  AZer  a  file-­‐split  is  processed  by  Map,  its  touched   a[ributes  are  loaded  enDrely     –  How  many  splits  of  a  file  is  a  tunable  parameter.    
  • 10. Tuple  construcDon  Some  columns  are  at  different  loading  stages.   –  Maintain  OIDs  for  each  column:  an  address   column     •  The  OIDs  assigned  are  equivalent  to  the  inserDon  order   –  Keep  a  catalog  to  track  loading  progress   a b c d Process  in  DB   Use  File  System  
  • 11. Incrementally P3)  How  to  index  a  parDally-­‐ loaded  table?  Organize fileIf  a  selec*on  filter  is  applied  on  an  a(ribute,  we  organize  it.    Dealing  with  parDally  loaded  a[ributes   c1 c2 address $ ! # & c1 % " $ column # ## % ( !!"#$$$ ! !! ) % %"#$$$ ) % * & &"#$$$ * & ! !! JOIN !"#$$$ ( ! ( ! "#$$$ & , ( &"#$$$ + & + & ("#$$$ , ( & ! !%"#$$$ !% !% " ( ## &
  • 12. Choosing  an  organizaDon  strategy  •  Why  not  use  merge  sort?     ./01#2# 3/45 !!"#$$$ % % %"#$$$ & ! % &"#$$$ !! ( ! & + !"#$$$ ! & , "#$$$ ( ( &"#$$$ & !% & ("#$$$ !! & !%"#$$$ !% ) - )"#$$$ ) + *"#$$$ * , * !+"#$$$ !+ ) !% - !! ,"#$$$ + * !+ !,"#$$$ , !+ !, +"#$$$ - !, !& -"#$$$ !, !& !&"#$$$ !& 367859#8:#;<=3# 378A>9#/B#5C>#A/7D@:#8:# =87>#3?95>@ 1050E09>
  • 13. Incremental  Merge  Sort   0123#4# 892:;#! 892:;#+ 5167 5<=>7/?>7.#!%%%? 5<=>7/?>7.#%!%%? !!"#$$$ % % ! % %"#$$$ & ! &"#$$$ !! ( %.#%/, & + & , !"#$$$ ! "#$$$ ( &"#$$$ * & !.#/!! ("#$$$ !% !% !%"#$$$ !! !! !% + ( )"#$$$ ) , & *"#$$$ * +.#(/- ) & !+"#$$$ !+ - ) - ,"#$$$ + * !,"#$$$ , !+ !+ ,.#!+/!& +"#$$$ - !, !, -"#$$$ !, !& !& !&"#$$$ !& 5<=>7:#>@#ABC5# 5=>F;:#1G#79;#F1=HE@#>@# 5>E<=;#>@3;I C>=;#5D:7;E 3272?2:;
  • 14. EVALUATION  
  • 15. Setup  •  Single-­‐Machine  Experiments   –  Embarrassingly  parallel   –  No  distributed  reorganizaDon  or  parDDoning  •  MonetDB  (hacked  to  support  IMS)  •  Hadoop  •  2  GB  file  of  5  integer  a[ributes:  107,374,182   tuples.    •  See  paper  for  more  details  
  • 16. The  big  picture   800 SQL Pre-load Incremental Reorganization (5/5) Incremental Reorganization (2/5) 700 Invisible Loading (5/5) Invisible Loading (2/5) MapReduce 600 500Time in Seconds 400 300 200 100 0 1 10 100 Job Sequence
  • 17. CumulaDve  costs   100000 SQL Pre-load Incremental Reorganization (5/5) Incremental Reorganization (2/5) Invisible Loading (5/5) Invisible Loading (2/5) MapReduceCumulative Time Spent in Seconds 10000 1000 100 1 10 100 Job Sequence
  • 18. Change  the  access  pa[ern   800 SQL Pre-load Incremental Reorganization (5/5) Incremental Reorganization (2/5) 700 Invisible Loading (5/5) Invisible Loading (2/5) MapReduce 600 500Time in Seconds 400 300 200 100 0 1 10 83 85 87 89 91 93 Job Sequence (Log scale) Job Sequence (Linear scale)
  • 19. Further  EvaluaDon  (Paper)  •  In-­‐depth  study  of  IMS   –  Comparison  with  Cracking  and  Pre-­‐sorDng   –  Effect  of  integraDng  Lightweight  compressions   into  IMS.  •  Li[le  mini-­‐experiments   –  InserDon  vs.  Copy   –  Processing  in  DB  vs.  using  DB  as  a  fast  access   medium  with  all  processing  in  MapReduce  
  • 20. Conclusion:  Lessons  Learned  •  Engineering  Nightmare   –  Many  complemenDng  technologies   •  Manimal,  AdapDve  Merging  …   –  In  the  era  of  Big-­‐Data  we  need  to  design  more   modular,  plug-­‐n-­‐play  tools  •  Can  of  worms   –  Most  BigData  problems  look  decepDvely  simple   unDl  you  start  mocking  around.  
  • 21. Some  problems  are  easier  than  others  
  • 22. Thanks!  QuesDons?  
  • 23. Why  is  loading  this  log  file  hard?   !"#$%&#%()%*+,-+,++%*+(*.%!/0010.%!23/$4%(*56+6+6(.%!"#$%&#%()%*+,-+,++%*+(*.%789:68,%;<=>*%?%@A#/0:(-B*-C)5*D@EF%0/G/0/0,%H448,II129 !"#$%&#%()%*+,-+,++%*+(*.%!/0010.%!23/$4%(*56+6+6(.%!"#$%&#%()%*+,-+,++%*+(*.%789:68,%;<=>-%?%@137J@EF%0/G/0/0,%H448,II129H1J4I789:IK !"#$%&#%()%*+,-+,++%*+(*.%!/0010.%!23/$4%(*56+6+6(.%!"#$%&#%()%*+,-+,++%*+(*.%789:68,%;<=>B%?%@!PJ4#7/$4Q+PFPJ4#7/$4Q(PFPJ4#7/$4Q*PF Message  field   !"#$%&#%()%*+,-+,++%*+(*.%!/0010.%!23/$4%(*56+6+6(.%!"#$%&#%()%*+,-+,++%*+(*.%789:68,%;<=>)%?%@21OO9$7@EF%0/G/0/0,%H448,II129H1J4I78 varies   !"#$%&#%()%*+,-+,++%*+(*.%!/0010.%!23/$4%(*56+6+6(.%!"#$%&#%()%*+,-+,++%*+(*.%789:68,%;<=>R%?%@/S9#94/T4#8/J@EF%0/G/0/0,%H448,II129H !"#$%&#%()%*(,+*,*+%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321What  is  the   depending  on   !Z1$%&#%(R%(+,)),-*%*+(*.%!/0010.%!23/$4%(+C6((C6(+56D).%[$S937%O/4H17%3$%0/A#/J4%)]V/G]VC9]V7B]V729]V(2]]^]V(D]VD7]V9)6]V+-_base  schema?   applicaDon!     !Z1$%&#%(R%((,-*,*D%*+(*.%!/0010.%!23/$4%C*6B+6*)-6(*5.%[$S937%`>[%3$%0/A#/J4%]S]V2+]VCC1]VG-]V9B!]V/L#]VCRI]V+G]VL*;]V()]VG-C !Z1$%&#%(R%(*,(+,BD%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321Time,  Type,   !Z1$%&#%(R%(*,(+,))%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321 !N#/%&#%(5%*+,((,B+%*+(*.%!$1432/.%29#aH4%"[bNc>ZF%JH#443$a%71K$Message  ?   !U03%&#%*+%(C,(-,(C%*+(*.%!K90$.%[$34,%"/JJ31$%d92H/%3J%$14%21$G3a#0/7%!H3$4,%""W"/JJ31$d92H/. H4487,%d1#7%$14%0/39L:%7/4/0O3$/%4H/%J/0S/0@J%G#:%A#93G3/7%71O93$%$9O/F%#J3$a%/V2//$2/6129%G10%"/0S/0e9O/ !U03%&#%*+%(C,(-,*+%*+(*.%!$1432/.%Y3a/J4,%a/$/0943$a%J/20/4%G10%73a/J4%9#4H/$4329431$%666Different  tables   !U03%&#%*+%(C,(-,*+%*+(*.%!$1432/.%Y3a/J4,%71$/ Context-­‐dependent  for  each  type?   Schema  Awareness   !U03%&#%*+%(C,(-,*+%*+(*.%!$1432/.%=892H/I*6*6*(%f`$3Vg%O17TJJI*6*6*(%h8/$""WI+6C6D0%Y=<I*%O17TG2a37I*6-6R%21$G3a#0/7%ii%0/J#O3$a%$ !U03%&#%*+%(C,(-,*-%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321 Different  analysts  know   !U03%&#%*+%(C,(B,+D%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321 !"94%&#%*(%(R,-C,*5%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321 the  schema  of  what   !"94%&#%*(%(R,)+,*5%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321 they  are  looking  for  and   !"94%&#%*(%(5,+B,*)%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321 !"94%&#%*(%(5,+R,)R%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321 don’t  care  about  other   !"94%&#%*(%(5,)+,(-%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321 log  messages   !"94%&#%*(%(5,)+,*R%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321 !"94%&#%*(%(C,(C,*+%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321 !"94%&#%*(%(C,(C,--%*+(*.%!/0010.%!23/$4%(*56+6+6(.%U3/%71/J%$14%/V3J4,%IW3L090:IX/L"/0S/0IY12#O/$4JIG9S321$6321 !N#/%&#%*B%(5,)B,B)%*+(*.%!$1432/.%29#aH4%"[bNc>ZF%JH#443$a%71K$

×