Introduction to DISQL, a distributed programming framework widely used in Baidu

7,594 views

Published on

1 Comment
19 Likes
Statistics
Notes
No Downloads
Views
Total views
7,594
On SlideShare
0
From Embeds
0
Number of Embeds
55
Actions
Shares
0
Downloads
189
Comments
1
Likes
19
Embeds 0
No embeds

No notes for slide

Introduction to DISQL, a distributed programming framework widely used in Baidu

  1. 1. Introduction to DISQL<br />Chen Xiaoming<br />Senior Engineer of Baidu IBASE Dept.<br />陈晓鸣<br />百度基础平台部<br />高级工程师<br />1<br />
  2. 2. What is DISQL?<br />2<br />
  3. 3. DISQL is a distributed programming frameworkwidely used in Baidu<br />3<br />
  4. 4. Contents<br />Problems<br />Solution<br />Examples<br />Rationales<br />Adoption<br />4<br />
  5. 5. Problems<br />5<br />
  6. 6. Problems<br />statistical analysis of logs<br />extraction of fields<br />in order to generate reports<br />6<br />
  7. 7. Problems<br />statistical analysis of features <br />features of web pages, web sites, ads, user preferences, etc<br />in order to provide data for data mining and machine learning<br />7<br />
  8. 8. Problems<br />common operations<br />selecting, filtering, grouping, sorting, joining, etc<br />8<br />
  9. 9. Solution<br />9<br />
  10. 10. A Platform<br />named Log Statistical Platform, a.k.a. LSP<br />web-based<br />convenient for secondary development<br />convenient for task/data/rights management<br />10<br />
  11. 11. A Programming Framework<br />named DIstributed SQL, a.k.a. DISQL<br />provide SQL-like operators which can be combined arbitrarily<br />encapsulate distributed algorithms<br />automatic code generation<br />11<br />
  12. 12. Application Programming Interfaces<br />named Distributed Query, a.k.a. DQuery<br />DSL-style APIsembedded in well-known programming languages<br />PHP so far, C++/Python,… in the future<br />using method chainingtechnique to provide fluent interface<br />data-flow in the form of DAGcomposed by chains of methods<br />12<br />
  13. 13. Three Edit Modes – Simple Mode<br />13<br />
  14. 14. Three Edit Modes – DQuery Mode<br />14<br />
  15. 15. Three Edit Modes – Complex Mode<br />15<br />
  16. 16. Hierarchy <br />16<br />
  17. 17. DISQL Architecture<br />Simple Mode<br />DQuery Mode<br />Complex<br />Mode<br />Edit Modes<br />PHP<br />C++<br />Python<br />APIs<br />Normalizer<br />Optimizer<br />Splitter<br />Planner<br />Coder<br />Translators<br />Data-flow<br />Schema<br />Storage APIs<br />Computing APIs<br />17<br />Runtimes<br />
  18. 18. LSP Architecture<br />18<br />data presentation & monitoring <br />third party apps<br />data access layer<br />data management layer<br />computing layer<br />storage systems<br />computing systems<br />
  19. 19. Examples<br />19<br />
  20. 20. Example 1 – word count<br />20<br />
  21. 21. Example 2<br />given a log of query and ad shows<br />extract site field from url field<br />filter sites with regex<br />calculate the amount of query and ad shows per site<br /> output in JSON format<br />21<br />
  22. 22. Code in DQuery Mode<br />22<br />
  23. 23. Rationales<br />23<br />
  24. 24. Use Case Driven VS Completeness<br />Our Solution<br />Problem<br />Problem<br />Problem<br />Problem<br />24<br />
  25. 25. Internal DSL VS External DSL<br />take advantage of:<br />parsers, libraries and VMs of the host languages<br />users and communities<br />language features<br />different from Pig, Hive, Sawzall, etc<br />25<br />
  26. 26. Open/Closed Principles<br />“open for extension, closed for modification”<br />open for single machine algorithms, closed for distributed algorithms<br />also different from Pig, Hive, Sawzall, …<br />26<br />
  27. 27. Adoption<br />27<br />
  28. 28. Users<br />……<br />……<br />28<br />
  29. 29. Usage<br />throughput/day: hundreds of TB<br />tasks/day: thousands<br />total tasks: > 1 million<br />29<br />
  30. 30. Q&A<br />also welcome to contact me with:<br /><ul><li>Twitter: @acumon
  31. 31. Email: chenxiaoming@baidu.com
  32. 32. Gmail/Gtalk: acumoncxm@gmail.com </li></ul>30<br />
  33. 33. The End<br />THANK YOU!<br />31<br />

×