20th.陈晓鸣 百度海量日志分析架构及处理经验分享

  • 546 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
546
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. @chenxiaoming@baidu.com
  • 2. LOGLOG LSP DISQL
  • 3. !   46.70.93.94 - - [11/Nov/2011:11:11:11 -1100] "GET /book/ 1984.html HTTP/1.1“404 2326 http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=947 "Mozilla/ 5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 “
  • 4. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   "GET /book/1984.html HTTP/1.1"!   404!   2326!   "http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47"!   "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10 “
  • 5. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   "GET /book/1984.html HTTP/1.1"!   404!   2326!   "http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47"!   "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10 “
  • 6. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   "GET /book/1984.html HTTP/1.1"!   404!   2326!   "http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47"!   " Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10"
  • 7. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   GET /book/1984.html HTTP/1.1!   404!   2326!   "http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=94 7"!   "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  • 8. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   "GET /book/1984.html HTTP/1.1"!   404!   2326!   "http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=947"!   " Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  • 9. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   "GET /book/1984.html HTTP/1.1"!   404!   2326!   " http://www.baidu.com/s?wd=1984& rsv_bp=0&rsv_spt=3&inputT=947 "!   "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  • 10. ——
  • 11. ——
  • 12. LOG LSP DISQL
  • 13. •  •  •  • •  •  •  •  •  Ad$hoc •  ……
  • 14. LOG LSP DISQL
  • 15. !  …!   $!  B*S $!   $!   $
  • 16. !   C++$ $ !  C++ $!   SQL $!  PHP$+$C$ $ !   $ !  Schema $!   $!   !   $ PHP .so $
  • 17. LSPLOG LSP DISQL
  • 18. UI
  • 19. DQuery
  • 20. DISQLLOG LSP DISQL
  • 21. !  !   !   _Url _Res( ) !   _Url _Site !   !   !  !  !  JSON
  • 22. DQuery!  !   _Url _Res( )!   _Url _S!  !   !  ! !  JSON
  • 23. PHP-Callback
  • 24. C-callback
  • 25. !  PHP SQL ( ) !  SQL M/R ! !  DAG !  MapReduce ! !  SQL!  PHP!  C++ + C-Runtime NEW! !   RAII + !   Copy On Write !   schema !   C++ PHP
  • 26. ! ! !  parser!  JSON[ { "cmd": "load“, "path": null "using": "SchemaReader" "from": 17 "options": {"max_item_in_mem“: 100000} "include": [25] } , {"cmd":"filter"……}, {"cmd":"join"……},…… ……]
  • 27. SQL[ { "cmd": "load“, "path": null "using": "SchemaReader" "from": 17 "options": {"max_item_in_mem“: 100000} "include": [25] } , {"cmd":"filter"……}, {"cmd":"join"……},…… ……]
  • 28. !  !  !  !  !  !  !  !  ( ) !  MapReduce !   Schema !  schema !  C++ PHP DOT ! 
  • 29. GroupUnique Shuffle Map$Phase$ Reduce$ $ $ Limit$1 $ Group $ $ Combine$ $$ Group Count Shuffle $ Shuffle $ Reduce$ $ $ Reduce$ $ Count $ $ Sum $ Reduce$Phase
  • 30. Schemafield ID name age field ID scoretype uint64 string int32 type uint64 doubleindex 2 5 9 Index 0 1 join Field ID name age Score Type Uint64 string int32 double Index 2 5 9 10
  • 31. !  !  !   Combiner!   Cached Combiner!   key Join!  !   !  I/O
  • 32. !   PHP!   C++!   DOT!   / MapReduce
  • 33. Processor ——Pipes & Filter class$Processorinit()process()fini() class$ class$ class$ class$ Selector Filter Counter UserProcessor init() init() init() init() process( process( process( process() ) ) ) fini() fini() fini() fini()
  • 34. !  4 1 10 27 3540 4761 1221 +34.5% DQuery 1153 3359 2206 +191% 1569 2963 1394 +88.9%!  !   LSP 24% PM 1352 47.4% }$$67% DQuery 43% RD 1174 41.2% 33% OP 190 6.66% 136 4.77% 2852 100%
  • 35. LOG LSP DISQL
  • 36. !  !  ●!  !  ● ● ● …! !  LSP !  ● UI ●!  DISQL !  ● ● ● ●! !  !  (@ ) (chenxiaoming@baidu.com) !  Hadoop in China 12 2 2 20 DISQL2.0
  • 37. …… ……chenxiaoming@baidu.com
  • 38. 关注我们:t.baidu-tech.com 资料下载和详细介绍:infoq.com/cn/zones/baidu-salon“畅想•交流•争鸣•聚会”是百度技术沙龙的宗旨。 百度技术沙龙是由百度与InfoQ中文站定期组织的线下技术交流活动。目的是让中高端技术人员有一个相对自由的思想交流和交友沟通的的平台。主要分讲师分享和OpenSpace两个关键环节,每期只关注一个焦点话题。讲师分享和现场Q&A让大家了解百度和其他知名网站技术支持的先进实践经验,OpenSpace环节是百度技术沙龙主题的升华和展开,提供一个自由交流的平台。针对当期主题,参与者人人都可以发起话题,展开讨论。 InfoQ 策划·组织·实施 关注我们:weibo.com/infoqchina