Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
@chenxiaoming@baidu.com
LOGLOG  LSP  DISQL
!   46.70.93.94 - - [11/Nov/2011:11:11:11 -1100] "GET /book/    1984.html HTTP/1.1“404 2326 http://www.baidu.com/s?  wd=19...
——!       46.70.93.94 - -!       [11/Nov/2011:11:11:11 -1100]!       "GET /book/1984.html HTTP/1.1"!       404!       2326...
——!       46.70.93.94 - -!       [11/Nov/2011:11:11:11 -1100]!       "GET /book/1984.html HTTP/1.1"!       404!       2326...
——!       46.70.93.94 - -!       [11/Nov/2011:11:11:11 -1100]!       "GET /book/1984.html HTTP/1.1"!       404!       2326...
——!       46.70.93.94 - -!       [11/Nov/2011:11:11:11 -1100]!       GET /book/1984.html HTTP/1.1!       404!       2326! ...
——!       46.70.93.94 - -!       [11/Nov/2011:11:11:11 -1100]!       "GET /book/1984.html HTTP/1.1"!       404!       2326...
——!       46.70.93.94 - -!       [11/Nov/2011:11:11:11 -1100]!       "GET /book/1984.html HTTP/1.1"!       404!       2326...
——
——
LOG  LSP  DISQL
•    •    •           • •    •                •                       •           •  Ad$hoc                      •  ……
LOG  LSP  DISQL
!    …!                      $!    B*S       $!          $!                  $
!           C++$       $                              !    C++                 $!     SQL                 $!    PHP$+$C$  ...
LSPLOG  LSP  DISQL
UI
DQuery
DISQLLOG  LSP  DISQL
!      !         !            _Url   _Res(        )     !                       _Url          _Site     !         !       ...
DQuery!    !            _Url   _Res(        )!                       _Url          _S!    !            !         ! !      ...
PHP-Callback
C-callback
!     PHP           SQL    (         )     !           SQL      M/R          ! !                         DAG     !        ...
! ! !                         parser!               JSON[     {         "cmd": "load“,         "path": null         "using...
SQL[    {        "cmd": "load“,        "path": null        "using": "SchemaReader"        "from": 17        "options":    ...
!      !           !      !           !      !           !      !            (     )          !            MapReduce     !...
GroupUnique             Shuffle                              Map$Phase$                   Reduce$                           ...
Schemafield     ID    name      age                field       ID     scoretype    uint64 string   int32               type ...
!    !    !     Combiner!     Cached Combiner!     key Join!    !           !                I/O
!     PHP!     C++!     DOT!         / MapReduce
Processor        ——Pipes & Filter  class$Processorinit()process()fini()            class$        class$         class$     ...
!           4 1        10 27           3540        4761        1221          +34.5% DQuery    1153        3359        2206...
LOG  LSP  DISQL
!      !             ●!      !              ●             ●            ●                           …! !         LSP     ! ...
……                  ……chenxiaoming@baidu.com
关注我们:t.baidu-tech.com          资料下载和详细介绍:infoq.com/cn/zones/baidu-salon“畅想•交流•争鸣•聚会”是百度技术沙龙的宗旨。 百度技术沙龙是由百度与InfoQ中文站定期组织的线下...
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
Upcoming SlideShare
Loading in …5
×

20th.陈晓鸣 百度海量日志分析架构及处理经验分享

1,697 views

Published on

  • Be the first to comment

20th.陈晓鸣 百度海量日志分析架构及处理经验分享

  1. 1. @chenxiaoming@baidu.com
  2. 2. LOGLOG LSP DISQL
  3. 3. !   46.70.93.94 - - [11/Nov/2011:11:11:11 -1100] "GET /book/ 1984.html HTTP/1.1“404 2326 http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=947 "Mozilla/ 5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 “
  4. 4. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   "GET /book/1984.html HTTP/1.1"!   404!   2326!   "http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47"!   "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10 “
  5. 5. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   "GET /book/1984.html HTTP/1.1"!   404!   2326!   "http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47"!   "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10 “
  6. 6. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   "GET /book/1984.html HTTP/1.1"!   404!   2326!   "http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47"!   " Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10"
  7. 7. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   GET /book/1984.html HTTP/1.1!   404!   2326!   "http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=94 7"!   "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  8. 8. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   "GET /book/1984.html HTTP/1.1"!   404!   2326!   "http://www.baidu.com/s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=947"!   " Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  9. 9. ——!   46.70.93.94 - -!   [11/Nov/2011:11:11:11 -1100]!   "GET /book/1984.html HTTP/1.1"!   404!   2326!   " http://www.baidu.com/s?wd=1984& rsv_bp=0&rsv_spt=3&inputT=947 "!   "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  10. 10. ——
  11. 11. ——
  12. 12. LOG LSP DISQL
  13. 13. •  •  •  • •  •  •  •  •  Ad$hoc •  ……
  14. 14. LOG LSP DISQL
  15. 15. !  …!   $!  B*S $!   $!   $
  16. 16. !   C++$ $ !  C++ $!   SQL $!  PHP$+$C$ $ !   $ !  Schema $!   $!   !   $ PHP .so $
  17. 17. LSPLOG LSP DISQL
  18. 18. UI
  19. 19. DQuery
  20. 20. DISQLLOG LSP DISQL
  21. 21. !  !   !   _Url _Res( ) !   _Url _Site !   !   !  !  !  JSON
  22. 22. DQuery!  !   _Url _Res( )!   _Url _S!  !   !  ! !  JSON
  23. 23. PHP-Callback
  24. 24. C-callback
  25. 25. !  PHP SQL ( ) !  SQL M/R ! !  DAG !  MapReduce ! !  SQL!  PHP!  C++ + C-Runtime NEW! !   RAII + !   Copy On Write !   schema !   C++ PHP
  26. 26. ! ! !  parser!  JSON[ { "cmd": "load“, "path": null "using": "SchemaReader" "from": 17 "options": {"max_item_in_mem“: 100000} "include": [25] } , {"cmd":"filter"……}, {"cmd":"join"……},…… ……]
  27. 27. SQL[ { "cmd": "load“, "path": null "using": "SchemaReader" "from": 17 "options": {"max_item_in_mem“: 100000} "include": [25] } , {"cmd":"filter"……}, {"cmd":"join"……},…… ……]
  28. 28. !  !  !  !  !  !  !  !  ( ) !  MapReduce !   Schema !  schema !  C++ PHP DOT ! 
  29. 29. GroupUnique Shuffle Map$Phase$ Reduce$ $ $ Limit$1 $ Group $ $ Combine$ $$ Group Count Shuffle $ Shuffle $ Reduce$ $ $ Reduce$ $ Count $ $ Sum $ Reduce$Phase
  30. 30. Schemafield ID name age field ID scoretype uint64 string int32 type uint64 doubleindex 2 5 9 Index 0 1 join Field ID name age Score Type Uint64 string int32 double Index 2 5 9 10
  31. 31. !  !  !   Combiner!   Cached Combiner!   key Join!  !   !  I/O
  32. 32. !   PHP!   C++!   DOT!   / MapReduce
  33. 33. Processor ——Pipes & Filter class$Processorinit()process()fini() class$ class$ class$ class$ Selector Filter Counter UserProcessor init() init() init() init() process( process( process( process() ) ) ) fini() fini() fini() fini()
  34. 34. !  4 1 10 27 3540 4761 1221 +34.5% DQuery 1153 3359 2206 +191% 1569 2963 1394 +88.9%!  !   LSP 24% PM 1352 47.4% }$$67% DQuery 43% RD 1174 41.2% 33% OP 190 6.66% 136 4.77% 2852 100%
  35. 35. LOG LSP DISQL
  36. 36. !  !  ●!  !  ● ● ● …! !  LSP !  ● UI ●!  DISQL !  ● ● ● ●! !  !  (@ ) (chenxiaoming@baidu.com) !  Hadoop in China 12 2 2 20 DISQL2.0
  37. 37. …… ……chenxiaoming@baidu.com
  38. 38. 关注我们:t.baidu-tech.com 资料下载和详细介绍:infoq.com/cn/zones/baidu-salon“畅想•交流•争鸣•聚会”是百度技术沙龙的宗旨。 百度技术沙龙是由百度与InfoQ中文站定期组织的线下技术交流活动。目的是让中高端技术人员有一个相对自由的思想交流和交友沟通的的平台。主要分讲师分享和OpenSpace两个关键环节,每期只关注一个焦点话题。讲师分享和现场Q&A让大家了解百度和其他知名网站技术支持的先进实践经验,OpenSpace环节是百度技术沙龙主题的升华和展开,提供一个自由交流的平台。针对当期主题,参与者人人都可以发起话题,展开讨论。 InfoQ 策划·组织·实施 关注我们:weibo.com/infoqchina

×