Analysis Farm:               A Cloud-based Scalable Aggregation and               Query Platform for Network Log Analysis ...
Outline• Background• Design and Implementation• Experimental Results• Summary
Outline• Background• Design and Implementation• Experimental Results• Summary
BackgroundMotivation: An Overview of SJTU Networks                   • Serving 50,000 people                   • 10Gb WDM,...
BackgroundMotivation: An Overview of SJTU Networks (cont.)                      Applications such as BT etc. use          ...
BackgroundDeployment of the Network Log Analysis System                                      6Gbps               ~3MBytes/...
Background                Network Log Analysis System                  6Gbps               ~3MBytes/s                  500...
Background   Network Log Analysis System: Border Router            • Handle all incoming and outgoing traffic            • C...
Background     Network Log Analysis System: DPI Engine• Input: 6Gbps raw network traffic• Output: 3MBytes/s syslog messages•...
Background Network Log Analysis System: Syslog Collector              • Java-written syslog collector              • Runni...
Background   Network Log Analysis System: Analysis Farm                                                 • Store log       ...
Background            Log Analysis Tasks• Aggregating • Get the overall usage of network border• Querying • Inspect networ...
Background             Log Analysis Tasks400 million log records per day (350GByte) !
Background        Research Challenges• Storage Scalability• Computation Scalability• Query Agility
Background                Related Work• loggly.com  • “Logging as a Service”• Yottaa.com  • Log-based Website performance ...
Outline• Background• Design and Implementation• Experimental Results• Summary
Design and ImplementationOur Approach: Cloud Computing + NoSQL• Cloud Computing • manageable, scalable, on demand resource...
Design and ImplementationThe Architecture of Analysis Farm                       Request                    Users         ...
Design and Implementation      How we tackle the three challenges?• Storage Scalability  •    On line Storage Expansion• C...
Design and Implementation    Address the Storage ScalabilityOn Line Storage Expansion 1.The application servers ask the Ia...
Design and Implementation      Address the Computation ScalabilityMongoDB Scale out 1.The IaaS provides a new  server to t...
Design and Implementation          Address the Query AgilityMongoDB handles ad hoc queries effectively  • Expressive Data ...
Outline• Background• Design and Implementation• Experimental Results• Summary
Experimental Results       Aggregating and Querying• Aggregating Log• Ad hoc Querying        SPEED is our primary focus.
Experimental Results     Experimental Setup for Aggregating• Method •   Aggregate 10-min log with MongoDB MapReduce• Datas...
Experimental Results       Experimental Results for Aggregating                                              RateType     ...
Experimental Results Experimental Setup for ad hoc Querying• Method • Execute ad hoc querying• Dataset • One day’s log rec...
Experimental ResultsExperimental Setup for ad hoc Querying (cont.)  • Query Types   • IP-initial Querysrc_IP == IP   • IP-...
Experimental Results   Experimental Results for IP-initial Query                                             RateTime Scop...
Experimental Results Experimental Results for IP-engaging Query                                             RateTime Scope...
Experimental Results    Experimental Results for IP-pair Query                                             RateTime Scope ...
Outline• Background• Design and Implementation• Experimental Results• Summary
Summary• Analysis Farm is built on OpenStack and  MongoDB• Analysis Farm is easy-to-manage and  easy-to-scale-out• Feasibi...
Acknowledgement• 973 program and NFSC• My partners in Shanghai Jiaotong Univ.• Dr. Lin Gu in HKUST• Workshop organizers an...
Analysis Farm:A Cloud-based Scalable Aggregation and Query Platform for Network Log AnalysisShanghai Jiaotong University  ...
Upcoming SlideShare
Loading in …5
×

D-Cloud 2011 A Cloud-based Scalable Aggregation and Query Platform for Network Log Analysis

822 views

Published on

This is Jianwen WEi's presentation on The 2011 International Workshop on Data Cloud (D-CLOUD 2011).

This presentation introduces a scalable cloud-based network log analysis platform, named Analysis Farm. Analysis Farm fulfills our needs to store and analyze more than 400 million log records every day.

D-Cloud 2011 http://www.cse.ust.hk/~lingu/D-CLOUD/ is held with affliation to 2011 International Conference on Cloud and Service Computing (IEEE CSC 2011) http://csc2011.comp.polyu.edu.hk/ . D-Cloud is held in Hong Kong, on Dec 12.

Email to me if you need a full-length paper. Be sure to introduce yourself in the letter :-)

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
822
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

D-Cloud 2011 A Cloud-based Scalable Aggregation and Query Platform for Network Log Analysis

  1. 1. Analysis Farm: A Cloud-based Scalable Aggregation and Query Platform for Network Log Analysis Jianwen Wei,Yusu Zhao, Kaida Jiang, Rui Xie,Yaohui Jin School of Electronic Information and Electrical Engineering, SJTU Network and Information Center, SJTU Shanghai Jiaotong University wei.jianwen@gmail.com Dec 12th, 2011The 2011 International Workshop on Data Cloud (D-CLOUD 2011), Hong Kong
  2. 2. Outline• Background• Design and Implementation• Experimental Results• Summary
  3. 3. Outline• Background• Design and Implementation• Experimental Results• Summary
  4. 4. BackgroundMotivation: An Overview of SJTU Networks • Serving 50,000 people • 10Gb WDM, MPLS • Network Monitoring
  5. 5. BackgroundMotivation: An Overview of SJTU Networks (cont.) Applications such as BT etc. use too much BORDER bandwidth!
  6. 6. BackgroundDeployment of the Network Log Analysis System 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data) Border Router DPI Syslog Collector Analysis Farm
  7. 7. Background Network Log Analysis System 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data)Border Router DPI Syslog Collector Analysis Farm
  8. 8. Background Network Log Analysis System: Border Router • Handle all incoming and outgoing traffic • Connecting to multiple ISPs • Traffic at 6Gbps 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data)Border Router DPI Syslog Collector Analysis Farm
  9. 9. Background Network Log Analysis System: DPI Engine• Input: 6Gbps raw network traffic• Output: 3MBytes/s syslog messages• Running on an x86 server• Analyze every network session 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data)Border Router DPI Syslog Collector Analysis Farm
  10. 10. Background Network Log Analysis System: Syslog Collector • Java-written syslog collector • Running on a virtual machine • Insertion rate: 5000/s on average, 12000/s at peak 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data)Border Router DPI Syslog Collector Analysis Farm
  11. 11. Background Network Log Analysis System: Analysis Farm • Store log • Analyze log 6Gbps ~3MBytes/s 5000 per sec Syslog Mirrored Traffic (plain text) (Raw Data)Border Router DPI Syslog Collector Analysis Farm
  12. 12. Background Log Analysis Tasks• Aggregating • Get the overall usage of network border• Querying • Inspect network activities http.tcp 1320155721-1320155731 202.120.2.102:54285-8.8.4.4:80 374 24021
  13. 13. Background Log Analysis Tasks400 million log records per day (350GByte) !
  14. 14. Background Research Challenges• Storage Scalability• Computation Scalability• Query Agility
  15. 15. Background Related Work• loggly.com • “Logging as a Service”• Yottaa.com • Log-based Website performance analysis• They use cloud-based solutions for scalability
  16. 16. Outline• Background• Design and Implementation• Experimental Results• Summary
  17. 17. Design and ImplementationOur Approach: Cloud Computing + NoSQL• Cloud Computing • manageable, scalable, on demand resources • OpenStack open source toolset for building clouds• NoSQL (Not Only SQL) • weaken ACID to improve performance • MongoDB document-oriented distributed database
  18. 18. Design and ImplementationThe Architecture of Analysis Farm Request Users mongos Configuration server Application Layer mongod mongod mongod mongod VM VM VM VM IaaS Layer Memory iSCIS Hardware Resource CPU Network Storage Pool
  19. 19. Design and Implementation How we tackle the three challenges?• Storage Scalability • On line Storage Expansion• Computation Scalability • MongoDB Scale out• Query Agility • MongoDB Handles ad hoc queries effectively
  20. 20. Design and Implementation Address the Storage ScalabilityOn Line Storage Expansion 1.The application servers ask the IaaS layer for more disk space. 2.The IaaS layer asks the hardware resource pool to attach new block devices. 3.The application servers execute on line filesystem expansion. No service interruption
  21. 21. Design and Implementation Address the Computation ScalabilityMongoDB Scale out 1.The IaaS provides a new server to the cluster. MapReduce 2.The MongoDB cluster Request rebalances data automatically. combiner mongos No service interruption mapper, combiner mapper, combiner mapper, combiner mapper, combiner mongod mongod mongod mongod
  22. 22. Design and Implementation Address the Query AgilityMongoDB handles ad hoc queries effectively • Expressive Data Model • Building Blocks for Compound Queries • Aggregating tools such as Group, MapReduce • Effective Optimization Methods, such as index
  23. 23. Outline• Background• Design and Implementation• Experimental Results• Summary
  24. 24. Experimental Results Aggregating and Querying• Aggregating Log• Ad hoc Querying SPEED is our primary focus.
  25. 25. Experimental Results Experimental Setup for Aggregating• Method • Aggregate 10-min log with MongoDB MapReduce• Dataset • One day’s log records, ~400million records• Configurations for Comparison • 1x farm: 4 mongod threads on a single server • 4x farm: 4 mongod threads on four servers • 8x farm: 8 mongod threads on eight servers
  26. 26. Experimental Results Experimental Results for Aggregating RateType Records Processed Time (records/s)1x 3201454 523s 61194x 3103742 200s 155688x 3317013 111s 29883 Experimental Results for 10-minute Log Aggregating
  27. 27. Experimental Results Experimental Setup for ad hoc Querying• Method • Execute ad hoc querying• Dataset • One day’s log records, ~400million records• Index • (start_t, end_t, src_IP, dst_IP, app)• Configuration for Analysis Farm • 8x farm: 8 mongod threads on eight servers
  28. 28. Experimental ResultsExperimental Setup for ad hoc Querying (cont.) • Query Types • IP-initial Querysrc_IP == IP • IP-engaging Query src_IP == IP OR dst_IP == IP • IP-pair Query IP-pair engaging AND app == HTTP • Time Scopes • 10 minutes, 30 minutes, 60 minutes
  29. 29. Experimental Results Experimental Results for IP-initial Query RateTime Scope Execution Time Records Scanned (records/s) 10min 3.085s 227581 73770 30min 8.816s 643259 72965 60min 18.517s 1370443 73795 Experimental Results for IP-initial Query (src_IP == IP)
  30. 30. Experimental Results Experimental Results for IP-engaging Query RateTime Scope Execution Time Records Scanned (records/s) 10min 18.012s 1234582 68542 30min 54.708s 3673304 67144 60min 119.034s 7912644 66474 Experimental Results for IP-engaging Query (src_IP == IP OR dst_IP == IP)
  31. 31. Experimental Results Experimental Results for IP-pair Query RateTime Scope Execution Time Records Scanned (records/s) 10min 5.670s 296772 52340 30min 6.267s 324813 51829 60min 19.327s 1027513 53165 Experimental Results for IP-pair Query (the IP-pair engages AND app == http)
  32. 32. Outline• Background• Design and Implementation• Experimental Results• Summary
  33. 33. Summary• Analysis Farm is built on OpenStack and MongoDB• Analysis Farm is easy-to-manage and easy-to-scale-out• Feasibility in aggregating and querying is verified• We use Analysis Farm to analyze 400 million, or 350GB log records every day
  34. 34. Acknowledgement• 973 program and NFSC• My partners in Shanghai Jiaotong Univ.• Dr. Lin Gu in HKUST• Workshop organizers and reviewers
  35. 35. Analysis Farm:A Cloud-based Scalable Aggregation and Query Platform for Network Log AnalysisShanghai Jiaotong University wei.jianwen@gmail.com @JianwenWEI Thank you! The 2011 International Workshop on Data Cloud (D-CLOUD 2011), Hong Kong

×