An Integrated Framework on Mining  Logs Files for Computing System  Management Tao Li School of Computer Science Florida I...
Agenda <ul><li>Introduction </li></ul><ul><li>System log categorization </li></ul><ul><ul><li>Text mining techniques to ca...
Introduction <ul><li>Traditional approaches for trouble shooting – relay on the knowledge and experience of domain expert....
Introduction (con.) <ul><li>Difficult to perform automated analysis </li></ul><ul><li>Method: </li></ul><ul><ul><li>Catego...
An overview of the integrated framework
System log categorization <ul><li>Common categories </li></ul><ul><ul><li>Base on the CBE (Common Base Event) format estab...
Incorporating the temporal information <ul><li>Two approach:  </li></ul><ul><ul><li>Naive Bayes algorithm </li></ul></ul><...
Mining event relationships - Introduction <ul><li>After log file transformed into common categories, discover interesting ...
Mining event relationships – Notations and problem formulations <ul><li>Temporal patterns: </li></ul><ul><ul><li>he tempor...
Mining event relationships – Discovering t-Patterns <ul><li>Let Ta and Tb be two point processes for event a and b repecen...
Experiments <ul><li>Log Data Generation </li></ul><ul><ul><li>Log files are collected form different machines with differe...
Discover and Visualize Event Relationships
 
Conclusion and Future work <ul><li>Automatically infer the set of common categories from history data. </li></ul><ul><li>T...
Upcoming SlideShare
Loading in …5
×

An Integrated Framework on Mining Logs Files for Computing System Management

439
-1

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
439
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

An Integrated Framework on Mining Logs Files for Computing System Management

  1. 1. An Integrated Framework on Mining Logs Files for Computing System Management Tao Li School of Computer Science Florida International University Miami, FL 33199 [email_address] Wei Peng School of Computer Science Florida International University Miami, FL 33199 [email_address] Feng Liang Insitute of Statistics and Decision Sciences Duke University Durham, NC 27708 [email_address] Sheng Ma Machine Learning for Systems IBM T.J. Watson Research Center Hawthorne, NY 10532 [email_address]
  2. 2. Agenda <ul><li>Introduction </li></ul><ul><li>System log categorization </li></ul><ul><ul><li>Text mining techniques to categorize text message into a set of common categories </li></ul></ul><ul><li>Incorporating the temporal information </li></ul><ul><ul><li>Two approach of incorporating temporal information to improve the categorization performance </li></ul></ul><ul><li>Mining event relationships </li></ul><ul><ul><li>Discovering the relationships between different events </li></ul></ul><ul><li>Experiments </li></ul><ul><li>Conclusion and future work </li></ul>
  3. 3. Introduction <ul><li>Traditional approaches for trouble shooting – relay on the knowledge and experience of domain expert. </li></ul><ul><li>Modern computing system are instrumented to generate huge amount of system log data </li></ul><ul><li>The date in log file describe </li></ul><ul><ul><li>Status of each component </li></ul></ul><ul><ul><li>System operational changes, such as starting and stopping of services </li></ul></ul><ul><ul><li>Detection of network applications </li></ul></ul><ul><ul><li>Software configuration modification </li></ul></ul><ul><ul><li>Software execution errors </li></ul></ul><ul><li>Complicate </li></ul><ul><ul><li>Different device (e.g. routers, processors, adapters) </li></ul></ul><ul><ul><li>Different software component (e.g. OS, middleware, user application) </li></ul></ul><ul><ul><li>Different provider (e.g. Cisco, IBM, Microsoft) </li></ul></ul><ul><li>Different report description </li></ul>
  4. 4. Introduction (con.) <ul><li>Difficult to perform automated analysis </li></ul><ul><li>Method: </li></ul><ul><ul><li>Categorize the text message with disparate formats into common situations. </li></ul></ul><ul><ul><li>Timestamp </li></ul></ul><ul><ul><ul><li>The temporal characteristics provide additional context information of the message. </li></ul></ul></ul><ul><ul><ul><li>Can be used to facilitate date analysis. </li></ul></ul></ul>
  5. 5. An overview of the integrated framework
  6. 6. System log categorization <ul><li>Common categories </li></ul><ul><ul><li>Base on the CBE (Common Base Event) format establish by IBM initiative. </li></ul></ul><ul><ul><li>The set of categories: </li></ul></ul><ul><ul><ul><li>Start, stop, dependency, create, connection, report, request, configuration, and other. </li></ul></ul></ul><ul><li>Message categorization </li></ul><ul><ul><li>Use naive Bayes as classification approach for learning in text categorization </li></ul></ul>
  7. 7. Incorporating the temporal information <ul><li>Two approach: </li></ul><ul><ul><li>Naive Bayes algorithm </li></ul></ul><ul><ul><li>Hidden Markov model </li></ul></ul>
  8. 8. Mining event relationships - Introduction <ul><li>After log file transformed into common categories, discover interesting patterns embedded in the data. </li></ul><ul><li>Try to find the mining temporal patterns through log timestamp. </li></ul><ul><li>Temporal patterns of interest appear in the system management application. </li></ul><ul><li>Sequence of events propagating from origin and low layer to high software layer through the dependency tree. </li></ul><ul><li>Knowing temporal patterns can help to pinpoint the root cause and take proper action. </li></ul>
  9. 9. Mining event relationships – Notations and problem formulations <ul><li>Temporal patterns: </li></ul><ul><ul><li>he temporal patterns assert dependency between events and specify the timing information. Usually, they can be described as “event a happens after event b ,say, about 5 minutes”. </li></ul></ul><ul><ul><li>We refer this type of patterns as t-patterns . </li></ul></ul>
  10. 10. Mining event relationships – Discovering t-Patterns <ul><li>Let Ta and Tb be two point processes for event a and b repecentively. </li></ul><ul><li>The distribution can be interpreted as probability of having event type b within time r. </li></ul>
  11. 11. Experiments <ul><li>Log Data Generation </li></ul><ul><ul><li>Log files are collected form different machines with different OS in the school of computer science at Florida international university. </li></ul></ul><ul><ul><li>Use Logdump2td (NT data collection tool) developed by Event mining team at IBM research center. </li></ul></ul><ul><li>Message Categorization </li></ul>
  12. 12. Discover and Visualize Event Relationships
  13. 14. Conclusion and Future work <ul><li>Automatically infer the set of common categories from history data. </li></ul><ul><li>The number of common categories for can be significantly large. </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×