SlideShare a Scribd company logo
1 of 41
1
Weiyi Shang
http://users.encs.concordia.ca/~shang/
Improving State-of-the-art
Log Analytics
Infrastructures
Software Engineering for Ultra-large-
scale Systems
Amazon’s massive AWS outage on
Feb. 28th 2017 took more than 4
hours to recover.
2
Logs are one of the only
resources of information
Operator Developer
3
4
Make
logging
decisions
Log processing
apps
System
issues
Report
RegEx
[^…]
Release
Produce at run-time
How to
analyze
logs
How to
make
loggin
g
statem
ents?
5
Logging
infrastructure
6
Make
logging
decisions
Log processing
apps
System
issues
Report
RegEx
[^…]
Release
Produce at run-time
Compression Parsing
7
Make
logging
decisions
Log processing
apps
System
issues
Report
RegEx
[^…]
Release
Produce at run-time
Compression Parsing
Logzip: Extracting Hidden
Structures via Iterative
Clustering for Log Compression
8
Workflow of Logzip
Ref:Liu,J
inyang,et al."Logzip:Extracting Hidden S
tructures via Iterative Clustering for Log Compression." 2019 34th IEEE/ACM International
Conference onAutomated S
oftware Engineering (AS
E).IEEE,2019.
Optimized if logs are
in big sizes!
Logs are typically stored in small
blocks
Log File
Split
Log Blocks
Compressed
Log Blocks
Compress Time/Size-based log rolling
16KB/60KB 128KB
256KB
384KB ~ 1024KB
64KB 64KB
Logzip does not perform well on
small log blocks.
The compression ratios of Logzip are 4% to 98% (by a median
of 63%) of the compression ratio without it.
• Do not have enough data to accurately extract template
• Not enough repetitiveness
• Preprocessing largely impact speed (up to 42s to
compress a 128KB log block)
• Inter-file repetitiveness not used
Initial investigation on log data
• T1: Identical tokens: Tokens with the same information (e.g., Year
component).
We observe 4 types of repetitiveness from the non-
content part of our selected log data.
• T2: Similar numeric tokens: Long & numeric tokens (e.g., Timestamp).
• T3: Repetitive tokens: Few tokens repeating a lot. (e.g., Log level)
• T4: Tokens with common prefix string: Tokens start with the same
information (e.g., Module).
H1: Extract identical tokens: Extract the identical token and its number of occurrences.
H2: Delta encoding for numbers: Save the delta between the current token and its prior
token (first token preserved).
H3: Build dictionary for repetitive tokens: Build a dictionary for each identical token and
replace tokens with their indexes.
H4: Extract common prefix string: Save the prefix string and store the remaining part of
each token.
Design of our preprocessing
approach: LogBlock
We do not perform extra information reduction steps to log content
part for compression performance concern.
An example of preprocessing heuristics
LogBlock’s preprocessing example
Our approach improves the compression ratio by a median of
5%, 9%, 15% and 21% on 16KB, 32KB, 64KB, and 128KB blocks
in comparison to compression without any preprocessing.
LogBlock improves the
compression ratio on small log
blocks Our approach is 31.0 to 50.1 times
faster than Logzip in preprocessing
and compressing small-sized log
blocks.
15
Make
logging
decisions
Log processing
apps
System
issues
Report
RegEx
[^…]
Release
Produce at run-time
Compression Parsing
Log Parsing
16
logInfo("Found block $blockId locally")
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally
Timestamp: 17/06/09 20:11:11; Level: INFO
Logger: storage.BlockManager
Static template: Found block <*> locally
Dynamic variable(s): rdd_42_20
Generate
Contain
Automated log parsing suffers from low efficiency
17
Efficiency is an important concern for log
parsing
The main idea of Logram
18
Raw log
(Unstructured)
Found block rdd_42_20 locally
Found block rdd_42_22 locally
Found block rdd_42_23 locally
Found block rdd_42_24 locally
19
Raw log
(Unstructured)
Found block rdd_42_20 locally
Found block rdd_42_22 locally
Found block rdd_42_23 locally
Found block rdd_42_24 locally
static dynamic
The main idea of Logram
20
Raw log
(Unstructured)
Found block rdd_42_20 locally
Found block rdd_42_22 locally
Found block rdd_42_23 locally
Found block rdd_42_24 locally
The goal of log parsing is to identify whether a token
is a static token or a dynamic token
Each static token has a higher number
of appearance.
Token “Found” appears 4 times.
Each dynamic token has a lower number of
appearance.
Token “rdd_42_20” appears only once.
The main idea of Logram
21
Raw log
(Unstructured)
Found block rdd_42_20 locally
Found block rdd_42_22 locally
Found block rdd_42_23 locally
Found block rdd_42_24 locally
Each static token has a higher
number of appearance.
Token “Found” appears 4 times.
Each dynamic token has a lower
number of appearance.
Token “rdd_42_20” appears only
once.
We use the number of appearances to
distinguish static and dynamic tokens.
The main idea of Logram
22
Raw log
(Unstructured)
Expecting attribute name [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM]
Failed to get next element [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM]
A dynamic token may also appear
frequently.
The main idea of Logram
If we consider 3-grams instead of individual token,
each 3-gram only appear once.
23
Raw log
(Unstructured)
Expecting attribute name [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM]
Failed to get next element [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM]
A dynamic token may also appear frequently.
The main idea of Logram
24
17/06/09 20:10:46 INFO rdd.HadoopRDD: Input split:
hdfs://hostname/2kSOSP.log:29168+7292
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_22 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_23 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_24 locally
Step 1: Dictionary Setup for n-grams
25
17/06/09 20:10:46 INFO rdd.HadoopRDD: Input split:
hdfs://hostname/2kSOSP.log:29168+7292
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_22 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_23 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_24 locally
Header Content
Step 1: Dictionary Setup for n-grams
Input split: hdfs://hostname/2kSOSP.log:29168+7292
Found block rdd_42_20 locally
Found block rdd_42_22 locally
Found block rdd_42_23 locally
Found block rdd_42_24 locally
26
Found block rdd_42_20 locally
Step 1: Dictionary Setup for n-grams
27
Found block rdd_42_20 locally
Generate 2-gram
Found block block rdd_42_20
Step 1: Dictionary Setup for n-grams
28
Found block rdd_42_20 locally
Generate 3-gram
Found block rdd_42_20 block rdd_42_20 locally
Found block rdd_42_20 locally
Generate 2-gram
Found block block rdd_42_20
Step 1: Dictionary Setup for n-grams
29
3-grams #
Input split: hdfs://hostname/2kSOSP.log:21876+7292
split: hdfs://hostname/2kSOSP.log:21876+7292 Input
hdfs://hostname/2kSOSP.log:21876+7292 Input split:
…
split: hdfs://hostname/2kSOSP.log:29168+7292 Found
hdfs://hostname/2kSOSP.log:29168+7292 Found block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
...
1
1
1
1
1
1
1
1
3
2-grams #
Input split:
split: hdfs://hostname/2kSOSP.log:21876+7292
hdfs://hostname/2kSOSP.log:21876+7292 Input
…
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
...
5
1
1
1
4
1
1
4
Step 1: Dictionary Setup for n-grams
Input split: hdfs://hostname/2kSOSP.log:29168+7292
Found block rdd_42_20 locally
Found block rdd_42_22 locally
30
Step 2: Parsing logs with n-gram
dictionaries
3-grams # appearance
split: hdfs://hostname/2kSOSP.log:29168+7292
Found
hdfs://hostname/2kSOSP.log:29168+7292 Found
block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
1
1
1
1
1
5
31
Found block rdd_42_20 locally
Look up
Found block rdd_42_20 block rdd_42_20 locally
Step 2: Parsing logs with n-gram dictionaries
3-grams # appearance
split: hdfs://hostname/2kSOSP.log:29168+7292
Found
hdfs://hostname/2kSOSP.log:29168+7292 Found
block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
1
1
1
1
1
5
Both 3-grams may contain dynamic values since
their appearances are only 1. 32
32
Found block rdd_42_20 locally
Look up
Found block rdd_42_20 block rdd_42_20 locally
Step 2: Parsing logs with n-gram dictionaries
3-grams # appearance
split: hdfs://hostname/2kSOSP.log:29168+7292 Found
hdfs://hostname/2kSOSP.log:29168+7292 Found block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
1
1
1
1
1
5
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4 33
Look up
Step 2: Parsing logs with n-gram dictionaries
3-grams # appearance
split: hdfs://hostname/2kSOSP.log:29168+7292 Found
hdfs://hostname/2kSOSP.log:29168+7292 Found block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
1
1
1
1
1
5
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4 34
Look up
This 2-gram contains only static tokens.
Step 2: Parsing logs with n-gram dictionaries
3-grams # appearance
split: hdfs://hostname/2kSOSP.log:29168+7292 Found
hdfs://hostname/2kSOSP.log:29168+7292 Found block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
1
1
1
1
1
5
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4 35
Look up
These 2-grams may contain dynamic tokens.
Step 2: Parsing logs with n-gram dictionaries
block rdd_42_20
rdd_42_20 locally
Finding overlapping
token
36
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4
Step 2: Parsing logs with n-gram dictionaries
block rdd_42_20
rdd_42_20 locally
Finding overlapping
token
Dynamic value 37
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4
Step 2: Parsing logs with n-gram dictionaries
block rdd_42_20
rdd_42_20 locally
Found block $1 locally
$1=rdd_42_20
Finding overlapping
token
Generating
template
Dynamic value 38
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4
Step 2: Parsing logs with n-gram dictionaries
Evaluation
39
Accuracy Efficiency Stabilisation Scalability
40
Average accuracy
Drain AEL Lenma Spell IPLoM
percent
age(%)
Logram
Logram achieves stable
parsing results using a
dictionary generated from
a small portion of log data
Logram achieves
near-linear
scalability
without
sacrificing
parsing accuracy.
41
http://users.encs.concordia.ca/~shang/
Weiyi Shang

More Related Content

Similar to SEMLA_logging_infra

Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DMithun Hunsur
 
Fantastic caches and where to find them
Fantastic caches and where to find themFantastic caches and where to find them
Fantastic caches and where to find themAlexey Tokar
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2Itamar Haber
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linuxPavel Klimiankou
 
Hierarchical free monads and software design in fp
Hierarchical free monads and software design in fpHierarchical free monads and software design in fp
Hierarchical free monads and software design in fpAlexander Granin
 
Redis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamRedis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamCodemotion
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRoberto Franchini
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathDennis Chung
 
Building Your First App with Shawn Mcarthy
Building Your First App with Shawn Mcarthy Building Your First App with Shawn Mcarthy
Building Your First App with Shawn Mcarthy MongoDB
 
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...Felipe Prado
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideLinaro
 
Troubleshooting real production problems
Troubleshooting real production problemsTroubleshooting real production problems
Troubleshooting real production problemsTier1 app
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockScyllaDB
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performancesource{d}
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...VMware Tanzu
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo dbAmit Thakkar
 
Eric Lafortune - Fighting application size with ProGuard and beyond
Eric Lafortune - Fighting application size with ProGuard and beyondEric Lafortune - Fighting application size with ProGuard and beyond
Eric Lafortune - Fighting application size with ProGuard and beyondGuardSquare
 

Similar to SEMLA_logging_infra (20)

Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
 
Fantastic caches and where to find them
Fantastic caches and where to find themFantastic caches and where to find them
Fantastic caches and where to find them
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
 
Hierarchical free monads and software design in fp
Hierarchical free monads and software design in fpHierarchical free monads and software design in fp
Hierarchical free monads and software design in fp
 
Redis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamRedis - for duplicate detection on real time stream
Redis - for duplicate detection on real time stream
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainath
 
Building Your First App with Shawn Mcarthy
Building Your First App with Shawn Mcarthy Building Your First App with Shawn Mcarthy
Building Your First App with Shawn Mcarthy
 
From logs to metrics
From logs to metricsFrom logs to metrics
From logs to metrics
 
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
Troubleshooting real production problems
Troubleshooting real production problemsTroubleshooting real production problems
Troubleshooting real production problems
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performance
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
 
Restfs internals
Restfs internalsRestfs internals
Restfs internals
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
 
Eric Lafortune - Fighting application size with ProGuard and beyond
Eric Lafortune - Fighting application size with ProGuard and beyondEric Lafortune - Fighting application size with ProGuard and beyond
Eric Lafortune - Fighting application size with ProGuard and beyond
 

More from swy351

Msr2016 tarek
Msr2016 tarek Msr2016 tarek
Msr2016 tarek swy351
 
MSR 2015
MSR 2015MSR 2015
MSR 2015swy351
 
MSR 2009
MSR 2009MSR 2009
MSR 2009swy351
 
ASE2010
ASE2010ASE2010
ASE2010swy351
 
WCRE2011
WCRE2011WCRE2011
WCRE2011swy351
 
ICSE2013
ICSE2013ICSE2013
ICSE2013swy351
 
ICSE2014
ICSE2014ICSE2014
ICSE2014swy351
 
ICSME2014
ICSME2014ICSME2014
ICSME2014swy351
 
ICPE2015
ICPE2015ICPE2015
ICPE2015swy351
 

More from swy351 (9)

Msr2016 tarek
Msr2016 tarek Msr2016 tarek
Msr2016 tarek
 
MSR 2015
MSR 2015MSR 2015
MSR 2015
 
MSR 2009
MSR 2009MSR 2009
MSR 2009
 
ASE2010
ASE2010ASE2010
ASE2010
 
WCRE2011
WCRE2011WCRE2011
WCRE2011
 
ICSE2013
ICSE2013ICSE2013
ICSE2013
 
ICSE2014
ICSE2014ICSE2014
ICSE2014
 
ICSME2014
ICSME2014ICSME2014
ICSME2014
 
ICPE2015
ICPE2015ICPE2015
ICPE2015
 

Recently uploaded

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 

Recently uploaded (20)

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 

SEMLA_logging_infra

  • 2. Software Engineering for Ultra-large- scale Systems Amazon’s massive AWS outage on Feb. 28th 2017 took more than 4 hours to recover. 2
  • 3. Logs are one of the only resources of information Operator Developer 3
  • 4. 4 Make logging decisions Log processing apps System issues Report RegEx [^…] Release Produce at run-time How to analyze logs How to make loggin g statem ents?
  • 8. Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression 8 Workflow of Logzip Ref:Liu,J inyang,et al."Logzip:Extracting Hidden S tructures via Iterative Clustering for Log Compression." 2019 34th IEEE/ACM International Conference onAutomated S oftware Engineering (AS E).IEEE,2019. Optimized if logs are in big sizes!
  • 9. Logs are typically stored in small blocks Log File Split Log Blocks Compressed Log Blocks Compress Time/Size-based log rolling 16KB/60KB 128KB 256KB 384KB ~ 1024KB 64KB 64KB
  • 10. Logzip does not perform well on small log blocks. The compression ratios of Logzip are 4% to 98% (by a median of 63%) of the compression ratio without it. • Do not have enough data to accurately extract template • Not enough repetitiveness • Preprocessing largely impact speed (up to 42s to compress a 128KB log block) • Inter-file repetitiveness not used
  • 11. Initial investigation on log data • T1: Identical tokens: Tokens with the same information (e.g., Year component). We observe 4 types of repetitiveness from the non- content part of our selected log data. • T2: Similar numeric tokens: Long & numeric tokens (e.g., Timestamp). • T3: Repetitive tokens: Few tokens repeating a lot. (e.g., Log level) • T4: Tokens with common prefix string: Tokens start with the same information (e.g., Module). H1: Extract identical tokens: Extract the identical token and its number of occurrences. H2: Delta encoding for numbers: Save the delta between the current token and its prior token (first token preserved). H3: Build dictionary for repetitive tokens: Build a dictionary for each identical token and replace tokens with their indexes. H4: Extract common prefix string: Save the prefix string and store the remaining part of each token.
  • 12. Design of our preprocessing approach: LogBlock We do not perform extra information reduction steps to log content part for compression performance concern.
  • 13. An example of preprocessing heuristics LogBlock’s preprocessing example
  • 14. Our approach improves the compression ratio by a median of 5%, 9%, 15% and 21% on 16KB, 32KB, 64KB, and 128KB blocks in comparison to compression without any preprocessing. LogBlock improves the compression ratio on small log blocks Our approach is 31.0 to 50.1 times faster than Logzip in preprocessing and compressing small-sized log blocks.
  • 16. Log Parsing 16 logInfo("Found block $blockId locally") 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally Timestamp: 17/06/09 20:11:11; Level: INFO Logger: storage.BlockManager Static template: Found block <*> locally Dynamic variable(s): rdd_42_20 Generate Contain
  • 17. Automated log parsing suffers from low efficiency 17 Efficiency is an important concern for log parsing
  • 18. The main idea of Logram 18 Raw log (Unstructured) Found block rdd_42_20 locally Found block rdd_42_22 locally Found block rdd_42_23 locally Found block rdd_42_24 locally
  • 19. 19 Raw log (Unstructured) Found block rdd_42_20 locally Found block rdd_42_22 locally Found block rdd_42_23 locally Found block rdd_42_24 locally static dynamic The main idea of Logram
  • 20. 20 Raw log (Unstructured) Found block rdd_42_20 locally Found block rdd_42_22 locally Found block rdd_42_23 locally Found block rdd_42_24 locally The goal of log parsing is to identify whether a token is a static token or a dynamic token Each static token has a higher number of appearance. Token “Found” appears 4 times. Each dynamic token has a lower number of appearance. Token “rdd_42_20” appears only once. The main idea of Logram
  • 21. 21 Raw log (Unstructured) Found block rdd_42_20 locally Found block rdd_42_22 locally Found block rdd_42_23 locally Found block rdd_42_24 locally Each static token has a higher number of appearance. Token “Found” appears 4 times. Each dynamic token has a lower number of appearance. Token “rdd_42_20” appears only once. We use the number of appearances to distinguish static and dynamic tokens. The main idea of Logram
  • 22. 22 Raw log (Unstructured) Expecting attribute name [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM] Failed to get next element [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM] A dynamic token may also appear frequently. The main idea of Logram
  • 23. If we consider 3-grams instead of individual token, each 3-gram only appear once. 23 Raw log (Unstructured) Expecting attribute name [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM] Failed to get next element [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM] A dynamic token may also appear frequently. The main idea of Logram
  • 24. 24 17/06/09 20:10:46 INFO rdd.HadoopRDD: Input split: hdfs://hostname/2kSOSP.log:29168+7292 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_22 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_23 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_24 locally Step 1: Dictionary Setup for n-grams
  • 25. 25 17/06/09 20:10:46 INFO rdd.HadoopRDD: Input split: hdfs://hostname/2kSOSP.log:29168+7292 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_22 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_23 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_24 locally Header Content Step 1: Dictionary Setup for n-grams Input split: hdfs://hostname/2kSOSP.log:29168+7292 Found block rdd_42_20 locally Found block rdd_42_22 locally Found block rdd_42_23 locally Found block rdd_42_24 locally
  • 26. 26 Found block rdd_42_20 locally Step 1: Dictionary Setup for n-grams
  • 27. 27 Found block rdd_42_20 locally Generate 2-gram Found block block rdd_42_20 Step 1: Dictionary Setup for n-grams
  • 28. 28 Found block rdd_42_20 locally Generate 3-gram Found block rdd_42_20 block rdd_42_20 locally Found block rdd_42_20 locally Generate 2-gram Found block block rdd_42_20 Step 1: Dictionary Setup for n-grams
  • 29. 29 3-grams # Input split: hdfs://hostname/2kSOSP.log:21876+7292 split: hdfs://hostname/2kSOSP.log:21876+7292 Input hdfs://hostname/2kSOSP.log:21876+7292 Input split: … split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block ... 1 1 1 1 1 1 1 1 3 2-grams # Input split: split: hdfs://hostname/2kSOSP.log:21876+7292 hdfs://hostname/2kSOSP.log:21876+7292 Input … hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found ... 5 1 1 1 4 1 1 4 Step 1: Dictionary Setup for n-grams
  • 30. Input split: hdfs://hostname/2kSOSP.log:29168+7292 Found block rdd_42_20 locally Found block rdd_42_22 locally 30 Step 2: Parsing logs with n-gram dictionaries
  • 31. 3-grams # appearance split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block 1 1 1 1 1 5 31 Found block rdd_42_20 locally Look up Found block rdd_42_20 block rdd_42_20 locally Step 2: Parsing logs with n-gram dictionaries
  • 32. 3-grams # appearance split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block 1 1 1 1 1 5 Both 3-grams may contain dynamic values since their appearances are only 1. 32 32 Found block rdd_42_20 locally Look up Found block rdd_42_20 block rdd_42_20 locally Step 2: Parsing logs with n-gram dictionaries
  • 33. 3-grams # appearance split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block 1 1 1 1 1 5 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 33 Look up Step 2: Parsing logs with n-gram dictionaries
  • 34. 3-grams # appearance split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block 1 1 1 1 1 5 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 34 Look up This 2-gram contains only static tokens. Step 2: Parsing logs with n-gram dictionaries
  • 35. 3-grams # appearance split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block 1 1 1 1 1 5 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 35 Look up These 2-grams may contain dynamic tokens. Step 2: Parsing logs with n-gram dictionaries
  • 36. block rdd_42_20 rdd_42_20 locally Finding overlapping token 36 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 Step 2: Parsing logs with n-gram dictionaries
  • 37. block rdd_42_20 rdd_42_20 locally Finding overlapping token Dynamic value 37 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 Step 2: Parsing logs with n-gram dictionaries
  • 38. block rdd_42_20 rdd_42_20 locally Found block $1 locally $1=rdd_42_20 Finding overlapping token Generating template Dynamic value 38 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 Step 2: Parsing logs with n-gram dictionaries
  • 40. 40 Average accuracy Drain AEL Lenma Spell IPLoM percent age(%) Logram Logram achieves stable parsing results using a dictionary generated from a small portion of log data Logram achieves near-linear scalability without sacrificing parsing accuracy.

Editor's Notes

  1. Introduce my self and topic Title large
  2. Field workloads continually change as the user base changes (e.g., as more users use the system), as user feature preferences change (e.g., preferences change from desktop to mobile access), as features are activated or disabled and as the deployment configuration changes (e.g., new servers are added). Changing field workloads may be a major impact on the performance of the system. Therefore, as the field workload change, so must the load test workloads.
  3. Shortcoming: query efficiency Scene2: Store logs that need to be frequently queried.
  4. Sort: breaks the inner-similarity of other components; Record line number, introduce extra non-repetitive information. Text replacement: Need extra processing steps which impact the processing speed.
  5. The goal of log parsing is to extract the static template, dynamic variables, and the header information from a raw log message to a structured format
  6. as the size of logs grows rapidly and the need for low-latency log analysis increases efficiency becomes an important concern for log parsing To increase efficiency, we propose Logram which use n-gram model for log parsing
  7. Here are some raw logs from a software system
  8. The tokens in the blue box are the static parts, the tokens in red box are the dynamic parts
  9. Just like the example in this slide
  10. However, a dynamic token can appear frequently in different log events. Here we will show an example so only depending on the frequency of individual tokens may not be sufficient.
  11. we will use n-gram model to limit the appearance of this kind of dynamic token
  12. At the beginning, we need to preprocess the raw logs
  13. And followed by content part the header parts of logs often follow a common format in the same software system, we can directly use a pre-defined regular expression to obtain this part
  14. After getting the content of each log message, we split the log message into tokens.
  15. After getting the content of each log message, we split the log message into tokens. then, combine the tokens to 2-gram
  16. use the same method
  17. The value is the corresponding appearance
  18. transfer the log to 3-grams, and look up the appearance in 3-gram dictionary
  19. Then, we can find
  20. Transform the candidate 3-grams to 2-grams and look up the appearance in 2-gram dictionary
  21. Find the overlapping part of the candidate 2-grams
  22. Find the overlapping part of the candidate 2-grams
  23. Find the overlapping part of the candidate 2-grams
  24. For evaluation, we evaluate Logram with 16 datasets from LogPai on 4 aspects