SlideShare a Scribd company logo
1
Weiyi Shang
http://users.encs.concordia.ca/~shang/
Improving State-of-the-art
Log Analytics
Infrastructures
Software Engineering for Ultra-large-
scale Systems
Amazon’s massive AWS outage on
Feb. 28th 2017 took more than 4
hours to recover.
2
Logs are one of the only
resources of information
Operator Developer
3
4
Make
logging
decisions
Log processing
apps
System
issues
Report
RegEx
[^…]
Release
Produce at run-time
How to
analyze
logs
How to
make
loggin
g
statem
ents?
5
Logging
infrastructure
6
Make
logging
decisions
Log processing
apps
System
issues
Report
RegEx
[^…]
Release
Produce at run-time
Compression Parsing
7
Make
logging
decisions
Log processing
apps
System
issues
Report
RegEx
[^…]
Release
Produce at run-time
Compression Parsing
Logzip: Extracting Hidden
Structures via Iterative
Clustering for Log Compression
8
Workflow of Logzip
Ref:Liu,J
inyang,et al."Logzip:Extracting Hidden S
tructures via Iterative Clustering for Log Compression." 2019 34th IEEE/ACM International
Conference onAutomated S
oftware Engineering (AS
E).IEEE,2019.
Optimized if logs are
in big sizes!
Logs are typically stored in small
blocks
Log File
Split
Log Blocks
Compressed
Log Blocks
Compress Time/Size-based log rolling
16KB/60KB 128KB
256KB
384KB ~ 1024KB
64KB 64KB
Logzip does not perform well on
small log blocks.
The compression ratios of Logzip are 4% to 98% (by a median
of 63%) of the compression ratio without it.
• Do not have enough data to accurately extract template
• Not enough repetitiveness
• Preprocessing largely impact speed (up to 42s to
compress a 128KB log block)
• Inter-file repetitiveness not used
Initial investigation on log data
• T1: Identical tokens: Tokens with the same information (e.g., Year
component).
We observe 4 types of repetitiveness from the non-
content part of our selected log data.
• T2: Similar numeric tokens: Long & numeric tokens (e.g., Timestamp).
• T3: Repetitive tokens: Few tokens repeating a lot. (e.g., Log level)
• T4: Tokens with common prefix string: Tokens start with the same
information (e.g., Module).
H1: Extract identical tokens: Extract the identical token and its number of occurrences.
H2: Delta encoding for numbers: Save the delta between the current token and its prior
token (first token preserved).
H3: Build dictionary for repetitive tokens: Build a dictionary for each identical token and
replace tokens with their indexes.
H4: Extract common prefix string: Save the prefix string and store the remaining part of
each token.
Design of our preprocessing
approach: LogBlock
We do not perform extra information reduction steps to log content
part for compression performance concern.
An example of preprocessing heuristics
LogBlock’s preprocessing example
Our approach improves the compression ratio by a median of
5%, 9%, 15% and 21% on 16KB, 32KB, 64KB, and 128KB blocks
in comparison to compression without any preprocessing.
LogBlock improves the
compression ratio on small log
blocks Our approach is 31.0 to 50.1 times
faster than Logzip in preprocessing
and compressing small-sized log
blocks.
15
Make
logging
decisions
Log processing
apps
System
issues
Report
RegEx
[^…]
Release
Produce at run-time
Compression Parsing
Log Parsing
16
logInfo("Found block $blockId locally")
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally
Timestamp: 17/06/09 20:11:11; Level: INFO
Logger: storage.BlockManager
Static template: Found block <*> locally
Dynamic variable(s): rdd_42_20
Generate
Contain
Automated log parsing suffers from low efficiency
17
Efficiency is an important concern for log
parsing
The main idea of Logram
18
Raw log
(Unstructured)
Found block rdd_42_20 locally
Found block rdd_42_22 locally
Found block rdd_42_23 locally
Found block rdd_42_24 locally
19
Raw log
(Unstructured)
Found block rdd_42_20 locally
Found block rdd_42_22 locally
Found block rdd_42_23 locally
Found block rdd_42_24 locally
static dynamic
The main idea of Logram
20
Raw log
(Unstructured)
Found block rdd_42_20 locally
Found block rdd_42_22 locally
Found block rdd_42_23 locally
Found block rdd_42_24 locally
The goal of log parsing is to identify whether a token
is a static token or a dynamic token
Each static token has a higher number
of appearance.
Token “Found” appears 4 times.
Each dynamic token has a lower number of
appearance.
Token “rdd_42_20” appears only once.
The main idea of Logram
21
Raw log
(Unstructured)
Found block rdd_42_20 locally
Found block rdd_42_22 locally
Found block rdd_42_23 locally
Found block rdd_42_24 locally
Each static token has a higher
number of appearance.
Token “Found” appears 4 times.
Each dynamic token has a lower
number of appearance.
Token “rdd_42_20” appears only
once.
We use the number of appearances to
distinguish static and dynamic tokens.
The main idea of Logram
22
Raw log
(Unstructured)
Expecting attribute name [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM]
Failed to get next element [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM]
A dynamic token may also appear
frequently.
The main idea of Logram
If we consider 3-grams instead of individual token,
each 3-gram only appear once.
23
Raw log
(Unstructured)
Expecting attribute name [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM]
Failed to get next element [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM]
A dynamic token may also appear frequently.
The main idea of Logram
24
17/06/09 20:10:46 INFO rdd.HadoopRDD: Input split:
hdfs://hostname/2kSOSP.log:29168+7292
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_22 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_23 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_24 locally
Step 1: Dictionary Setup for n-grams
25
17/06/09 20:10:46 INFO rdd.HadoopRDD: Input split:
hdfs://hostname/2kSOSP.log:29168+7292
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_22 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_23 locally
17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_24 locally
Header Content
Step 1: Dictionary Setup for n-grams
Input split: hdfs://hostname/2kSOSP.log:29168+7292
Found block rdd_42_20 locally
Found block rdd_42_22 locally
Found block rdd_42_23 locally
Found block rdd_42_24 locally
26
Found block rdd_42_20 locally
Step 1: Dictionary Setup for n-grams
27
Found block rdd_42_20 locally
Generate 2-gram
Found block block rdd_42_20
Step 1: Dictionary Setup for n-grams
28
Found block rdd_42_20 locally
Generate 3-gram
Found block rdd_42_20 block rdd_42_20 locally
Found block rdd_42_20 locally
Generate 2-gram
Found block block rdd_42_20
Step 1: Dictionary Setup for n-grams
29
3-grams #
Input split: hdfs://hostname/2kSOSP.log:21876+7292
split: hdfs://hostname/2kSOSP.log:21876+7292 Input
hdfs://hostname/2kSOSP.log:21876+7292 Input split:
…
split: hdfs://hostname/2kSOSP.log:29168+7292 Found
hdfs://hostname/2kSOSP.log:29168+7292 Found block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
...
1
1
1
1
1
1
1
1
3
2-grams #
Input split:
split: hdfs://hostname/2kSOSP.log:21876+7292
hdfs://hostname/2kSOSP.log:21876+7292 Input
…
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
...
5
1
1
1
4
1
1
4
Step 1: Dictionary Setup for n-grams
Input split: hdfs://hostname/2kSOSP.log:29168+7292
Found block rdd_42_20 locally
Found block rdd_42_22 locally
30
Step 2: Parsing logs with n-gram
dictionaries
3-grams # appearance
split: hdfs://hostname/2kSOSP.log:29168+7292
Found
hdfs://hostname/2kSOSP.log:29168+7292 Found
block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
1
1
1
1
1
5
31
Found block rdd_42_20 locally
Look up
Found block rdd_42_20 block rdd_42_20 locally
Step 2: Parsing logs with n-gram dictionaries
3-grams # appearance
split: hdfs://hostname/2kSOSP.log:29168+7292
Found
hdfs://hostname/2kSOSP.log:29168+7292 Found
block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
1
1
1
1
1
5
Both 3-grams may contain dynamic values since
their appearances are only 1. 32
32
Found block rdd_42_20 locally
Look up
Found block rdd_42_20 block rdd_42_20 locally
Step 2: Parsing logs with n-gram dictionaries
3-grams # appearance
split: hdfs://hostname/2kSOSP.log:29168+7292 Found
hdfs://hostname/2kSOSP.log:29168+7292 Found block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
1
1
1
1
1
5
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4 33
Look up
Step 2: Parsing logs with n-gram dictionaries
3-grams # appearance
split: hdfs://hostname/2kSOSP.log:29168+7292 Found
hdfs://hostname/2kSOSP.log:29168+7292 Found block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
1
1
1
1
1
5
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4 34
Look up
This 2-gram contains only static tokens.
Step 2: Parsing logs with n-gram dictionaries
3-grams # appearance
split: hdfs://hostname/2kSOSP.log:29168+7292 Found
hdfs://hostname/2kSOSP.log:29168+7292 Found block
Found block rdd_42_20
block rdd_42_20 locally
rdd_42_20 locally Found
locally Found block
1
1
1
1
1
5
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4 35
Look up
These 2-grams may contain dynamic tokens.
Step 2: Parsing logs with n-gram dictionaries
block rdd_42_20
rdd_42_20 locally
Finding overlapping
token
36
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4
Step 2: Parsing logs with n-gram dictionaries
block rdd_42_20
rdd_42_20 locally
Finding overlapping
token
Dynamic value 37
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4
Step 2: Parsing logs with n-gram dictionaries
block rdd_42_20
rdd_42_20 locally
Found block $1 locally
$1=rdd_42_20
Finding overlapping
token
Generating
template
Dynamic value 38
2-grams # appearance
hdfs://hostname/2kSOSP.log:29168+7292 Found
Found block
block rdd_42_20
rdd_42_20 locally
locally Found
1
4
1
1
4
Step 2: Parsing logs with n-gram dictionaries
Evaluation
39
Accuracy Efficiency Stabilisation Scalability
40
Average accuracy
Drain AEL Lenma Spell IPLoM
percent
age(%)
Logram
Logram achieves stable
parsing results using a
dictionary generated from
a small portion of log data
Logram achieves
near-linear
scalability
without
sacrificing
parsing accuracy.
41
http://users.encs.concordia.ca/~shang/
Weiyi Shang

More Related Content

Similar to SEMLA_logging_infra

Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
Mithun Hunsur
 
Fantastic caches and where to find them
Fantastic caches and where to find themFantastic caches and where to find them
Fantastic caches and where to find them
Alexey Tokar
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
Itamar Haber
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
Pavel Klimiankou
 
Hierarchical free monads and software design in fp
Hierarchical free monads and software design in fpHierarchical free monads and software design in fp
Hierarchical free monads and software design in fp
Alexander Granin
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
Roberto Franchini
 
Redis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamRedis - for duplicate detection on real time stream
Redis - for duplicate detection on real time stream
Codemotion
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainath
Dennis Chung
 
Building Your First App with Shawn Mcarthy
Building Your First App with Shawn Mcarthy Building Your First App with Shawn Mcarthy
Building Your First App with Shawn Mcarthy
MongoDB
 
From logs to metrics
From logs to metricsFrom logs to metrics
From logs to metrics
Leonardo Di Donato
 
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
Felipe Prado
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
Linaro
 
Troubleshooting real production problems
Troubleshooting real production problemsTroubleshooting real production problems
Troubleshooting real production problems
Tier1 app
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
ScyllaDB
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performance
source{d}
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
VMware Tanzu
 
Restfs internals
Restfs internalsRestfs internals
Restfs internals
Manfred Furuholmen
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
Amit Thakkar
 
Eric Lafortune - Fighting application size with ProGuard and beyond
Eric Lafortune - Fighting application size with ProGuard and beyondEric Lafortune - Fighting application size with ProGuard and beyond
Eric Lafortune - Fighting application size with ProGuard and beyond
GuardSquare
 

Similar to SEMLA_logging_infra (20)

Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
 
Fantastic caches and where to find them
Fantastic caches and where to find themFantastic caches and where to find them
Fantastic caches and where to find them
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
 
Hierarchical free monads and software design in fp
Hierarchical free monads and software design in fpHierarchical free monads and software design in fp
Hierarchical free monads and software design in fp
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
 
Redis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamRedis - for duplicate detection on real time stream
Redis - for duplicate detection on real time stream
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainath
 
Building Your First App with Shawn Mcarthy
Building Your First App with Shawn Mcarthy Building Your First App with Shawn Mcarthy
Building Your First App with Shawn Mcarthy
 
From logs to metrics
From logs to metricsFrom logs to metrics
From logs to metrics
 
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
DEF CON 27 - workshop - ISAAC EVANS - discover exploit and eradicate entire v...
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
Troubleshooting real production problems
Troubleshooting real production problemsTroubleshooting real production problems
Troubleshooting real production problems
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performance
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
 
Restfs internals
Restfs internalsRestfs internals
Restfs internals
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
 
Eric Lafortune - Fighting application size with ProGuard and beyond
Eric Lafortune - Fighting application size with ProGuard and beyondEric Lafortune - Fighting application size with ProGuard and beyond
Eric Lafortune - Fighting application size with ProGuard and beyond
 

More from swy351

Msr2016 tarek
Msr2016 tarek Msr2016 tarek
Msr2016 tarek
swy351
 
MSR 2015
MSR 2015MSR 2015
MSR 2015
swy351
 
MSR 2009
MSR 2009MSR 2009
MSR 2009
swy351
 
ASE2010
ASE2010ASE2010
ASE2010
swy351
 
WCRE2011
WCRE2011WCRE2011
WCRE2011
swy351
 
ICSE2013
ICSE2013ICSE2013
ICSE2013
swy351
 
ICSE2014
ICSE2014ICSE2014
ICSE2014
swy351
 
ICSME2014
ICSME2014ICSME2014
ICSME2014
swy351
 
ICPE2015
ICPE2015ICPE2015
ICPE2015
swy351
 

More from swy351 (9)

Msr2016 tarek
Msr2016 tarek Msr2016 tarek
Msr2016 tarek
 
MSR 2015
MSR 2015MSR 2015
MSR 2015
 
MSR 2009
MSR 2009MSR 2009
MSR 2009
 
ASE2010
ASE2010ASE2010
ASE2010
 
WCRE2011
WCRE2011WCRE2011
WCRE2011
 
ICSE2013
ICSE2013ICSE2013
ICSE2013
 
ICSE2014
ICSE2014ICSE2014
ICSE2014
 
ICSME2014
ICSME2014ICSME2014
ICSME2014
 
ICPE2015
ICPE2015ICPE2015
ICPE2015
 

Recently uploaded

UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
AnkitaPandya11
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
Rakesh Kumar R
 

Recently uploaded (20)

UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
 

SEMLA_logging_infra

  • 2. Software Engineering for Ultra-large- scale Systems Amazon’s massive AWS outage on Feb. 28th 2017 took more than 4 hours to recover. 2
  • 3. Logs are one of the only resources of information Operator Developer 3
  • 4. 4 Make logging decisions Log processing apps System issues Report RegEx [^…] Release Produce at run-time How to analyze logs How to make loggin g statem ents?
  • 8. Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression 8 Workflow of Logzip Ref:Liu,J inyang,et al."Logzip:Extracting Hidden S tructures via Iterative Clustering for Log Compression." 2019 34th IEEE/ACM International Conference onAutomated S oftware Engineering (AS E).IEEE,2019. Optimized if logs are in big sizes!
  • 9. Logs are typically stored in small blocks Log File Split Log Blocks Compressed Log Blocks Compress Time/Size-based log rolling 16KB/60KB 128KB 256KB 384KB ~ 1024KB 64KB 64KB
  • 10. Logzip does not perform well on small log blocks. The compression ratios of Logzip are 4% to 98% (by a median of 63%) of the compression ratio without it. • Do not have enough data to accurately extract template • Not enough repetitiveness • Preprocessing largely impact speed (up to 42s to compress a 128KB log block) • Inter-file repetitiveness not used
  • 11. Initial investigation on log data • T1: Identical tokens: Tokens with the same information (e.g., Year component). We observe 4 types of repetitiveness from the non- content part of our selected log data. • T2: Similar numeric tokens: Long & numeric tokens (e.g., Timestamp). • T3: Repetitive tokens: Few tokens repeating a lot. (e.g., Log level) • T4: Tokens with common prefix string: Tokens start with the same information (e.g., Module). H1: Extract identical tokens: Extract the identical token and its number of occurrences. H2: Delta encoding for numbers: Save the delta between the current token and its prior token (first token preserved). H3: Build dictionary for repetitive tokens: Build a dictionary for each identical token and replace tokens with their indexes. H4: Extract common prefix string: Save the prefix string and store the remaining part of each token.
  • 12. Design of our preprocessing approach: LogBlock We do not perform extra information reduction steps to log content part for compression performance concern.
  • 13. An example of preprocessing heuristics LogBlock’s preprocessing example
  • 14. Our approach improves the compression ratio by a median of 5%, 9%, 15% and 21% on 16KB, 32KB, 64KB, and 128KB blocks in comparison to compression without any preprocessing. LogBlock improves the compression ratio on small log blocks Our approach is 31.0 to 50.1 times faster than Logzip in preprocessing and compressing small-sized log blocks.
  • 16. Log Parsing 16 logInfo("Found block $blockId locally") 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally Timestamp: 17/06/09 20:11:11; Level: INFO Logger: storage.BlockManager Static template: Found block <*> locally Dynamic variable(s): rdd_42_20 Generate Contain
  • 17. Automated log parsing suffers from low efficiency 17 Efficiency is an important concern for log parsing
  • 18. The main idea of Logram 18 Raw log (Unstructured) Found block rdd_42_20 locally Found block rdd_42_22 locally Found block rdd_42_23 locally Found block rdd_42_24 locally
  • 19. 19 Raw log (Unstructured) Found block rdd_42_20 locally Found block rdd_42_22 locally Found block rdd_42_23 locally Found block rdd_42_24 locally static dynamic The main idea of Logram
  • 20. 20 Raw log (Unstructured) Found block rdd_42_20 locally Found block rdd_42_22 locally Found block rdd_42_23 locally Found block rdd_42_24 locally The goal of log parsing is to identify whether a token is a static token or a dynamic token Each static token has a higher number of appearance. Token “Found” appears 4 times. Each dynamic token has a lower number of appearance. Token “rdd_42_20” appears only once. The main idea of Logram
  • 21. 21 Raw log (Unstructured) Found block rdd_42_20 locally Found block rdd_42_22 locally Found block rdd_42_23 locally Found block rdd_42_24 locally Each static token has a higher number of appearance. Token “Found” appears 4 times. Each dynamic token has a lower number of appearance. Token “rdd_42_20” appears only once. We use the number of appearances to distinguish static and dynamic tokens. The main idea of Logram
  • 22. 22 Raw log (Unstructured) Expecting attribute name [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM] Failed to get next element [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM] A dynamic token may also appear frequently. The main idea of Logram
  • 23. If we consider 3-grams instead of individual token, each 3-gram only appear once. 23 Raw log (Unstructured) Expecting attribute name [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM] Failed to get next element [0x800f080d - CBS_E_MANIFEST_INVALID_ITEM] A dynamic token may also appear frequently. The main idea of Logram
  • 24. 24 17/06/09 20:10:46 INFO rdd.HadoopRDD: Input split: hdfs://hostname/2kSOSP.log:29168+7292 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_22 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_23 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_24 locally Step 1: Dictionary Setup for n-grams
  • 25. 25 17/06/09 20:10:46 INFO rdd.HadoopRDD: Input split: hdfs://hostname/2kSOSP.log:29168+7292 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_20 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_22 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_23 locally 17/06/09 20:11:11 INFO storage.BlockManager: Found block rdd_42_24 locally Header Content Step 1: Dictionary Setup for n-grams Input split: hdfs://hostname/2kSOSP.log:29168+7292 Found block rdd_42_20 locally Found block rdd_42_22 locally Found block rdd_42_23 locally Found block rdd_42_24 locally
  • 26. 26 Found block rdd_42_20 locally Step 1: Dictionary Setup for n-grams
  • 27. 27 Found block rdd_42_20 locally Generate 2-gram Found block block rdd_42_20 Step 1: Dictionary Setup for n-grams
  • 28. 28 Found block rdd_42_20 locally Generate 3-gram Found block rdd_42_20 block rdd_42_20 locally Found block rdd_42_20 locally Generate 2-gram Found block block rdd_42_20 Step 1: Dictionary Setup for n-grams
  • 29. 29 3-grams # Input split: hdfs://hostname/2kSOSP.log:21876+7292 split: hdfs://hostname/2kSOSP.log:21876+7292 Input hdfs://hostname/2kSOSP.log:21876+7292 Input split: … split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block ... 1 1 1 1 1 1 1 1 3 2-grams # Input split: split: hdfs://hostname/2kSOSP.log:21876+7292 hdfs://hostname/2kSOSP.log:21876+7292 Input … hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found ... 5 1 1 1 4 1 1 4 Step 1: Dictionary Setup for n-grams
  • 30. Input split: hdfs://hostname/2kSOSP.log:29168+7292 Found block rdd_42_20 locally Found block rdd_42_22 locally 30 Step 2: Parsing logs with n-gram dictionaries
  • 31. 3-grams # appearance split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block 1 1 1 1 1 5 31 Found block rdd_42_20 locally Look up Found block rdd_42_20 block rdd_42_20 locally Step 2: Parsing logs with n-gram dictionaries
  • 32. 3-grams # appearance split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block 1 1 1 1 1 5 Both 3-grams may contain dynamic values since their appearances are only 1. 32 32 Found block rdd_42_20 locally Look up Found block rdd_42_20 block rdd_42_20 locally Step 2: Parsing logs with n-gram dictionaries
  • 33. 3-grams # appearance split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block 1 1 1 1 1 5 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 33 Look up Step 2: Parsing logs with n-gram dictionaries
  • 34. 3-grams # appearance split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block 1 1 1 1 1 5 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 34 Look up This 2-gram contains only static tokens. Step 2: Parsing logs with n-gram dictionaries
  • 35. 3-grams # appearance split: hdfs://hostname/2kSOSP.log:29168+7292 Found hdfs://hostname/2kSOSP.log:29168+7292 Found block Found block rdd_42_20 block rdd_42_20 locally rdd_42_20 locally Found locally Found block 1 1 1 1 1 5 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 35 Look up These 2-grams may contain dynamic tokens. Step 2: Parsing logs with n-gram dictionaries
  • 36. block rdd_42_20 rdd_42_20 locally Finding overlapping token 36 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 Step 2: Parsing logs with n-gram dictionaries
  • 37. block rdd_42_20 rdd_42_20 locally Finding overlapping token Dynamic value 37 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 Step 2: Parsing logs with n-gram dictionaries
  • 38. block rdd_42_20 rdd_42_20 locally Found block $1 locally $1=rdd_42_20 Finding overlapping token Generating template Dynamic value 38 2-grams # appearance hdfs://hostname/2kSOSP.log:29168+7292 Found Found block block rdd_42_20 rdd_42_20 locally locally Found 1 4 1 1 4 Step 2: Parsing logs with n-gram dictionaries
  • 40. 40 Average accuracy Drain AEL Lenma Spell IPLoM percent age(%) Logram Logram achieves stable parsing results using a dictionary generated from a small portion of log data Logram achieves near-linear scalability without sacrificing parsing accuracy.

Editor's Notes

  1. Introduce my self and topic Title large
  2. Field workloads continually change as the user base changes (e.g., as more users use the system), as user feature preferences change (e.g., preferences change from desktop to mobile access), as features are activated or disabled and as the deployment configuration changes (e.g., new servers are added). Changing field workloads may be a major impact on the performance of the system. Therefore, as the field workload change, so must the load test workloads.
  3. Shortcoming: query efficiency Scene2: Store logs that need to be frequently queried.
  4. Sort: breaks the inner-similarity of other components; Record line number, introduce extra non-repetitive information. Text replacement: Need extra processing steps which impact the processing speed.
  5. The goal of log parsing is to extract the static template, dynamic variables, and the header information from a raw log message to a structured format
  6. as the size of logs grows rapidly and the need for low-latency log analysis increases efficiency becomes an important concern for log parsing To increase efficiency, we propose Logram which use n-gram model for log parsing
  7. Here are some raw logs from a software system
  8. The tokens in the blue box are the static parts, the tokens in red box are the dynamic parts
  9. Just like the example in this slide
  10. However, a dynamic token can appear frequently in different log events. Here we will show an example so only depending on the frequency of individual tokens may not be sufficient.
  11. we will use n-gram model to limit the appearance of this kind of dynamic token
  12. At the beginning, we need to preprocess the raw logs
  13. And followed by content part the header parts of logs often follow a common format in the same software system, we can directly use a pre-defined regular expression to obtain this part
  14. After getting the content of each log message, we split the log message into tokens.
  15. After getting the content of each log message, we split the log message into tokens. then, combine the tokens to 2-gram
  16. use the same method
  17. The value is the corresponding appearance
  18. transfer the log to 3-grams, and look up the appearance in 3-gram dictionary
  19. Then, we can find
  20. Transform the candidate 3-grams to 2-grams and look up the appearance in 2-gram dictionary
  21. Find the overlapping part of the candidate 2-grams
  22. Find the overlapping part of the candidate 2-grams
  23. Find the overlapping part of the candidate 2-grams
  24. For evaluation, we evaluate Logram with 16 datasets from LogPai on 4 aspects