Information and Communication Technology (ICT) is not limited to software development, mobile apps and ICT service management but percolates into all kind of products with the so-called Internet of Things.
ICT depends on software where defects are common. Developing software is knowledge acquisition, not civil engineering. Thus knowledge might be missing and consequently leading to defects and failures to perform. In turn, operating ICT products involves connecting ICT services with human interaction, and is error-prone as well. There is much value in delivering software without defects. However, up to now there exists no agreed method of measuring defects in ICT. UML sequence diagrams is a software model that describes data movements between actors and objects and allows for automated measurements using ISO/IEC 19761 COSMIC. Can we also use it for defect measurements that allows applying standard Six Sigma techniques to ICT by measuring both functional size and defect density in the same model? It allows sizing of functionality and defects even if no code is available. ISO/IEC 19761 measurements are linear, thus fitting to sprints in agile development as well as for using statistical tools from Six Sigma.(IT Confidence 2014, Tokyo (Japan))
Information and Communication Technology (ICT) is not limited to software development, mobile apps and ICT service management but percolates into all kind of products with the so-called Internet of Things.
ICT depends on software where defects are common. Developing software is knowledge acquisition, not civil engineering. Thus knowledge might be missing and consequently leading to defects and failures to perform. In turn, operating ICT products involves connecting ICT services with human interaction, and is error-prone as well. There is much value in delivering software without defects. However, up to now there exists no agreed method of measuring defects in ICT. UML sequence diagrams is a software model that describes data movements between actors and objects and allows for automated measurements using ISO/IEC 19761 COSMIC. Can we also use it for defect measurements that allows applying standard Six Sigma techniques to ICT by measuring both functional size and defect density in the same model? It allows sizing of functionality and defects even if no code is available. ISO/IEC 19761 measurements are linear, thus fitting to sprints in agile development as well as for using statistical tools from Six Sigma.(IT Confidence 2014, Tokyo (Japan))
A talk in Thai at DataConf 2016 in Bangkok
See some notes in Thai at
https://docs.google.com/presentation/d/1Bap_skCIbPPnsRsk-lg--OfHyFMCEhrx_y0tDgozMcc/edit?usp=sharing
Machine Learning to Turbo-Charge the Ops Portion of DevOpsDeborah Schalm
Already on a continuous or short-cycle delivery? Constantly rewiring your apps with microservice and similar architectures? Maintaining visibility and maximizing service levels once this stuff gets into production could be a regular nightmare. Coding instrumentation into your apps is time-consuming and error-prone. Instead, let machine learning do the work of adapting your monitoring to your fast-moving application environments. In this webcast learn about various types of machine learning that are optimized for operational data, and see in a demo how this could be leveraged to ensure your ops move as fast as rest of your DevOps pipeline.
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Amine Barrak
Presentation of Best student paper award on CASCON2018 intitled: Just-in-time Detection of Protection-Impacting Changes on WordPress and MediaWiki
Link to the paper: https://dl.acm.org/citation.cfm?id=3291310
ISV Error Handling With Spring '21 UpdateCodeScience
With the Spring ‘21 release, BatchApexErrorEvent is a newly available error handling tool for managed packages that makes it easier for ISVs to diagnose and debug batch processing errors. In our latest tech webinar, CodeScience Technical Architect, Rob Davis, presents a deep dive into how to use this new tool in tandem with other error handling methods to make your managed packages more resilient.
Why Monitoring and Logging are Important in DevOps.pdfDatacademy.ai
As businesses increasingly rely on technology to deliver products and services, it's critical to ensure that their IT systems are performing optimally. This is where DevOps comes in, as it helps organizations streamline their software development and deployment processes. Monitoring and logging are two critical components of the DevOps approach, as they help teams to identify and troubleshoot issues in real-time. In this LinkedIn post, we'll explore the importance of monitoring and logging in DevOps and how they can help organizations achieve greater efficiency and reliability in their IT operations.
Association Rule Mining Scheme for Software Failure AnalysisEditor IJMTER
The software execution process is tracked with event logs. The event logs are used to maintain the
execution process flow in a textual log file. The log file also manages the error values and their source of classes.
The error values are used to analyze the failure of the software. The data mining methods are used to evaluate the
quality and software failure rate analysis process. The text logs are processed and data values are extracted from
the data values. The data values are mined using the machine learning methods for failure analysis.
The service error, service complaints, interaction error and crash errors are maintained under the log files.
The events and their reactions are also maintained under the log files. Software termination and execution failures
are identified using the log details. The log file parsing process is applied to extract data from the logs. The
associations rule mining methods are used to analyze the log files for failure detection process. The system uses
the Weighted Association Rule Mining (WARM) scheme to fetch failure rate in the software execution flow. The
system improves the failure rate detection accuracy in WARM model.
Studying the Integration Practices and the Evolution of Ad Libraries in the G...SAIL_QU
In-app advertisements have become a major revenue for app developers in the mobile app economy. Ad libraries play an integral part in this ecosystem as app
developers integrate these libraries into their apps to display ads. However, little is known about how app developers integrate these libraries with their apps and how these libraries have evolved over time.
In this thesis, we study the ad library integration practices and the evolution of such libraries. To understand the integration practices of ad libraries, we manually study apps and derive a set of rules to automatically identify four strategies for integrating
multiple ad libraries. We observe that integrating multiple ad libraries commonly occurs in apps with a large number of downloads and ones in categories with a high percentage of apps that display ads. We also observe that app developers prefer to manage their own integrations instead of using off the shelf features of ad libraries for integrating multiple ad libraries.
To study the evolution of ad libraries, we conduct a longitudinal study of the 8 most popular ad libraries. In particular, we look at their evolution in terms of size, the main drivers for releasing a new ad library version, and their architecture. We observe that ad libraries are continuously evolving with a median release interval of 34 days. Some ad libraries have grown exponentially in size (e.g., Facebook Audience Network ad library), while other libraries have worked to reduce their size. To study the main drivers for releasing an ad library version, we manually study the release notes of the eight studied ad libraries. We observe that ad library developers continuously update their ad libraries to support a wider range of Android versions (i.e., to ensure that more devices can use the libraries without errors). Finally, we derive a reference architecture for ad libraries and study how the studied ad libraries diverged from this architecture during our study period.
Our findings can assist ad library developers to understand the challenges for developing ad libraries and the desired features of these libraries.
A talk in Thai at DataConf 2016 in Bangkok
See some notes in Thai at
https://docs.google.com/presentation/d/1Bap_skCIbPPnsRsk-lg--OfHyFMCEhrx_y0tDgozMcc/edit?usp=sharing
Machine Learning to Turbo-Charge the Ops Portion of DevOpsDeborah Schalm
Already on a continuous or short-cycle delivery? Constantly rewiring your apps with microservice and similar architectures? Maintaining visibility and maximizing service levels once this stuff gets into production could be a regular nightmare. Coding instrumentation into your apps is time-consuming and error-prone. Instead, let machine learning do the work of adapting your monitoring to your fast-moving application environments. In this webcast learn about various types of machine learning that are optimized for operational data, and see in a demo how this could be leveraged to ensure your ops move as fast as rest of your DevOps pipeline.
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Amine Barrak
Presentation of Best student paper award on CASCON2018 intitled: Just-in-time Detection of Protection-Impacting Changes on WordPress and MediaWiki
Link to the paper: https://dl.acm.org/citation.cfm?id=3291310
ISV Error Handling With Spring '21 UpdateCodeScience
With the Spring ‘21 release, BatchApexErrorEvent is a newly available error handling tool for managed packages that makes it easier for ISVs to diagnose and debug batch processing errors. In our latest tech webinar, CodeScience Technical Architect, Rob Davis, presents a deep dive into how to use this new tool in tandem with other error handling methods to make your managed packages more resilient.
Why Monitoring and Logging are Important in DevOps.pdfDatacademy.ai
As businesses increasingly rely on technology to deliver products and services, it's critical to ensure that their IT systems are performing optimally. This is where DevOps comes in, as it helps organizations streamline their software development and deployment processes. Monitoring and logging are two critical components of the DevOps approach, as they help teams to identify and troubleshoot issues in real-time. In this LinkedIn post, we'll explore the importance of monitoring and logging in DevOps and how they can help organizations achieve greater efficiency and reliability in their IT operations.
Association Rule Mining Scheme for Software Failure AnalysisEditor IJMTER
The software execution process is tracked with event logs. The event logs are used to maintain the
execution process flow in a textual log file. The log file also manages the error values and their source of classes.
The error values are used to analyze the failure of the software. The data mining methods are used to evaluate the
quality and software failure rate analysis process. The text logs are processed and data values are extracted from
the data values. The data values are mined using the machine learning methods for failure analysis.
The service error, service complaints, interaction error and crash errors are maintained under the log files.
The events and their reactions are also maintained under the log files. Software termination and execution failures
are identified using the log details. The log file parsing process is applied to extract data from the logs. The
associations rule mining methods are used to analyze the log files for failure detection process. The system uses
the Weighted Association Rule Mining (WARM) scheme to fetch failure rate in the software execution flow. The
system improves the failure rate detection accuracy in WARM model.
Similar to Mining Development Knowledge to Understand and Support Software Logging Practices (20)
Studying the Integration Practices and the Evolution of Ad Libraries in the G...SAIL_QU
In-app advertisements have become a major revenue for app developers in the mobile app economy. Ad libraries play an integral part in this ecosystem as app
developers integrate these libraries into their apps to display ads. However, little is known about how app developers integrate these libraries with their apps and how these libraries have evolved over time.
In this thesis, we study the ad library integration practices and the evolution of such libraries. To understand the integration practices of ad libraries, we manually study apps and derive a set of rules to automatically identify four strategies for integrating
multiple ad libraries. We observe that integrating multiple ad libraries commonly occurs in apps with a large number of downloads and ones in categories with a high percentage of apps that display ads. We also observe that app developers prefer to manage their own integrations instead of using off the shelf features of ad libraries for integrating multiple ad libraries.
To study the evolution of ad libraries, we conduct a longitudinal study of the 8 most popular ad libraries. In particular, we look at their evolution in terms of size, the main drivers for releasing a new ad library version, and their architecture. We observe that ad libraries are continuously evolving with a median release interval of 34 days. Some ad libraries have grown exponentially in size (e.g., Facebook Audience Network ad library), while other libraries have worked to reduce their size. To study the main drivers for releasing an ad library version, we manually study the release notes of the eight studied ad libraries. We observe that ad library developers continuously update their ad libraries to support a wider range of Android versions (i.e., to ensure that more devices can use the libraries without errors). Finally, we derive a reference architecture for ad libraries and study how the studied ad libraries diverged from this architecture during our study period.
Our findings can assist ad library developers to understand the challenges for developing ad libraries and the desired features of these libraries.
Improving the testing efficiency of selenium-based load testsSAIL_QU
Slides for a paper published at AST 2019:
Shahnaz M. Shariff, Heng Li, Cor-Paul Bezemer, Ahmed E. Hassan, Thanh H. D. Nguyen, and Parminder Flora. 2019. Improving the testing efficiency of selenium-based load tests. In Proceedings of the 14th International Workshop on Automation of Software Test (AST '19). IEEE Press, Piscataway, NJ, USA, 14-20. DOI: https://doi.org/10.1109/AST.2019.00008
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
An Approach to Detecting Writing Styles Based on Clustering Techniquesambekarshweta25
An Approach to Detecting Writing Styles Based on Clustering Techniques
Authors:
-Devkinandan Jagtap
-Shweta Ambekar
-Harshit Singh
-Nakul Sharma (Assistant Professor)
Institution:
VIIT Pune, India
Abstract:
This paper proposes a system to differentiate between human-generated and AI-generated texts using stylometric analysis. The system analyzes text files and classifies writing styles by employing various clustering algorithms, such as k-means, k-means++, hierarchical, and DBSCAN. The effectiveness of these algorithms is measured using silhouette scores. The system successfully identifies distinct writing styles within documents, demonstrating its potential for plagiarism detection.
Introduction:
Stylometry, the study of linguistic and structural features in texts, is used for tasks like plagiarism detection, genre separation, and author verification. This paper leverages stylometric analysis to identify different writing styles and improve plagiarism detection methods.
Methodology:
The system includes data collection, preprocessing, feature extraction, dimensional reduction, machine learning models for clustering, and performance comparison using silhouette scores. Feature extraction focuses on lexical features, vocabulary richness, and readability scores. The study uses a small dataset of texts from various authors and employs algorithms like k-means, k-means++, hierarchical clustering, and DBSCAN for clustering.
Results:
Experiments show that the system effectively identifies writing styles, with silhouette scores indicating reasonable to strong clustering when k=2. As the number of clusters increases, the silhouette scores decrease, indicating a drop in accuracy. K-means and k-means++ perform similarly, while hierarchical clustering is less optimized.
Conclusion and Future Work:
The system works well for distinguishing writing styles with two clusters but becomes less accurate as the number of clusters increases. Future research could focus on adding more parameters and optimizing the methodology to improve accuracy with higher cluster values. This system can enhance existing plagiarism detection tools, especially in academic settings.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Mining Development Knowledge to Understand and Support Software Logging Practices
1. Mining Development Knowledge to
Understand and Support Software
Logging Practices
Heng Li
Supervisor: Dr. Ahmed E. Hassan
Software Analysis & Intelligence Lab (SAIL)
Queen’s University, Canada
2. Developers insert logging code that
produces log messages at runtime
2
Log()
Logging
code
Log
messages
Software
system
Log.info(“Stopping server on ” + port);
2016-07-23 17:56:16 INFO Stopping server on 8032
Log messages record valuable runtime information
3. Diagnose
failures
Logging is critical for software maintenance
Detect
anomalies
Log messages are widely used in software
maintenance efforts
3
Understand
runtime
behaviors
Fu et al., Contextual analysis of program logs
for understanding system behaviors. MSR ‘13
Yuan et al., Sherlog: Error diagnosis by
connecting clues from run-time logs. ASPLOS ‘10
Xu et al., Detecting large-scale system
problems by mining console logs. SOSP ‘09
4. Developers have difficulties deciding on
appropriate logging code
4
“A lot of log
noise”
“Slowing
down perf
by 20%”
“Missing an
error log”
Developers spend a significant amount of efforts
maintaining their logging code
§ Logging practices in open source projects
[Yuan et al., 2012; Chen and Jiang, 2017]
§ Logging practices in industry
[Shang et al, 2014; Fu et al, 2014]
Prior
work
5. Development knowledge explains
the development of logging code
5
− LOG.info(msg);
+ LOG.warn(msg);
To help users
identify a problem
LOG.warn(msg);
What How Why
Change historySource code Issue reports
6. Thesis statement
Development knowledge can help us understand
current logging practices and develop useful tools
to support such logging practices
6
Change historySource code Issue reports
Development knowledge
7. Mining development knowledge to
understand and support logging practices
7
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
8. Mining development knowledge to
understand and support logging practices
8
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
9. Developers communicate their logging
concerns in issue reports
9
Logging cost: performance overhead
Remove a logging statement
10. Developers communicate their logging
concerns in issue reports
10
Add a logging statement
Logging benefit:
exposing runtime problems
11. We study logging-related issues reports to
understand developer’s logging concerns
11
Logging
issue
reports
Logging
concerns
Automated
& manual
filtering
Qualitative
analysis
12. What are developers’ logging concerns?
12
Logging Benefits
§ Assisting in debugging
Logging Costs
§ Excessive log information
Research opportunities
Leverage Minimize
Frequency
§ Providing runtime perf
§ Exposing runtime problems
§ Bookkeeping
§ Showing execution progress
§ Exposing unnecessary details
§ Misleading end users
§ Performance overhead
§ Exposing sensitive info
13. Mining development knowledge to
understand and support logging practices
13
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
10 categories of
logging concerns
(e.g., misleading users)
14. Mining development knowledge to
understand and support logging practices
14
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
15. Some code topics are more likely to need
logging statements
15
Examples of JIRA issues that require developers to log
the topic of “connections”
[EMSE 2018]
16. Can code topics explain where to log?
Topic: “connection”
Logging statement
[EMSE 2018]
16
We extract the code topics and logging statements for
each code snippet (method level)
17. We use LDA to extract code topics
Logging statement
[EMSE 2018]
17
Tokenization
Topic model
(LDA)
queue, connection
Topic: “connection”
18. A small number of topics are much more
likely to be logged
Topic: “connection”
Logging statement
The most log-intensive topics usually capture
communication between machines (e.g., ”connection”) or
interactions between threads (e.g., “thread interruption”)
[EMSE 2018]
18
19. We combine both the structure and topic
info to explain where to log
Topic: “connection”
Logging statement
Structure info: lines of
code, complexity, control
flow statements, etc.
[EMSE 2018]
19
20. We combine both the structure and topic
info to explain where to log
Topic: “connection”
Logging statement
Structure info: lines of
code, complexity, control
flow statements, etc.
LASSO model
[EMSE 2018]
20
21. Code topics bring additional explanatory
power (up to 13% AUC improvement)
21
0.82
0.86
0.8
0.86
0.83
0.96
0.87
0.94
0.9 0.9 0.88
0.99
0.5
0.6
0.7
0.8
0.9
1
Structure info Structure & topic info
AUC
The performance (AUC) of our LASSO models
Random guess
[EMSE 2018]
22. Mining development knowledge to
understand and support logging practices
22
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
Logging varies
across code topics
23. Mining development knowledge to
understand and support logging practices
23
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
24. Developers have difficulties to make
appropriate log changes
24
Developers usually forget to change logging code when
they change their code; in many cases, logging code is
written as “after-thoughts” after a failure happens and
logs are needed [Yuan et al., 2012]
Commit n Commit n+1
Code
changes
Log
changes
Version k
Debugging
difficulties
Code change history
Maintenance
efforts
25. Learning from the code change history to
provide log change suggestions
25
[EMSE 2017]
Code Code Log Code Log
?
Commit 1 Commit 2 Commit n…
Code changes
without log
changes
Code changes
with log
changes
Do we need to
change logs?
Code change history
26. LOG?
Providing automated suggestions for log
changes when developers change the code
26
Random Forest
Classifier
Log change
suggestions
Three dimensions
25 metrics
Change
metrics
Historical
metrics
Product
metrics
[EMSE 2017]
Code
27. Our models can effectively suggest whether
a log change is needed
27
0.84
0.91
0.86 0.88
0.5
0.6
0.7
0.8
0.9
1
AUC
The performance (AUC) of our Random
Forest models
Random guess
[EMSE 2017]
28. LOG?
The source code and code changes are
important for explaining log changes
28
Log change
suggestions
Three dimensions
25 metrics
Change
metrics
Historical
metrics
Product
metrics
[EMSE 2017]
Code
Explain
29. Mining development knowledge to
understand and support logging practices
29
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
The source code &
code changes can
explain log changes
30. Mining development knowledge to
understand and support logging practices
30
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
31. Log levels are used to disable some verbose
log messages while enabling important ones
31
Trace
Debug
Info
Warn
Error
Fatal Less verbose levels
(higher levels)
More verbose
levels (lower levels)
Log.error(“message”)
Log level
32. Improper log levels can have many
negative impacts
32
“…tends to generate a lot
of log noise…”
“These warnings worry
users”
Developers spend much efforts adjusting log levels
[Yuan et al., 2012]
33. Learning from the code change history to
provide log level suggestions
33
[EMSE 2017]
Commit 1 Commit 2 Commit n…
Code change history
Log.warn(msg) Log.info(msg) Log. ? (msg)
Log.error(msg)
Which log level
to use?
34. Providing automated suggestions for log
levels when developers add logging code
34
Logging statement metrics
Containing block metrics
Containing file metrics
Code change metrics
Historical change metrics
Trace
Debug
Info
Warn
Error
Fatal
Ordinal
Regression
Model
[EMSE 2017]
35. Ordinal regression models can effectively
model log levels
35
0.76
0.78
0.81
0.75
0.5
0.6
0.7
0.8
0.9
The performance (AUC) of our Ordinal
Regression Models
AUC
Random guess
[EMSE 2017]
36. The content of a logging statements and the
containing block/file explain its log level
36
Logging statement metrics
Containing block metrics
Containing file metrics
Code change metrics
Historical change metrics
Trace
Debug
Info
Warn
Error
Fatal
[EMSE 2017]
Explain
37. Mining development knowledge to
understand and support logging practices
37
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Error
Warn
Info
The log content &
containing blocks/files
can explain log levels
38. Mining development knowledge to
understand and support logging practices
38
Developers’
logging concerns?
[TSE under review]
Where to log?
When to update
log? How to log?
[EMSE 2018]
[EMSE 2017] [EMSE 2017]
Logging varies
across code topics
Error
Warn
Info
The source code &
code changes can
explain log changes
The log content &
containing blocks/files
can explain log levels
10 categories of
logging concerns
(e.g., misleading users)
39. References
§ Fu, Q., Lou, J. G., Lin, Q., Ding, R., Zhang, D., and Xie, T. (2013). Contextual analysis of program logs for
understanding system behaviors. In Proceedings of the 10th Working Conference on Mining Software
Repositories, MSR ’13, pages 397–400.
§ Xu, W., Huang, L., Fox, A., Patterson, D., and Jordan, M. I. (2009). Detecting large-scale system problems by
mining console logs. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles,
SOSP ’09, pages 117–132.
§ Yuan, D., Mai, H., Xiong, W., Tan, L., Zhou, Y., and Pasupathy, S. (2010). Sherlog: Error diagnosis by connecting
clues from run-time logs. In Proceedings of the 15th International Conference on Architectural Support for
Programming Languages and Operating Systems, ASPLOS ’10, pages 143–154.
§ Yuan, D., Park, S., and Zhou, Y. (2012). Characterizing logging practices in open source software. In Proceedings
of the 34th International Conference on Software Engineering, ICSE ’12, pages 102–112.
§ Chen, B. and Jiang, Z. M. J. (2017). Characterizing logging practices in Java-based open source software projects
– a replication study in apache software foundation. Empirical Software Engineering, 22(1):330–374.
§ Shang, W., Jiang, Z. M., Adams, B., Hassan, A. E., Godfrey, M. W., Nasser, M., and Flora, P. (2014). An
exploratory study of the evolution of communicated information about the execution of large software
systems. Journal of Software: Evolution and Process, 26(1):3–26.
§ Fukushima, T., Kamei, Y., McIntosh, S., Yamashita, K., and Ubayashi, N. (2014). An empirical study of just-in-time
defect prediction using cross-project models. In Proceedings of the 11thWorking Conference onMining
Software Repositories, MSR 2014, pages 172–181.
39
42. Mining log messages
42
Understanding runtime behaviors
[Fu et al., 2013; Hassan et al., 2008; Shang et al., 2013]
Detecting anomaly conditions
[Xu et al., 2008, 2009; Fu et al., 2009; Jiang et al., 2008]
Diagnosing system failures
[Yuan et al, 2010; Syer et al., 2013]
Prior work highlights the importance of improving
logging quality
43. Mining logging code
43
Logging practices in open source projects
[Yuan et al., 2012; Chen and Jiang, 2017]
Logging practices in industry
[Fu et al, 2014; Pecchia et al., 2015]
Evolution of logging code
[Shang et al, 2011; Kabinna et al., 2016]
Log()
Developers spend much effort maintaining their logging
Software logging is a common practice
44. Improving logging code: proactive logging
44
Proactively adding logging info in the source
code
[Yuan et al., 2011, 2012; Zhao et al., 2017]
Log()
Producing excessive log information
Developers’ expertise and concerns are not considered
45. Improving logging code: learning to log
45
Learning statistical models to suggest where
to log
[Zhu et al., 2015; Lal and Sureka, 2016; Jia et al., 2018]
Ignoring logging patterns (e.g., log level, stack trace)
Log()
Focusing on one dim. of dev. knowledge (source code)
Providing logging suggestions as a post-dev. process
46. Logging stack traces can grow log files
very fast
46
Log.warn(msg) Log.warn(msg, e)
Logging a log
message + full stack
trace
Logging a log
message
47. Developers have difficulties to decide
whether to log stack traces
47
Missing stack trace
Improper logging
of stack trace
48. Learning from existing source code to
suggest whether to log a stack trace
48
Source
code
Source
code
Log(msg) Log(msg, e)
Source
code
Log(msg, ?)
Random Forest
Classifier
Log the
stack trace?
Six dimensions of
features
Log(msg, e)
49. Our models can effectively suggest whether
a stack trace is needed
49
0.85
0.94
0.9
0.86
0.5
0.6
0.7
0.8
0.9
1
AUC
The performance (AUC) of our Random
Forest models
Random guess