Boosting Log Observability in Production Systems through Bytecode-Driven Fault Variable Tracking.pptx

1
Taizheng Wang, Yutong Wang, Wei Chang, Chunyang Ye*,
Hui Zhou
Hainan University, CHINA
Boosting Log Observability in
Production Systems through Bytecode-
Driven Fault Variable Tracking
International Conference on Software Maintenance and Evolution
(ICSME) 2025

2
Contents
Conclusion
01
02
03
04
05
Introduction
Contributio
n
Framework
Evaluation

4
Background
• Software reliability maintenance is a core challenge in large-
scale production systems.[1]
• Increase in system complexity → Software Failure is inevitable.
[2]
• The Role of Logs in Observability.[3]
• Support fault diagnosis
• System behavior monitoring
• Software Maintenance and Evolution
[1] C. V. Ramamoorthy and S.-b. F. Ho, “Testing large software with automated software evaluation systems,” in
Proceedings of the international conference on Reliable software, 1975, pp. 382–394.
[2] K.-L. Peng and C.-Y. Huang, “Reliability analysis of on-demand servicebased software systems considering failure
dependencies,” IEEE Transactions on Services Computing, vol. 10, no. 3, pp. 423–435, 2015.
[3] Z. Liu, X. Xia, D. Lo, Z. Xing, A. E. Hassan, and S. Li, “Which variables should i log?” IEEE Transactions on
Software Engineering, vol. 47, no. 9, pp. 2012–2031, 2019.

5
Problem Statement
• Log records are often insufficient:
• lack Fault-Related Variables (FRVs) [4]
• Incomplete diagnostic information
• The debugging time has been extended.
• Resulting in difficulties in reproducing and locating the fault
[4] S. He, P. He, Z. Chen, T. Yang, Y. Su, and M. R. Lyu, “A survey on
automated log analysis for reliability engineering,” ACM computing
surveys (CSUR), vol. 54, no. 6, pp. 1–37, 2021.

6
Case Study: HADOOP-12795
• Problem: KMS server error (HTTP 500) → No stack trace
• Consequence: Client exception lacks details → Hard to debug
• Solution: Added logs (status, throwable) → Easier fault diagnosis

7
Empirical Evidence
• Three authors conducted manual analysis of Bugs.jar (1158 bug
reports) [5]:
• Only 13.16% of the FRVs were recorded.
• Traditional Log:
• Manual insertion → Inconsistent, prone to errors [6]
• Lack of adaptability
[5] R. K. Saha, Y. Lyu, W. Lam, H. Yoshida, and M. R. Prasad,
“Bugs.jar: A large-scale, diverse dataset of real-world java bugs,”
in Proceedings of the 15th international conference on mining
software repositories, 2018, pp. 10–13.

8
Limitations of Existing Work
• Current research: Automated log enhancement [3],[7]
• Mainly based on static source code instrumentation
• Limitations ：
• Need to recompile and redeploy → High cost
• Lack of flexibility after deployment
• Log enhancement generalization makes it difficult to capture
specific variables of faults
[3] Z. Liu, X. Xia, D. Lo, Z. Xing, A. E. Hassan, and S. Li, “Which variables should I log?” IEEE Transactions on
Software Engineering, vol. 47, no. 9, pp. 2012–2031, 2019.
[7] D. Yuan, J. Zheng, S. Park, Y. Zhou, and S. Savage, “Improving software diagnosability via log enhancement,”
ACM Transactions on Computer Systems (TOCS), vol. 30, no. 1, pp. 1–28, 2012.

9
Our Approach: VarFR
• VarFR: A bytecode-driven fault variable tracking framework
• Features:
• No source code modification → direct bytecode
instrumentation
• Automatic tracking of runtime FRVs
• Advantages:
• No recompilation or redeployment required
• Captures runtime states → improves observability
• Enables fine-grained & adaptive log augmentation post-
deployment

10
Challenges
• Variable Selection ：
• 25% log changes involve variables [8], [9]
• Over-monitoring → overhead, noise
• Under-monitoring → missing important info
• No standardized logging guidelines [10]
• Bytecode Complexity ：
• Lacks semantics & structure
• Obfuscated / compiler-generated names
• Same ID, different entities
• Hard to capture context

12
Contribution
• Bytecode Instrumentation Tool – Fine-grained variable monitoring
without source code modification
• Fault-Related Variable Recommendation Model – LSTM + GNN +
CNN → Bytecode Feature Fusion
• Annotated Dataset – Based on Bugs.jar, 1,173 bugs, 873 buggy
methods
• Evaluation – Outperforms baseline methods, enhances log
observability and fault diagnosis

20
Evaluation
• Dataset Choice
• Use Bugs.jar (1158 bugs, 8 projects) Broader coverage, closer to real-world defect
• Dataset Features
• Each bug: defective code + patch Focus on method-level fault-related variables
• Construction Process
• Based on fine-grained fault localization 3 authors manually annotated all instances

21
Evaluation
• How does the proposed method compare to existing approaches
in terms of performance? (RQ1)

22
Evaluation
• Does combining bytecode, variable metadata, and method
structure outperform using only part of these features?(RQ2)

23
Evaluation
• Case studies on real bugs (RQ3)
• Test VarFR generalization performance
• Defect Cases
• HADOOP-12795: Missing logs (status, throwable) in exception handling
• HDFS-14793: Missing log (this) during initialization

25
Conclusion
• Bytecode-level monitoring of fault-related variables, no source code
needed
• Multi-feature fusion: bytecode semantics + method flow + variable
metadata
• Improves log completeness and fault observability
• Outperforms baselines on Bugs.jar
• Benefits: faster debugging, lower maintenance, better software
adaptability

26
Questions
Thanks for Your Time!
Taizheng Wang taizhengwang@hainanu.edu.cn

Boosting Log Observability in Production Systems through Bytecode-Driven Fault Variable Tracking.pptx

More Related Content

Similar to Boosting Log Observability in Production Systems through Bytecode-Driven Fault Variable Tracking.pptx

Recently uploaded

Boosting Log Observability in Production Systems through Bytecode-Driven Fault Variable Tracking.pptx

Editor's Notes