1
Taizheng Wang, Yutong Wang, Wei Chang, Chunyang Ye*,
Hui Zhou
Hainan University, CHINA
Boosting Log Observability in
Production Systems through Bytecode-
Driven Fault Variable Tracking
International Conference on Software Maintenance and Evolution
(ICSME) 2025
2
Contents
Conclusion
01
02
03
04
05
Introduction
Contributio
n
Framework
Evaluation
3
01 Introduction
4
Background
• Software reliability maintenance is a core challenge in large-
scale production systems.[1]
• Increase in system complexity → Software Failure is inevitable.
[2]
• The Role of Logs in Observability.[3]
• Support fault diagnosis
• System behavior monitoring
• Software Maintenance and Evolution
[1] C. V. Ramamoorthy and S.-b. F. Ho, “Testing large software with automated software evaluation systems,” in
Proceedings of the international conference on Reliable software, 1975, pp. 382–394.
[2] K.-L. Peng and C.-Y. Huang, “Reliability analysis of on-demand servicebased software systems considering failure
dependencies,” IEEE Transactions on Services Computing, vol. 10, no. 3, pp. 423–435, 2015.
[3] Z. Liu, X. Xia, D. Lo, Z. Xing, A. E. Hassan, and S. Li, “Which variables should i log?” IEEE Transactions on
Software Engineering, vol. 47, no. 9, pp. 2012–2031, 2019.
5
Problem Statement
• Log records are often insufficient:
• lack Fault-Related Variables (FRVs) [4]
• Incomplete diagnostic information
• The debugging time has been extended.
• Resulting in difficulties in reproducing and locating the fault
[4] S. He, P. He, Z. Chen, T. Yang, Y. Su, and M. R. Lyu, “A survey on
automated log analysis for reliability engineering,” ACM computing
surveys (CSUR), vol. 54, no. 6, pp. 1–37, 2021.
6
Case Study: HADOOP-12795
• Problem: KMS server error (HTTP 500) → No stack trace
• Consequence: Client exception lacks details → Hard to debug
• Solution: Added logs (status, throwable) → Easier fault diagnosis
7
Empirical Evidence
• Three authors conducted manual analysis of Bugs.jar (1158 bug
reports) [5]:
• Only 13.16% of the FRVs were recorded.
• Traditional Log:
• Manual insertion → Inconsistent, prone to errors [6]
• Lack of adaptability
[5] R. K. Saha, Y. Lyu, W. Lam, H. Yoshida, and M. R. Prasad,
“Bugs.jar: A large-scale, diverse dataset of real-world java bugs,”
in Proceedings of the 15th international conference on mining
software repositories, 2018, pp. 10–13.
8
Limitations of Existing Work
• Current research: Automated log enhancement [3],[7]
• Mainly based on static source code instrumentation
• Limitations :
• Need to recompile and redeploy → High cost
• Lack of flexibility after deployment
• Log enhancement generalization makes it difficult to capture
specific variables of faults
[3] Z. Liu, X. Xia, D. Lo, Z. Xing, A. E. Hassan, and S. Li, “Which variables should I log?” IEEE Transactions on
Software Engineering, vol. 47, no. 9, pp. 2012–2031, 2019.
[7] D. Yuan, J. Zheng, S. Park, Y. Zhou, and S. Savage, “Improving software diagnosability via log enhancement,”
ACM Transactions on Computer Systems (TOCS), vol. 30, no. 1, pp. 1–28, 2012.
9
Our Approach: VarFR
• VarFR: A bytecode-driven fault variable tracking framework
• Features:
• No source code modification → direct bytecode
instrumentation
• Automatic tracking of runtime FRVs
• Advantages:
• No recompilation or redeployment required
• Captures runtime states → improves observability
• Enables fine-grained & adaptive log augmentation post-
deployment
10
Challenges
• Variable Selection :
• 25% log changes involve variables [8], [9]
• Over-monitoring → overhead, noise
• Under-monitoring → missing important info
• No standardized logging guidelines [10]
• Bytecode Complexity :
• Lacks semantics & structure
• Obfuscated / compiler-generated names
• Same ID, different entities
• Hard to capture context
11
02 Contribution
12
Contribution
• Bytecode Instrumentation Tool – Fine-grained variable monitoring
without source code modification
• Fault-Related Variable Recommendation Model – LSTM + GNN +
CNN → Bytecode Feature Fusion
• Annotated Dataset – Based on Bugs.jar, 1,173 bugs, 873 buggy
methods
• Evaluation – Outperforms baseline methods, enhances log
observability and fault diagnosis
13
03 Framework
14
Overview
15
Bytecode2Insn module
16
Bytecode2Localtable module
17
Bytecode2Cfg module
18
Bytecode instrumentation
19
04 Evaluation
20
Evaluation
• Dataset Choice
• Use Bugs.jar (1158 bugs, 8 projects) Broader coverage, closer to real-world defect
• Dataset Features
• Each bug: defective code + patch Focus on method-level fault-related variables
• Construction Process
• Based on fine-grained fault localization 3 authors manually annotated all instances
21
Evaluation
• How does the proposed method compare to existing approaches
in terms of performance? (RQ1)
22
Evaluation
• Does combining bytecode, variable metadata, and method
structure outperform using only part of these features?(RQ2)
23
Evaluation
• Case studies on real bugs (RQ3)
• Test VarFR generalization performance
• Defect Cases
• HADOOP-12795: Missing logs (status, throwable) in exception handling
• HDFS-14793: Missing log (this) during initialization
24
05 Conclusion
25
Conclusion
• Bytecode-level monitoring of fault-related variables, no source code
needed
• Multi-feature fusion: bytecode semantics + method flow + variable
metadata
• Improves log completeness and fault observability
• Outperforms baselines on Bugs.jar
• Benefits: faster debugging, lower maintenance, better software
adaptability
26
Questions
Thanks for Your Time!
Taizheng Wang taizhengwang@hainanu.edu.cn

Boosting Log Observability in Production Systems through Bytecode-Driven Fault Variable Tracking.pptx

  • 1.
    1 Taizheng Wang, YutongWang, Wei Chang, Chunyang Ye*, Hui Zhou Hainan University, CHINA Boosting Log Observability in Production Systems through Bytecode- Driven Fault Variable Tracking International Conference on Software Maintenance and Evolution (ICSME) 2025
  • 2.
  • 3.
  • 4.
    4 Background • Software reliabilitymaintenance is a core challenge in large- scale production systems.[1] • Increase in system complexity → Software Failure is inevitable. [2] • The Role of Logs in Observability.[3] • Support fault diagnosis • System behavior monitoring • Software Maintenance and Evolution [1] C. V. Ramamoorthy and S.-b. F. Ho, “Testing large software with automated software evaluation systems,” in Proceedings of the international conference on Reliable software, 1975, pp. 382–394. [2] K.-L. Peng and C.-Y. Huang, “Reliability analysis of on-demand servicebased software systems considering failure dependencies,” IEEE Transactions on Services Computing, vol. 10, no. 3, pp. 423–435, 2015. [3] Z. Liu, X. Xia, D. Lo, Z. Xing, A. E. Hassan, and S. Li, “Which variables should i log?” IEEE Transactions on Software Engineering, vol. 47, no. 9, pp. 2012–2031, 2019.
  • 5.
    5 Problem Statement • Logrecords are often insufficient: • lack Fault-Related Variables (FRVs) [4] • Incomplete diagnostic information • The debugging time has been extended. • Resulting in difficulties in reproducing and locating the fault [4] S. He, P. He, Z. Chen, T. Yang, Y. Su, and M. R. Lyu, “A survey on automated log analysis for reliability engineering,” ACM computing surveys (CSUR), vol. 54, no. 6, pp. 1–37, 2021.
  • 6.
    6 Case Study: HADOOP-12795 •Problem: KMS server error (HTTP 500) → No stack trace • Consequence: Client exception lacks details → Hard to debug • Solution: Added logs (status, throwable) → Easier fault diagnosis
  • 7.
    7 Empirical Evidence • Threeauthors conducted manual analysis of Bugs.jar (1158 bug reports) [5]: • Only 13.16% of the FRVs were recorded. • Traditional Log: • Manual insertion → Inconsistent, prone to errors [6] • Lack of adaptability [5] R. K. Saha, Y. Lyu, W. Lam, H. Yoshida, and M. R. Prasad, “Bugs.jar: A large-scale, diverse dataset of real-world java bugs,” in Proceedings of the 15th international conference on mining software repositories, 2018, pp. 10–13.
  • 8.
    8 Limitations of ExistingWork • Current research: Automated log enhancement [3],[7] • Mainly based on static source code instrumentation • Limitations : • Need to recompile and redeploy → High cost • Lack of flexibility after deployment • Log enhancement generalization makes it difficult to capture specific variables of faults [3] Z. Liu, X. Xia, D. Lo, Z. Xing, A. E. Hassan, and S. Li, “Which variables should I log?” IEEE Transactions on Software Engineering, vol. 47, no. 9, pp. 2012–2031, 2019. [7] D. Yuan, J. Zheng, S. Park, Y. Zhou, and S. Savage, “Improving software diagnosability via log enhancement,” ACM Transactions on Computer Systems (TOCS), vol. 30, no. 1, pp. 1–28, 2012.
  • 9.
    9 Our Approach: VarFR •VarFR: A bytecode-driven fault variable tracking framework • Features: • No source code modification → direct bytecode instrumentation • Automatic tracking of runtime FRVs • Advantages: • No recompilation or redeployment required • Captures runtime states → improves observability • Enables fine-grained & adaptive log augmentation post- deployment
  • 10.
    10 Challenges • Variable Selection: • 25% log changes involve variables [8], [9] • Over-monitoring → overhead, noise • Under-monitoring → missing important info • No standardized logging guidelines [10] • Bytecode Complexity : • Lacks semantics & structure • Obfuscated / compiler-generated names • Same ID, different entities • Hard to capture context
  • 11.
  • 12.
    12 Contribution • Bytecode InstrumentationTool – Fine-grained variable monitoring without source code modification • Fault-Related Variable Recommendation Model – LSTM + GNN + CNN → Bytecode Feature Fusion • Annotated Dataset – Based on Bugs.jar, 1,173 bugs, 873 buggy methods • Evaluation – Outperforms baseline methods, enhances log observability and fault diagnosis
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
    20 Evaluation • Dataset Choice •Use Bugs.jar (1158 bugs, 8 projects) Broader coverage, closer to real-world defect • Dataset Features • Each bug: defective code + patch Focus on method-level fault-related variables • Construction Process • Based on fine-grained fault localization 3 authors manually annotated all instances
  • 21.
    21 Evaluation • How doesthe proposed method compare to existing approaches in terms of performance? (RQ1)
  • 22.
    22 Evaluation • Does combiningbytecode, variable metadata, and method structure outperform using only part of these features?(RQ2)
  • 23.
    23 Evaluation • Case studieson real bugs (RQ3) • Test VarFR generalization performance • Defect Cases • HADOOP-12795: Missing logs (status, throwable) in exception handling • HDFS-14793: Missing log (this) during initialization
  • 24.
  • 25.
    25 Conclusion • Bytecode-level monitoringof fault-related variables, no source code needed • Multi-feature fusion: bytecode semantics + method flow + variable metadata • Improves log completeness and fault observability • Outperforms baselines on Bugs.jar • Benefits: faster debugging, lower maintenance, better software adaptability
  • 26.
    26 Questions Thanks for YourTime! Taizheng Wang taizhengwang@hainanu.edu.cn

Editor's Notes

  • #1 Good afternoon, everyone. My name is Taizheng Wang from Hainan University. Today, I’ll be presenting on Boosting Log Observability in Production Systems through Bytecode-Driven Fault Variable Tracking.
  • #2 this is the outline of today’s presentation. I will introduce our method from five aspects.
  • #4 Ensuring software reliability in large-scale production systems has always been a persistent(/pəˈsɪstənt/) challenge. As software grows increasingly complex, failures become inevitable, often caused by intricate dependencies, evolving codebases, and dynamic execution environments. In this context, log plays a important role in software observability. Logs not only help developers diagnose faults, but also support system monitoring and maintenance.
  • #5 Despite the importance of logging, real-world log statements are often insufficient. A major issue is the omission /əˈmɪʃ(ə)n/ of fault-related variables. Without these variables, diagnostic information is incomplete, which prolongs debugging and makes the fault localization /ləʊkəlaɪˈzeɪʃn/ process much more difficult and time-consuming.
  • #6 Let me illustrate with Hadoop case. In an earlier version, when the KMS server returned an HTTP 500 error, no stack trace was logged. The client only received a exception with no further details, making troubleshooting extremely difficult. In the updated version, developers added new log statements to record key variables such as status and throwable.
  • #7 This is not an isolated case. We analyzed 1,158 bug reports from the Bugs.jar dataset, and found that only a few fault-related variables were logged. This means that critical execution details were often missing. Furthermore, traditional logging practices rely heavily on manual insertion /ɪnˈsɜːʃn/ of log statements. This process is error-prone/prəʊn/, and lacks adaptability/əˌdæptəˈbɪləti/ in evolving software system.
  • #8 Several studies have explored automated log enhancement. However, most of these methods rely on static source code instrumentation, which has three major limitations. First, modifying source code requires re-compilation and re-deployment, which incurs high operational overhead. Second, once deployed, systems lack the flexibility to adjust logging granularity/ˌɡrænjəˈlærəti/. And third, these enhancements are often generic/dʒəˈnerɪk/, failing to capture fault-related variables effectively.
  • #9 To address these limitations, we propose VarFR, a bytecode-driven fault variable tracking framework. Unlike traditional approaches, VarFR does not require source code modification. Instead, it leverages bytecode instrumentation to automatically track and log fault-related variables at runtime. This design provides three main benefits: First, it eliminates the need for re-compilation and re-deployment, reducing operational burden.Second, by capturing runtime execution states, it improves observability beyond log statements.And third, it enables fine-grained/ˈfaɪnˈgreɪnd/, adaptive log augmentation in environments, which is critical for long-running production systems.
  • #10 Implementing bytecode-driven variable tracking faces two main challenges. First, variable selection. Studies show that over 25% of log changes involve adjusting variable monitoring. Too much monitoring adds overhead and hides real issues; too little risks missing key debugging signals. Even companies like Microsoft lack clear guidelines. Second, bytecode complexity. Bytecode has no explicit semantics. Variable names may be obfuscated, reused across different entities, while related variables may look different. This makes accurate identification very hard.
  • #12 Our contributions in this work can be summarized in four points. First, we developed a bytecode instrumentation tool , which enables fine-grained variable monitoring without modifying source code. Second, we propose a novel fault-related variable recommendation model. Third, we construct and release a manually labeled dataset of fault-related variables based on Bugs.jar. Finally, we conduct extensive experiments and demonstrate that VarFR significantly outperforms baseline models.
  • #14 Let me introduce our VarFR framework. VarFR is a bytecode-driven model for recommending fault-related variables, combining bytecode analysis with deep learning. It works in two main phases/ˈfeɪzɪz/. First, during training, the model learns fault-related patterns from bytecode files. VarFR extracts three kinds of features from each method,which are instruction sequence, control flow graph, and variable metadata. These are processed by different neural networks—and then fused them into a unified feature graph. The 3D convolution layer captures graph features. We trained the model on a manually labeled dataset of bugs.jar. Then, in the inference phase, it predicts and ranks suspicious variables in target method, which computing fault probabilities, and generating a ranked list of suspicious /səˈspɪʃəs/ variables.
  • #15 Let‘s introduce the different bytecode processing modules We use Bytecode2Insn module to parse class file and generate the method instruction. The problem is that bytecode load/store instruction don’t tell you the real variable name, and sometimes different variables share the same slot, which makes things confusing. To fix this, we check the variable scopes and match them to their names. In this way, the bytecode is easier to read and makes more sense, which helps a lot with tracking fault-related variables.
  • #16 Besides bytecode instructions, each method also has a local variable table that stores variable details like Variable name, type, scope, and index. This is important because raw bytecode file only shows the Variable index. In VarFR, we use Bytecode2Localtable module and generate the variable metadata.
  • #17 The bytecode2Cfg module is responsible for processing bytecode structural information. With the Soot framework, it creates control flow graphs for bug methods, generating code basic blocks. It first converts methods into Jimple IR files, then into CFG graph.
  • #18 this is the Bytecode instrumentation
  • #20 We chose Bugs.jar instead of Defects4J because it’s larger and more representative, covering 8 projects and 1158 bugs, which better reflects real-world software defects. Each bug instance contains both the defective code and its patch. We focused on annotating method-level fault-related variables. For construction, we built on the latest fine-grained localization techniques, and three authors manually annotated all bug instances to ensure reliability.
  • #21 We evaluated VarFR against baseline models using MRR, MAP, Top-1, and Top-2 accuracy. VarFR outperforms all baselines, achieving the highest scores across every metric. These results confirm that our model significantly improve fault-related variable prediction.
  • #22 We also conducted ablation experiments. Single-feature models performed the worst, two-feature models improved results, especially bytecode plus variable metadata. However Simply concatenating three features model didn’t help too much. And VarFR achieved the best performance, showing that feature fusion network is important for fault-related variable prediction.
  • #25 We present VarFR, a bytecode-level approach for tracking fault-related variables to improve log observability. Unlike traditional methods, it doesn’t require changing the source code. By combining bytecode semantics, control flow graphs, and variable metadata, VarFR can accurately predict fault-related variables. Our experiments show it outperforms baseline models, making logs more complete, debugging faster, and maintenance cheaper.