Tracing Back Log Data to
its Log Statement:
From Research to Practice
Daan Schipper (Adyen)
Maurício F. Aniche, Arie van Deursen (Delft University of Technology)
DEV OPS
20170101160001 Adyen version: ******
20170101160002 Starting TX/amt=10001/currency=978
20170101160003 Starting EMV
20170101160004 EMV started
20170101160005 Magswipe opened
20170101160006 CTLS started
20170101160007 Transaction initialised
20170101160008 Run TX as EMV transaction
20170101160009 Application selected app:******
20170101160010 read_application_data succeeded
20170101160011 data_authentication succeeded
20170101160012 validate 0
20170101160013 DCC rejected
20170101160014 terminal_risk_management succeeded
20170101160015 verify_card_holder succeeded
20170101160016 generate_first_ac succeeded
20170101160017 Authorizing online
20170101160018 Data returned by the host succeeded
20170101160019 Transaction authorized by card
20170101160020 Approved receipt printed
20170101160021 pos_result_code:APPROVED
20170101160022 Final status: Approved
20170101160001 Adyen version: ******
20170101160002 Starting TX/amt=10001/currency=978
20170101160003 Starting EMV
20170101160004 EMV started
20170101160005 Magswipe opened
20170101160006 CTLS started
20170101160007 Transaction initialised
20170101160008 Run TX as EMV transaction
20170101160009 Application selected app:******
20170101160010 read_application_data succeeded
20170101160011 data_authentication succeeded
20170101160012 validate 0
20170101160013 DCC rejected
20170101160014 terminal_risk_management succeeded
20170101160015 verify_card_holder succeeded
20170101160016 generate_first_ac succeeded
20170101160017 Authorizing online
20170101160018 Data returned by the host succeeded
20170101160019 Transaction authorized by card
20170101160020 Approved receipt printed
20170101160021 pos_result_code:APPROVED
20170101160022 Final status: Approved
20170101160001 Adyen version: ******
20170101160002 Starting TX/amt=10001/currency=978
20170101160003 Starting EMV
20170101160004 EMV started
20170101160005 Magswipe opened
20170101160006 CTLS started
20170101160007 Transaction initialised
20170101160008 Run TX as EMV transaction
20170101160009 Application selected app:******
20170101160010 read_application_data succeeded
20170101160011 data_authentication succeeded
20170101160012 validate 0
20170101160013 DCC rejected
20170101160014 terminal_risk_management succeeded
20170101160015 verify_card_holder succeeded
20170101160016 generate_first_ac succeeded
20170101160017 Authorizing online
20170101160018 Data returned by the host succeeded
20170101160019 Transaction authorized by card
20170101160020 Approved receipt printed
20170101160021 pos_result_code:APPROVED
20170101160022 Final status: Approved
20170101160001 Adyen version: ******
20170101160002 Starting TX/amt=10001/currency=978
20170101160003 Starting EMV
20170101160004 EMV started
20170101160005 Magswipe opened
20170101160006 CTLS started
20170101160007 Transaction initialised
20170101160008 Run TX as EMV transaction
20170101160009 Application selected app:******
20170101160010 read_application_data succeeded
20170101160011 data_authentication succeeded
20170101160012 validate 0
20170101160013 DCC rejected
20170101160014 terminal_risk_management succeeded
20170101160015 verify_card_holder succeeded
20170101160016 generate_first_ac succeeded
20170101160017 Authorizing online
20170101160018 Data returned by the host succeeded
20170101160019 Transaction authorized by card
20170101160020 Approved receipt printed
20170101160021 pos_result_code:APPROVED
20170101160022 Final status: Approved
20170101160001 Adyen version: ******
20170101160002 Starting TX/amt=10001/currency=978
20170101160003 Starting EMV
20170101160004 EMV started
20170101160005 Magswipe opened
20170101160006 CTLS started
20170101160007 Transaction initialised
20170101160008 Run TX as EMV transaction
20170101160009 Application selected app:******
20170101160010 read_application_data succeeded
20170101160011 data_authentication succeeded
20170101160012 validate 0
20170101160013 DCC rejected
20170101160014 terminal_risk_management succeeded
20170101160015 verify_card_holder succeeded
20170101160016 generate_first_ac succeeded
20170101160017 Authorizing online
20170101160018 Data returned by the host succeeded
20170101160019 Transaction authorized by card
20170101160020 Approved receipt printed
20170101160021 pos_result_code:APPROVED
20170101160022 Final status: Approved
log.info(”Customer “ + customer +
“ paying ” + paymentValue);
[2019-02-03 15:43:24] [MagicPayment.java][L456] Customer Maurício paying 235.67
Class name Line number
In practice…
• Adyen can’t log the class and the line number that
originates a log message.
• Developers ‘grep’ the source code
• Can we automatically detect where origin of the log
lines?
• W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan.
Detecting large-scale system problems by mining console
logs. In Proceedings of the ACM SIGOPS 22nd symposium
on Operating systems principles, pages 117–132. ACM,
2009.
Match
message to
template
Source
code
Find log
statements
Create
template
Enrich
template
Create index
Template
database
Logs
Link
Template creation process Matching process
query
Templates
Match
message to
template
Source
code
Find log
statements
Create
template
Enrich
template
Create index
Template
database
Logs
Link
Template creation process Matching process
query
Templates
log.info(“Customer “ + customer + “ ID “ + id); à “Customer .* paying d*”
Match
message to
template
Source
code
Find log
statements
Create
template
Enrich
template
Create index
Template
database
Logs
Link
Template creation process Matching process
query
Templates
RQ1: Accuracy
• We collect 100k messages from a week day.
• These 100k messages pointed to 676 different locations in the source
code.
• 95% CL, 5% CI sample = 245 links.
• Manually investigation = 97.6% accuracy (239 out of 245 correct links)
• 99.8% accuracy in the original paper (two projects: HDFS and Darkstar,
millions of log lines, # of log statements not specified).
RQ2: Performance
Creating the ASTs of the entire
codebase is the most expensive
part of the process.
Step Time (minutes) Percentage
Finding log statements 37:25 92.1%
Creating template 00:32 1.3%
Enriching template 02:27 5.9%
Creating index 00:16 0.7%
Total 40:42 100%
Related work: not really reported.
Overall process: 200 HDFS nodes (with aggressive logging)
over 48 hours => 3 minutes with 50 nodes, or less than 10
minutes with 10 nodes.
When does it fail?
• JSON-based logs
• The developer logs a JSON (that is created in runtime) and thus the log
statement is simply “log.info(json)”, making our template inaccurate.
• Unknown logging method
• Developers create their own logging methods, which our tool can’t recognize.
• Log strings created on-the-fly
• Some log messages are too complex and developers create them by means of
multiple line of code (e.g. String log = “content1”; log = log + “content 2”, …),
which makes the analysis harder.
• Related work: “Almost all of these (failed) messages contain long
string variables.” => same for us.
Summary
• Logging the class and line number that originates a log message can be
expensive.
• Heuristics to make this link are available.
• We evaluated Xu et al’s proposal in an industry dataset:
• High accuracy (~97%)
• Reasonable performance, where the high cost comes from AST generation.
• Complex log statements make the analysis hard.
Schipper, Aniche, van Deursen. Tracing Back Log Data to its Log Statement:
From Research to Practice.
Contact: Daan.Schipper@adyen.com, M.FinavaroAniche@tudelft.nl,
Arie.VanDeursen@tudelft.nl
Paper: http://bit.ly/msr19-tracing-log-statements

Tracing Back Log Data to its Log Statement: From Research to Practice

  • 1.
    Tracing Back LogData to its Log Statement: From Research to Practice Daan Schipper (Adyen) Maurício F. Aniche, Arie van Deursen (Delft University of Technology)
  • 2.
    DEV OPS 20170101160001 Adyenversion: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved
  • 3.
    log.info(”Customer “ +customer + “ paying ” + paymentValue); [2019-02-03 15:43:24] [MagicPayment.java][L456] Customer Maurício paying 235.67 Class name Line number
  • 5.
    In practice… • Adyencan’t log the class and the line number that originates a log message. • Developers ‘grep’ the source code • Can we automatically detect where origin of the log lines? • W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 117–132. ACM, 2009.
  • 6.
    Match message to template Source code Find log statements Create template Enrich template Createindex Template database Logs Link Template creation process Matching process query Templates
  • 7.
    Match message to template Source code Find log statements Create template Enrich template Createindex Template database Logs Link Template creation process Matching process query Templates log.info(“Customer “ + customer + “ ID “ + id); à “Customer .* paying d*”
  • 8.
    Match message to template Source code Find log statements Create template Enrich template Createindex Template database Logs Link Template creation process Matching process query Templates
  • 9.
    RQ1: Accuracy • Wecollect 100k messages from a week day. • These 100k messages pointed to 676 different locations in the source code. • 95% CL, 5% CI sample = 245 links. • Manually investigation = 97.6% accuracy (239 out of 245 correct links) • 99.8% accuracy in the original paper (two projects: HDFS and Darkstar, millions of log lines, # of log statements not specified).
  • 10.
    RQ2: Performance Creating theASTs of the entire codebase is the most expensive part of the process. Step Time (minutes) Percentage Finding log statements 37:25 92.1% Creating template 00:32 1.3% Enriching template 02:27 5.9% Creating index 00:16 0.7% Total 40:42 100% Related work: not really reported. Overall process: 200 HDFS nodes (with aggressive logging) over 48 hours => 3 minutes with 50 nodes, or less than 10 minutes with 10 nodes.
  • 11.
    When does itfail? • JSON-based logs • The developer logs a JSON (that is created in runtime) and thus the log statement is simply “log.info(json)”, making our template inaccurate. • Unknown logging method • Developers create their own logging methods, which our tool can’t recognize. • Log strings created on-the-fly • Some log messages are too complex and developers create them by means of multiple line of code (e.g. String log = “content1”; log = log + “content 2”, …), which makes the analysis harder. • Related work: “Almost all of these (failed) messages contain long string variables.” => same for us.
  • 12.
    Summary • Logging theclass and line number that originates a log message can be expensive. • Heuristics to make this link are available. • We evaluated Xu et al’s proposal in an industry dataset: • High accuracy (~97%) • Reasonable performance, where the high cost comes from AST generation. • Complex log statements make the analysis hard. Schipper, Aniche, van Deursen. Tracing Back Log Data to its Log Statement: From Research to Practice. Contact: Daan.Schipper@adyen.com, M.FinavaroAniche@tudelft.nl, Arie.VanDeursen@tudelft.nl Paper: http://bit.ly/msr19-tracing-log-statements