Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Qsic2008 jiang
1. Zhen Ming Jiang, Ahmed E. Hassan
Software Analysis and Intelligence (SAIL) Lab
Queens’ University, Canada
Gilbert Hamann, Parminder Flora
Enterprise Performance Engineering,
Research In Motion (RIM), Canada
Abstracting Execution Logs to Execution
Events for Enterprise Applications
2. How many types of “errors” are there?
■ One RIM application generates 1.6 million
log lines (in 8 hours) and 23,000 lines
contain “fail” or “failure”
– Total 319 execution events, among them 16
contains “fail” or “failure”
Events Frequency
Error occurred during purchasing, item=$v 500
Error! Cannot retrieve catalogs for user=$v 300
Authentication error for user=$v 100
3. 1. User checkout for accountID(Tom), item=100
2. User checkout for accountID(Jenny), item=100
3. Item shipped for accountID(Tom), item=100
4. User checkout for accountID(John), item=100
Abstracting Log Lines to
Execution Events
Events Lines
User checkout for accountID($v), item=$v 1, 2, 4
Item shipped for accountID($v), item=$v 3
5. Running CCFinder on Logs
■ Won’t work for large files
■ Unsatisfying results
■ Because log lines do not have
– Delimiters like “;” or “}”
– Keywords like “if”, “for”
6. Working Example
1. Start check out
2. Paid for, item=bag, quantity=1, amount=100
3. Paid for, item=book, quantity=3, amount=150
4. Check out, total amount is 250
5. Check out done
7. Our Log Abstraction
Approach
3_0_1 1. Start check out
3_0_2 5. Check out done
5_1_1 4. Check out, total amount=$v
8_3_1 2. Paid for, item=$v, quantity=$v, amount=$v
8_3_1 3. Paid for, item=$v, quantity=$v, amount=$v
8. Anonymize
1. Start check out
2. Paid for, item=bag, quantity=1, amount=100
3. Paid for, item=book, quantity=3, amount=150
4. Check out, total amount is 250
5. Check out done
1. Start check out
2. Paid for, item=$v, quantity=$v, amount=$v
3. Paid for, item=$v, quantity=$v, amount=$v
4. Check out, total amount=$v
5. Check out done
9. Tokenize
1. Start check out
2. Paid for, item=$v, quantity=$v, amount=$v
3. Paid for, item=$v, quantity=$v, amount=$v
4. Check out, total amount=$v
5. Check out done
(3, 0) 1. Start check out
5. Check out done
(5, 1) 4. Check out, total amount=$v
(8, 3) 2. Paid for, item=$v, quantity=$v, amount=$v
3. Paid for, item=$v, quantity=$v, amount=$v
10. Categorize
3_0_1 1. Start check out
3_0_2 5. Check out done
5_1_1 4. Check out, total amount=$v
8_3_1 2. Paid for, item=$v, quantity=$v, amount=$v
8_3_1 2. Paid for, item=$v, quantity=$v, amount=$v
(3, 0) 1. Start check out
5. Check out done
(5, 1) 4. Check out, total amount=$v
(8, 3) 2. Paid for, item=$v, quantity=$v, amount=$v
3. Paid for, item=$v, quantity=$v, amount=$v
11. Reconcile
5_0_1 Start processing for user Jen
5_0_2 Start processing for user Tom
5_0_3 Start processing for user Henry
5_0_4 Start processing for user Jack
5_0_5 Start processing for user Peter
5_0_1 Start processing for user $v
12. Reconcile
(6, 2) User shopping basket contains: 1, 2
(7, 3) User shopping basket contains: 1, 2, 3
(8, 4) User shopping basket contains: 1, 2, 3, 4
6_2_1 User shopping basket contains: $v
6_2_1 User shopping basket contains: $v
7_3_1 User shopping basket contains: $v
8_4_1 User shopping basket contains: $v
14. Measuring the Performance
- Getting the Correct Execution Events
■ Simply searching for “printf” or “System.out” won’t work
■ We use
– Internationalization file
– Random sampling
15. Case Study
RIM App 1 723, 608
RIM App 2 1, 688, 876
LoadSim 67, 651
Blue Gene/L 2, 994, 986
■ 4 Applications
■ Other similar log abstraction tools
– Terrify
– SLCT
18. Discussion
- SLCT Performance
■ SLCT performance is not high, because
– Infrequent log lines won’t abstract
– Does not further abstract line patterns
20. Conclusions
How many types of “errors” are
there?
Events Frequency
Error occurred during purchasing, item=$v 500
Error! Cannot retrieve catalogs for user=$v 300
Authentication error for user=$v 100
Our Log Abstraction
Approach
3_0_1 1. Start check out
3_0_2 5. Check out done
5_1_1 4. Check out, total amount=$v
8_3_1 2. Paid for, item=$v, quantity=$v, amount=$v
8_3_1 2. Paid for, item=$v, quantity=$v, amount=$v
Measuring the Performance Performance Comparison