Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Leveraging Performance Counters
and Execution Logs to Diagnose 
Memory‐Related Performance Issues
Mark D. Syer, Zhen Ming ...
2
Failures in ULS systems are typically 
due to performance issues
3
4
“...triggered a latent memory leak… By Monday
morning, the rate of memory loss became quite
high and consumed enough memor...
Load testing 
may detect 
failures before 
they occur in 
the field
6
7
Performance analysts collect
counters & logs
0
5
10
15
20
25
30
0
100
200
300
400
500
600
700
800
900
1000
Memory Usage
Time
8
Memory 
Leak!
Diagnosing memory issues 
...
Diagnosing 
memory‐issues 
is difficult
9
Huge amount of data
Rapidly evolving systems
0
5
10
15
20
25
30
0
100
200
300
400
500
600
700
800
900
1000
Memory Usage
Time
10
Combining counters 
and logs is difficu...
Generate
Signatures
Detect
Outliers
Inspect
Outliers
Our approach identifies the events 
causing performance issues
11
0
5
10
15
20
00:00 00:08 00:16 00:24
Memory (MB)
Time
12
We generate a signature each time 
memory is sampled
Abstract log lines to events
00:01, Alice starts a conversation with Bob
00:01, Alice says `hi' to Bob
00:02, Alice says `...
00:00, 5MB
00:08, 15MB
00:16, 15MB
00:24, 5MB
Combine the counters and events
00:01, USER starts a conversation with USER ...
Count the events and calculate the 
memory delta in each time interval
00:08 00:16 00:24
USER starts a conversation with U...
Detect
Outliers
Inspect
Outliers
We identify and inspect 
outlying signatures
16
Can we diagnose...
17
Memory bloat?
Memory leaks?
Memory spikes?
Effort ReductionEffort ReductionPrecision
18
Our approach flags events
with high precision
0
20
40
60
80
100
Memory bloat Memory leak Memory spike
Precision
19
+80%
Effort ReductionPrecision
20
Precision
+80%
Our approach flags a small number
of events for expert analysis
0
1,000
2,000
3,000
4,000
5,000
6,000
# Log Lines # Flagge...
Our approach flags a small number
of events for expert analysis
99.9
99.92
99.94
99.96
99.98
100
Memory bloat Memory leak ...
Effort Reduction
23
>99.98%+80%
Precision
+80%
Precision
24
Upcoming SlideShare
Loading in …5
×

1

Share

Download to read offline

Leveraging performance counters and execution logs to diagnose memory related performance issues

Download to read offline

Leveraging performance counters and execution logs to diagnose memory related performance issues

  1. 1. Leveraging Performance Counters and Execution Logs to Diagnose  Memory‐Related Performance Issues Mark D. Syer, Zhen Ming Jiang, Meiyappan Nagappan,  Ahmed E. Hassan, Mohamed Nasser and Parminder Flora mdsyer@cs.queensu.ca 1
  2. 2. 2
  3. 3. Failures in ULS systems are typically  due to performance issues 3
  4. 4. 4
  5. 5. “...triggered a latent memory leak… By Monday morning, the rate of memory loss became quite high and consumed enough memory on the affected storage servers that they were unable to keep up with normal request handling processes.” 5
  6. 6. Load testing  may detect  failures before  they occur in  the field 6
  7. 7. 7 Performance analysts collect counters & logs
  8. 8. 0 5 10 15 20 25 30 0 100 200 300 400 500 600 700 800 900 1000 Memory Usage Time 8 Memory  Leak! Diagnosing memory issues  requires counters and logs
  9. 9. Diagnosing  memory‐issues  is difficult 9 Huge amount of data Rapidly evolving systems
  10. 10. 0 5 10 15 20 25 30 0 100 200 300 400 500 600 700 800 900 1000 Memory Usage Time 10 Combining counters  and logs is difficult Memory  Leak!
  11. 11. Generate Signatures Detect Outliers Inspect Outliers Our approach identifies the events  causing performance issues 11
  12. 12. 0 5 10 15 20 00:00 00:08 00:16 00:24 Memory (MB) Time 12 We generate a signature each time  memory is sampled
  13. 13. Abstract log lines to events 00:01, Alice starts a conversation with Bob 00:01, Alice says `hi' to Bob 00:02, Alice says `are you busy?' to Bob 00:11, Bob says `yes' to Alice 00:12, Alice says `ok' to Bob 00:18, Alice ends a conversation with Bob 13
  14. 14. 00:00, 5MB 00:08, 15MB 00:16, 15MB 00:24, 5MB Combine the counters and events 00:01, USER starts a conversation with USER  00:01, USER says MSG to USER  00:02, USER says MSG to USER  00:11, USER says MSG to USER  00:12, USER says MSG to USER  00:18, USER ends a conversation with USER  14
  15. 15. Count the events and calculate the  memory delta in each time interval 00:08 00:16 00:24 USER starts a conversation with USER  1 0 0 USER says MSG to USER  2 2 0 USER ends a conversation with USER 0 0 1 Δ Memory 10MB 0 ‐10MB 15
  16. 16. Detect Outliers Inspect Outliers We identify and inspect  outlying signatures 16
  17. 17. Can we diagnose... 17 Memory bloat? Memory leaks? Memory spikes?
  18. 18. Effort ReductionEffort ReductionPrecision 18
  19. 19. Our approach flags events with high precision 0 20 40 60 80 100 Memory bloat Memory leak Memory spike Precision 19
  20. 20. +80% Effort ReductionPrecision 20 Precision +80%
  21. 21. Our approach flags a small number of events for expert analysis 0 1,000 2,000 3,000 4,000 5,000 6,000 # Log Lines # Flagged Events 21 5,303 1 99.98%
  22. 22. Our approach flags a small number of events for expert analysis 99.9 99.92 99.94 99.96 99.98 100 Memory bloat Memory leak Memory spike 22
  23. 23. Effort Reduction 23 >99.98%+80% Precision +80% Precision
  24. 24. 24
  • pmanoprakash

    Mar. 30, 2017

Views

Total views

540

On Slideshare

0

From embeds

0

Number of embeds

9

Actions

Downloads

8

Shares

0

Comments

0

Likes

1

×