Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Daniel dauwe ece 561 Benchmarking Results
1.
2. Presentation Outline
• Project Goals
• Tools for Benchmarking:
• Performance counters, PAPI,
• HPC Toolkit, Phoronix Test Suits,
• Power Measurement
• How testing was accomplished
• List of additional data points for application to processor affinity
• A simple continuation of Ryan’s test work
• Results from Memory/Cache Interference Testing for multiple applications
run simultaneously pinned to specific cores
3. Project Goals
• Benchmarking Processors
– Monitor both performance counters and the system's power
usage
– Gathering more data for looking at application affinity for
performance on a particular processor architecture
• Memory Intensive Applications
• CPU Intensive Applications
– Analyze the Interaction/Interference of multiple applications run
simultaneously on different cores of the same processor
• This data collection is intermediate work for future unspecified
projects
4. Performance Counters and PAPI
• Performance counters
– Counters built into processor hardware that record the number
of occurrences of user specified events in hardware
• PAPI – Performance Application Programming Interface
– PAPI was developed in the hope of identifying bottlenecks in
current architectural development of high performance
computing
– A standardized list of performance counters available for most
processors
– PAPI makes it easier to have consistent tests across multiple
processor architectures
5. What do the Performance Counter
Measurements mean?
• Can mean different things based on which counters are being
monitored Ex:
– PAPI_L1_DCA - Level 1 data cache accesses
– PAPI_FAD_INS - Floating point add instructions
– PAPI_L2_DCM - Level 2 data cache misses
• The raw count data provided by the Performance Counter will need
to be meaningfully interpreted by the user
6. Matching Performance counters to Processor
Architectures
• Performance Counters used for these tests :
– PAPI_TOT_INS – Total Instructions Executed
– PAPI_L2_TCM – Data and Instruction Level 2 Cache Misses
• These should be pretty universally available across different
processor architectures
• Future inclusion of other tests may require other Performance
Counters, but available Performance Counters vary greatly between
processor architectures…
7. HPC Toolkit
• “An Integrated suite of tools for measurement and analysis of
program performance”
• Essentially
– HPC Toolkit makes it easier to interface with the local machine's
performance counters
– Makes collecting program performance data easier
8. Phoronix Test Suite
• Phoronix Provides lots of test applications capable of testing many
aspects of processor performance
– Phoronix tests are responsible for all of the benchmarking data
gathered for this presentation
• However many other groups write application suites useful for
benchmarking
– SPEC CPU2000 / 2006
– PARSEC
• Several resources such as “OpenBenchmarking.org” provide a
substantial amount of results from tests run from these suites on
many processor architectures
– This could prove to be a useful resource, however they do not
include information about power usage
9. Applications used for testing CrossCore cache interference
• C-Ray
– A Ray Tracing Program
– CPU Intensive
– Many Floating Point Calculation Operations
– Relatively Little Memory Access
• Ramspeed
– Integer and Floating Point Writes and Reads to memory
– Memory Intensive
– More interaction with the caches
10. Monitoring Power Usage
• “Watts Up? PRO” power meter
– Measures power consumption from a single standard power
outlet
– Has a USB port to interface with a computer and dump recorded
power measurements
11. How tests were run
• Minimalist Ubuntu Operating System allows the processor's
attention to be dedicated to the test applications
– Terminal Based User Interface
– Unnecessary background processes not included in the
operating system
• Power usage and selected program counters are recorded and
saved while the various test applications are run.
• For Testing Interference between programs:
– “taskset” was used to pin the applications to specific processor
cores
– The applications were run concurrently, while performance
counter results were measured
12. Measuring Memory Interference
between Applications
•
How this is tested:
•
•
•
Simultaneously pin different types of applications to run only on specific cores in the
processor,
Then use performance counters and the power meter to measure the interference
Interference could be defined as:
•
•
Increase in application execution time
•
•
An increase in the number of cache misses
Possibly defined by an increase in power consumption
Test plan:
•
Tests were run:
• First on an AMD Turion II Dual-Core M520 Processor (2 cores, 5 P-states)
• Later also on an Intel Pentium Dual Core CPU (2 cores, 4 P-states)
•
Run control tests for running each processor alone (pinned to a single core )
•
Run the tests together and analyze the differences
13. Control Results:
Intel Pentium dual CPU T2330
Intel Pentium Dual
Core: C-Ray L2 Cache
Miss Control Results
Intel Pentium Dual
Core : C-Ray Execution
Time Control Results
400
2000
50000
300
1500
1000
CPU Control
Test
500
40000
200
CPU Control
Test
100
1
2
0
3
1
2
3
0
Intel Pentium Dual Core:
Ramspeed L2 Cache
Miss Control Results
54100
220
200
180
160
140
0
1
2
3
1
2
3
Intel Pentium Dual
Core Ramspeed Power
Usage Control Results
10000
9500
54050
Memory
Control Test
CPU Control
Energy
20000
0
0
Intel Pentium Dual
Core: Ramspeed
Execution Time…
30000
10000
0
0
Intel Pentium Dual
Core C-Ray Power
Usage Control Results
Memory
Control
Energy
9000
Memory
Control Test 8500
54000
53950
0
1
2
3
8000
0
1
2
3
14. Control Results:
AMD Turion II Dual Core Mobile M520
AMD Turion II DualCore C-ray Execution
Time control Results
582
581
580
579
578
577
576
AMD Turion II DualCore C-ray L2 Cache
Miss control Results
800
600
CPU Control
Test
400
200
0
0
1
2
3
4
0
1
2
3
4
AMD Turion II DualCore C-ray Power Usage
control Results
50000
40000
CPU Control 30000
Test
20000
10000
0
CPU Control
Energy
0
AMD Turion II DualCore Ramspeed
Execution Time
control Results
82
80
78
76
74
AMD Turion II DualCore Ramspeed L2
Cache Miss control
Results
4600
Memory
Control Test
0
1
2
3
4
1
2
3
4
AMD Turion II DualCore Ramspeed Power
Usage control Results
15000
10000
4400
Memory
Control Test
4200
4000
0
1
2
3
4
Memory
Control Energy
5000
0
0
1
2
3
4
15. Taking a Closer Look at the AMD
Control Results from the previous slide:
•
It seems suspect that the results from the control test should produce the same execution time across all
p-states, even though this result for the C-Ray execution control test was consistent over multiple runs on
the AMD Turion II processor, a test execution on a secondary Intel Pentium Dual Core processor produced
results that were closer to what seems realistic:
C-Ray Execution
Time
(AMD First Run)
2500
C-ray Execution
Time
(AMD Second Run)
1800
1400
1400
1200
1200
1500
Control Test
1000
Interference
Test
CPU Control
Test
1000
CPU
Interference
Test
600
200
0
1
2
3
4
CPU
Interference
Test
600
400
200
0
0
CPU Control
Test
1000
800
800
400
500
1800
1600
1600
2000
C-ray Execution
Time (Intel Run)
0
1
2
3
4
0
0
1
2
3
16. Interference Results
(Joint Pinning Results on C-Ray):
Intel Pentium dual CPU T2330
•
The third column of data represents Adjusted interference results
C-ray Execution Time
Interference (Ramspeed
test on second core)
C-ray L2 Cache Misses
Interference (Ramspeed
test on second core)
1800
5000
1600
4500
1400
4000
CPU Control
Test
1200
45000
40000
35000
3500
CPU Control
Test
3000
1000
Original CPU
Interference
Test
800
600
Adjusted CPU
Interference
Test
400
200
Original CPU
Interference
Test
2500
2000
Adjusted CPU
Interference
Test
1500
1000
0
1
2
3
30000
CPU Control
Energy
25000
20000
1 CPU and 1
Memory
Interference
Test Energy
15000
10000
5000
500
0
Power usage for C-ray
and Ramspeed tests
run together
0
0
0
1
2
3
0
1
2
3
17. Interference Results
(Joint Pinning Results on Ramspeed):
Intel Pentium dual CPU T2330
Ramspeed Execution Time
Interference
(C-ray test on second core)
220
Ramspeed L2 Cache Misses
Interference
(C-ray test on second core)
54100
210
54050
200
190
Memory Control Test
180
Memory Interference
Test
54000
Memory Control Test
Memory Interference
Test
53950
170
53900
160
150
0
1
2
3
53850
0
1
2
3
18. Interference Results
(2 CPU Intensive Application Pinning Results):
Intel Pentium dual CPU T2330
C-ray Execution Time
Interference
(C-ray test on second
core)
1600
C-ray L2 Cache
Misses Interference
(C-ray test on second
core)
700
Power usage for 2 Cray tests running on
separate cores
45000
40000
1400
600
35000
1200
CPU Control
Test
1000
CPU
Interference
Test
800
600
CPU
Interference
Test
400
200
500
CPU Control
Test
400
CPU
Interference
Test
300
CPU
Interference
Test
200
100
0
0
1
2
3
30000
CPU Control
Energy
25000
20000
2 CPU
Interference
Test Energy
15000
10000
5000
0
0
0
1
2
3
0
1
2
3
19. Interference Results
(2 Memory Intensive Application
Pinning Results):
Intel Pentium dual CPU T2330
Ramspeed Execution
Time Interference
(Ramspeed test on
second core)
Ramspeed L2 Cache
Misses Interference
(Ramspeed test on
second core)
400
54250
350
54200
Power usage for 2
Ramspeed tests
running on separate
cores
45000
40000
35000
300
Memory
Control Test
250
54150
Memory
Control Test
54100
Memory
Interference
Test
200
150
Memory
Interference
Test
100
Memory
Interference
Test
Memory
Interference
Test
54000
53950
0
53900
0
1
2
3
Memory
Control
Energy
25000
54050
50
30000
20000
2 Memory
Interference
Test Energy
15000
10000
5000
0
0
1
2
3
0
1
2
3
20. Interference between
simultaneous applications:
Future Tests
The foundation scripts have been written so in the future it will
be very easy to add support for testing:
– Interference of 1 type of application pinned to N cores for a processor
with a substantial number of cores (ie >2)
– Interference from 2 CPU intensive or 2 Memory intensive test
applications
– Measure memory interference with M applications mapped to N cores
(Obviously N > 2)
– Testing a larger sample size might produce more interesting results
– Find which application to core mappings can provide the best
performance for specific architectures/cache sizes
21. Presentation Outline
• Project Goals
• Tools for Benchmarking:
• Performance counters, PAPI,
• HPC Toolkit, Phoronix Test Suits,
• Power Measurement
• How testing was accomplished
• List of additional data points for application to processor affinity
• A simple continuation of Ryan’s test work
• Results from Interference Testing for applications pinned to specific cores