Silent stores

SILENT STORES
HARISH CHETTY , SUJAY GANDHAM & POORNA CHANDRA VELADI

255 0 0 0 0 0 0 0
255 0 0 0 0 0 0 0
147 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
SILENT STORE
SILENT BYTES

RESEARCH QUESTION
 1] To determine the ratio of silent stores vs total stores in different benchmarks
 2] To determine clustering and pattern behavior of silent stores.
 To determine clustering behavior of only silent stores
 To determine clustering behavior of silent and non-silent stores

MODIFICATIONS
 We had to make two modifications to acquire the required data.
 1] Modified lsq_unit_impl.hh and transferred the data to a file (Store.txt)
 This file consists of 2 lines for each store.
 The first line was the Address where the store was being written to
 The second line was the Data which the store was about to write
 2] Modified packet.hh and transferred the data to a file (Cache.txt)
 This file consists of 4 lines for each packet
 The first line was the Address where the packet was writing
 The second line was the number of bytes being written
 The third line was the old data at the destination
 The fourth line was the new data being written at the destination

Addr : 0x1d5cf8
Data : 0x0
Addr : 0x1d5cf0
Data : 0x248
Addr : 0x1d5ce8
Data : 0x231
Addr : 0x1d5ce0
Data : 0x0
Addr : 0x1d5cd8
Data : 0x0
Addr : 0x1d5cf8
Size : 8
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Addr : 0x1d5cf0
Size : 8
0 0 0 0 0 0 0 0
248 0 0 0 0 0 0 0
Addr : 0x1d5ce8
Size : 8
0 0 0 0 0 0 0 0
231 0 0 0 0 0 0 0
Addr : 0x1d5ce0
Size : 8
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Addr : 0x1d5cd8
Size : 8
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Cache.txt
Store.txt

SETUP
 All the benchmarks were tested with 8KB L1 Cache (4-Way Set Associative/ 64 byte line size)
 All the tests were carried out on detailed cpu .
 Enormous amount of time was consumed to run each test.
 To speedup we used cloud computers to parallelize the operation.
 All the computers were 4-Core / 8 GB RAM and 80 GB SSD.
 The time range to complete benchmarks was between 33 minutes (soplex) to 3897 minutes
(omnetpp)
 There were many which did not complete (Time range was > 6000 minutes)

PROCESSING DATA
 Processing the data was very difficult!
 The file sizes were much more larger than main memory.
 Impossible to read them and carry out any sort of mapping or modification.
 File sizes were in order of > 25 GB for some benchmarks
 A lot of amount of coding!
 Two different forms of lazy reading
 Sampling logic for plotting
 Lazy selective sorting

SILENT STORE RATIO
Configuration Total Stores Silent Stores Ratio Status
specrand_i_X86_8KB_4_64 11993059 5939535 0.495248 Completed
povray_X86_8KB_4_64 2006962 1060460 0.528391 Completed
soplex_X86_8KB_4_64 5855911 2174472 0.371329 Completed
perlbench_X86_8KB_4_64 322898 77418 0.23976 Completed
gobmk_X86_8KB_4_64 3320091 3195427 0.962452 Completed
libquantum_X86_8kBKB_4_64 13980555 2106616 0.150682 Completed
bzip2_X86_8kB_4_64 226490984 24108946 0.106445 Completed
gamess_X86_8kB_4_64 333742515 60388278 0.180943 Completed
omnetpp_X86_8KB_4_64 333742515 60388278 0.180943 Completed
gcc_X86_8KB_4_64 72284661 31533701 0.436243 Aborted
namd_X86_8KB_4_64 169225684 76474519 0.451908 Incomplete
lbm_X86_8KB_4_64 371077787 172324400 0.464389 Incomplete
mcf_X86_8kBKB_4_64 98439312 22986711 0.233511 Incomplete
milc_X86_8kBKB_4_64 286986509 17784410 0.0619694 Incomplete

SILENT BYTE RATIO
Configuration Total Store Bytes Silent Bytes Ratio Status
specrand_i_X86_8KB_4_64 76518563 61601237 0.80505 Completed
povray_X86_8KB_4_64 15282175 13221672 0.86517 Completed
soplex_X86_8KB_4_64 36234532 24984419 0.68952 Completed
perlbench_X86_8KB_4_64 2311327 1738174 0.752024 Completed
gobmk_X86_8KB_4_64 21637745 21249594 0.982061 Completed
libquantum_X86_8kBKB_4_64 109697353 96032613 0.875432 Completed
bzip2_X86_8kB_4_64 742892581 458953854 0.617793 Completed
gamess_X86_8kB_4_64 2422301950 1704319167 0.703595 Completed
omnetpp_X86_8KB_4_64 535292751 434227897 0.811197 Completed
gcc_X86_8KB_4_64 2422301950 1704319167 0.703595 Aborted
namd_X86_8KB_4_64 1082700667 903980569 0.834931 Incomplete
lbm_X86_8KB_4_64 2911874103 1978336222 0.679403 Incomplete
mcf_X86_8kBKB_4_64 752304117 573852760 0.762794 Incomplete
milc_X86_8kBKB_4_64 ??? ??? ??? Incomplete

PLOTTING DATA
 Plotting the stores was necessary to determine clustering behavior
 The first idea was to plot each and every store vs store number.
 This was impossible to do as the number of stores was enormous
 We did not have enough main memory to create such a plot
 Even if were able to plot it, the information would be practically useless due to the scale.
 Created a sampling technique
 Divided the entire store subspace into 500 subparts
 Plotted only the first store in each subpart.
 Created charts using this via python
 There was still one major problem!!!

RUN LENGTH ENCODING
 Had to determine a new idea to identify clusters.
 We noticed that there were only 2 conditions for stores  Silent vs Non-Silent
 Which is equivalent to True or False Condition (1’s and 0’s)
 Thus logically our data was a very large string of binary data.
 This was similar to jpeg images where data compression is always used in such conditions.
 It was possible to apply the same idea here of Run Length Encoding.
 Since storing the entire RLE was also not feasible, we capped it at 200.
 To make sure silent stores were not dominated by non-silent, we did 2 forms of RLE
 1] Top 200 RLE of both silent and non-silent stores
 2] Top 200 RLE of only silent stores.

1111111111000001111111111000111111111100000111110001110001111111111111111111100000000
000000
10 X 1
05 X 0
10 X 1
03 X 0
10 X 1
05 X 0
05 X 1
03 X 0
03 X 1
03 X 0
20 X 1
14 X 0
20 X 1
14 X 0
10 X 1
10 X 1
10 X 1
05 X 0
05 X 0
05 X 1
03 X 0
03 X 0
03 X 0
03 X 1
Sorted
20 X 1
14 X 0
10 X 1
10 X 1
10 X 1
Trimmed
Example RLE of size 5

Type Length
0 1865497
0 1799967
0 1465497
0 1399967
0 1065499
0 999969
0 999967
0 740025
0 674501
0 366149
0 342447
Type Length
0 263
1 152
0 39
0 30
0 28
0 25
0 23
0 22
0 19
0 18
0 17
Type Length
1 1560002
1 1560002
1 22889
1 22528
0 12341
0 8823
0 5289
0 1368
0 1368
0 1368
0 1368
bzip2 specrand gobmk
Type Length
0 102406
1 84450
0 23942
0 11987
0 11986
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
mcf
T 3320091
S 3195427
T 11993059
S 5939535
T 98439312
S 22986711
T 226490984
S 24108946

Type Length
1 65538
1 5576
1 5460
1 4200
1 3288
1 3260
1 3138
1 3094
1 2965
1 2962
1 2814
Type Length
1 152
1 15
1 15
1 14
1 14
1 14
1 14
1 14
1 14
1 14
1 14
Type Length
1 1560002
1 1560002
1 22889
1 22528
1 152
1 107
1 58
1 58
1 58
1 58
1 58
bzip2 specrand gobmk
Type Length
1 84450
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
1 11972
1 11972
mcf
T 3320091
S 3195427
T 11993059
S 5939535
T 98439312
S 22986711
T 226490984
S 24108946

CONCLUSION
 Amount of silent stores are significant in almost all benchmarks.
 There is also a requirement to focus on silent bytes.
 Silent stores do show some amount of observable relation in programs.
 More evaluation is necessary to determine in which phase of the program the sequences happen.
 Also it is necessary to evaluate how the nature of the program impacts silent stores.

Silent stores

Recommended

Recommended

More Related Content

Similar to Silent stores

Similar to Silent stores (20)

Recently uploaded

Recently uploaded (20)

Silent stores