The Address Trace Compressor usestwo methods for compression: Lossy compression (based on concept of phase) Lossless compression(bytesort) Simplicity Speed Traces can hog some memory Therefore, trace compression saves Memory efficiency space !!
What exactly does ATC do…??•Generates compact address traces..•The traces thatATC takes as input have thesimplest format that anaddresstrace can have : they arejust sequences of 64-bitvalues.
Lossless compression techniques:Mache lossless compression:The original trace is transformed by replacing Reversible.!!the full address with the difference between theaddress andthe previous address having the same label.
Continued.. Address sequence is easier toThe bytesort way of ATC: compress with byte level compressor, if bytes are unshuffled. 8 byte 8 byte 8 byte address address address ………….. N such address blocks. N bytes N bytes N bytes OUTPUT….!! ……….8 such blocks.1st byte of each block 2nd byte of each block
This is how bytesort usesunshuffling….Consider an example of 384 16 bit address:F200,F201,A100,F202,F203,A101,F204,F205,A102,...,F2FE,F2FF,A17F. Byte unshuffling givesF2,F2,A1,F2,F2,A1,...,F2,F2,A1|00,01,00,02,03,01,...,FE,FF,7FThe second block looks irregularTo expose more regularity in the second block,sort the addresses according to their high-order byte before outputting the second block.After sorting according to the high-order byte, the address sequence becomes:A100,A101,...,A17F,F200,F201,F202,...,F2FFand the unshuffled low-order byte block is00,01,...,7F,00,01,02,...,FF. Overall, the transformed traceis F2,F2,A1,F2,F2,A1,...,F2,F2,A1|00,01,...,7F,00,01,02,...,FF.The second block is now more regular because the sequence00,01,...,7F repeats twice.
Pragmatically..,The filtering is done with a level-1 instructioncache and alevel-1 data cache.The filteredaddress sequence contains missinginstruction and datablock addresses in sequential order.For compressing traces, we use the bzip2compressorWe measure for each compressedtrace the average number of bits peraddress (BPA).The smaller the BPA, the higher thecompression ratio.Total memory space taken by trace= mean ofBPAs x number of traces
Comparison :120 Overall, the small bytesort and the100 big bytesort are respectively 40% and 26% faster than80 TCgen. We also measured the60 contribution of bzip2 to the decompression40 time and found that bzip2 contributes about 50% of the20 decompression time for TCgen and almost 65% for bytesort. 0 TC gen bigbytesort smallbytesort bzip2
Lossy compression:Even with a very effective lossless compression For an image, lossy compression ismethod, acceptablesome address traces are inherently difficult to provided the compressed imagecompress, e.g., looks like the originalbecause the address sequence looks random. To image. This is achieved bymake these removing the details that thetraces significantly more compact, some form of eye cannot see or that the brainlossy compression ignoresis necessary.If there are N different addresses & Sorted byte-histograms permitif we partition the trace into intervals detecting when two intervals AIi consisting of L consecutive addresses each, and if L is much and Blarger than N, all intervals look alike and a single interval are likely to have similarcan temporal structures and providebe used to characterize the whole trace. In other words, we a wayreplace the original trace I1, I2, I3, ・ ・ ・ , Ik with a to transform addresses of A so ascompressed to imitate the spatiotemporaltrace I1, I1, I1, ・ ・ ・ , I1. The new trace is more compact structure of B.becausewe can represent the trace as a sequence of interval IDs
Let us consider an interval of L consecutive 64-bit addressesA(k) with k ∈ [1, L]. Each address A(k) is coded with eightbytes b[j](k) ∈ [0, 255] :A(k) =7 Distance D between∑ b[j](k) × 28×j intervals A & B isJ=0 D(A,B) = max j∈[0,7]We define the byte-histograms as follows : d(h’A *j+, h’ B[j])for j ∈ [0, 7] and i ∈ [0, 255], h[j](i) = …….(2)L∑ δi(b[j](k))K=1Where δi(x) = 1 if x = i 0 if x = iIn other words, the byte-histogram value h[j](i) is the numberof addresses in the interval whose byte of order j is equal to i(note that255∑ h[j](i) = L).i=0For a given j, the sorted byte histogramh[j] is obtained from h[j] by sorting the 256 valuesh[j](i) in decreasing order. That is,H’[j](i) = h[j](p[j](i)) ……(1)
The compressed trace consists of a set of chunks We only record inand an interval trace. A chunk is an interval of the the interval trace the factoriginal trace that interval 1 can be used to imitatethat we compress with a lossless compression scheme. interval 2, along with the(all chunks are compressed with the bytesort method) byte translations t[j]The interval trace is compressed with bzip2. Each time we (translations arecreate a chunk, we record an entry for it in a histogram table completely described with 8in memory, where we store the histograms for that chunk. × 256 bytes). Then weWhen the table is full, we evict the entry belonging to the oldest computechunk. We always create a chunk for the first interval in a trace. the distance between theAt the end of the first interval, we compute the histograms for third interval and thethe interval and we store them in the histogram table. previous chunks.Then we And so on. When severalcompute the histograms for the second interval and we compute chunks match the currentthe distance between intervals 1 and 2 using formula (2). If the interval, wedistance is greater than the threshold, we create a chunk for the imitate the interval using thesecond interval and we store the histograms of the second interval chunk having the smallestin the histogram table. Otherwise, if the distance is less distancethan the threshold, we do not create a chunk. with the interval. The fewer chunks are created, the more compact the trace
THECOMPRESSION RATIO ACHIEVED WITH LOSSYCOMPRESSION DEPENDS ON THE TRACELENGTH AND ON THE TRACECHARACTERISTICS. WITHOUT COMPRESSION, WE WOULD NEED 4 TERABYTES OF DISK SPACE TO STORE THE TRACES. WITH LOSSY COMPRESSION, THE 22 TRACES TAKE ONLY 9 GIGABYTES OF DISK SPACE.
The ATC compressor is written in C. Itconsists basicallyof 4 functions, atc open, atc close, atccode and atc decode.In the program of Figure 6, the atc open function iscalled with argument ’k’ which means lossy compression. Forlossless compression, the argument would be ’c’. The atc openfunction creates a directory, whose name is given by theargumentdirname, in which the compressed trace will be stored.The argument bz2 means that we want compressed chunks tohave the suffix .bz2 and the last argument is the command thatis used to compress bytesorted chunks. On this example, weuse bzip2, but we could use another compressor, like gzip. Afterthe compressed trace has been opened with atc open, thecompression is done by calling atc code for each 64-bit inputvalue. An example program for decompressing traces is shownin Figure 7. Here, the atc open function is called with theargument’d’ (decompression). The last argument is the commandfor decompressing bytesorted chunks.
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.