SlideShare a Scribd company logo
SILENT STORES
HARISH CHETTY , SUJAY GANDHAM & POORNA CHANDRA VELADI
255 0 0 0 0 0 0 0
255 0 0 0 0 0 0 0
147 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
SILENT STORE
SILENT BYTES
RESEARCH QUESTION
 1] To determine the ratio of silent stores vs total stores in different benchmarks
 2] To determine clustering and pattern behavior of silent stores.
 To determine clustering behavior of only silent stores
 To determine clustering behavior of silent and non-silent stores
MODIFICATIONS
 We had to make two modifications to acquire the required data.
 1] Modified lsq_unit_impl.hh and transferred the data to a file (Store.txt)
 This file consists of 2 lines for each store.
 The first line was the Address where the store was being written to
 The second line was the Data which the store was about to write
 2] Modified packet.hh and transferred the data to a file (Cache.txt)
 This file consists of 4 lines for each packet
 The first line was the Address where the packet was writing
 The second line was the number of bytes being written
 The third line was the old data at the destination
 The fourth line was the new data being written at the destination
Addr : 0x1d5cf8
Data : 0x0
Addr : 0x1d5cf0
Data : 0x248
Addr : 0x1d5ce8
Data : 0x231
Addr : 0x1d5ce0
Data : 0x0
Addr : 0x1d5cd8
Data : 0x0
Addr : 0x1d5cf8
Size : 8
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Addr : 0x1d5cf0
Size : 8
0 0 0 0 0 0 0 0
248 0 0 0 0 0 0 0
Addr : 0x1d5ce8
Size : 8
0 0 0 0 0 0 0 0
231 0 0 0 0 0 0 0
Addr : 0x1d5ce0
Size : 8
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Addr : 0x1d5cd8
Size : 8
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Cache.txt
Store.txt
SETUP
 All the benchmarks were tested with 8KB L1 Cache (4-Way Set Associative/ 64 byte line size)
 All the tests were carried out on detailed cpu .
 Enormous amount of time was consumed to run each test.
 To speedup we used cloud computers to parallelize the operation.
 All the computers were 4-Core / 8 GB RAM and 80 GB SSD.
 The time range to complete benchmarks was between 33 minutes (soplex) to 3897 minutes
(omnetpp)
 There were many which did not complete (Time range was > 6000 minutes)
PROCESSING DATA
 Processing the data was very difficult!
 The file sizes were much more larger than main memory.
 Impossible to read them and carry out any sort of mapping or modification.
 File sizes were in order of > 25 GB for some benchmarks
 A lot of amount of coding!
 Two different forms of lazy reading
 Sampling logic for plotting
 Lazy selective sorting
SILENT STORE RATIO
Configuration Total Stores Silent Stores Ratio Status
specrand_i_X86_8KB_4_64 11993059 5939535 0.495248 Completed
povray_X86_8KB_4_64 2006962 1060460 0.528391 Completed
soplex_X86_8KB_4_64 5855911 2174472 0.371329 Completed
perlbench_X86_8KB_4_64 322898 77418 0.23976 Completed
gobmk_X86_8KB_4_64 3320091 3195427 0.962452 Completed
libquantum_X86_8kBKB_4_64 13980555 2106616 0.150682 Completed
bzip2_X86_8kB_4_64 226490984 24108946 0.106445 Completed
gamess_X86_8kB_4_64 333742515 60388278 0.180943 Completed
omnetpp_X86_8KB_4_64 333742515 60388278 0.180943 Completed
gcc_X86_8KB_4_64 72284661 31533701 0.436243 Aborted
namd_X86_8KB_4_64 169225684 76474519 0.451908 Incomplete
lbm_X86_8KB_4_64 371077787 172324400 0.464389 Incomplete
mcf_X86_8kBKB_4_64 98439312 22986711 0.233511 Incomplete
milc_X86_8kBKB_4_64 286986509 17784410 0.0619694 Incomplete
SILENT BYTE RATIO
Configuration Total Store Bytes Silent Bytes Ratio Status
specrand_i_X86_8KB_4_64 76518563 61601237 0.80505 Completed
povray_X86_8KB_4_64 15282175 13221672 0.86517 Completed
soplex_X86_8KB_4_64 36234532 24984419 0.68952 Completed
perlbench_X86_8KB_4_64 2311327 1738174 0.752024 Completed
gobmk_X86_8KB_4_64 21637745 21249594 0.982061 Completed
libquantum_X86_8kBKB_4_64 109697353 96032613 0.875432 Completed
bzip2_X86_8kB_4_64 742892581 458953854 0.617793 Completed
gamess_X86_8kB_4_64 2422301950 1704319167 0.703595 Completed
omnetpp_X86_8KB_4_64 535292751 434227897 0.811197 Completed
gcc_X86_8KB_4_64 2422301950 1704319167 0.703595 Aborted
namd_X86_8KB_4_64 1082700667 903980569 0.834931 Incomplete
lbm_X86_8KB_4_64 2911874103 1978336222 0.679403 Incomplete
mcf_X86_8kBKB_4_64 752304117 573852760 0.762794 Incomplete
milc_X86_8kBKB_4_64 ??? ??? ??? Incomplete
PLOTTING DATA
 Plotting the stores was necessary to determine clustering behavior
 The first idea was to plot each and every store vs store number.
 This was impossible to do as the number of stores was enormous
 We did not have enough main memory to create such a plot
 Even if were able to plot it, the information would be practically useless due to the scale.
 Created a sampling technique
 Divided the entire store subspace into 500 subparts
 Plotted only the first store in each subpart.
 Created charts using this via python
 There was still one major problem!!!
hmmer_X86_8KB_4_64
gobmk_X86_8KB_4_64
lbm_X86_8KB_4_64
1
0
1
0
Incorrect
Sequence
RUN LENGTH ENCODING
 Had to determine a new idea to identify clusters.
 We noticed that there were only 2 conditions for stores  Silent vs Non-Silent
 Which is equivalent to True or False Condition (1’s and 0’s)
 Thus logically our data was a very large string of binary data.
 This was similar to jpeg images where data compression is always used in such conditions.
 It was possible to apply the same idea here of Run Length Encoding.
 Since storing the entire RLE was also not feasible, we capped it at 200.
 To make sure silent stores were not dominated by non-silent, we did 2 forms of RLE
 1] Top 200 RLE of both silent and non-silent stores
 2] Top 200 RLE of only silent stores.
1111111111000001111111111000111111111100000111110001110001111111111111111111100000000
000000
10 X 1
05 X 0
10 X 1
03 X 0
10 X 1
05 X 0
05 X 1
03 X 0
03 X 1
03 X 0
20 X 1
14 X 0
20 X 1
14 X 0
10 X 1
10 X 1
10 X 1
05 X 0
05 X 0
05 X 1
03 X 0
03 X 0
03 X 0
03 X 1
Sorted
20 X 1
14 X 0
10 X 1
10 X 1
10 X 1
Trimmed
Example RLE of size 5
Type Length
0 1865497
0 1799967
0 1465497
0 1399967
0 1065499
0 999969
0 999967
0 740025
0 674501
0 366149
0 342447
Type Length
0 263
1 152
0 39
0 30
0 28
0 25
0 23
0 22
0 19
0 18
0 17
Type Length
1 1560002
1 1560002
1 22889
1 22528
0 12341
0 8823
0 5289
0 1368
0 1368
0 1368
0 1368
bzip2 specrand gobmk
Type Length
0 102406
1 84450
0 23942
0 11987
0 11986
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
mcf
T 3320091
S 3195427
T 11993059
S 5939535
T 98439312
S 22986711
T 226490984
S 24108946
Type Length
1 65538
1 5576
1 5460
1 4200
1 3288
1 3260
1 3138
1 3094
1 2965
1 2962
1 2814
Type Length
1 152
1 15
1 15
1 14
1 14
1 14
1 14
1 14
1 14
1 14
1 14
Type Length
1 1560002
1 1560002
1 22889
1 22528
1 152
1 107
1 58
1 58
1 58
1 58
1 58
bzip2 specrand gobmk
Type Length
1 84450
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
1 11972
1 11972
mcf
T 3320091
S 3195427
T 11993059
S 5939535
T 98439312
S 22986711
T 226490984
S 24108946
CONCLUSION
 Amount of silent stores are significant in almost all benchmarks.
 There is also a requirement to focus on silent bytes.
 Silent stores do show some amount of observable relation in programs.
 More evaluation is necessary to determine in which phase of the program the sequences happen.
 Also it is necessary to evaluate how the nature of the program impacts silent stores.

More Related Content

Similar to Silent stores

.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
NETFest
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
Dongwon Kim
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentals
Chris Adkin
 
Jumping into heaven’s gate
Jumping into heaven’s gateJumping into heaven’s gate
Jumping into heaven’s gate
Yarden Shafir
 
Java on arm theory, applications, and workloads [dev5048]
Java on arm  theory, applications, and workloads [dev5048]Java on arm  theory, applications, and workloads [dev5048]
Java on arm theory, applications, and workloads [dev5048]
Aleksei Voitylov
 
Full scan frenzy at amadeus
Full scan frenzy at amadeusFull scan frenzy at amadeus
Full scan frenzy at amadeus
MongoDB
 
Advanced High-Performance Computing Features of the OpenPOWER ISA
 Advanced High-Performance Computing Features of the OpenPOWER ISA Advanced High-Performance Computing Features of the OpenPOWER ISA
Advanced High-Performance Computing Features of the OpenPOWER ISA
Ganesan Narayanasamy
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch mode
Chris Adkin
 
Scaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insertScaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insert
Chris Adkin
 
Heap Base Exploitation
Heap Base ExploitationHeap Base Exploitation
Heap Base Exploitation
UTD Computer Security Group
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
Jorge E. López de Vergara Méndez
 
Oracle Database InMemory
Oracle Database InMemoryOracle Database InMemory
Oracle Database InMemory
Jorge Barba
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
Chris Adkin
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the Budget
Memory Fabric Forum
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
Istvan Szukacs
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
Istvan Szukacs
 
embedded C.pptx
embedded C.pptxembedded C.pptx
embedded C.pptx
mohammedahmed539376
 
Project report
Project reportProject report
Project report
anjum mujawar mujawar
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL
Bernd Ocklin
 

Similar to Silent stores (20)

.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentals
 
Jumping into heaven’s gate
Jumping into heaven’s gateJumping into heaven’s gate
Jumping into heaven’s gate
 
Java on arm theory, applications, and workloads [dev5048]
Java on arm  theory, applications, and workloads [dev5048]Java on arm  theory, applications, and workloads [dev5048]
Java on arm theory, applications, and workloads [dev5048]
 
Full scan frenzy at amadeus
Full scan frenzy at amadeusFull scan frenzy at amadeus
Full scan frenzy at amadeus
 
Advanced High-Performance Computing Features of the OpenPOWER ISA
 Advanced High-Performance Computing Features of the OpenPOWER ISA Advanced High-Performance Computing Features of the OpenPOWER ISA
Advanced High-Performance Computing Features of the OpenPOWER ISA
 
An introduction to column store indexes and batch mode
An introduction to column store indexes and batch modeAn introduction to column store indexes and batch mode
An introduction to column store indexes and batch mode
 
Scaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insertScaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insert
 
Heap Base Exploitation
Heap Base ExploitationHeap Base Exploitation
Heap Base Exploitation
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
 
Oracle Database InMemory
Oracle Database InMemoryOracle Database InMemory
Oracle Database InMemory
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the Budget
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
embedded C.pptx
embedded C.pptxembedded C.pptx
embedded C.pptx
 
Project report
Project reportProject report
Project report
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 

Silent stores

  • 1. SILENT STORES HARISH CHETTY , SUJAY GANDHAM & POORNA CHANDRA VELADI
  • 2. 255 0 0 0 0 0 0 0 255 0 0 0 0 0 0 0 147 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SILENT STORE SILENT BYTES
  • 3. RESEARCH QUESTION  1] To determine the ratio of silent stores vs total stores in different benchmarks  2] To determine clustering and pattern behavior of silent stores.  To determine clustering behavior of only silent stores  To determine clustering behavior of silent and non-silent stores
  • 4. MODIFICATIONS  We had to make two modifications to acquire the required data.  1] Modified lsq_unit_impl.hh and transferred the data to a file (Store.txt)  This file consists of 2 lines for each store.  The first line was the Address where the store was being written to  The second line was the Data which the store was about to write  2] Modified packet.hh and transferred the data to a file (Cache.txt)  This file consists of 4 lines for each packet  The first line was the Address where the packet was writing  The second line was the number of bytes being written  The third line was the old data at the destination  The fourth line was the new data being written at the destination
  • 5. Addr : 0x1d5cf8 Data : 0x0 Addr : 0x1d5cf0 Data : 0x248 Addr : 0x1d5ce8 Data : 0x231 Addr : 0x1d5ce0 Data : 0x0 Addr : 0x1d5cd8 Data : 0x0 Addr : 0x1d5cf8 Size : 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Addr : 0x1d5cf0 Size : 8 0 0 0 0 0 0 0 0 248 0 0 0 0 0 0 0 Addr : 0x1d5ce8 Size : 8 0 0 0 0 0 0 0 0 231 0 0 0 0 0 0 0 Addr : 0x1d5ce0 Size : 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Addr : 0x1d5cd8 Size : 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Cache.txt Store.txt
  • 6. SETUP  All the benchmarks were tested with 8KB L1 Cache (4-Way Set Associative/ 64 byte line size)  All the tests were carried out on detailed cpu .  Enormous amount of time was consumed to run each test.  To speedup we used cloud computers to parallelize the operation.  All the computers were 4-Core / 8 GB RAM and 80 GB SSD.  The time range to complete benchmarks was between 33 minutes (soplex) to 3897 minutes (omnetpp)  There were many which did not complete (Time range was > 6000 minutes)
  • 7. PROCESSING DATA  Processing the data was very difficult!  The file sizes were much more larger than main memory.  Impossible to read them and carry out any sort of mapping or modification.  File sizes were in order of > 25 GB for some benchmarks  A lot of amount of coding!  Two different forms of lazy reading  Sampling logic for plotting  Lazy selective sorting
  • 8. SILENT STORE RATIO Configuration Total Stores Silent Stores Ratio Status specrand_i_X86_8KB_4_64 11993059 5939535 0.495248 Completed povray_X86_8KB_4_64 2006962 1060460 0.528391 Completed soplex_X86_8KB_4_64 5855911 2174472 0.371329 Completed perlbench_X86_8KB_4_64 322898 77418 0.23976 Completed gobmk_X86_8KB_4_64 3320091 3195427 0.962452 Completed libquantum_X86_8kBKB_4_64 13980555 2106616 0.150682 Completed bzip2_X86_8kB_4_64 226490984 24108946 0.106445 Completed gamess_X86_8kB_4_64 333742515 60388278 0.180943 Completed omnetpp_X86_8KB_4_64 333742515 60388278 0.180943 Completed gcc_X86_8KB_4_64 72284661 31533701 0.436243 Aborted namd_X86_8KB_4_64 169225684 76474519 0.451908 Incomplete lbm_X86_8KB_4_64 371077787 172324400 0.464389 Incomplete mcf_X86_8kBKB_4_64 98439312 22986711 0.233511 Incomplete milc_X86_8kBKB_4_64 286986509 17784410 0.0619694 Incomplete
  • 9. SILENT BYTE RATIO Configuration Total Store Bytes Silent Bytes Ratio Status specrand_i_X86_8KB_4_64 76518563 61601237 0.80505 Completed povray_X86_8KB_4_64 15282175 13221672 0.86517 Completed soplex_X86_8KB_4_64 36234532 24984419 0.68952 Completed perlbench_X86_8KB_4_64 2311327 1738174 0.752024 Completed gobmk_X86_8KB_4_64 21637745 21249594 0.982061 Completed libquantum_X86_8kBKB_4_64 109697353 96032613 0.875432 Completed bzip2_X86_8kB_4_64 742892581 458953854 0.617793 Completed gamess_X86_8kB_4_64 2422301950 1704319167 0.703595 Completed omnetpp_X86_8KB_4_64 535292751 434227897 0.811197 Completed gcc_X86_8KB_4_64 2422301950 1704319167 0.703595 Aborted namd_X86_8KB_4_64 1082700667 903980569 0.834931 Incomplete lbm_X86_8KB_4_64 2911874103 1978336222 0.679403 Incomplete mcf_X86_8kBKB_4_64 752304117 573852760 0.762794 Incomplete milc_X86_8kBKB_4_64 ??? ??? ??? Incomplete
  • 10. PLOTTING DATA  Plotting the stores was necessary to determine clustering behavior  The first idea was to plot each and every store vs store number.  This was impossible to do as the number of stores was enormous  We did not have enough main memory to create such a plot  Even if were able to plot it, the information would be practically useless due to the scale.  Created a sampling technique  Divided the entire store subspace into 500 subparts  Plotted only the first store in each subpart.  Created charts using this via python  There was still one major problem!!!
  • 15. RUN LENGTH ENCODING  Had to determine a new idea to identify clusters.  We noticed that there were only 2 conditions for stores  Silent vs Non-Silent  Which is equivalent to True or False Condition (1’s and 0’s)  Thus logically our data was a very large string of binary data.  This was similar to jpeg images where data compression is always used in such conditions.  It was possible to apply the same idea here of Run Length Encoding.  Since storing the entire RLE was also not feasible, we capped it at 200.  To make sure silent stores were not dominated by non-silent, we did 2 forms of RLE  1] Top 200 RLE of both silent and non-silent stores  2] Top 200 RLE of only silent stores.
  • 16. 1111111111000001111111111000111111111100000111110001110001111111111111111111100000000 000000 10 X 1 05 X 0 10 X 1 03 X 0 10 X 1 05 X 0 05 X 1 03 X 0 03 X 1 03 X 0 20 X 1 14 X 0 20 X 1 14 X 0 10 X 1 10 X 1 10 X 1 05 X 0 05 X 0 05 X 1 03 X 0 03 X 0 03 X 0 03 X 1 Sorted 20 X 1 14 X 0 10 X 1 10 X 1 10 X 1 Trimmed Example RLE of size 5
  • 17. Type Length 0 1865497 0 1799967 0 1465497 0 1399967 0 1065499 0 999969 0 999967 0 740025 0 674501 0 366149 0 342447 Type Length 0 263 1 152 0 39 0 30 0 28 0 25 0 23 0 22 0 19 0 18 0 17 Type Length 1 1560002 1 1560002 1 22889 1 22528 0 12341 0 8823 0 5289 0 1368 0 1368 0 1368 0 1368 bzip2 specrand gobmk Type Length 0 102406 1 84450 0 23942 0 11987 0 11986 1 11973 1 11973 1 11973 1 11973 1 11973 1 11973 mcf T 3320091 S 3195427 T 11993059 S 5939535 T 98439312 S 22986711 T 226490984 S 24108946
  • 18. Type Length 1 65538 1 5576 1 5460 1 4200 1 3288 1 3260 1 3138 1 3094 1 2965 1 2962 1 2814 Type Length 1 152 1 15 1 15 1 14 1 14 1 14 1 14 1 14 1 14 1 14 1 14 Type Length 1 1560002 1 1560002 1 22889 1 22528 1 152 1 107 1 58 1 58 1 58 1 58 1 58 bzip2 specrand gobmk Type Length 1 84450 1 11973 1 11973 1 11973 1 11973 1 11973 1 11973 1 11973 1 11973 1 11972 1 11972 mcf T 3320091 S 3195427 T 11993059 S 5939535 T 98439312 S 22986711 T 226490984 S 24108946
  • 19. CONCLUSION  Amount of silent stores are significant in almost all benchmarks.  There is also a requirement to focus on silent bytes.  Silent stores do show some amount of observable relation in programs.  More evaluation is necessary to determine in which phase of the program the sequences happen.  Also it is necessary to evaluate how the nature of the program impacts silent stores.