This document discusses research into analyzing the behavior of silent stores in CPU benchmarks. The key findings are:
1) Significant ratios (up to 98%) of total stores were found to be silent in most benchmarks tested.
2) Due to the enormous amount of store data, innovative processing techniques like run length encoding had to be used to identify clusters and patterns in the data.
3) Preliminary analysis found evidence of clustering in silent stores for some benchmarks, but more evaluation is needed to understand how program characteristics impact silent store behavior.
.NET has accustomed us to writing code quickly and without thinking about what is going on underneath. Unfortunately, convenience comes with additional cost. It is very easy to lose the performance of our component through simple statement or code block which behaves differently than we thought. I will focus on the everyday performance traps, which can spoil your hard effort.
The old days of 32bit applications are long bygone, nowadays most Operating Systems are running in a 64bit environment, requiring 64bit applications.
So how can a 64bit Operating System run a 32bit legacy Application?
The native 64bit environment cannot directly support the execution of a 32bit Application.
32bit Applications expect several surrounding pillars which help it perform necessary actions,
and those no longer exist in a 64bit environment.
However, in practice Windows contains many secrets, and one of those secrets is the WoW64
subsystem.
The Wow64 Subsystem supplies a natural environment for the legacy 32bit Application and enables anyone to run them on newer 64bit Operating Systems without any trouble.
How the subsystem actually does this remains a question to many.
Any Application, whatever its type, begins its execution in 64bit mode.
The Operating System then relentlessly moves forward to the 32bit world by loading the WoW64 Subsystem, in order to let the 32bit Application execute freely.
In this talk we will dive into the WoW64 Subsystem and explain how a 32bit Application performs 64bit (native) system calls.
We will also see how it is possible to exploit this mechanism in order to create smarter malware that evade Next-Generation and Previous-Generation AV products.
Java on arm theory, applications, and workloads [dev5048]Aleksei Voitylov
Although ARM processors are almost always viewed as having been designed for the embedded market, several vendors are making a bet and building server CPUs that contend with Intel in cloud deployments. With the presence of the Java ARM port and a wide variety of applications in the Java ecosystem able to run on ARM CPUs, the real question becomes which workloads are best suited to the ARM servers niche and which metrics can be optimized for using ARM servers. This presentation explores the status of Java and the Java ecosystem on ARM, together with the Java ARM port features and performance of specific workloads. Some focus is on the recent changes in the Java ARM port, which the speaker’s company contributes to.
Amadeus processes 1.6bn transactions per day in its data centers, pushing databases & hardware to the limit on a daily basis.
In this talk, we will present how Amadeus Revenue Accounting team implemented a search & reporting application used by airlines to track their cash flows interactively.
Application features (1) a user-friendly graphical interface managing & running unpredictable queries on (2) a MongoDB data warehouse, scanning several years of data synced in real-time with our main operational database.
We will cover how we designed our MongoDB cluster & servers to cope with an unpredictable OLAP effort (interactive results expected on TBs of data without any index), by enforcing parallel processing through microsharding.
Talk will also deal with the integration of Ops Manager API in our corporate monitoring software, allowing our global ops to operate MongoDB clusters with their existing tools.
Advanced High-Performance Computing Features of the OpenPOWER ISAGanesan Narayanasamy
Power ISA processors have a long history of offering superior features for HPC applications. Well known examples include POWER3, used in the ASCI White supercomputer, various PowerPC processors used in the Blue Gene family of massively parallel computers, and POWER9, present in the leading supercomputers of today, Summit and Sierra. OpenPOWER ISA has enabled open access to many of these features. IBM's most recent contribution to OpenPOWER ISA, in the form of Power ISA Version 3.1, includes the Matrix-Multiply Assist (MMA) instructions. The MMA instructions are designed to deliver additional performance both for classical high-performance computing, in the space of scientific and technical computing, and for the increasingly important space of business analytics. In addition, the Open Memory Interface (OMI), also developed by IBM, opens new levels of memory bandwidth and capacity for the most demanding applications. Our goal is to raise awareness of and interest in these new features, which we believe can lead to further research in processor architecture and programming environments. Some of the most promising application areas include graph algorithms, classical machine learning and deep learning.
Scaling sql server 2014 parallel insertChris Adkin
A slide deck on how to get the best possible performance out of the parallel insert feature introduced in SQL Server 2014 as presented at SQL Bits XIV.
We introduce the fundamentals of dynamic memory allocation and highlight several exploitable properties. These ideas are put into practice in a set of heap overflow challenges from exploit-exercise.com's Protostar VM. We walk through the first three. Other uses of heap space such as heap spraying are mentioned.
New Ethernet standards, such as 40 GbE or 100 GbE, are already being deployed commercially along with their corresponding Network Interface Cards (NICs) for the servers. However, network measurement solutions are lagging behind: while there are several tools available for monitoring 10 or 20 Gbps networks, higher speeds pose a harder challenge that requires more new ideas, different from those applied previously, and so there are less applications available. In this paper, we show a system capable of capturing, timestamping and storing 40 Gbps network traffic using a tailored network driver together with Non-Volatile Memory express (NVMe) technology and the Storage Performance Development Kit (SPDK) framework. Also, we expose core ideas that can be extended for the capture at higher rates: a multicore architecture capable of synchronization with minimal overhead that reduces disordering of the received frames, methods to filter the traffic discarding unwanted frames without being computationally expensive, and the use of an intermediate buffer that allows simultaneous access from several applications to the same data and efficient disk writes. Finally, we show a testbed for a reliable benchmarking of our solution using custom DPDK traffic generators and replayers, which
have been made freely available for the network measurement
community.
MemVerge Field CTO Yong Tian shows what memory expansion costs with an analysis of various server configurations with up to 8TB of tiered DRAM and CXL memory.
MySQL NDB Cluster 8.0 SQL faster than NoSQL Bernd Ocklin
MySQL NDB Cluster running SQL faster than most NoSQL databases. Benchmark results, comparisons and introduction into NDB's parallel distributed in-memory query engine. MySQL Day before FOSDEM 2020.
.NET has accustomed us to writing code quickly and without thinking about what is going on underneath. Unfortunately, convenience comes with additional cost. It is very easy to lose the performance of our component through simple statement or code block which behaves differently than we thought. I will focus on the everyday performance traps, which can spoil your hard effort.
The old days of 32bit applications are long bygone, nowadays most Operating Systems are running in a 64bit environment, requiring 64bit applications.
So how can a 64bit Operating System run a 32bit legacy Application?
The native 64bit environment cannot directly support the execution of a 32bit Application.
32bit Applications expect several surrounding pillars which help it perform necessary actions,
and those no longer exist in a 64bit environment.
However, in practice Windows contains many secrets, and one of those secrets is the WoW64
subsystem.
The Wow64 Subsystem supplies a natural environment for the legacy 32bit Application and enables anyone to run them on newer 64bit Operating Systems without any trouble.
How the subsystem actually does this remains a question to many.
Any Application, whatever its type, begins its execution in 64bit mode.
The Operating System then relentlessly moves forward to the 32bit world by loading the WoW64 Subsystem, in order to let the 32bit Application execute freely.
In this talk we will dive into the WoW64 Subsystem and explain how a 32bit Application performs 64bit (native) system calls.
We will also see how it is possible to exploit this mechanism in order to create smarter malware that evade Next-Generation and Previous-Generation AV products.
Java on arm theory, applications, and workloads [dev5048]Aleksei Voitylov
Although ARM processors are almost always viewed as having been designed for the embedded market, several vendors are making a bet and building server CPUs that contend with Intel in cloud deployments. With the presence of the Java ARM port and a wide variety of applications in the Java ecosystem able to run on ARM CPUs, the real question becomes which workloads are best suited to the ARM servers niche and which metrics can be optimized for using ARM servers. This presentation explores the status of Java and the Java ecosystem on ARM, together with the Java ARM port features and performance of specific workloads. Some focus is on the recent changes in the Java ARM port, which the speaker’s company contributes to.
Amadeus processes 1.6bn transactions per day in its data centers, pushing databases & hardware to the limit on a daily basis.
In this talk, we will present how Amadeus Revenue Accounting team implemented a search & reporting application used by airlines to track their cash flows interactively.
Application features (1) a user-friendly graphical interface managing & running unpredictable queries on (2) a MongoDB data warehouse, scanning several years of data synced in real-time with our main operational database.
We will cover how we designed our MongoDB cluster & servers to cope with an unpredictable OLAP effort (interactive results expected on TBs of data without any index), by enforcing parallel processing through microsharding.
Talk will also deal with the integration of Ops Manager API in our corporate monitoring software, allowing our global ops to operate MongoDB clusters with their existing tools.
Advanced High-Performance Computing Features of the OpenPOWER ISAGanesan Narayanasamy
Power ISA processors have a long history of offering superior features for HPC applications. Well known examples include POWER3, used in the ASCI White supercomputer, various PowerPC processors used in the Blue Gene family of massively parallel computers, and POWER9, present in the leading supercomputers of today, Summit and Sierra. OpenPOWER ISA has enabled open access to many of these features. IBM's most recent contribution to OpenPOWER ISA, in the form of Power ISA Version 3.1, includes the Matrix-Multiply Assist (MMA) instructions. The MMA instructions are designed to deliver additional performance both for classical high-performance computing, in the space of scientific and technical computing, and for the increasingly important space of business analytics. In addition, the Open Memory Interface (OMI), also developed by IBM, opens new levels of memory bandwidth and capacity for the most demanding applications. Our goal is to raise awareness of and interest in these new features, which we believe can lead to further research in processor architecture and programming environments. Some of the most promising application areas include graph algorithms, classical machine learning and deep learning.
Scaling sql server 2014 parallel insertChris Adkin
A slide deck on how to get the best possible performance out of the parallel insert feature introduced in SQL Server 2014 as presented at SQL Bits XIV.
We introduce the fundamentals of dynamic memory allocation and highlight several exploitable properties. These ideas are put into practice in a set of heap overflow challenges from exploit-exercise.com's Protostar VM. We walk through the first three. Other uses of heap space such as heap spraying are mentioned.
New Ethernet standards, such as 40 GbE or 100 GbE, are already being deployed commercially along with their corresponding Network Interface Cards (NICs) for the servers. However, network measurement solutions are lagging behind: while there are several tools available for monitoring 10 or 20 Gbps networks, higher speeds pose a harder challenge that requires more new ideas, different from those applied previously, and so there are less applications available. In this paper, we show a system capable of capturing, timestamping and storing 40 Gbps network traffic using a tailored network driver together with Non-Volatile Memory express (NVMe) technology and the Storage Performance Development Kit (SPDK) framework. Also, we expose core ideas that can be extended for the capture at higher rates: a multicore architecture capable of synchronization with minimal overhead that reduces disordering of the received frames, methods to filter the traffic discarding unwanted frames without being computationally expensive, and the use of an intermediate buffer that allows simultaneous access from several applications to the same data and efficient disk writes. Finally, we show a testbed for a reliable benchmarking of our solution using custom DPDK traffic generators and replayers, which
have been made freely available for the network measurement
community.
MemVerge Field CTO Yong Tian shows what memory expansion costs with an analysis of various server configurations with up to 8TB of tiered DRAM and CXL memory.
MySQL NDB Cluster 8.0 SQL faster than NoSQL Bernd Ocklin
MySQL NDB Cluster running SQL faster than most NoSQL databases. Benchmark results, comparisons and introduction into NDB's parallel distributed in-memory query engine. MySQL Day before FOSDEM 2020.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
3. RESEARCH QUESTION
1] To determine the ratio of silent stores vs total stores in different benchmarks
2] To determine clustering and pattern behavior of silent stores.
To determine clustering behavior of only silent stores
To determine clustering behavior of silent and non-silent stores
4. MODIFICATIONS
We had to make two modifications to acquire the required data.
1] Modified lsq_unit_impl.hh and transferred the data to a file (Store.txt)
This file consists of 2 lines for each store.
The first line was the Address where the store was being written to
The second line was the Data which the store was about to write
2] Modified packet.hh and transferred the data to a file (Cache.txt)
This file consists of 4 lines for each packet
The first line was the Address where the packet was writing
The second line was the number of bytes being written
The third line was the old data at the destination
The fourth line was the new data being written at the destination
6. SETUP
All the benchmarks were tested with 8KB L1 Cache (4-Way Set Associative/ 64 byte line size)
All the tests were carried out on detailed cpu .
Enormous amount of time was consumed to run each test.
To speedup we used cloud computers to parallelize the operation.
All the computers were 4-Core / 8 GB RAM and 80 GB SSD.
The time range to complete benchmarks was between 33 minutes (soplex) to 3897 minutes
(omnetpp)
There were many which did not complete (Time range was > 6000 minutes)
7. PROCESSING DATA
Processing the data was very difficult!
The file sizes were much more larger than main memory.
Impossible to read them and carry out any sort of mapping or modification.
File sizes were in order of > 25 GB for some benchmarks
A lot of amount of coding!
Two different forms of lazy reading
Sampling logic for plotting
Lazy selective sorting
10. PLOTTING DATA
Plotting the stores was necessary to determine clustering behavior
The first idea was to plot each and every store vs store number.
This was impossible to do as the number of stores was enormous
We did not have enough main memory to create such a plot
Even if were able to plot it, the information would be practically useless due to the scale.
Created a sampling technique
Divided the entire store subspace into 500 subparts
Plotted only the first store in each subpart.
Created charts using this via python
There was still one major problem!!!
15. RUN LENGTH ENCODING
Had to determine a new idea to identify clusters.
We noticed that there were only 2 conditions for stores Silent vs Non-Silent
Which is equivalent to True or False Condition (1’s and 0’s)
Thus logically our data was a very large string of binary data.
This was similar to jpeg images where data compression is always used in such conditions.
It was possible to apply the same idea here of Run Length Encoding.
Since storing the entire RLE was also not feasible, we capped it at 200.
To make sure silent stores were not dominated by non-silent, we did 2 forms of RLE
1] Top 200 RLE of both silent and non-silent stores
2] Top 200 RLE of only silent stores.
17. Type Length
0 1865497
0 1799967
0 1465497
0 1399967
0 1065499
0 999969
0 999967
0 740025
0 674501
0 366149
0 342447
Type Length
0 263
1 152
0 39
0 30
0 28
0 25
0 23
0 22
0 19
0 18
0 17
Type Length
1 1560002
1 1560002
1 22889
1 22528
0 12341
0 8823
0 5289
0 1368
0 1368
0 1368
0 1368
bzip2 specrand gobmk
Type Length
0 102406
1 84450
0 23942
0 11987
0 11986
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
mcf
T 3320091
S 3195427
T 11993059
S 5939535
T 98439312
S 22986711
T 226490984
S 24108946
18. Type Length
1 65538
1 5576
1 5460
1 4200
1 3288
1 3260
1 3138
1 3094
1 2965
1 2962
1 2814
Type Length
1 152
1 15
1 15
1 14
1 14
1 14
1 14
1 14
1 14
1 14
1 14
Type Length
1 1560002
1 1560002
1 22889
1 22528
1 152
1 107
1 58
1 58
1 58
1 58
1 58
bzip2 specrand gobmk
Type Length
1 84450
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
1 11973
1 11972
1 11972
mcf
T 3320091
S 3195427
T 11993059
S 5939535
T 98439312
S 22986711
T 226490984
S 24108946
19. CONCLUSION
Amount of silent stores are significant in almost all benchmarks.
There is also a requirement to focus on silent bytes.
Silent stores do show some amount of observable relation in programs.
More evaluation is necessary to determine in which phase of the program the sequences happen.
Also it is necessary to evaluate how the nature of the program impacts silent stores.