This document discusses the Java Memory Model (JMM) and how it describes how threads interact through memory in Java. It covers key aspects of the JMM including happens-before ordering, memory barriers, visibility rules, and how final fields and atomic instructions interact with the memory model. It also discusses performance considerations and how different processor architectures implement memory ordering.
At first glance, writing concurrent programs in Java seems like a straight-forward task. But the devil is in the detail. Fortunately, these details are strictly regulated by the Java memory model which, roughly speaking, decides what values a program can observe for a field at any given time. Without respecting the memory model, a Java program might behave erratic and yield bugs that only occure on some hardware platforms. This presentation summarizes the guarantees that are given by Java's memory model and teaches how to properly use volatile and final fields or synchronized code blocks. Instead of discussing the model in terms of memory model formalisms, this presentation builds on easy-to follow Java code examples.
SpringOne Platform 2017
Stéphane Maldini, Pivotal; Simon Basle, Pivotal
"In 2016, Project Reactor was the foundation before Spring Reactive story, in particular with Reactor Core 3.0 fueling our initial Spring Framework 5 development.
2017 and 2018 are the years Project Reactor empowers the final Spring Framework 5 GA and an entire ecosystem, thus including further refinement, feedbacks and incredible new features. In fact, the new Reactor Core 3.1 and Reactor Netty 0.7 are the very major versions used by the like of Spring Boot 2.0, and they have dramatically consolidated around a simple but yet coherent API.
Discover those changes and the new Reactor capabilities including support for Reactive AOP, Observability, Tracing, Error Strategies for long-running streams, new Netty driver, improved test support, community driven initiatives and much more
Finally, the first java framework & ecosystem gets the reactive library it needs !"
Slides for my talk event-sourced architectures with Akka. Discusses Akka Persistence as mechanism to do event-sourcing. Presented at Javaone 2014 and Jfokus 2015.
At first glance, writing concurrent programs in Java seems like a straight-forward task. But the devil is in the detail. Fortunately, these details are strictly regulated by the Java memory model which, roughly speaking, decides what values a program can observe for a field at any given time. Without respecting the memory model, a Java program might behave erratic and yield bugs that only occure on some hardware platforms. This presentation summarizes the guarantees that are given by Java's memory model and teaches how to properly use volatile and final fields or synchronized code blocks. Instead of discussing the model in terms of memory model formalisms, this presentation builds on easy-to follow Java code examples.
SpringOne Platform 2017
Stéphane Maldini, Pivotal; Simon Basle, Pivotal
"In 2016, Project Reactor was the foundation before Spring Reactive story, in particular with Reactor Core 3.0 fueling our initial Spring Framework 5 development.
2017 and 2018 are the years Project Reactor empowers the final Spring Framework 5 GA and an entire ecosystem, thus including further refinement, feedbacks and incredible new features. In fact, the new Reactor Core 3.1 and Reactor Netty 0.7 are the very major versions used by the like of Spring Boot 2.0, and they have dramatically consolidated around a simple but yet coherent API.
Discover those changes and the new Reactor capabilities including support for Reactive AOP, Observability, Tracing, Error Strategies for long-running streams, new Netty driver, improved test support, community driven initiatives and much more
Finally, the first java framework & ecosystem gets the reactive library it needs !"
Slides for my talk event-sourced architectures with Akka. Discusses Akka Persistence as mechanism to do event-sourcing. Presented at Javaone 2014 and Jfokus 2015.
Presentation to describe about Circuit Breakers, where to apply, how and examples. Using the Netflix Hystrix and Spring Retry to demonstrate how and examples available on Github.
https://github.com/BHRother/spring-hystrix-example
https://github.com/BHRother/spring-circuit-breaker-example
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/28XnVtb.
Felix Klock describe the core concepts of the Rust language (ownership, borrowing, and lifetimes), as well as the tools beyond the compiler for open source software component distribution (cargo, crates.io). Filmed at qconlondon.com.
Felix Klock is a research engineer at Mozilla, where he works on the Rust compiler, runtime libraries, and language design. He previously worked on the ActionScript Virtual Machine for the Adobe Flash runtime. Klock is one of the developers of the Larceny Scheme language runtime.
Java is Object Oriented Programming. Java 8 is the latest version of the Java which is used by many companies for the development in many areas. Mobile, Web, Standalone applications.
Being Functional on Reactive Streams with Spring ReactorMax Huang
The journey begins with using Java 8 introduced Optional/Stream/CompletableFuture more functional, after which Reactive Streams is introduced with a homemade implementation that is ultimately made functional to increase usability. Finally Spring Reactor (Project Reactor) is presented and used for building a device simulator periodically reporting data to device controller.
Reactive programming is a general programming term focused on reacting to changes, such as data values or events. It can and often is done imperatively. A callback, delegate is an approach to reactive programming done imperatively.
Memory Management: What You Need to Know When Moving to Java 8AppDynamics
This presentation will compare and contrast application behavior in Java 7 with Java 8, particularly focusing on memory management and usage. Several code examples are presented to show how to recognize and respond to common pitfalls.
Presentation to describe about Circuit Breakers, where to apply, how and examples. Using the Netflix Hystrix and Spring Retry to demonstrate how and examples available on Github.
https://github.com/BHRother/spring-hystrix-example
https://github.com/BHRother/spring-circuit-breaker-example
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/28XnVtb.
Felix Klock describe the core concepts of the Rust language (ownership, borrowing, and lifetimes), as well as the tools beyond the compiler for open source software component distribution (cargo, crates.io). Filmed at qconlondon.com.
Felix Klock is a research engineer at Mozilla, where he works on the Rust compiler, runtime libraries, and language design. He previously worked on the ActionScript Virtual Machine for the Adobe Flash runtime. Klock is one of the developers of the Larceny Scheme language runtime.
Java is Object Oriented Programming. Java 8 is the latest version of the Java which is used by many companies for the development in many areas. Mobile, Web, Standalone applications.
Being Functional on Reactive Streams with Spring ReactorMax Huang
The journey begins with using Java 8 introduced Optional/Stream/CompletableFuture more functional, after which Reactive Streams is introduced with a homemade implementation that is ultimately made functional to increase usability. Finally Spring Reactor (Project Reactor) is presented and used for building a device simulator periodically reporting data to device controller.
Reactive programming is a general programming term focused on reacting to changes, such as data values or events. It can and often is done imperatively. A callback, delegate is an approach to reactive programming done imperatively.
Memory Management: What You Need to Know When Moving to Java 8AppDynamics
This presentation will compare and contrast application behavior in Java 7 with Java 8, particularly focusing on memory management and usage. Several code examples are presented to show how to recognize and respond to common pitfalls.
The workshop is based on several Nikita Salnikov-Tarnovski lectures + my own research. The workshop consists of 2 parts. The first part covers:
- different Java GCs, their main features, advantages and disadvantages;
- principles of GC tuning;
- work with GC Viewer as tool for GC analysis;
- first steps tuning demo;
- comparison primary GCs on Java 1.7 and Java 1.8
The second part covers:
- work with Off-Heap: ByteBuffer / Direct ByteBuffer / Unsafe / MapDB;
- examples and comparison of approaches;
The off-heap-demo: https://github.com/moisieienko-valerii/off-heap-demo
English version of the presentation we gave at Devoxx FR 2012.
In depth analysis on how java Garbage collector works and how to minimise pause in your application.
The Java Memory Model describes how threads in the Java programming language interact through memory. Together with the description of single-threaded execution of code, the memory model provides the semantics of the Java programming language.
It is crucial for a programmer to know how, according to Java Language Specification, write correctly synchronized, race free programs.
Java Memory Consistency Model - concepts and contextTomek Borek
Java Memory Consistency Model is a difficult topic.
It's useful in making sure that multi-threaded programs on multi-threaded cores will interact with each other (and through memory) in a consistent manner.
It's specification is damn hard (even according to folks with lots of concurrent experience, like Doug Lea) to read, understand and routinely follow without error.
This presentation talks about some fallacies surrounding memory model, explains it, offers definitions and reasons for it's existence. It ain't deep, it's more entry level stuff.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2msgBlb.
Kavya Joshi explores when and why locks affect performance, delves into Go’s lock implementation as a case study, and discusses strategies one can use when locks are actually a problem. Filmed at qconnewyork.com.
Kavya Joshi works as a software engineer at Samsara - a start-up in San Francisco. She particularly enjoys architecting and building highly concurrent, highly scalable systems.
Java has a solid Memory Model, and there are a couple of excellent libraries for concurrency. When you start working with threads however, pitfalls start appearing - especially if the program is supposed to be fast and correct. This session shows proven solutions for some typical problems, showing how to view program code from a concurrency perspective: Which threads share which data, and how? How to reduce the impact of locks? How to avoid them altogether - and when is that worth it?
Traditionally, computer software has been written for serial computation. To solve a problem, an algorithm is constructed and implemented as a serial stream of instructions. These instructions are executed on a central processing unit on one computer. Only one instruction may execute at a time—after that instruction is finished, the next is executed.
CSCI 2121- Computer Organization and Assembly Language Labor.docxannettsparrow
CSCI 2121- Computer Organization and Assembly Language
Laboratory No. 5
Week of March 12th, 2018
Submission Instructions:
1. Save the files as shift.sv and vending.sv
2. Put all files into one folder.
3. Compress the folder into a .zip file.
4. Submit the zip file on Brightspace.
5. Submission Deadline: Sunday, April 15th, 2018, 11.55PM
SRC CHIP PROJECT
In order to proceed on the project for this course, there are several key Verilog concepts which are
necessary in the implementation of this code.
Part 0: More Verilog concepts.
Shared busses and tri-state buffers
Before we begin implementing the CPU, let's go over a few concepts which will be needed for the lab
assignment. The first is the concept of a shared bus. We've already seen busses in Verilog in terms of an
array of wires or registers, however in the context of our CPU, the word "bus" has another meaning. Our
CPU uses a shared "bus" which is a wire connected between the inputs and outputs of many different
registers, in order to share data:
This can be easily modelled in Verilog, using the inout keyword for a module. This defines a wire which
can act as both an input and an output to our module. However, there is one critically important design
concept which goes along with the usage of shared busses:
Only one signal can be written to a shared bus at any given time.
If two signals are written to the bus, the result is a bus collision, and the value is undefined. In the
simulation, this is represented as "X", but in real life this would cause garbage data to exist on the bus,
potentially corrupting any process which is running on the CPU. As Verilog programmers, it's our job to
ensure that a bus collision never happens. To facilitate this, we'll use a digital logic component called a
tri-state buffer:
The purpose of a tri-state buffer is to act as a switch. If B is 1, then the data can pass through the buffer,
but if not, it simply disconnects the wire, outputting a high-impedance state (in Verilog, denoted by Z).
hz943141
A B C
0 0 Z
0 1 0
1 0 Z
1 1 1
In Verilog, we can't directly write a tri-state buffer, but it can be easily synthesized using a ternary
operator as shown on line 11:
Line 11 shows the construction of a ternary operator to use as a tri-state buffer. If the input called
activate_out is set to 1, then the tri-state buffer is activated, and the circuit will put my_result on the
bus. Otherwise, it puts the value 32'bz on the bus, which will disconnect my_result from the bus, and
allow another circuit to write to the bus.
Note: It is important that during your lab assignment, any circuits which write to the bus have a tri-
state buffer implemented to prevent bus collisions.
Verilog Tasks:
Verilog tasks are like object functions in Java. Read more about them here. You may find that they are
useful in separating the code for instructions within a module, particularly for th.
Agenda:
This talk will provide an in-depth review of the usage of canaries in the kernel and the interaction with userspace, as well as a short review of canaries and why they are needed in general so don't be afraid if you never heard of them.
Speaker:
Gil Yankovitch, CEO, Chief Security Researcher from Nyx Security Solutions
IRQs: the Hard, the Soft, the Threaded and the PreemptibleAlison Chaiken
The Linux kernel supports a diverse set of interrupt handlers that partition work into immediate and deferred tasks. The talk introduces the major varieties and explains how IRQs differ in the real-time kernel.
ECECS 472572 Final Exam ProjectRemember to check the errat.docxtidwellveronique
ECE/CS 472/572 Final Exam Project
Remember to check the errata section (at the very bottom of the page) for updates.
Your submission should be comprised of two items:
a
.pdf
file containing your written report and a
.tar
file containing a directory structure with your C or C++ source code. Your grade will be reduced if you do not follow the submission instructions.
All written reports (for both 472 and 572 students) must be composed in MS Word, LaTeX, or some other word processor and submitted as a PDF file.
Please take the time to read this entire document. If you have questions there is a high likelihood that another section of the document provides answers.
Introduction
In this final project you will implement a cache simulator. Your simulator will be configurable and will be able to handle caches with varying capacities, block sizes, levels of associativity, replacement policies, and write policies. The simulator will operate on trace files that indicate memory access properties. All input files to your simulator will follow a specific structure so that you can parse the contents and use the information to set the properties of your simulator.
After execution is finished, your simulator will generate an output file containing information on the number of cache misses, hits, and miss evictions (i.e. the number of block replacements). In addition, the file will also record the total number of (simulated) clock cycles used during the situation. Lastly, the file will indicate how many read and write operations were requested by the CPU.
It is important to note that your simulator is required to make several significant assumptions for the sake of simplicity.
1. You do not have to simulate the actual data contents. We simply pretend that we copied data from main memory and keep track of the hypothetical time that would have elapsed.
2. Accessing a sub-portion of a cache block takes the exact same time as it would require to access the entire block. Imagine that you are working with a cache that uses a 32 byte block size and has an access time of 15 clock cycles. Reading a 32 byte block from this cache will require 15 clock cycles. However, the same amount of time is required to read 1 byte from the cache.
3. In this project assume that main memory RAM is always accessed in units of 8 bytes (i.e. 64 bits at a time).
When accessing main memory, it's expensive to access the first unit. However, DDR memory typically includes buffering which means that the RAM can provide access to the successive memory (in 8 byte chunks) with minimal overhead. In this project we assume an
overhead of 1 additional clock cycle per contiguous unit
.
For example, suppose that it costs 255 clock cycles to access the first unit from main memory. Based on our assumption, it would only cost 257 clock cycles to access 24 bytes of memory.
4. Assume that all caches utilize a "fetch-on-write" scheme if a miss occurs on a Store operation. This means that .
ECECS 472572 Final Exam ProjectRemember to check the err.docxtidwellveronique
ECE/CS 472/572 Final Exam Project
Remember to check the errata section (at the very bottom of the page) for updates.
Your submission should be comprised of two items:
a
.pdf
file containing your written report and a
.tar
file containing a directory structure with your C or C++ source code. Your grade will be reduced if you do not follow the submission instructions.
All written reports (for both 472 and 572 students) must be composed in MS Word, LaTeX, or some other word processor and submitted as a PDF file.
Please take the time to read this entire document. If you have questions there is a high likelihood that another section of the document provides answers.
Introduction
In this final project you will implement a cache simulator. Your simulator will be configurable and will be able to handle caches with varying capacities, block sizes, levels of associativity, replacement policies, and write policies. The simulator will operate on trace files that indicate memory access properties. All input files to your simulator will follow a specific structure so that you can parse the contents and use the information to set the properties of your simulator.
After execution is finished, your simulator will generate an output file containing information on the number of cache misses, hits, and miss evictions (i.e. the number of block replacements). In addition, the file will also record the total number of (simulated) clock cycles used during the situation. Lastly, the file will indicate how many read and write operations were requested by the CPU.
It is important to note that your simulator is required to make several significant assumptions for the sake of simplicity.
1. You do not have to simulate the actual data contents. We simply pretend that we copied data from main memory and keep track of the hypothetical time that would have elapsed.
2. Accessing a sub-portion of a cache block takes the exact same time as it would require to access the entire block. Imagine that you are working with a cache that uses a 32 byte block size and has an access time of 15 clock cycles. Reading a 32 byte block from this cache will require 15 clock cycles. However, the same amount of time is required to read 1 byte from the cache.
3. In this project assume that main memory RAM is always accessed in units of 8 bytes (i.e. 64 bits at a time).
When accessing main memory, it's expensive to access the first unit. However, DDR memory typically includes buffering which means that the RAM can provide access to the successive memory (in 8 byte chunks) with minimal overhead. In this project we assume an
overhead of 1 additional clock cycle per contiguous unit
.
For example, suppose that it costs 255 clock cycles to access the first unit from main memory. Based on our assumption, it would only cost 257 clock cycles to access 24 bytes of memory.
4. Assume that all caches utilize a "fetch-on-write" scheme if a miss occurs on a Store operation. This means that.
* Memory types (RAM, ROM, EEPROM, etc).
* Program memory segments.
* Static vs. Dynamic memory allocation.
* Static vs. Dynamic linking.
* Function call with respect to stack, i/p, o/p and i/o parameters and return value.
* Functions types (Synch. vs. ASynch, Reentrant vs. non-Reentrant, Recursive, Inline function vs. function-like macro).
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
2. Outline
● Introduction to JMM
● Happens-before
● Memory barriers
● Performance issues
● Atomicity
● JEP 171
● Non blocking algorithms
Java
C++
ASM
3. Java Memory Model
● Instructions reordering
● Visibility
● Final fields
● Interaction with atomic instructions
4. Java Memory Model
● The Java memory model (JMM) describes how threads in
the Java programming language interact through
memory.
● Provides sequential consistency for data race free
programs.
5. Instructions reordering
Program order:
int a = 1;
int b = 2;
int c = 3;
int d = 4;
int e = a + b;
int f = c – d;
Execution order:
int d = 4;
int c = 3;
int f = c – d;
int b = 2;
int a = 1;
int e = a + b;
6. Quiz
x = y = 0
x = 1
j = y
y = 1
i = x
What could be the result?
Thread 1 Thread 2
7. Answer(s)
● i = 1; j = 1
● i = 0; j = 1
● i = 1; j = 0
● i = 0; j = 0
8. Happens-before order
Two actions can be ordered by a happens-before
relationship. If one action happens-before another, then the
first is visible to and ordered before the second.
Java Language Specification, Java SE 7 Edition
9. Happens-before rules
● A monitor release and matching later monitor acquire
establish a happens before ordering.
● A write to a volatile field happens-before every
subsequent read of that field.
● Execution order within a thread also establishes a
happens before order.
● Happens before order is transitive.
10. Java tools
● Volatile variables
volatile boolean running = true;
● Monitors
synchronized (this) {
i = a;
a = i;
}
ReentrantLock lock = new ReentrantLock();
lock.lock();
lock.unlock();
11. What does volatile do?
● Volatile reads/writes can not be reordered
● Compilers and runtime are not allowed to allocate volatile
variables in registers
● Volatile longs and doubles are atomic
14. Volatiles and monitors ordering
Can Reorder 2nd operation
1st operation Normal Load
Normal Store
Volatile Load
MonitorEnter
Volatile Store
MonitorExit
Normal Load
Normal Store
No
Volatile Load
MonitorEnter
No No No
Volatile store
MonitorExit
No No
The JSR-133 Cookbook for Compiler Writers
15. Visibility
Thread 1:
public void run() {
int counter = 0;
while (running) {
counter++;
}
System.out.println("Counted up
to " + counter);
}
Thread 2:
public void run() {
try {
Thread.sleep(100);
} catch (InterruptedException
ignored) { }
running = false;
}
LoopFlag
17. How is it possible?
● Compiler can reorder instructions.
● Compiler can keep values in registers.
● Processor can reorder instructions.
● Values may not be synchronized to main memory.
● JMM is designed to allow aggressive optimizations.
LoopFlag - volatile
23. Memory barrier - LoadLoad
The sequence: Load1; LoadLoad; Load2
Ensures that Load1's data are loaded before data accessed
by Load2 and all subsequent load instructions are loaded. In
general, explicit LoadLoad barriers are needed on
processors that perform speculative loads and/or out-of-
order processing in which waiting load instructions can
bypass waiting stores. On processors that guarantee to
always preserve load ordering, the barriers amount to no-
ops.
The JSR-133 Cookbook for Compiler Writers
24. Memory barrier - StoreStore
The sequence: Store1; StoreStore; Store2
Ensures that Store1's data are visible to other processors
(i.e., flushed to memory) before the data associated with
Store2 and all subsequent store instructions. In general,
StoreStore barriers are needed on processors that do not
otherwise guarantee strict ordering of flushes from write
buffers and/or caches to other processors or main memory.
The JSR-133 Cookbook for Compiler Writers
25. Memory barrier - LoadStore
The sequence: Load1; LoadStore; Store2
Ensures that Load1's data are loaded before all data
associated with Store2 and subsequent store instructions
are flushed. LoadStore barriers are needed only on those
out-of-order procesors in which waiting store instructions
can bypass loads.
The JSR-133 Cookbook for Compiler Writers
26. Memory barrier - StoreLoad
The sequence: Store1; StoreLoad; Load2
Ensures that Store1's data are made visible to other processors (i.e., flushed to
main memory) before data accessed by Load2 and all subsequent load instructions
are loaded. StoreLoad barriers protect against a subsequent load incorrectly using
Store1's data value rather than that from a more recent store to the same location
performed by a different processor. Because of this, on the processors discussed
below, a StoreLoad is strictly necessary only for separating stores from subsequent
loads of the same location(s) as were stored before the barrier. StoreLoad barriers
are needed on nearly all recent multiprocessors, and are usually the most
expensive kind. Part of the reason they are expensive is that they must disable
mechanisms that ordinarily bypass cache to satisfy loads from write-buffers. This
might be implemented by letting the buffer fully flush, among other possible stalls.
The JSR-133 Cookbook for Compiler Writers
27. Memory barriers
Required
barriers
2nd operation
1st operation
Normal Load Normal Store Volatile Load
MonitorEnter
Volatile Store
MonitorExit
Normal Load LoadStore
Normal Store StoreStore
Volatile Load
MonitorEnter
LoadLoad LoadStore LoadLoad LoadStore
Volatile Store
MonitorExit
StoreLoad StoreStore
The JSR-133 Cookbook for Compiler Writers
28. Intel X86/64 Memory Model
● Loads are not reordered with other loads.
● Stores are not reordered with other stores.
● Stores are not reordered with older loads.
● Loads may be reordered with older stores to different locations but
not with older stores to the same location.
● In a multiprocessor system, memory ordering obeys causality (memory
ordering respects transitive visibility).
● In a multiprocessor system, stores to the same location have a total order.
● In a multiprocessor system, locked instructions have a total order.
● Loads and stores are not reordered with locked instructions.
LoopFlag – asm - store, MemoryBarriers – asm
29. StoreLoad on Intel Ivy Bridge
lock addl $0x0,(%rsp)
Intel's IA-32 developer manual: Locked operations are
atomic with respect to all other memory operations and all
externally visible events. [...] Locked instructions can be
used to synchronize data written by one processor and read
by another processor.
32. Memory barriers - architecture
Processor LoadStore LoadLoad StoreStore StoreLoad Data
dependency
orders
loads?
Atomic
Conditional
Other
Atomics
Atomics
provide
barrier?
sparc-TSO no-op no-op no-op membar
(StoreLoad)
yes CAS:
casa
swap,
ldstub
full
x86 no-op no-op no-op mfence or
cpuid or
locked
insn
yes CAS:
cmpxchg
xchg,
locked
insn
full
ia64 combine
with
st.rel or
ld.acq
ld.acq st.rel mf yes CAS:
cmpxchg
xchg,
fetchadd
target +
acq/rel
arm dmb
(see below)
dmb
(see below)
dmb-st dmb indirection
only
LL/SC:
ldrex/strex
target
only
ppc lwsync
(see below)
lwsync
(see below)
lwsync hwsync indirection
only
LL/SC:
ldarx/stwcx
target
only
alpha mb mb wmb mb no LL/SC:
ldx_l/stx_c
target
only
pa-risc no-op no-op no-op no-op yes build
from
ldcw
ldcw (NA)
The JSR-133 Cookbook for Compiler Writers
* The x86 processors supporting "streaming SIMD" SSE2 extensions require LoadLoad "lfence" only only in connection with these
streaming instructions.
33. Final fields
● Act as a normal field, but:
– A store of a final field (inside a constructor) and, if the field
is a reference, any store that this final can reference, cannot
be reordered with a subsequent store (outside that
constructor) of the reference to the object holding that field
into a variable accessible to other threads. (x.finalField =
v; ... ; sharedRef = x;)
– The initial load (i.e., the very first encounter by a thread) of
a final field cannot be reordered with the initial load of the
reference to the object containing the final field. (v.afield =
1; x.finalField = v; ... ; sharedRef = x;)
34. Final field example
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x;
int j = f.y;
}
}
}
35. Final field example
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x;
int j = f.y;
}
}
}
Guaranteed value 3
4 or 0 !!
37. AtomicInteger
public class AtomicInteger extends Number implements java.io.Serializable {
//...
private volatile int value;
public final void set(int newValue) {
value = newValue;
}
//...
public final void lazySet(int newValue) {
unsafe.putOrderedInt(this, valueOffset, newValue);
}
//...
public final boolean compareAndSet(int expect, int update) {
return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
}
Atomic - asm