This document discusses techniques for deterministic replay of multithreaded programs. It describes how recording shared memory ordering information can enable replay that reproduces data races and concurrency bugs. Specifically, it outlines using a directory-based approach to track read-write dependencies between threads and reduce the log size through transitive reduction of dependencies.
An introduction to promises from the ground up; an overview of the recent history of promises; and some guidance on using promises in your real-world code.
Video at http://www.youtube.com/watch?v=MNxnHbyzhuo, and article at http://open.blogs.nytimes.com/2013/05/29/promises-promises/.
An introduction to promises from the ground up; an overview of the recent history of promises; and some guidance on using promises in your real-world code.
Video at http://www.youtube.com/watch?v=MNxnHbyzhuo, and article at http://open.blogs.nytimes.com/2013/05/29/promises-promises/.
Replication is useful in improving the availability of data by coping data at multiple sites.
Either a relation or a fragment can be replicated at one or more sites.
Fully redundant databases are those in which every site contains a copy of the entire database.
Depending on the availability and redundancy factor there are three types of replications:
Full replication.
No replication.
Partial replication.
This is the Complete Information about Data Replication you need, i am focused on these topics:
What is replication?
Who use it?
Types ?
Implementation Methods?
Most computers today have multiple cores and processes. That means, if you really want to fully explore the capabilities of them, you must explore concurrent programming resources. Go was developed with this in mind, offering goroutines and channels.
Although concurrent programming may seem simple with Go, there are a couple of details to keep in mind, to avoid problems. In this talk, I’m going to show a couple of practical examples with goroutines and channels, and some common pitfalls
A talk I gave at the Golang TO Meetup. Highlighting the beautiful powers of Go with respect to concurrency, and writing concurrent programs using it.
Code at: github.com/jsimnz/concurrency-talk
Garbage collection has largely removed the need to think about memory management when you write Java code, but there is still a benefit to understanding and minimizing the memory usage of your applications, particularly with the growing number of deployments of Java on embedded devices. This session gives you insight into the memory used as you write Java code and provides you with guidance on steps you can take to minimize your memory usage and write more-memory-efficient code. It shows you how to
• Understand the memory usage of Java code
• Minimize the creation of new Java objects
• Use the right Java collections in your application
• Identify inefficiencies in your code and remove them
Video available from Parleys.com:
https://www.parleys.com/talk/how-write-memory-efficient-java-code
This presentation several topics of subjects RDBMS and DBMS including Distributed Database Design,Architecture of Distributed database processing system,Data Communication concept,Concurrency control and recovery. All the topics are briefly described according to syllabus of BCA II and BCA III year subjects.
Maximum CPU utilization obtained with multiprogramming
CPU–I/O Burst Cycle – Process execution consists of a cycle of CPU execution and I/O wait
CPU burst followed by I/O burst
CPU burst distribution is of main concern
Replication is useful in improving the availability of data by coping data at multiple sites.
Either a relation or a fragment can be replicated at one or more sites.
Fully redundant databases are those in which every site contains a copy of the entire database.
Depending on the availability and redundancy factor there are three types of replications:
Full replication.
No replication.
Partial replication.
This is the Complete Information about Data Replication you need, i am focused on these topics:
What is replication?
Who use it?
Types ?
Implementation Methods?
Most computers today have multiple cores and processes. That means, if you really want to fully explore the capabilities of them, you must explore concurrent programming resources. Go was developed with this in mind, offering goroutines and channels.
Although concurrent programming may seem simple with Go, there are a couple of details to keep in mind, to avoid problems. In this talk, I’m going to show a couple of practical examples with goroutines and channels, and some common pitfalls
A talk I gave at the Golang TO Meetup. Highlighting the beautiful powers of Go with respect to concurrency, and writing concurrent programs using it.
Code at: github.com/jsimnz/concurrency-talk
Garbage collection has largely removed the need to think about memory management when you write Java code, but there is still a benefit to understanding and minimizing the memory usage of your applications, particularly with the growing number of deployments of Java on embedded devices. This session gives you insight into the memory used as you write Java code and provides you with guidance on steps you can take to minimize your memory usage and write more-memory-efficient code. It shows you how to
• Understand the memory usage of Java code
• Minimize the creation of new Java objects
• Use the right Java collections in your application
• Identify inefficiencies in your code and remove them
Video available from Parleys.com:
https://www.parleys.com/talk/how-write-memory-efficient-java-code
This presentation several topics of subjects RDBMS and DBMS including Distributed Database Design,Architecture of Distributed database processing system,Data Communication concept,Concurrency control and recovery. All the topics are briefly described according to syllabus of BCA II and BCA III year subjects.
Maximum CPU utilization obtained with multiprogramming
CPU–I/O Burst Cycle – Process execution consists of a cycle of CPU execution and I/O wait
CPU burst followed by I/O burst
CPU burst distribution is of main concern
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
In this session we will show how to build a text classifier using the Apache Lucene/Solr with libSVM libraries. We classify our corpus of job offers into a number of predefined categories. Each indexed document (a job offer) then belongs to zero, one or more categories. Known machine learning techniques for text classification include naïve bayes model, logistic regression, neural network, support vector machine (SVM), etc. We use Lucene/Solr to construct the features vector. Then we use the libsvm library known as the reference implementation of the SVM model to classify the document. We construct as many one-vs-all svm classifiers as there are classes in our setting, then using the Hadoop MapReduce Framework we reconcile the result of our classifiers. The end result is a scalable multi-class classifier. Finally we outline how the classifier is used to enrich basic solr keyword search.
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
Big data analysis relies on exploiting various handy tools to gain insight from data easily. In this talk, the speaker demonstrates a data mining flow for text classification using many Python tools. The flow consists of feature extraction/selection, model training/tuning and evaluation. Various tools are used in the flow, including: Pandas for feature processing, scikit-learn for classification, IPython, Notebook for fast sketching, matplotlib for visualization.
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleSean Zhong
Gearpump is a Akka based realtime streaming engine, it use Actor to model everything. It has super performance and flexibility. It has performance of 18000000 messages/second and latency of 8ms on a cluster of 4 machines.
Three tricks how to understand what's happening inside of .NET Core app running on Linux: perf, lttng and lldb. As unrelated bonus, last slides have a brief intro into Google Cloud Platform
The second part of Linux Internals covers system calls, process subsystem and inter process communication mechanisms. Understanding these services provided by Linux are essential for embedded systems engineer.
The Java Memory Model describes how threads in the Java programming language interact through memory. Together with the description of single-threaded execution of code, the memory model provides the semantics of the Java programming language.
It is crucial for a programmer to know how, according to Java Language Specification, write correctly synchronized, race free programs.
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
Giuseppe will present the differences between high-performance and high-throughput applications. High-throughput computing (HTC) refers to computations where individual tasks do not need to interact while running. It differs from High-performance (HPC) where frequent and rapid exchanges of intermediate results is required to perform the computations. HPC codes are based on tightly coupled MPI, OpenMP, GPGPU, and hybrid programs and require low latency interconnected nodes. HTC makes use of unreliable components distributing the work out to every node and collecting results at the end of all parallel tasks.
Visit: https://www.eudat.eu/eudat-summer-school
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Data race
1. 1
Thread 1 Thread 2
X++ T=Y
Z=2 T=X
What is a Data Race?
Two concurrent accesses to a shared
location, at least one of them for writing.
Indicative of a bug
2. 2
Lock(m)
Unlock(m) Lock(m)
Unlock(m)
How Can Data Races be Prevented?
Explicit synchronization between threads:
Locks
Critical Sections
Barriers
Mutexes
Semaphores
Monitors
Events
Etc.
Thread 1 Thread 2
X++
T=X
3. 3
Is This Sufficient?
Yes!
No!
Programmer dependent
Correctness – programmer may forget to synch
Need tools to detect data races
Expensive
Efficiency – to achieve correctness, programmer may
overdo.
Need tools to remove excessive synch’s
4. 4
#define N 100
Type g_stack = new Type[N];
int g_counter = 0;
Lock g_lock;
void push( Type& obj ){lock(g_lock);...unlock(g_lock);}
void pop( Type& obj ) {lock(g_lock);...unlock(g_lock);}
void popAll( ) {
lock(g_lock);
delete[] g_stack;
g_stack = new Type[N];
g_counter = 0;
unlock(g_lock);
}
int find( Type& obj, int number ) {
lock(g_lock);
for (int i = 0; i < number; i++)
if (obj == g_stack[i]) break; // Found!!!
if (i == number) i = -1; // Not found… Return -1 to caller
unlock(g_lock);
return i;
}
int find( Type& obj ) {
return find( obj, g_counter );
}
Where is Waldo?
5. 5
#define N 100
Type g_stack = new Type[N];
int g_counter = 0;
Lock g_lock;
void push( Type& obj ){lock(g_lock);...unlock(g_lock);}
void pop( Type& obj ) {lock(g_lock);...unlock(g_lock);}
void popAll( ) {
lock(g_lock);
delete[] g_stack;
g_stack = new Type[N];
g_counter = 0;
unlock(g_lock);
}
int find( Type& obj, int number ) {
lock(g_lock);
for (int i = 0; i < number; i++)
if (obj == g_stack[i]) break; // Found!!!
if (i == number) i = -1; // Not found… Return -1 to caller
unlock(g_lock);
return i;
}
int find( Type& obj ) {
return find( obj, g_counter );
}
Can You Find the Race?
Similar problem was found
in java.util.Vector
write
read
6. 6
Detecting Data Races?
NP-hard [Netzer&Miller 1990]
Input size = # instructions performed
Even for 3 threads only
Even with no loops/recursion
Execution orders/scheduling (#threads)thread_length
# inputs
Detection-code’s side-effects
Weak memory, instruction reorder, atomicity
7. 7
Motivation
Run-time framework goals
Collect a complete trace of a program’s user-mode execution
Keep the tracing overhead for both space and time low
Re-simulate the traced execution deterministically based on the
collected trace with full fidelity down to the instruction level
Full fidelity: user mode only, no tracing of kernel, only user-mode I/O
callbacks
Advantages
Complete program trace that can be analyzed from multiple
perspectives (replay analyzers: debuggers, locality, etc)
Trace can be collected on one machine and re-played on other
machines (or perform live analysis by streaming)
Challenges: Trace Size and Performance
8. 8
Original Record-Replay Approaches
InstantReplay ’87
Record order or memory accesses
overhead may affect program behavior
RecPlay ’00
Record only synchronizations
Not deterministic if have data races
Netzer ’93
Record optimal trace
too expensive to keep track of all memory locations
Bacon & Goldstein ’91
Record memory bus transactions with hardware
high logging bandwidth
9. 9
Motivation
Increasing use and development for multi-core
processors
MT program behavior is non-deterministic
To effectively debug software, developers must be able
to replay executions that exhibit concurrency bugs
Shared memory updates happen in different order
10. 10
Related Concepts
Runtime interpretation/translation of binary instructions
Requires no static instrumentation, or special symbol information
Handle dynamically generated code, self modifying code
Recording/Logging: ~100-200x
More recent logging
Proposed hardware support (for MT domain)
FDR (Flight Data Recorder)
BugNet (cache bits set on first load)
RTR (Regulated Transitive Reduction)
DeLorean (ISCA 2008- chunks of instructions)
Strata (time layer across all the logs for the running threads)
iDNA (Diagnostic infrastructure using NirvanA- Microsoft)
11. 11
Deterministic Replay
Re-execute the exact same sequence of instructions
as recorded in a previous run
Single threaded programs
Record Load Values needed for reproducing behavior of a
run (Load Log)
Registers updated by system calls and signal handlers
(Reg Log)
Output of special instructions: RDTSC, CPUID (Reg Log)
System call (virtualization- cloning arguments, updates)
Checkpointing (log summary ~10Million)
Multi-threaded programs
Log interleaving among threads (shared memory updates
ordering – SMO Log)
12. 12
PinSEL – System Effect Log (SEL)
Logging program load values needed for deterministic replay:
– First access from a memory location
– Values modified by the system (system effect) and read by
program
– Machine and time sensitive instructions (cpuid,rdtsc)
Load A; (A = 111)
Logged
Not Logged
Syscall modifies
location (B -> 0)
and (C -> 99)
Load C; (C = 99)
Load D; (D = 10)
Store A; (A 111)
Store B; (B 55)
Load B; (B = 0)
system call
Program
execution
Load C; (C = 9)
Load D; (D = 10)
•Trace size is ~4-5 bytes per instruction
13. 13
reads
Observation: Hardware caches eliminate most off-chip reads
Optimize logging:
Logger and replayer simulate identical cache memories
Simple cache (the memory copy structure) to decide which values
to log. No tags or valid bits to check. If the values mismatch they
are logged.
Average trace size is <1 bit per instruction
i = 1;
for (j = 0; j < 10; j++)
{
i = i + j;
}
k = i; // value read is 46
System_call();
k = i; // value read is 0 (not predicted)
The only read not predicted and logged follows the system call
14. 14
Example Overhead
PinSEL and PinPLAY
Initial work (2006) with single threaded programs:
SPEC2000 ref runs: 130x slowdown for pinSEL and ~80x
for PinPLAY (w/o in-lining)
Working with a subset of SPLASH2 benchmarks: 230x
slowdown for PinSEL
Now: Geo-mean SPEC2006
Pin 1.4x
Logger 83.6x
Replayer 1.4x
15. 15
Example: Microsoft iDNA Trace
Writer Performance
Applicatio
n
Simulated
Instructions
(millions)
Trace File
Size
Trace File
Bits /
Instructio
n
Native
Execution
Time
Execution
Time While
Tracing
Execution
Overhead
Gzip 24,097 245 MB 0.09 11.7s 187s 15.98
Excel 1,781 99 MB 0.47 18.2s 105s 5.76
Power
Point
7,392 528 MB 0.60 43.6s 247s 5.66
IE 116 5 MB 0.50 0.499s 6.94s 13.90
Vulcan 2,408 152 MB 0.53 2.74s 46.6s 17.01
Satsolver 9,431 1300 MB 1.16 9.78s 127s 12.98
•Memchecker and valgrind are in 30-40x range on CPU 2006
•iDNA ~11x, (does not log shared-memory dependences explicitly)
•Use a sequential number for every lock prefixed memory operation: offline
data race analysis
16. 16
Logging Shared Memory Ordering
(Cristiano’s PinSEL/PLAY Overview)
Emulation of Directory Based Cache
Coherence
Identifies RAW, WAR, WAW dependences
Indexed by hashing effective address
Each entry represents an address range
Store A
Load B
Program execution
hash
Dir Entry
Dir Entry
Dir Entry
Dir Entry
Directory
17. 17
Directory Entries
Every DirEntry maintains:
Thread id of the last_writer
A timestamp is the # of memory ref. the thread has executed
Vector of timestamps of last access for each thread to that
entry
On Loads: update the timestamp for the thread in the entry
On Stores: update the timestamp and the last_writer fields
Programexecution
Thread T1 Thread T2
Last writer id:1: Store A
2: Load A
DirEntry: [A:D]
Last writer id:
DirEntry: [E:H]
Directory
T1: T2:
T1: T2:
1: Load F
2: Store A
3: Load F
3: Store F
T1
1
1
T2
22
3
T1
3
Vector
18. 18
Detecting Dependences
RAW dependency between threads T and T’ is established
if:
T executes a load that maps to the directory entry A
T’ is the last_writer for the same entry
WAW dependency between T and T’ is established if:
T executes a store that maps to the directory entry A
T’ is the last_writer for the same entry
WAR dependency between T and T’ is established if:
T executes a store that maps to the directory entry A
T’ has accessed the same entry in the past and T is not the
last_writer
19. 19
ExampleProgramexecution
Thread T1 Thread T2
Last writer id:1: Store A
2: Load A
DirEntry: [A:D]
Last writer id:
DirEntry: [E:H]
T1: T2:
T1: T2:
1: Load F
2: Store A
3: Load F
3: Store F
T1
1
1
T2
22
3
T1
3
WAW
RAW
WAR
T1 2 T2 2
T1 3 T2 3
T2 2 T1 1
SMO logs:
Thread T1 cannot execute memory reference 2
until T2 executes its memory reference 2
Thread T2 cannot execute memory
reference 2 until T1 executes its
memory reference 1
Last access to the DirEntry
Last_writer
Last access to
the DirEntry
20. 20
Ordering Memory Accesses
(Reducing log size)
Preserving order will reproduce
execution
a→b: “a happens-before b”
Ordering is transitive: a→b, b→c means
a→c
Two instructions must be ordered if:
they both access the same memory, and
one of them is a write
21. 21
Constraints: Enforcing Order
To guarantee a→d:
a→d
b→d
a→c
b→c
Suppose we need b→c
b→c is necessary
a→d is redundant
P1
a
b
c
d
P2
overconstrained
22. 22
Reproduce exact same conflicts: no more, no less
Problem Formulation
ld A
Thread I Thread J
Recording
st B
st C
sub
ld B
add
st C
ld B
st A
st C
Thread I Thread J
Replay
Log
ld D
st D
ld A
st B
st C
sub
ld B
add
st C
ld B
st A
st C
ld D
st D
Conflicts
(red)
Dependence
(black)
23. 23
Detect conflicts Write
log
Log All Conflicts
1
2
3
4
5
6
1
2
3
4
5
6
ld A
Thread I Thread J
Replay
st B
st C
sub
ld B
add
st C
ld B
st A
st C
ld D
st D
Log J: 2→3
1→4
3→5
4→6
Log I: 2→3
Log Size: 5*16=80 bytes
(10 integers)
Dependence Log
16
bytes
Assign IC
(logical Timestamps)
But too many conflicts
25. 25
RTR (Regulated Transitive Reduction):
Stricter Dependences to Aid Vectorization
1
2
3
4
5
6
1
2
3
4
5
6
ld A
Thread I Thread J
Replay
st B
st C
sub
ld B
add
st C
ld B
st A
st C
ld D
st D
Log J: 2→3
4→5
Log I: 2→3
Log Size: 48 bytes
(6 integers)
New Reduced Log
stricte
r
Reduce
d
4% Overhead RTR+FDR (simulated on GEMs)
.2 MB/core/second logging (Apache)
Editor's Notes
&lt;number&gt;
Talking about previous solutions, let’s have a short survey. Most previous record-replay solutions are in software.
For example, InstantReplay and Netzer both try to record the software execution in software. But both of them suffered from high performance overhead due to high data bandwidth or high computation overhead. Bacon and Goldstein proposed a hardware recorder solution but have high logging bandwidth and required a central memory bus. Recently, RecPlay took a unique approach to reduce the performance overhead by only record synchronizations. Unfortunately it does not work for programs that contain data races. So, we have to record more than just synchronizations.
Talk about the big picture first, we need to log and recreate dependencies, then we need to reduce the log size.
Define dependencies using words
Mention assume in-order replay
Go slow, define everything
Reproduce the same conflict, as we will see a naïve way to do that is to record all conflicts