This document summarizes key concepts related to the CAP theorem and achieving consistency in distributed systems. It discusses how the CAP theorem proves it is impossible to guarantee consistency, availability, and partition tolerance simultaneously. It then explores ways to weaken consistency models, such as eventual consistency, to work around CAP. The main technique discussed is using quorums to ensure consistency when partitions do not occur, and allowing for eventual consistency when they do to maintain availability.
Microservices for performance - GOTO Chicago 2016Peter Lawrey
How do Microservices and Trading Systems overlap?
How can one area learn from the other?
How can we test components of microservices?
Is there a library which helps us implement and test these services?
Low latency microservices in java QCon New York 2016Peter Lawrey
In this talk we explore how Microservices and Trading System overlap and what they can learn from each other. In particular, how can we make microservices easy to test and performant. How can Trading System have shorter time to market and easier to maintain.
Microservices for performance - GOTO Chicago 2016Peter Lawrey
How do Microservices and Trading Systems overlap?
How can one area learn from the other?
How can we test components of microservices?
Is there a library which helps us implement and test these services?
Low latency microservices in java QCon New York 2016Peter Lawrey
In this talk we explore how Microservices and Trading System overlap and what they can learn from each other. In particular, how can we make microservices easy to test and performant. How can Trading System have shorter time to market and easier to maintain.
I have been receiving multiple queries on what is clk-to-q delay, how's it different from library setup time and library hold time, etc. I mentioned in my discussions, that the videos on CMOS digital circuit will be uploaded soon, but looks like, it might take some time, and hence decided to uploaded few images from my CMOS course, to explain the difference between all of them.
Usually we launch hundreds of instances in AWS for day to day work. As long as they are accessible from our hosts (probably a RHEL or Ubuntu or your own mac), we are good to go. But there are some instances where you might get a patch from IT for your host. Once you apply the patch, you realize that you are unable to access your AWS instances anymore. And your IT team doesn't have any clue on what happened. You contact AWS support, and they say it all looks good. So how do you proceed from this scenario? Where to start and what to do. This talk goes through all the steps starting with most basic checks all the way to updating the crypto key exchange algorithms on your host.
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...NoSQLmatters
Salvatore Sanfilippo – How Redis Cluster works, and why
In this talk the algorithmic details of Redis Cluster will be exposed in order to show what were the design tensions in the clustered version of an high performance database supporting complex data type, the selected tradeoffs, and their effect on the availability and consistency of the resulting solution.Other non-chosen solutions in the design space will be illustrated for completeness.
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...Peter Breuer
Presentation at 30th Annual IEEE/NASA Software Engineering Workshop (SEW-30), Loyola College Graduate Center, Columbia, MD, USA, April 25, 2006. The preprint of the paper is at http://www.academia.edu/1413564/Detecting_deadlock_double-free_and_other_abuses_in_a_million_lines_of_linux_kernel_source. DOI 10.1109/SEW.2006.1 .
Concurrency in Distributed Systems : Leslie Lamport papersSubhajit Sahu
In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the final outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability property of a program, algorithm, or problem into order-independent or partially-ordered components or units.[1]
A number of mathematical models have been developed for general concurrent computation including Petri nets, process calculi, the parallel random-access machine model, the actor model and the Reo Coordination Language.
I have been receiving multiple queries on what is clk-to-q delay, how's it different from library setup time and library hold time, etc. I mentioned in my discussions, that the videos on CMOS digital circuit will be uploaded soon, but looks like, it might take some time, and hence decided to uploaded few images from my CMOS course, to explain the difference between all of them.
Usually we launch hundreds of instances in AWS for day to day work. As long as they are accessible from our hosts (probably a RHEL or Ubuntu or your own mac), we are good to go. But there are some instances where you might get a patch from IT for your host. Once you apply the patch, you realize that you are unable to access your AWS instances anymore. And your IT team doesn't have any clue on what happened. You contact AWS support, and they say it all looks good. So how do you proceed from this scenario? Where to start and what to do. This talk goes through all the steps starting with most basic checks all the way to updating the crypto key exchange algorithms on your host.
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...NoSQLmatters
Salvatore Sanfilippo – How Redis Cluster works, and why
In this talk the algorithmic details of Redis Cluster will be exposed in order to show what were the design tensions in the clustered version of an high performance database supporting complex data type, the selected tradeoffs, and their effect on the availability and consistency of the resulting solution.Other non-chosen solutions in the design space will be illustrated for completeness.
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...Peter Breuer
Presentation at 30th Annual IEEE/NASA Software Engineering Workshop (SEW-30), Loyola College Graduate Center, Columbia, MD, USA, April 25, 2006. The preprint of the paper is at http://www.academia.edu/1413564/Detecting_deadlock_double-free_and_other_abuses_in_a_million_lines_of_linux_kernel_source. DOI 10.1109/SEW.2006.1 .
Concurrency in Distributed Systems : Leslie Lamport papersSubhajit Sahu
In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the final outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability property of a program, algorithm, or problem into order-independent or partially-ordered components or units.[1]
A number of mathematical models have been developed for general concurrent computation including Petri nets, process calculi, the parallel random-access machine model, the actor model and the Reo Coordination Language.
Case study of how we used Helix not only to build the distributed system but also to test it. We built a Chaos monkey to simulate failures and developed tools in Helix to parse zookeeper transaction logs, controller and participant logs and reconstructed the exact sequence of steps that led to a failure. Once we get the exact sequence of steps, we reproduce the events using Helix for orchestration.
In the big data world, our data stores communicate over an asynchronous, unreliable network to provide a facade of consistency. However, to really understand the guarantees of these systems, we must understand the realities of networks and test our data stores against them.
Jepsen is a tool which simulates network partitions in data stores and helps us understand the guarantees of our systems and its failure modes. In this talk, I will help you understand why you should care about network partitions and how can we test datastores against partitions using Jepsen. I will explain what Jepsen is and how it works and the kind of tests it lets you create. We will try to understand the subtleties of distributed consensus, the CAP theorem and demonstrate how different data stores such as MongoDB, Cassandra, Elastic and Solr behave under network partitions. Finally, I will describe the results of the tests I wrote using Jepsen for Apache Solr and discuss the kinds of rare failures which were found by this excellent tool.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
2. CAP conjecture [reminder]
• Can only have two of:
– Consistency
– Availability
– Partition-tolerance
• Examples
– Databases, 2PC, centralized algo (C & A)
– Distributed databases, majority protocols (C & P)
– DNS, Bayou (A & P)
3. CAP theorem
• Formalization by Gilbert & Lynch
• What does impossible mean?
– There exist an execution which violates one of CAP
– not possible to guarantee that an algorithm has
all three at all times
• Shard data with different CAP tradeoffs
• Detect partitions and weaken consistency
4. Partition-tolerance & availability
• What is partition-tolerance?
– Consistency and Availability are provided by algo
– Partitions are external events (scheduler/oracle)
• Partition-tolerance is really a failure model
• Partition-tolerance equivalent with omissions
• In the CAP theorem
– Proof rests on partitions that never heal
– Datacenters can guarantee recovery of partitions!
• Can guarantee that conflict resolution eventually happens
5. How do we ensure consistency
• Main technique to be consistent
– Quorum principle
– Example: Majority quorums
• Always write to and read from a majority of nodes
• At least one node knows most recent value
majority(9)=5
WRITE(v)
READ v
6. Quorum Principle
• Majority Quorum
– Pro: tolerate up to N/2 -1 crashes
– Con: Have to read/write N/2 +1 values
• Read/write quorums (Dynamo, ZooKeeper, Chain Repl)
– Read R nodes, Rrite W nodes, s.t. R + W > N (W > N/2)
– Pro: adjust performance of reads/writes
– Con: availability can suffer
• Maekwa Quorum
–
–
–
–
P1
Arrange nodes in a MxM grid
P4
Write to row+col, read cols (always overlap)
P7
Pro: Only need to read/write O( sqrt(N) ) nodes
Con: Tolerate at most O( sqrt(N) ) crashes (reconfiguration)
P2
P3
P5
P6
P8
P9
7
7. Probabilistic Quorums
• Quorum size α√N, (α > 1)
intersects with probability 1-exp(α2)
– Example:
– Maekwa:
N=16 nodes, quorum size 7,
intersects 95%, tolerates 9 failures
N=16 nodes, quorum size 7,
intersects 100%, tolerates 4 failures
– Pro: Small quorums, high fault-tolerance
– Con: Could fail to intersect, N usually large
8
8. Quorums and CAP
• With quorums we can get
– C & P: partition can make quorum unavailable
– C & A: no-partition ensures availability and atomicity
• Faced decision when fail to get quorum *brewer’11+
– Sacrifice availability by waiting for merger
– Sacrifice atomicity by ignoring the quorum
• Can we get CAP for weaker consistency?
9. What does atomicity really mean?
R
P1
R
P2
P3
W(5)
W(6)
invocation response
• Linearization Points
– Read ops appear as if immediately happened at all nodes at
• time between invocation and response
– Write ops appear as if immediately happened at all nodes at
• time between invocation and response
10. Definition of Atomicity
• Linearization Points
– Read ops appear as if immediately happened at all nodes at
• time between invocation and response
– Write ops appear as if immediately happened at all nodes at
• time between invocation and response
R:6
P1
R:5
P2
P3
W(5)
W(6)
atomic
12. Atomicity too strong?
R:5
P1
R:6
P2
P3
W(5)
not atomic
W(6)
• Linearization points too strong?
– Why not just have R:5 appear atomically right after W(5)?
– Lamport: ”If P2’s operator phones P1 and tells her I just read 6”
13. Atomicity too strong?
R:5
P1
R:6
P2
P3
W(5)
W(6)
not atomic
sequentially
consistent
• Sequential consistency
–
–
–
–
Weaker than atomicity
Sequential consistency removes this ”real-time” requirement
Any global ordering OK as long as they respect local ordering
Does Gilbert’s proof fall apart for sequential consistency?
• Causal memory
–
–
–
–
Weaker than sequential
No need to have global view, each process different view
Local, read/writes immediately return to caller
CAP theorem does not apply to causal memory
P1
P2
causally
consistent
W(0) R:1
W(1) R:0
14. Going really weak
• Eventual consistency
– When network non-partitioned, all nodes eventually have the same
value
– I.e. don’t be ”consistent” at all times, but only after partitions heal!
• Based on powerful technique: gossipping
–
–
–
–
Periodically exchange ”logs” with one random node
Exchange must be constant-sized packets
Set reconciliation, merkle trees, etc
Use (clock, node_id) to break ties of events in log
• Properties of gossipping
– All nodes will have the same value in O(log N) time
– No positive-feedback cycles that congest the network
15. BASE
• Catch all for any consistency model C’ that
enables C’-A-P
– Eventual consistency
– PRAM consistency
– Causal consistency
• Main ingredients
– Stale data
– Soft-state (regenerateable state)
– Approximate answers
16. Summary
• No need to ensure CAP at all times
– Switch between algorithms or satisfy subset at different times
• Weaken consistency model
– Choose weaker consistency:
• Causal memory (relatively strong) work around CAP
– Only be consistent when network isn’t partitioned:
• Eventual consistency (very weak) works around CAP
• Weaken partition-tolerance
– Some environments never partition, e.g. datacenters
– Tolerate unavailability in small quorums
– Some env. have recovery guarantees (partitions heal within X
hours), perform conflict resolution
17. Related Work (ignored in talk)
• PRAM consistency (Pipelined RAM)
– Weaker than causal and non-blocking
• Eventual Linearizability (PODC’10)
– Becomes atomic after quiescent periods
• Gossipping & set reconciliation
– Lots of related work
Editor's Notes
Failed ops appear ascompleted at every node, XORnever occurred at any node