The document describes an approach called lineage-driven fault injection for testing the fault tolerance of distributed systems. It involves thinking backwards from outcomes to understand why good things happened and what could have gone wrong. An example protocol for reliable broadcast is used to illustrate how lineage analysis can uncover potential failures. Repeated attempts by the broadcaster to relay messages provides redundancy over time that could prevent failures. Traces of executions are modeled as forests of proof trees to represent the different ways redundancy provides evidence that an outcome was achieved.
Extending Python, what is the best option for me?Codemotion
by Francisco Fernández Castaño - Python is a great language, but there are occasions where we need access to low level operations or connect with some database driver written in C. With the FFI(Foreign function interface) we can connect Python with other languages like C, C++ and even the new Rust. There are some alternatives to achieve this goal, Native Extensions, Ctypes and CFFI. I'll compare this three ways of extending Python.
Session 1 of Introduction to R for Data Science, Data Science Serbia in cooperation with Startit, Belgrade, lecturers: ing Branko Kovač and dr Goran S. Milovanović
We are building a SciKit-Learn based tools to detect anomalous behavior in DNS traffic, using three different algorithm with Machine Learning. This research work is not finished yet, so that this presentation will cover only the basic part of it; What we are doing now and what we are planing to deploy.
Information access over linked data requires to determine
subgraph(s), in linked data's underlying graph, that correspond to the required information need. Usually, an information access framework is able to retrieve richer information by checking of a large number of possible subgraphs. However, on the ecking of a large number of possible subgraphs increases information access complexity. This makes information access frameworks less eective. A large number of contemporary linked data information access frameworks reduce the complexity by introducing dierent heuristics but they suer on retrieving richer information. Or, some frameworks do not care about the complexity. However, a practically usable framework should retrieve richer information with lower complexity. In linked data information access, we hypothesize that pre-processed data statistics of linked data can be used to eciently check a large number of possible subgraphs. This will help to retrieve comparatively richer information with lower data access complexity. Preliminary evaluation of our proposed hypothesis shows promising performance.
Analysing streams of text data to extract topics is an important task for getting useful
insights to be leveraged in subsequent workflows. For example extracting topics from text to be
continuously ingested into a search engine can be useful to tag documents with important
keywords or concepts to be used at search time. Another use case is doing analysis of support
tickets to get insights on the most common problems for customers.
In this talk we illustrate how to use Flink's Dynamic processing capabilities to continuously train
topic models from unlabelled text and use such models to extract topics from the data itself.
Such topic models will be built leveraging distributed representations of words and documents.
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the semantics of the RDF query language SPARQL.
Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...DataWorks Summit
All the distributed file systems and NoSQL databases work very well during the normal operation. We can find the big differences if we investigate the behaviour in case of emergency. Data replication strategies and recovery algorithms are the key ingredients of a distributed data storage to save data.
Recent studies proved that the random replication is not the safest choice for storing data as it almost guarantees to lose data in the common scenario of simultaneous node failures. Copyset Replication method significantly reduces the frequency of data loss events with selecting the replica groups in a smart way.
In this talk we will introduce the key elements of a successful data replication and show how advanced data replication strategies could help to survive outages. We will show how Apache Hadoop Ozone solves the problem with advanced techniques and present the challenges of using Copyset algorithm with advanced cluster topology support.
Pipeline hazards | Structural Hazard, Data Hazard & Control Hazardbabuece
Audio Version available in YouTube Link : https://www.youtube.com/AKSHARAM?sub_confirmation=1
subscribe the channel
Computer Architecture and Organization
V semester
Anna University
By
Babu M, Assistant Professor
Department of ECE
RMK College of Engineering and Technology
Chennai
Extending Python, what is the best option for me?Codemotion
by Francisco Fernández Castaño - Python is a great language, but there are occasions where we need access to low level operations or connect with some database driver written in C. With the FFI(Foreign function interface) we can connect Python with other languages like C, C++ and even the new Rust. There are some alternatives to achieve this goal, Native Extensions, Ctypes and CFFI. I'll compare this three ways of extending Python.
Session 1 of Introduction to R for Data Science, Data Science Serbia in cooperation with Startit, Belgrade, lecturers: ing Branko Kovač and dr Goran S. Milovanović
We are building a SciKit-Learn based tools to detect anomalous behavior in DNS traffic, using three different algorithm with Machine Learning. This research work is not finished yet, so that this presentation will cover only the basic part of it; What we are doing now and what we are planing to deploy.
Information access over linked data requires to determine
subgraph(s), in linked data's underlying graph, that correspond to the required information need. Usually, an information access framework is able to retrieve richer information by checking of a large number of possible subgraphs. However, on the ecking of a large number of possible subgraphs increases information access complexity. This makes information access frameworks less eective. A large number of contemporary linked data information access frameworks reduce the complexity by introducing dierent heuristics but they suer on retrieving richer information. Or, some frameworks do not care about the complexity. However, a practically usable framework should retrieve richer information with lower complexity. In linked data information access, we hypothesize that pre-processed data statistics of linked data can be used to eciently check a large number of possible subgraphs. This will help to retrieve comparatively richer information with lower data access complexity. Preliminary evaluation of our proposed hypothesis shows promising performance.
Analysing streams of text data to extract topics is an important task for getting useful
insights to be leveraged in subsequent workflows. For example extracting topics from text to be
continuously ingested into a search engine can be useful to tag documents with important
keywords or concepts to be used at search time. Another use case is doing analysis of support
tickets to get insights on the most common problems for customers.
In this talk we illustrate how to use Flink's Dynamic processing capabilities to continuously train
topic models from unlabelled text and use such models to extract topics from the data itself.
Such topic models will be built leveraging distributed representations of words and documents.
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the semantics of the RDF query language SPARQL.
Losing Data in a Safe Way – Advanced Replication Strategies in Apache Hadoop ...DataWorks Summit
All the distributed file systems and NoSQL databases work very well during the normal operation. We can find the big differences if we investigate the behaviour in case of emergency. Data replication strategies and recovery algorithms are the key ingredients of a distributed data storage to save data.
Recent studies proved that the random replication is not the safest choice for storing data as it almost guarantees to lose data in the common scenario of simultaneous node failures. Copyset Replication method significantly reduces the frequency of data loss events with selecting the replica groups in a smart way.
In this talk we will introduce the key elements of a successful data replication and show how advanced data replication strategies could help to survive outages. We will show how Apache Hadoop Ozone solves the problem with advanced techniques and present the challenges of using Copyset algorithm with advanced cluster topology support.
Pipeline hazards | Structural Hazard, Data Hazard & Control Hazardbabuece
Audio Version available in YouTube Link : https://www.youtube.com/AKSHARAM?sub_confirmation=1
subscribe the channel
Computer Architecture and Organization
V semester
Anna University
By
Babu M, Assistant Professor
Department of ECE
RMK College of Engineering and Technology
Chennai
Building a Distributed Message Log from Scratch - SCaLE 16xTyler Treat
Apache Kafka has shown that the log is a powerful abstraction for data-intensive applications. It can play a key role in managing data and distributing it across the enterprise efficiently. Vital to any data plane is not just performance, but availability and scalability. In this session, we examine what a distributed log is, how it works, and how it can achieve these goals. Specifically, we'll discuss lessons learned while building NATS Streaming, a reliable messaging layer built on NATS that provides similar semantics. We'll cover core components like leader election, data replication, log persistence, and message delivery. Come learn about distributed systems!
Building a Distributed Message Log from ScratchTyler Treat
Apache Kafka has shown that the log is a powerful abstraction for data-intensive applications. It can play a key role in managing data and distributing it across the enterprise efficiently. Vital to any data plane is not just performance, but availability and scalability. In this session, we examine what a distributed log is, how it works, and how it can achieve these goals. Specifically, we'll discuss lessons learned while building NATS Streaming, a reliable messaging layer built on NATS that provides similar semantics. We'll cover core components like leader election, data replication, log persistence, and message delivery. Come learn about distributed systems!
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...DataStax
Have you ever wondered what is in all of those SSTable files and how it helps Cassandra find and manage your data? If you go to the Datastax website they will give you a high level explanation of what is in each file. In this talk we will go much deeper explaining each file and walking through a dump of its contents. We will also explore the differences between Cassandra 2.1 and 3.4.
About the Speaker
John Schulz Prinicipal Consultant, The Pythian Group
John has 40 of years experience working with data. Data in files and in Databases from flat files through ISAM to relational databases and most recently NoSQL. For the last 15 he's worked on a variety of Open source technologies including MySQL, PostgreSQL, Cassandra, Riak, Hadoop and Hbase. He has been working with Cassandra since 2010. For the last eighteen months he has been working for The Pythian Group to help their customers improve their existing databases and select new ones.
Design principles in pattern formation: Robustness and equivalencesMichael P.H. Stumpf
In this talk I outline the historically competing models of positional information and the Turing pattern mechanism. I discuss how we can distill design principles for these mechanisms, and the develop a distance between mathematical models based on simplicial complexes and persistent homology. This allows us to show that the two mechanisms are in fact not as different as had previously been claimed.
Building a Replicated Logging System with Apache KafkaGuozhang Wang
Apache Kafka is a scalable publish-subscribe messaging system
with its core architecture as a distributed commit log.
It was originally built as its centralized event
pipelining platform for online data integration tasks. Over
the past years developing and operating Kafka, we extend
its log-structured architecture as a replicated logging backbone
for much wider application scopes in the distributed
environment. I am going to talk about our design
and engineering experience to replicate Kafka logs for various
distributed data-driven systems, including
source-of-truth data storage and stream processing.
Similar to Lineage-driven Fault Injection, SIGMOD'15 (20)
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
2. The future is disorder
• Data-intensive systems are increasingly
distributed and heterogeneous
• Distributed systems suffer partial failures
• Fault-tolerant code is hard to get right
• Composing FT components is hard too!
3. Motivation: Kafka replication bug
Three correct components:
1. Primary/backup replication
2. Timeout-based failure detectors
3. Zookeeper
One nasty bug:
Acknowledged writes are lost
4. ‘Molly’ witnesses the bug
Replica b Replica c Zookeeper Replica a Client
1 1
2
1
3
4
CRASHED
1
3
5
m m
m l
m
a
c
w
5. ‘Molly’ witnesses the bug
Replica b Replica c Zookeeper Replica a Client
1 1
2
1
3
4
CRASHED
1
3
5
m m
m l
m
a
c
w
Brief network
partition
6. ‘Molly’ witnesses the bug
Replica b Replica c Zookeeper Replica a Client
1 1
2
1
3
4
CRASHED
1
3
5
m m
m l
m
a
c
w
Brief network
partition
a becomes
primary and
sole replica
7. ‘Molly’ witnesses the bug
Replica b Replica c Zookeeper Replica a Client
1 1
2
1
3
4
CRASHED
1
3
5
m m
m l
m
a
c
w
Brief network
partition
a becomes
primary and
sole replica
a ACKs
client write
8. ‘Molly’ witnesses the bug
Replica b Replica c Zookeeper Replica a Client
1 1
2
1
3
4
CRASHED
1
3
5
m m
m l
m
a
c
w
Brief network
partition
a becomes
primary and
sole replica
a ACKs
client write
Data
loss
9. Fault-tolerance:
the state of the art
1. Bottom-up approaches
(e.g. verification)
2. Top-down approaches
(e.g. fault injection)
Investment
Returns
Investment
Returns
10. Fault-tolerance:
the state of the art
1. Bottom-up approaches
(e.g. verification)
2. Top-down approaches
(e.g. fault injection)
Investment
Returns
Investment
Returns
11. 1. Bottom-up approaches
(e.g. verification)
2. Top-down approaches
(e.g. fault injection)
Fault-tolerance:
the state of the art
Investment
Returns
12. Fault-tolerance:
the state of the art
1. Bottom-up approaches
(e.g. verification)
2. Top-down approaches
(e.g. fault injection)
Investment
Returns
13. Fault-tolerance:
the state of the art
1. Bottom-up approaches
(e.g. verification)
2. Top-down approaches
(e.g. fault injection)
Investment
Returns
14. Fault-tolerance:
the state of the art
1. Bottom-up approaches
(e.g. verification)
2. Top-down approaches
(e.g. fault injection)
15. Lineage-driven fault injection
Goal: whole-system testing that
• finds all of the fault-tolerance bugs, or
• certifies that none exist
Main idea: fault-tolerance is redundancy.
16. Lineage-driven fault injection
Approach: think backwards from outcomes
Use lineage to find evidence of redundancy
Original Question:
• Could a bad thing ever happen?
Reframed question:
• Why did a good thing happen?
• What could have gone wrong?
17. A game
Protocol:
Reliable broadcast
Specification:
Pre: A correct process delivers a message m
Post: All correct process delivers m
Failure Model:
(Permanent) crash failures
Message loss / partitions
Program'
Output%
constraints%
18. Round 1
The broadcaster makes an attempt to
relay the message to the other nodes
“An effort” delivery protocol:
19. Round 1 in space / time
Process b Process a Process c
2
1
2
log log
26. An execution is a (fragile) “proof”
of an outcome
log(A, data)@1 node(A, B)@1
AB1 r2
log(B, data)@2
r1
log(B, data)@3
r1
log(B, data)@4
r1
log(B, data)@5
l
l
AB2
log(A, data)@1
r1
log(A, data)@2
r1
log(A, data)@3
node(A, B)@1
r3
node(A, B)@2
r3
node(A, B)@3
AB3 r2
l
l
l
(which required a message from A to B at time 1)
50. Let’s reflect
Intuition:
Fault-tolerance is redundancy in space and time.
Strategy:
Reason backwards from outcomes using lineage
Lineage exposes redundancy of outcome support.
Finding bugs: choose failures that “break” all derivations
Fixing bugs: add additional derivations
51. Automating the role of the adversary
1. Break a proof by dropping any
contributing message.
(AB1 ∨ BC2)
52. Automating the role of the adversary
1. Break a proof by dropping any
contributing message.
2. Find a set of failures that breaks all proofs
of a good outcome.
Disjunction
Conjunction of disjunctions (AKA CNF)
(AB1 ∨ BC2) ∧ (AC1) ∧ (AC2)
53. Automating the role of the adversary
1. Break a proof by dropping any
contributing message.
2. Find a set of failures that breaks all proofs
of a good outcome.
Disjunction
Conjunction of disjunctions (AKA CNF)
(AB1 ∨ BC2) ∧ (AC1) ∧ (AC2)
56. By injecting only “interesting” faults…
Molly provides guarantees that
outcomes are fault-tolerant
Program
Bound
Combina/ons
Execu/ons
redun-‐deliv
11
8.07
X
1018
11
ack-‐deliv
8
3.08
X
1013
673
paxos-‐synod
7
4.81
X
1011
173
bully-‐leader
10
1.26
X
1017
2
flux
22
6.20
X
1076
187
57. Molly, the LDFI prototype
Molly finds fault-tolerance violations
quickly or guarantees that none exist.
Molly uses data lineage to reason about
redundancy of support (or lack thereof)
for system outcomes.
58.
59. Case study: commit protocols
Agent a Agent a Coordinator Agent d
2 2
1
3
CRASHED
2
v v
p p p
v
2-Phase commit
Agent a Agent b Coordinator Agent d
2
3
4
5
6
2
3
4
5
6
1
2
3
CRASHED
2
3
4
5
6
vote
decision_req decision_req
vote
decision_req decision_req
prepare prepare prepare
vote
decision_req decision_req
Collaborative termination
Process a Process b Process C Process d
2
4
7
8
2
4
7
8
1
3
5
6
7
8
2
CRASHED
vote_msg
ack
commit
vote_msg
ack
commit
cancommit cancommit cancommit
precommit precommit precommit
abort (LOST) abort (LOST)
abort abort
vote_msg
3-Phase commit
60. 3PC in an asynchronous network
Process a Process b Process C Process d
2
4
7
8
2
4
7
8
1
3
5
6
7
8
2
CRASHED
vote_msg
ack
commit
vote_msg
ack
commit
cancommit cancommit cancommit
precommit precommit precommit
abort (LOST) abort (LOST)
abort abort
vote_msg
Brief network
partition
Agent crash
Agents learn
commit decision
d is dead; coordinator
decides to abort
Agents A & B
decide to
commit