SlideShare a Scribd company logo
1 of 77
Download to read offline
Unit IV
Dr. Lenin SB
Associate Professor/ECE
 Introduction to Reliability Evaluation Techniques –
 Reliability Models for Hardware Redundancy –
 Permanent faults only - Transient faults.
 Introduction to clock synchronization –
 A Non-Fault-Tolerant Synchronization Algorithm –
 Fault-Tolerant Synchronization in Hardware –
 Completely connected zero propagation time system –
 Sparse interconnection zero propagation time system –
 Fault tolerant analysis with Signal Propagation delays.
What is Reliability Evaluation?
 The process of determining whether an existing system / entity has
achieved a specified level of operational reliability (desired, agreed upon
or contracted behaviour).
Software Reliability Definition
The probability that the software will; operate as required (i.e., without fail),
for a specified time, in a specified environment.
Software Reliability – features
• Failures in software are design faults,
• Reliability during test changes continually (new problems are found as
old ones are fixed / new code is never perfect)
• Phenomenon of software reliability growth
• Environment is important (platform/inputs)
• New envt. may require s/w retest
Hardware Reliability - features
• failure is usually due to physical deterioration
• hardware reliability tends, more than software, towards a constant value,
• hardware reliability usually follows the ‘bathtub’ principle,
• again, environment is important; a proportion of hardware faults are
design faults
When we talk of reliability measures the irony is that we invariably talk
about failure measures.
There are four general ways of measuring failures against time;
 Time of failure,
 Interval between failures,
 Cumulative failures experienced up to a given time,
 Failures experienced in a time interval.
FAULTS
ERRORS
FAILURE
ENVIRONMENT
OPERATOR
INPUT
OR
REVEALING
MECHANISM
AND
LEADS TO ZERO OR MANY
LEADS TO ZERO OR MANY
POTENTIALLY
LEADS TO ZERO OR
MANY
MISTAKES
(PERSON
MAKES)
CAN BE ATTRIBUTED TO
ONE OR MANY
CAN BE ATTRIBUTED TO
ONE OR MANY
CAN BE ATTRIBUTED TO
ONE OR MANY
Hardware Reliability is ensured by conducting the following tests:
 Fault Tree Analysis
 Failure Modes Effects and Criticality Analysis
 Failsafe Tests
 Fault Injection Tests
 PCB Trace Analysis and Circuit Simulation
 Environmental Tests
Software Reliability is ensured by following the following Techniques:
 Defensive Programming
 To produce programs which detect anomalous control flow, data flow
 or data values during their execution and react to these in a predetermined and
acceptable manner.
 Fault Detection & Diagnosis
 To detect faults in a system, which might lead to a failure, thus providing the basis for
countermeasures in order to minimize the consequences of failures.
Error Detecting and Correcting Codes
 To detect and correct errors in sensitive information.
Diverse Programming
 Detect and mask residual software design faults during execution of a program, in order
to prevent Safety critical failures of the system, and to continue operation for high
reliability.
Software Error Effect Analysis
 To identify software modules, their criticality; to propose means for detecting software
errors and enhancing software robustness; to evaluate the amount of validation needed
on the various software components.
 Software Quality Audit
 Software Rule Checking
 Unit Testing
 Software Integration Tests
 Software/Hardware Integration Tests
 Fault Injection Tests
 System Validation
 Computers used in critical life applications must be so reliable that they
cannot be validated by experiment alone.
 The product of most computer companies, purely experimental approach
is impractical in such a case, to get around this difficulty, we use
mathematical models of reliability.
 We construct a mathematical model of the real-time computer, and solve
it. By doing this, we are adding one possible source of error and the
assumptions of the mathematical model.
 The correctness of the assumptions is a necessary condition of the
correctness of the predictions of the model.
 Reliability of a real-time system is one of its most important
characteristics, as real-time systems are used for mostly critical systems,
where the margin of error should be non-existent.
 Due to the potential loss of life or damages to system or process at hand.
Degradation of systems, is heavily monitored to minimize risks and
failures. This is to ensure down-time is as close to ‘0’ as possible. This
also helps to improve any impacts of profits.
 For example, and embedded pacemaker, if these devices were not
completely accurate and reliable this could result in alteration in the
regular heart beats, which could cause loss of life to the patient if it’s not
completelyreliable.
 Most of the difficult problem in reliability modeling is to keep the complexity of models
sufficiently small.
 When the various parameters of the model are exponentially distributed result in an
unacceptable complexity for all current techniques are used to reduce the complexity of
such models consist largely of state aggregation.
 In which multiple states are grouped together and treated as a single state and
decomposition, in which the overall model is broken down into sub models, each sub
model is solved.
 The overall model is broken down into sub models. These techniques are approximations
only, but approximations mandated by the underlying difficulty of the problem.
 The reliability of components is usually specified through a probability distribution function
of the lifetime of those components.
For example,
 If failures occur as a Poisson process with rate 𝜆, the lifetime distribution is given by,
𝐹𝑙 𝑡 = 1 − exp⁡(−𝜆𝑡)
 If failures occur as a weibull distribution process with a SHAPE parameter α and scale
parameter 𝛌, the lifetime distribution is given by, 𝐹𝑙 𝑡 = 1 − 𝑒𝑥𝑝⁡(−,𝜆𝑡- 𝛼
)
 We will denote by fl(t)the associated density function (we will assume here that Fl(t) is
differentiable).
 The hazard rate h(t) of a component with age t is defined as the rate of failure at time t,
given that it has not failed up to time t.
 We can use Bayes’s law to express the hazard rate as function of the lifetime distribution
function.
 h(t)dt = prob{system fails in [t, t+dt] | system has not failed up to t}
 ⁡=⁡
𝑝𝑟𝑜𝑏*𝑠𝑦𝑠𝑡𝑒𝑚⁡𝑓𝑎𝑖𝑙𝑠⁡𝑖𝑛⁡ 𝑡,𝑡+𝑑𝑡 ∩𝑠𝑦𝑠𝑡𝑒𝑚⁡ℎ𝑎𝑠⁡𝑛𝑜𝑡⁡𝑓𝑎𝑖𝑙𝑒𝑑⁡𝑢𝑝⁡𝑡𝑜⁡𝑡+
𝑝𝑟𝑜𝑏*𝑠𝑦𝑠𝑡𝑒𝑚⁡ℎ𝑎𝑠⁡𝑛𝑜𝑡⁡𝑓𝑎𝑖𝑙𝑒𝑑⁡𝑢𝑝⁡𝑡𝑜⁡𝑡+
=
𝑓 𝑙 𝑡 𝑑𝑡
1−𝐹𝑙 𝑡
 If the failure process is poisson with rate 𝛌, h t =⁡
𝜆𝑒−𝜆𝑡
𝑒−𝜆𝑡
= ⁡λ
Note: Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event
∴ ℎ 𝑡 =⁡
𝑓𝑙 𝑡
1 −⁡𝐹𝑙 𝑡
 If the failure process is weibull with shape and scale parameters α and 𝛌,
 h(t)⁡= ⁡𝛼𝜆(𝜆𝑡) 𝛼−1
0 < 𝛼 < 1, 𝑡ℎ𝑒𝑛⁡ℎ 𝑡 𝑑𝑒𝑐𝑟𝑒𝑎𝑠𝑒𝑠⁡𝑤𝑖𝑡ℎ⁡𝑡𝑖𝑚𝑒
𝛼 = 1, 𝑡ℎ𝑒⁡𝑓𝑎𝑖𝑙𝑢𝑟𝑒⁡𝑝𝑟𝑜𝑐𝑒𝑠𝑠⁡𝑖𝑠⁡𝑝𝑜𝑖𝑠𝑠𝑜𝑛
𝛼 > 1, ℎ 𝑡 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒𝑠⁡𝑤𝑖𝑡ℎ⁡𝑡𝑖𝑚𝑒.
a Bathtub Curve
life time distributions, for λ =1.
 Many real life components have a hazard rate shaped according to the bath tub curve,
shown in figure. In the beginning the hazard rate is quite high, and then it begins to drop.
 This is known as infant-mortality phase, where components with manufacturing defects are
cleared out.
 The rate then becomes approximately constant, before aging effects set in and cause the
hazard rate to rise with age.
Note: a plot of the empirical cumulative distribution function of data on special axes in a type of Q-Q plot
 Series – parallel systems
 NMR clusters
 Combinatorial model
 Markov chain model
 Voter reliability
 In series connection if any of the components fails, result in system failure.
 In parallel connection all the components to fail before the system fails. R(𝑐𝑖)⁡denotes the
reliability over an given interval [0,t] of component 𝑐𝑖
 Consider N Modular Redundant cluster.
 Faulty processors are immediately identified and disconnected from the system
 System will always consist of good processor only.
 There is no repair.
 The system will fail only if there are fewer than two functional processors left in the system.
 Since there is no repair, all the failures are assumed to be permanent. The probability of
system failure over this interval is given by,
Prob{system failure in[0,t]} =
𝑝𝑟𝑜𝑏*𝑒𝑥𝑎𝑐𝑡𝑙𝑦⁡𝑖⁡𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟𝑠⁡𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑎𝑙⁡𝑎𝑡⁡𝑡+𝑙
𝑖=0
Stage Error Sources Error Detection
Specification & Design Algorithm Design Formal Specification Consistency Checks Simulation
Prototype
Algorithm Design Wiring & AssemblyTiming
ComponentFailure
Stimulus/Response Testing
Manufacture Wiring &Assembly ComponentFailure System Testing Diagnostics
Installation Assembly Component Failure System Testing Diagnostics
Field Operation
ComponentFailure Operator Errors
Environmental Factors
Diagnostics
 MTTF: Mean Time to Failure or Expected Life
 MTTF: Mean Time To (first) Failure is defined as the expected value of tf
 where λ is the failure rate
 MTTF of a system is the expected time of the first failure in a sample of identical initially
perfect systems.
 MTTR: Mean Time To Repair is defined as the expected time for repair.
 MTBF: Mean Time Between Failure
MTTF = E(t)= R(t)dt =1/λ
Availability =
MTBF/(MTBF+MTTR)
 Building a reliable serial system is extraordinarily difficult and expensive.
 For example: if one is to build a serial system with 100 components each of which had a
reliability of 0.999, the overall system reliability would be 0.999100 = 0.905
 Reliability of System of Components
 Minimal Path Set:
 Minimal set of components whose functioning ensures the functioning of the system:
{1,3,4} {2,3,4} {1,5} {2,5}
 Parallel Connected Components
 Qk(t) is 1 − Rk(t):
 Qk(t) = 1 − e−λkt
 Assuming the failure rates of components are statistically independent n
Qpar (t) =Q (t)
 Overall system reliability: Rpar (t) = 1 − (1 − Ri(t))
 Parallel and Serial Connected Components
 Total reliability is the reliability of the first half, in serial with the second half.
 Given R1=0.9, R2=0.9, R3=0.99, R4=0.99, R5=0.87
 Rt = (1 − (1 − 0.9)(1 − 0.9))(1 − (1 − 0.87)(1 − (0.99 × 0.99))) = 0.987
What is a fault?
Fault is an erroneous state of software or hardware resulting from failures of its
components
• Fault Sources
• Design errors
• Manufacturing Problems
• External disturbances
• Harsh environmental conditions
• System Misuse
• Mechanical -- “wears out”
• Deterioration: wear, fatigue, corrosion
• Shock: fractures, overload, etc.
• Electronic Hardware -- “bad fabrication; wearsout”
• Latent manufacturing defects
• Operating environment: noise, heat, ESD, electro-migration
• Design defects
• Software -- “bad design”
• Design defects
• “Code rot” -- accumulated run-time faults
• People
• Can take a whole lecture content...
Failure: Component does not provide service
Fault:Adefect within a system
Error:Adeviation from the required operation of the system or subsystem
Extent: Local (independent) or Distributed (related)
Value:
Determinate
Indeterminate (varying values)
Duration:
Transient
Intermittent
Permanent
There is four-fold categorization to deal with the system faults and increase system reliability
and/oravailability.
• Methods for MinimizingFaults
• Fault Avoidance: How to prevent the fault occurrence. Increase reliability by
conservative design and use high reliability components.
• Fault Tolerance: How to provide the service complying with the specification in spite
of faults having occurred or occurring.
• Fault Tolerance: How to provide the service complying with the specification in spite of
faults having occurred or occurring.
• Fault Removal: How to minimize the presence of faults.
• Fault Forecasting: How to estimate the presence, occurrence, and the consequences
of faults.
• Fault-Tolerance is the ability of a computer system to survive in the presence of faults.
Input
Primary
Rollback and try alternate
version Failed
Failed and alternates
exhausted
Passed Output
Recovery Memory
• Fault recovery technique's success depends on the detection of faults accurately and as
early as possible.
• Three classes of recovery procedures:
• Full Recovery
• It requires all the aspects of fault tolerant computing.
• Degraded recovery: Also referred as graceful degradation. Similar to full recovery but no
subsystem is switched-in.
• Defectivecomponent is takenout of service.
• Suited for multiprocessors.
• Safe Shutdown
Forward Recovery
• Produces correct results through continuation of normal processing.
• Highly application dependent
Backward Recovery
• Some redundant process and state information is recorded with the progress of
computation.
• Rollback the interrupted process to a point for which the correct information is
available.
• e.g. Retry, Check pointing, Journaling
• Reliability
• Serial Reliability, Parallel Reliability, System Reliability
• Fault Tolerance
• Hardware,Software
Issue
• Synchronization within one system is hard enough
• Semaphores
• Messages
• Monitors
• Synchronization among processes in a distributed system is much harder
• Time is an interesting and Important issue
• Ex. At what time in a day a particular event occurred at a particular computer.. Consistency
(use of timestamp for serialization), e-commerce, authentication etc.
• Algorithms that depend upon clock synchronization have been developed for several
problems.
• Due to loose synchrony, the notion of physical time is problematic in DS
• There is no absolute physical “global time” in DS
• How time is really measured?
• Earlier: Solar day, solar second, mean solar second
• Solar day: time between two consequtive transits of the sun
• Solar second: 1/86400 of a solar day
• Mean solar day: average length of a solar day
• Problem: solar day gets longer because of slowdown of earth rotation due to friction (300
million years ago there were 400 days per year)
• International Atomic Time (TAI): number of ticks of Cesium 133 atom since 1/1/58
(atomic second)
• Atom clock: one second defined as (since 1967) 9,192,631,770 transitions of the atom
Cesium 133
• Because of slowdown of earth, leap seconds have to be introduced
• Correction of TAI is called Universal Coordinated Time (UTC): 30 leap seconds
introduced so far
• Network Time Protocol (NTP) can synchronize globally with an accuracy of up to 50
msec
 TAI seconds are of constant length, unlike solar seconds. Leap seconds are introduced
when necessary to keep in phase with the sun.
• Let C(t) be a perfect clock
• A clock Ci(t) is called correct at time t if Ci(t) = C(t)
• A clock Ci(t) is called accurate at time t if dCi(t)/dt = dC(t)/dt = 1
• Two clocks Ci(t) and Ck(t) are synchronized at time t if Ci(t) = Ck(t)
• Computers contain physical clock (crystal oscillator)
• Physical time t, hardware time Hi(t), software time Ci(t)
• The clock output can be read by SW and scaled into a suitable time unit and the value can be
used to timestamp any event Ci(t) = Hi(t) + 
• Clock skew: The instantaneous difference between the readings of any
two clocks
• Clock drift: Crystal-based clocks count time at different rates, and so diverge.
• Underlying oscillators are subject to physical variations, with the consequence that their
frequencies of oscillation differ
• Even the same clock’s freq. varies with temp.
• Designs exists that attempt to compensate for this variation but they cannot eliminate it.
• The diff in the oscillations between two clocks might be small, but the difference accumulated
over many oscillations leads to an observable difference
• For clocks based on a quartz crystal, the drift is about 10–6 sec/sec – giving a difference
of one second every 1,000,000 sec or 11.6 days.
You want to catch the bus at 5pm in the stop, but your watch is off by 15
minutes
• What if your watch is Late by 15 minutes?
• What if your watch is Fast by 15 minutes?
Synchronization is required for
• Correctness
• Fairness
Airline reservation system
• Server A receives a client request to purchase last ticket on flight ABC 123.
• Server A timestamps purchase using local clock 9h:15m:32.45s, and logs it. Replies ok
to client.
• That was the last seat. Server A sends message to Server B saying “flight full.”
• B enters “Flight ABC 123 full” + local clock value (which reads 9h:10m:10.11s) into its
log.
• Server C queries A’s and B’s logs. Is confused that a client purchased a ticket after the
flight became full.
• May execute incorrect or unfair actions.
• An Asynchronous Distributed System (DS) consists of a number of processes.
• Each process has a state (values of variables).
• Each process takes actions to change its state, which may be an instruction or a communication
action (send, receive).
• An event is the occurrence of an action.
• Each process has a local clock – events within a process can be assigned timestamps, and thus
ordered linearly.
• But – in a DS, we also need to know the time order of events across different processes.
 Clocks across processes are not synchronized in an asynchronous DS
(unlike in a multiprocessor/parallel system, where they are). So…
1. Process clocks can be different
2. Need algorithms for either (a) time synchronization, or (b) for telling which event happened before which
• In a DS, each process has its own clock.
• Clock Skew versus Drift
• Clock Skew = Relative Difference in clock values of twoprocesses
• Clock Drift = Relative Difference in clock frequencies (rates) of twoprocesses
• A non-zero clock drift causes skew to increase (eventually).
• Maximum Drift Rate (MDR) of a clock
• Absolute MDR is defined relative to Coordinated Universal Time (UTC). UTC is the
“correct” time at any point of time.
• MDR of a process depends on the environment.
• Max drift rate between two clocks with similar MDR is 2 * MDR
• Max-Synch-Interval = (MaxAcceptableSkew—CurrentSkew)/ (MDR * 2)
• (i.e., time = distance/speed)
• If the UTC time is t and the process i’s time is Ci(t) then ideally we would like to have Ci(t)
= t, or dC/dt = 1.
• In practice, we use a tolerance variable , such that
• In external synchronization, clock is synchronized with an authoritative external source of time
• In internal synchronization clocks are synchronized with one another with a known degree of
accuracy
  11
dt
dC
• Ci(t): the reading of the software clock at process i when the real time is t.
• External synchronization: For a synchronization bound D>0, and for source S of UTC
time, for i=1,2,...,N and for all real times t. Clocks Ci are externally
accurate to within the bound D.
• In external synchronization, clock is synchronized with an authoritative external source of
time
• Internal synchronization: For a synchronization bound D>0, for i, j=1,2,...,N
and for all real times t. Clocks Ci are internally accurate within the bound D.
,)()( DtCtS i 
DtCtC ji  )()(
• In internal synchronization clocks are synchronized with one another with a known degree
of accuracy
• External synchronization with D  Internal synchronization with 2D
• Internal synchronization with D  External synchronization with ??
• UTC signals are synchronized and broadcast regularly from land-based
radio stations and satellites covering many parts of the world
• E.g. in the US the radio station WWV broadcasts time signals on several short-wave
frequencies
• Satellite sources include Geo-stationary Operational Environmental Satellites (GOES)
and the GPS
• Radio waves travel at near the speed of light. The propagation delay can be accounted
for if the exact speed and the distance from the source are known
• Unfortunately, the propagation speed varies with atmospheric conditions – leading to
inaccuracy
• Accuracy of a received signal is a function of both the accuracy of the source and its
distance from the source through the atmosphere
 The relation between clock time and UTC when clocks tick at different rates.
Problem: Show that, in order to
guarantee that no two clocks differ
by more than , clocks must be
resynchronized at least every /2
seconds.
• The constant r is specified by the manufacturer and is known as the maximum drift rate.
• If two clocks are drifting from the Universal Coordinated Time (UTC) in opposite direction,
at a time Δt after they are synchronized, they may be as much as 2*ρ*Δt apart.
• If the operating system designer want to guarantee that no two clocks ever differ by more
than δ, clocks must be synchronized at least every δ/2 ρ seconds.
Remember the definition of synchronous distributed system?
• Known bounds for message delay, clock drift rate and execution time.
• Clock synchronization is easy in this case
• In practice most DS are asynchronous.
• Cristian’s Algorithm
• The Berkeley Algorithm
• Consider internal synch between two process in a synch DS
• P sends time t on its local clock to Q in a msg m
• In principle, Q could set its clock to the time t + Ttrans, where Ttrans is the time taken to
transmit m between them
• The two processes would then agree (internal synch)
• Unfortunately, Ttrans is subject to variation and is unknown
• All processes are competing for resources with P and Q and other messages are
competing with m for the network
• But there is always a minimum transmission time min that would be obtained if no other
processes executed and no other network traffic existed
• min can be measured or conservatively estimated
• In synch system, by definition, there is also an upper bound max on the time taken to
transmit any message
• Let the uncertainty in the msg transmission time be u, so that u = (max – min)
• If Q sets its clock to be (t + min), then clock skew may be as much as u (since the message may
in fact have taken time max to arrive).
• If Q sets it to (t + max), the skew may again be as large as u.
• If, however, Q sets it clock to (t + (max + min)/2), then the skew is at most u/2.
• In general, for a synch system, the optimum bound that can be achieved on clock skew when
synchronizing N clocks is u(1-1/N)
• For an asynchronous system Ttrans = min + x, where x >=0
Asynchronous system
• Achieves synchronization only if the observed RTT between the client and server is sufficiently
short compared with the required accuracy.
Observations:
• RTT between processes are reasonably short in practice, yet theoretically unbounded
• Practical estimate possible if RTT is sufficiently short in comparison to required accuracy
• In LAN RTT should be around 1-10ms during which a clock with a drift rate of 10-6s/s varies by at
most 10-5ms. Hence the estimate of RTT is reasonably accurate
• A coordinator (time server): master
• Just the opposite approach of Cristian’s algorithm
• Periodically the master polls the time of each client (slave) whose clocks are to be synchronized.
• Based on the answer (by observing the RTT as in Cristian’s algorithm), it computes the average
(including its own clock value) and broadcasts the new time.
• This method is suitable for a system in which no machine has a WWV receiver getting the
UTC.
• The time daemon’s time must be set manually by the operator periodically.
• The balance of probabilities is that the average cancels out the individual clock’s
tendencies to run fast or slow
• The accuracy depends upon a nominal maximum RTT between the master and the slaves
• The master eliminates any occasional readings associated with larger times than this
maximum
• Instead of sending the updated current time back to the comps – which will introduce
further uncertainty due to message transmission time – the master send the amount by
which each individual slave’s clock requires adjustment (+ or - )
• The algorithm eliminates readings from faulty clocks (since these could have significant
adverse effects if an ordinary average was taken) – a subset of clock is chosen that do not
differ by more than a specified amount and then the average is taken.
• The time daemon asks all the other machines for their clock values
• The machines answer
• The time daemon tells everyone how to adjust their clock
• Both Cristian’s and Berkeley’s methods are highly centralized, with the usual
disadvantages - single point of failure, congestion around the server, … etc.
• One class of decentralized clock synchronization algorithms works by dividing time into
fixed-length re-synchronization intervals.
• The ith interval starts at T0 + iR and runs until T0 + (i+1)R, where T0 is an agreed upon
moment in the past, and R is a system parameter.
• At the beginning of each interval, every machine broadcasts the current time according to
its clock.
• After a machine broadcasts its time, it starts a local timer to collect all other broadcasts
that arrive during some interval S.
• When all broadcasts arrive, an algorithm is run to compute a new time.
• Some algorithms:
• average out the time.
• discard the m highest and m lowest and average the rest -- this is to prevent up to m
faulty clocks sending out nonsense
• correct each message by adding to it an estimate of propagation time from the source.
This estimate can be made from the known topology of the network, or by timing how
long it takes for probe message to be echoed.
Cristian’s and Berkeley algorithms  synch within intranet
• NTP – defines an architecture for a time service and a protocol to distribute
time information over the Internet
• Provides a service enabling clients across the Internet to be synchronized accurately to
UTC
• Provides a reliable service that can survive lengthy losses of connectivity
• Enables client to resynchronize sufficiently frequently to offset clock drifts
• Provides protection against interference with the time service
• Uses a network of time servers to synchronize all processes on a network.
• Time servers are connected by a synchronization subnet tree. The root is in touch with
UTC. Each node synchronizes its children nodes.
Secondary servers, synched
by the primary server
Primary server, direct synch.
Strata 3, synched by the
secondary servers
1
2 2
3 3 3 3 3 3
2
• t and t’: actual transmission times for m and m’(unknown)
• o: true offset of clock at B relative to clock at A
• oi: estimateof actual offset betweenthe twoclocks
• di: estimate of accuracy of oi ; total transmission times for m and
m’; di=t+t’
Ti
Ti-1Ti-2
Ti-3
Server B
Server A
Time
m m'
Time
i-2T = i-3T +t + o
iT = i-1T +t'-o
This leads to
id = t +t' = i-2T - i-3T + iT - i-1T
o = io +(t'-t) / 2, where
io = ( i-2T - i-3T + i-1T - iT ) / 2.
It can then be shown that
io - id / 2 £ o £ io + id / 2.
• NTP servers apply a data filtering algorithm to successive pairs < oi , di> which estimates
the offset o and calculates the quality of this estimate as a statistical quantity called the
filter dispersion.
• The eight most recent pairs < oi , di> are retained
• The value of oi that corresponds to the min value of di is chosen to estimate o.
• Compare time Ts provided by the time server to time Tc at computer C
• If Ts > Tc (e.g. 9:07 am vs 9:05 am), could advance C’s time to Ts
• May miss some clock ticks, probably OK
• If Ts < Tc (e.g. 9:07 am vs 9:10 am), cannot rollback C’s time to Ts
• Many applications assume that time always advances
• The solution is not to set C’s clock back – but can cause C’s clock to run slowly until it
resynchronizes with the time server
• This can be achieved in SW, w/o changing the rate at which the HW clock ticks (an
operation which is not always supported by HW clocks)
• Calculation …
• Value received from UTC receiver is only accurate to within 0.1–10 milliseconds
• At best, we can synchronize clocks to within 10–30 milliseconds of each other
• We have to synchronize frequently, to avoid local clock drift
• Time synchronization important for distributed systems
• Cristian’s algorithm
• Berkeley algorithm
• NTP
• Relative order of events enough for practical purposes
• Lamport’s logical clocks
• Vector clocks

More Related Content

What's hot

Chapter 3 Charateristics and Quality Attributes of Embedded System
Chapter 3 Charateristics and Quality Attributes of Embedded SystemChapter 3 Charateristics and Quality Attributes of Embedded System
Chapter 3 Charateristics and Quality Attributes of Embedded SystemMoe Moe Myint
 
ASIP (Application-specific instruction-set processor)
ASIP (Application-specific instruction-set processor)ASIP (Application-specific instruction-set processor)
ASIP (Application-specific instruction-set processor)Hamid Reza
 
Device drivers and interrupt service mechanism
Device drivers and interrupt service mechanismDevice drivers and interrupt service mechanism
Device drivers and interrupt service mechanismVijay Kumar
 
Signal Integrity Asif
Signal Integrity AsifSignal Integrity Asif
Signal Integrity AsifMohammed Asif
 
Unit 4 Real Time Operating System
Unit 4 Real Time Operating SystemUnit 4 Real Time Operating System
Unit 4 Real Time Operating SystemDr. Pankaj Zope
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsReal Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsHariharan Ganesan
 
Unit 1 intro-embedded
Unit 1 intro-embeddedUnit 1 intro-embedded
Unit 1 intro-embeddedPavithra S
 
Trends in Embedded system Design
Trends in Embedded system DesignTrends in Embedded system Design
Trends in Embedded system DesignRaman Deep
 
Embedded System Tools ppt
Embedded System Tools  pptEmbedded System Tools  ppt
Embedded System Tools pptHalai Hansika
 
Embedded Firmware Design and Development, and EDLC
Embedded Firmware Design and Development, and EDLCEmbedded Firmware Design and Development, and EDLC
Embedded Firmware Design and Development, and EDLCJuliaAndrews11
 
Embedded Systems
Embedded SystemsEmbedded Systems
Embedded SystemsNavin Kumar
 
Chapter 8 Embedded Hardware Design and Development (second portion)
Chapter 8 Embedded Hardware Design and Development (second portion)Chapter 8 Embedded Hardware Design and Development (second portion)
Chapter 8 Embedded Hardware Design and Development (second portion)Moe Moe Myint
 
Embedded systems notes
Embedded systems notesEmbedded systems notes
Embedded systems notesShikha Sharma
 
Embedded system design process_models
Embedded system design process_modelsEmbedded system design process_models
Embedded system design process_modelsRavi Selvaraj
 
Design of embedded systems
Design of embedded systemsDesign of embedded systems
Design of embedded systemsPradeep Kumar TS
 

What's hot (20)

Chapter 3 Charateristics and Quality Attributes of Embedded System
Chapter 3 Charateristics and Quality Attributes of Embedded SystemChapter 3 Charateristics and Quality Attributes of Embedded System
Chapter 3 Charateristics and Quality Attributes of Embedded System
 
ASIP (Application-specific instruction-set processor)
ASIP (Application-specific instruction-set processor)ASIP (Application-specific instruction-set processor)
ASIP (Application-specific instruction-set processor)
 
Device drivers and interrupt service mechanism
Device drivers and interrupt service mechanismDevice drivers and interrupt service mechanism
Device drivers and interrupt service mechanism
 
Signal Integrity Asif
Signal Integrity AsifSignal Integrity Asif
Signal Integrity Asif
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalization
 
Unit 4 Real Time Operating System
Unit 4 Real Time Operating SystemUnit 4 Real Time Operating System
Unit 4 Real Time Operating System
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsReal Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systems
 
E.s unit 6
E.s unit 6E.s unit 6
E.s unit 6
 
Unit 1 intro-embedded
Unit 1 intro-embeddedUnit 1 intro-embedded
Unit 1 intro-embedded
 
Hardware-Software Codesign
Hardware-Software CodesignHardware-Software Codesign
Hardware-Software Codesign
 
Trends in Embedded system Design
Trends in Embedded system DesignTrends in Embedded system Design
Trends in Embedded system Design
 
Embedded System Tools ppt
Embedded System Tools  pptEmbedded System Tools  ppt
Embedded System Tools ppt
 
Embedded Firmware Design and Development, and EDLC
Embedded Firmware Design and Development, and EDLCEmbedded Firmware Design and Development, and EDLC
Embedded Firmware Design and Development, and EDLC
 
Task assignment and scheduling
Task assignment and schedulingTask assignment and scheduling
Task assignment and scheduling
 
Embedded Systems
Embedded SystemsEmbedded Systems
Embedded Systems
 
Chapter 8 Embedded Hardware Design and Development (second portion)
Chapter 8 Embedded Hardware Design and Development (second portion)Chapter 8 Embedded Hardware Design and Development (second portion)
Chapter 8 Embedded Hardware Design and Development (second portion)
 
UART
UARTUART
UART
 
Embedded systems notes
Embedded systems notesEmbedded systems notes
Embedded systems notes
 
Embedded system design process_models
Embedded system design process_modelsEmbedded system design process_models
Embedded system design process_models
 
Design of embedded systems
Design of embedded systemsDesign of embedded systems
Design of embedded systems
 

Similar to Reliability Evaluation Techniques

Application of theorem proving for safety-critical vehicle software
Application of theorem proving for safety-critical vehicle softwareApplication of theorem proving for safety-critical vehicle software
Application of theorem proving for safety-critical vehicle softwareAdaCore
 
Software reliability
Software reliabilitySoftware reliability
Software reliabilityAnand Kumar
 
SE2018_Lec 19_ Software Testing
SE2018_Lec 19_ Software TestingSE2018_Lec 19_ Software Testing
SE2018_Lec 19_ Software TestingAmr E. Mohamed
 
Presentation
PresentationPresentation
Presentations1150056
 
Presentation
PresentationPresentation
Presentations1150056
 
[2019]Version-based Microservice Analysis Monitoring and Visualization
[2019]Version-based Microservice Analysis Monitoring and Visualization[2019]Version-based Microservice Analysis Monitoring and Visualization
[2019]Version-based Microservice Analysis Monitoring and VisualizationChenChunYu2
 
Quality Assurance
Quality AssuranceQuality Assurance
Quality AssuranceKiran Kumar
 
Continuous Performance Testing
Continuous Performance TestingContinuous Performance Testing
Continuous Performance TestingMark Price
 
Ch15 software reliability
Ch15 software reliabilityCh15 software reliability
Ch15 software reliabilityAbraham Paul
 
2011-05-02 - VU Amsterdam - Testing safety critical systems
2011-05-02 - VU Amsterdam - Testing safety critical systems2011-05-02 - VU Amsterdam - Testing safety critical systems
2011-05-02 - VU Amsterdam - Testing safety critical systemsJaap van Ekris
 
Smc EDA System Rotary Machine Insulation Analysis
Smc EDA System Rotary Machine Insulation AnalysisSmc EDA System Rotary Machine Insulation Analysis
Smc EDA System Rotary Machine Insulation AnalysisErika Herbozo
 
JMeter - Performance testing your webapp
JMeter - Performance testing your webappJMeter - Performance testing your webapp
JMeter - Performance testing your webappAmit Solanki
 
Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17koolkampus
 
2010-03-31 - VU Amsterdam - Experiences testing safety critical systems
2010-03-31 - VU Amsterdam - Experiences testing safety critical systems2010-03-31 - VU Amsterdam - Experiences testing safety critical systems
2010-03-31 - VU Amsterdam - Experiences testing safety critical systemsJaap van Ekris
 
2015 05-07 - vu amsterdam - testing safety critical systems
2015 05-07 - vu amsterdam - testing safety critical systems2015 05-07 - vu amsterdam - testing safety critical systems
2015 05-07 - vu amsterdam - testing safety critical systemsJaap van Ekris
 
SE2_Lec 20_Software Testing
SE2_Lec 20_Software TestingSE2_Lec 20_Software Testing
SE2_Lec 20_Software TestingAmr E. Mohamed
 
Netcetera Proactive Management Service
Netcetera Proactive Management ServiceNetcetera Proactive Management Service
Netcetera Proactive Management ServicePeter Skelton
 
2012A8PS309P_AbhishekKumar_FinalReport
2012A8PS309P_AbhishekKumar_FinalReport2012A8PS309P_AbhishekKumar_FinalReport
2012A8PS309P_AbhishekKumar_FinalReportabhishekroushan
 
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...ijccmsjournal
 

Similar to Reliability Evaluation Techniques (20)

Application of theorem proving for safety-critical vehicle software
Application of theorem proving for safety-critical vehicle softwareApplication of theorem proving for safety-critical vehicle software
Application of theorem proving for safety-critical vehicle software
 
Software reliability
Software reliabilitySoftware reliability
Software reliability
 
SE2018_Lec 19_ Software Testing
SE2018_Lec 19_ Software TestingSE2018_Lec 19_ Software Testing
SE2018_Lec 19_ Software Testing
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
[2019]Version-based Microservice Analysis Monitoring and Visualization
[2019]Version-based Microservice Analysis Monitoring and Visualization[2019]Version-based Microservice Analysis Monitoring and Visualization
[2019]Version-based Microservice Analysis Monitoring and Visualization
 
Quality Assurance
Quality AssuranceQuality Assurance
Quality Assurance
 
Continuous Performance Testing
Continuous Performance TestingContinuous Performance Testing
Continuous Performance Testing
 
Ch15 software reliability
Ch15 software reliabilityCh15 software reliability
Ch15 software reliability
 
2011-05-02 - VU Amsterdam - Testing safety critical systems
2011-05-02 - VU Amsterdam - Testing safety critical systems2011-05-02 - VU Amsterdam - Testing safety critical systems
2011-05-02 - VU Amsterdam - Testing safety critical systems
 
Smc EDA System Rotary Machine Insulation Analysis
Smc EDA System Rotary Machine Insulation AnalysisSmc EDA System Rotary Machine Insulation Analysis
Smc EDA System Rotary Machine Insulation Analysis
 
JMeter - Performance testing your webapp
JMeter - Performance testing your webappJMeter - Performance testing your webapp
JMeter - Performance testing your webapp
 
FMEA Presentation V1.1
FMEA Presentation V1.1FMEA Presentation V1.1
FMEA Presentation V1.1
 
Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17
 
2010-03-31 - VU Amsterdam - Experiences testing safety critical systems
2010-03-31 - VU Amsterdam - Experiences testing safety critical systems2010-03-31 - VU Amsterdam - Experiences testing safety critical systems
2010-03-31 - VU Amsterdam - Experiences testing safety critical systems
 
2015 05-07 - vu amsterdam - testing safety critical systems
2015 05-07 - vu amsterdam - testing safety critical systems2015 05-07 - vu amsterdam - testing safety critical systems
2015 05-07 - vu amsterdam - testing safety critical systems
 
SE2_Lec 20_Software Testing
SE2_Lec 20_Software TestingSE2_Lec 20_Software Testing
SE2_Lec 20_Software Testing
 
Netcetera Proactive Management Service
Netcetera Proactive Management ServiceNetcetera Proactive Management Service
Netcetera Proactive Management Service
 
2012A8PS309P_AbhishekKumar_FinalReport
2012A8PS309P_AbhishekKumar_FinalReport2012A8PS309P_AbhishekKumar_FinalReport
2012A8PS309P_AbhishekKumar_FinalReport
 
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...
A Novel Approach to Derive the Average-Case Behavior of Distributed Embedded ...
 

More from Sri Manakula Vinayagar Engineering College

Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Sri Manakula Vinayagar Engineering College
 

More from Sri Manakula Vinayagar Engineering College (20)

IoT Methodology.pptx
IoT Methodology.pptxIoT Methodology.pptx
IoT Methodology.pptx
 
ACNS UNIT-5.pdf
ACNS UNIT-5.pdfACNS UNIT-5.pdf
ACNS UNIT-5.pdf
 
2. ACNS UNIT-1.pptx
2. ACNS UNIT-1.pptx2. ACNS UNIT-1.pptx
2. ACNS UNIT-1.pptx
 
1. ACNS UNIT-1.pptx
1. ACNS UNIT-1.pptx1. ACNS UNIT-1.pptx
1. ACNS UNIT-1.pptx
 
7. Multi-operator D2D communication.pptx
7. Multi-operator D2D communication.pptx7. Multi-operator D2D communication.pptx
7. Multi-operator D2D communication.pptx
 
11. New challenges in the 5G modelling.pptx
11. New challenges in the 5G modelling.pptx11. New challenges in the 5G modelling.pptx
11. New challenges in the 5G modelling.pptx
 
8. Simulation methodology.pptx
8. Simulation methodology.pptx8. Simulation methodology.pptx
8. Simulation methodology.pptx
 
10. Calibration.pptx
10. Calibration.pptx10. Calibration.pptx
10. Calibration.pptx
 
9. Evaluation methodology.pptx
9. Evaluation methodology.pptx9. Evaluation methodology.pptx
9. Evaluation methodology.pptx
 
4. Ultra Reliable and Low Latency Communications.pptx
4. Ultra Reliable and Low Latency Communications.pptx4. Ultra Reliable and Low Latency Communications.pptx
4. Ultra Reliable and Low Latency Communications.pptx
 
1. Massive Machine-Type Communication.pptx
1. Massive Machine-Type Communication.pptx1. Massive Machine-Type Communication.pptx
1. Massive Machine-Type Communication.pptx
 
1. Coordinated Multi-Point Transmission in 5G.pptx
1. Coordinated Multi-Point Transmission in 5G.pptx1. Coordinated Multi-Point Transmission in 5G.pptx
1. Coordinated Multi-Point Transmission in 5G.pptx
 
Real time operating systems
Real time operating systemsReal time operating systems
Real time operating systems
 
Low power embedded system design
Low power embedded system designLow power embedded system design
Low power embedded system design
 
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
 
Telecommunication systems
Telecommunication systemsTelecommunication systems
Telecommunication systems
 
Home appliances
Home appliancesHome appliances
Home appliances
 
loudspeakers and microphones
loudspeakers and microphonesloudspeakers and microphones
loudspeakers and microphones
 
Television standards and systems
Television standards and systemsTelevision standards and systems
Television standards and systems
 
Optical recording and reproduction
Optical recording and reproductionOptical recording and reproduction
Optical recording and reproduction
 

Recently uploaded

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 

Recently uploaded (20)

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 

Reliability Evaluation Techniques

  • 1. Unit IV Dr. Lenin SB Associate Professor/ECE
  • 2.  Introduction to Reliability Evaluation Techniques –  Reliability Models for Hardware Redundancy –  Permanent faults only - Transient faults.  Introduction to clock synchronization –  A Non-Fault-Tolerant Synchronization Algorithm –  Fault-Tolerant Synchronization in Hardware –  Completely connected zero propagation time system –  Sparse interconnection zero propagation time system –  Fault tolerant analysis with Signal Propagation delays.
  • 3. What is Reliability Evaluation?  The process of determining whether an existing system / entity has achieved a specified level of operational reliability (desired, agreed upon or contracted behaviour).
  • 4. Software Reliability Definition The probability that the software will; operate as required (i.e., without fail), for a specified time, in a specified environment. Software Reliability – features • Failures in software are design faults, • Reliability during test changes continually (new problems are found as old ones are fixed / new code is never perfect) • Phenomenon of software reliability growth • Environment is important (platform/inputs) • New envt. may require s/w retest
  • 5. Hardware Reliability - features • failure is usually due to physical deterioration • hardware reliability tends, more than software, towards a constant value, • hardware reliability usually follows the ‘bathtub’ principle, • again, environment is important; a proportion of hardware faults are design faults
  • 6. When we talk of reliability measures the irony is that we invariably talk about failure measures. There are four general ways of measuring failures against time;  Time of failure,  Interval between failures,  Cumulative failures experienced up to a given time,  Failures experienced in a time interval.
  • 7. FAULTS ERRORS FAILURE ENVIRONMENT OPERATOR INPUT OR REVEALING MECHANISM AND LEADS TO ZERO OR MANY LEADS TO ZERO OR MANY POTENTIALLY LEADS TO ZERO OR MANY MISTAKES (PERSON MAKES) CAN BE ATTRIBUTED TO ONE OR MANY CAN BE ATTRIBUTED TO ONE OR MANY CAN BE ATTRIBUTED TO ONE OR MANY
  • 8. Hardware Reliability is ensured by conducting the following tests:  Fault Tree Analysis  Failure Modes Effects and Criticality Analysis  Failsafe Tests  Fault Injection Tests  PCB Trace Analysis and Circuit Simulation  Environmental Tests
  • 9. Software Reliability is ensured by following the following Techniques:  Defensive Programming  To produce programs which detect anomalous control flow, data flow  or data values during their execution and react to these in a predetermined and acceptable manner.  Fault Detection & Diagnosis  To detect faults in a system, which might lead to a failure, thus providing the basis for countermeasures in order to minimize the consequences of failures.
  • 10. Error Detecting and Correcting Codes  To detect and correct errors in sensitive information. Diverse Programming  Detect and mask residual software design faults during execution of a program, in order to prevent Safety critical failures of the system, and to continue operation for high reliability. Software Error Effect Analysis  To identify software modules, their criticality; to propose means for detecting software errors and enhancing software robustness; to evaluate the amount of validation needed on the various software components.
  • 11.  Software Quality Audit  Software Rule Checking  Unit Testing  Software Integration Tests  Software/Hardware Integration Tests  Fault Injection Tests  System Validation
  • 12.  Computers used in critical life applications must be so reliable that they cannot be validated by experiment alone.  The product of most computer companies, purely experimental approach is impractical in such a case, to get around this difficulty, we use mathematical models of reliability.  We construct a mathematical model of the real-time computer, and solve it. By doing this, we are adding one possible source of error and the assumptions of the mathematical model.  The correctness of the assumptions is a necessary condition of the correctness of the predictions of the model.
  • 13.  Reliability of a real-time system is one of its most important characteristics, as real-time systems are used for mostly critical systems, where the margin of error should be non-existent.  Due to the potential loss of life or damages to system or process at hand. Degradation of systems, is heavily monitored to minimize risks and failures. This is to ensure down-time is as close to ‘0’ as possible. This also helps to improve any impacts of profits.  For example, and embedded pacemaker, if these devices were not completely accurate and reliable this could result in alteration in the regular heart beats, which could cause loss of life to the patient if it’s not completelyreliable.
  • 14.  Most of the difficult problem in reliability modeling is to keep the complexity of models sufficiently small.  When the various parameters of the model are exponentially distributed result in an unacceptable complexity for all current techniques are used to reduce the complexity of such models consist largely of state aggregation.  In which multiple states are grouped together and treated as a single state and decomposition, in which the overall model is broken down into sub models, each sub model is solved.  The overall model is broken down into sub models. These techniques are approximations only, but approximations mandated by the underlying difficulty of the problem.
  • 15.  The reliability of components is usually specified through a probability distribution function of the lifetime of those components. For example,  If failures occur as a Poisson process with rate 𝜆, the lifetime distribution is given by, 𝐹𝑙 𝑡 = 1 − exp⁡(−𝜆𝑡)  If failures occur as a weibull distribution process with a SHAPE parameter α and scale parameter 𝛌, the lifetime distribution is given by, 𝐹𝑙 𝑡 = 1 − 𝑒𝑥𝑝⁡(−,𝜆𝑡- 𝛼 )  We will denote by fl(t)the associated density function (we will assume here that Fl(t) is differentiable).
  • 16.  The hazard rate h(t) of a component with age t is defined as the rate of failure at time t, given that it has not failed up to time t.  We can use Bayes’s law to express the hazard rate as function of the lifetime distribution function.  h(t)dt = prob{system fails in [t, t+dt] | system has not failed up to t}  ⁡=⁡ 𝑝𝑟𝑜𝑏*𝑠𝑦𝑠𝑡𝑒𝑚⁡𝑓𝑎𝑖𝑙𝑠⁡𝑖𝑛⁡ 𝑡,𝑡+𝑑𝑡 ∩𝑠𝑦𝑠𝑡𝑒𝑚⁡ℎ𝑎𝑠⁡𝑛𝑜𝑡⁡𝑓𝑎𝑖𝑙𝑒𝑑⁡𝑢𝑝⁡𝑡𝑜⁡𝑡+ 𝑝𝑟𝑜𝑏*𝑠𝑦𝑠𝑡𝑒𝑚⁡ℎ𝑎𝑠⁡𝑛𝑜𝑡⁡𝑓𝑎𝑖𝑙𝑒𝑑⁡𝑢𝑝⁡𝑡𝑜⁡𝑡+ = 𝑓 𝑙 𝑡 𝑑𝑡 1−𝐹𝑙 𝑡  If the failure process is poisson with rate 𝛌, h t =⁡ 𝜆𝑒−𝜆𝑡 𝑒−𝜆𝑡 = ⁡λ Note: Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event ∴ ℎ 𝑡 =⁡ 𝑓𝑙 𝑡 1 −⁡𝐹𝑙 𝑡
  • 17.  If the failure process is weibull with shape and scale parameters α and 𝛌,  h(t)⁡= ⁡𝛼𝜆(𝜆𝑡) 𝛼−1 0 < 𝛼 < 1, 𝑡ℎ𝑒𝑛⁡ℎ 𝑡 𝑑𝑒𝑐𝑟𝑒𝑎𝑠𝑒𝑠⁡𝑤𝑖𝑡ℎ⁡𝑡𝑖𝑚𝑒 𝛼 = 1, 𝑡ℎ𝑒⁡𝑓𝑎𝑖𝑙𝑢𝑟𝑒⁡𝑝𝑟𝑜𝑐𝑒𝑠𝑠⁡𝑖𝑠⁡𝑝𝑜𝑖𝑠𝑠𝑜𝑛 𝛼 > 1, ℎ 𝑡 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒𝑠⁡𝑤𝑖𝑡ℎ⁡𝑡𝑖𝑚𝑒. a Bathtub Curve life time distributions, for λ =1.
  • 18.  Many real life components have a hazard rate shaped according to the bath tub curve, shown in figure. In the beginning the hazard rate is quite high, and then it begins to drop.  This is known as infant-mortality phase, where components with manufacturing defects are cleared out.  The rate then becomes approximately constant, before aging effects set in and cause the hazard rate to rise with age. Note: a plot of the empirical cumulative distribution function of data on special axes in a type of Q-Q plot
  • 19.
  • 20.
  • 21.
  • 22.  Series – parallel systems  NMR clusters  Combinatorial model  Markov chain model  Voter reliability
  • 23.  In series connection if any of the components fails, result in system failure.  In parallel connection all the components to fail before the system fails. R(𝑐𝑖)⁡denotes the reliability over an given interval [0,t] of component 𝑐𝑖
  • 24.  Consider N Modular Redundant cluster.  Faulty processors are immediately identified and disconnected from the system  System will always consist of good processor only.  There is no repair.
  • 25.  The system will fail only if there are fewer than two functional processors left in the system.  Since there is no repair, all the failures are assumed to be permanent. The probability of system failure over this interval is given by, Prob{system failure in[0,t]} = 𝑝𝑟𝑜𝑏*𝑒𝑥𝑎𝑐𝑡𝑙𝑦⁡𝑖⁡𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟𝑠⁡𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑎𝑙⁡𝑎𝑡⁡𝑡+𝑙 𝑖=0
  • 26. Stage Error Sources Error Detection Specification & Design Algorithm Design Formal Specification Consistency Checks Simulation Prototype Algorithm Design Wiring & AssemblyTiming ComponentFailure Stimulus/Response Testing Manufacture Wiring &Assembly ComponentFailure System Testing Diagnostics Installation Assembly Component Failure System Testing Diagnostics Field Operation ComponentFailure Operator Errors Environmental Factors Diagnostics
  • 27.  MTTF: Mean Time to Failure or Expected Life  MTTF: Mean Time To (first) Failure is defined as the expected value of tf  where λ is the failure rate  MTTF of a system is the expected time of the first failure in a sample of identical initially perfect systems.  MTTR: Mean Time To Repair is defined as the expected time for repair.  MTBF: Mean Time Between Failure MTTF = E(t)= R(t)dt =1/λ
  • 29.  Building a reliable serial system is extraordinarily difficult and expensive.  For example: if one is to build a serial system with 100 components each of which had a reliability of 0.999, the overall system reliability would be 0.999100 = 0.905  Reliability of System of Components  Minimal Path Set:  Minimal set of components whose functioning ensures the functioning of the system: {1,3,4} {2,3,4} {1,5} {2,5}
  • 30.  Parallel Connected Components  Qk(t) is 1 − Rk(t):  Qk(t) = 1 − e−λkt  Assuming the failure rates of components are statistically independent n Qpar (t) =Q (t)  Overall system reliability: Rpar (t) = 1 − (1 − Ri(t))
  • 31.  Parallel and Serial Connected Components  Total reliability is the reliability of the first half, in serial with the second half.  Given R1=0.9, R2=0.9, R3=0.99, R4=0.99, R5=0.87  Rt = (1 − (1 − 0.9)(1 − 0.9))(1 − (1 − 0.87)(1 − (0.99 × 0.99))) = 0.987
  • 32. What is a fault? Fault is an erroneous state of software or hardware resulting from failures of its components • Fault Sources • Design errors • Manufacturing Problems • External disturbances • Harsh environmental conditions • System Misuse
  • 33. • Mechanical -- “wears out” • Deterioration: wear, fatigue, corrosion • Shock: fractures, overload, etc. • Electronic Hardware -- “bad fabrication; wearsout” • Latent manufacturing defects • Operating environment: noise, heat, ESD, electro-migration • Design defects • Software -- “bad design” • Design defects • “Code rot” -- accumulated run-time faults • People • Can take a whole lecture content...
  • 34. Failure: Component does not provide service Fault:Adefect within a system Error:Adeviation from the required operation of the system or subsystem Extent: Local (independent) or Distributed (related) Value: Determinate Indeterminate (varying values) Duration: Transient Intermittent Permanent
  • 35. There is four-fold categorization to deal with the system faults and increase system reliability and/oravailability. • Methods for MinimizingFaults • Fault Avoidance: How to prevent the fault occurrence. Increase reliability by conservative design and use high reliability components. • Fault Tolerance: How to provide the service complying with the specification in spite of faults having occurred or occurring.
  • 36. • Fault Tolerance: How to provide the service complying with the specification in spite of faults having occurred or occurring. • Fault Removal: How to minimize the presence of faults. • Fault Forecasting: How to estimate the presence, occurrence, and the consequences of faults. • Fault-Tolerance is the ability of a computer system to survive in the presence of faults.
  • 37. Input Primary Rollback and try alternate version Failed Failed and alternates exhausted Passed Output Recovery Memory
  • 38. • Fault recovery technique's success depends on the detection of faults accurately and as early as possible. • Three classes of recovery procedures: • Full Recovery • It requires all the aspects of fault tolerant computing. • Degraded recovery: Also referred as graceful degradation. Similar to full recovery but no subsystem is switched-in. • Defectivecomponent is takenout of service. • Suited for multiprocessors. • Safe Shutdown
  • 39. Forward Recovery • Produces correct results through continuation of normal processing. • Highly application dependent Backward Recovery • Some redundant process and state information is recorded with the progress of computation. • Rollback the interrupted process to a point for which the correct information is available. • e.g. Retry, Check pointing, Journaling
  • 40. • Reliability • Serial Reliability, Parallel Reliability, System Reliability • Fault Tolerance • Hardware,Software
  • 41. Issue • Synchronization within one system is hard enough • Semaphores • Messages • Monitors • Synchronization among processes in a distributed system is much harder
  • 42. • Time is an interesting and Important issue • Ex. At what time in a day a particular event occurred at a particular computer.. Consistency (use of timestamp for serialization), e-commerce, authentication etc. • Algorithms that depend upon clock synchronization have been developed for several problems. • Due to loose synchrony, the notion of physical time is problematic in DS • There is no absolute physical “global time” in DS
  • 43. • How time is really measured? • Earlier: Solar day, solar second, mean solar second • Solar day: time between two consequtive transits of the sun • Solar second: 1/86400 of a solar day • Mean solar day: average length of a solar day • Problem: solar day gets longer because of slowdown of earth rotation due to friction (300 million years ago there were 400 days per year)
  • 44. • International Atomic Time (TAI): number of ticks of Cesium 133 atom since 1/1/58 (atomic second) • Atom clock: one second defined as (since 1967) 9,192,631,770 transitions of the atom Cesium 133 • Because of slowdown of earth, leap seconds have to be introduced • Correction of TAI is called Universal Coordinated Time (UTC): 30 leap seconds introduced so far • Network Time Protocol (NTP) can synchronize globally with an accuracy of up to 50 msec
  • 45.  TAI seconds are of constant length, unlike solar seconds. Leap seconds are introduced when necessary to keep in phase with the sun.
  • 46. • Let C(t) be a perfect clock • A clock Ci(t) is called correct at time t if Ci(t) = C(t) • A clock Ci(t) is called accurate at time t if dCi(t)/dt = dC(t)/dt = 1 • Two clocks Ci(t) and Ck(t) are synchronized at time t if Ci(t) = Ck(t)
  • 47. • Computers contain physical clock (crystal oscillator) • Physical time t, hardware time Hi(t), software time Ci(t) • The clock output can be read by SW and scaled into a suitable time unit and the value can be used to timestamp any event Ci(t) = Hi(t) +  • Clock skew: The instantaneous difference between the readings of any two clocks • Clock drift: Crystal-based clocks count time at different rates, and so diverge.
  • 48. • Underlying oscillators are subject to physical variations, with the consequence that their frequencies of oscillation differ • Even the same clock’s freq. varies with temp. • Designs exists that attempt to compensate for this variation but they cannot eliminate it. • The diff in the oscillations between two clocks might be small, but the difference accumulated over many oscillations leads to an observable difference • For clocks based on a quartz crystal, the drift is about 10–6 sec/sec – giving a difference of one second every 1,000,000 sec or 11.6 days.
  • 49. You want to catch the bus at 5pm in the stop, but your watch is off by 15 minutes • What if your watch is Late by 15 minutes? • What if your watch is Fast by 15 minutes? Synchronization is required for • Correctness • Fairness
  • 50. Airline reservation system • Server A receives a client request to purchase last ticket on flight ABC 123. • Server A timestamps purchase using local clock 9h:15m:32.45s, and logs it. Replies ok to client. • That was the last seat. Server A sends message to Server B saying “flight full.” • B enters “Flight ABC 123 full” + local clock value (which reads 9h:10m:10.11s) into its log. • Server C queries A’s and B’s logs. Is confused that a client purchased a ticket after the flight became full. • May execute incorrect or unfair actions.
  • 51. • An Asynchronous Distributed System (DS) consists of a number of processes. • Each process has a state (values of variables). • Each process takes actions to change its state, which may be an instruction or a communication action (send, receive). • An event is the occurrence of an action. • Each process has a local clock – events within a process can be assigned timestamps, and thus ordered linearly. • But – in a DS, we also need to know the time order of events across different processes.  Clocks across processes are not synchronized in an asynchronous DS (unlike in a multiprocessor/parallel system, where they are). So… 1. Process clocks can be different 2. Need algorithms for either (a) time synchronization, or (b) for telling which event happened before which
  • 52. • In a DS, each process has its own clock. • Clock Skew versus Drift • Clock Skew = Relative Difference in clock values of twoprocesses • Clock Drift = Relative Difference in clock frequencies (rates) of twoprocesses • A non-zero clock drift causes skew to increase (eventually). • Maximum Drift Rate (MDR) of a clock • Absolute MDR is defined relative to Coordinated Universal Time (UTC). UTC is the “correct” time at any point of time. • MDR of a process depends on the environment. • Max drift rate between two clocks with similar MDR is 2 * MDR • Max-Synch-Interval = (MaxAcceptableSkew—CurrentSkew)/ (MDR * 2) • (i.e., time = distance/speed)
  • 53. • If the UTC time is t and the process i’s time is Ci(t) then ideally we would like to have Ci(t) = t, or dC/dt = 1. • In practice, we use a tolerance variable , such that • In external synchronization, clock is synchronized with an authoritative external source of time • In internal synchronization clocks are synchronized with one another with a known degree of accuracy   11 dt dC
  • 54. • Ci(t): the reading of the software clock at process i when the real time is t. • External synchronization: For a synchronization bound D>0, and for source S of UTC time, for i=1,2,...,N and for all real times t. Clocks Ci are externally accurate to within the bound D. • In external synchronization, clock is synchronized with an authoritative external source of time • Internal synchronization: For a synchronization bound D>0, for i, j=1,2,...,N and for all real times t. Clocks Ci are internally accurate within the bound D. ,)()( DtCtS i  DtCtC ji  )()(
  • 55. • In internal synchronization clocks are synchronized with one another with a known degree of accuracy • External synchronization with D  Internal synchronization with 2D • Internal synchronization with D  External synchronization with ??
  • 56. • UTC signals are synchronized and broadcast regularly from land-based radio stations and satellites covering many parts of the world • E.g. in the US the radio station WWV broadcasts time signals on several short-wave frequencies • Satellite sources include Geo-stationary Operational Environmental Satellites (GOES) and the GPS
  • 57. • Radio waves travel at near the speed of light. The propagation delay can be accounted for if the exact speed and the distance from the source are known • Unfortunately, the propagation speed varies with atmospheric conditions – leading to inaccuracy • Accuracy of a received signal is a function of both the accuracy of the source and its distance from the source through the atmosphere
  • 58.  The relation between clock time and UTC when clocks tick at different rates. Problem: Show that, in order to guarantee that no two clocks differ by more than , clocks must be resynchronized at least every /2 seconds.
  • 59. • The constant r is specified by the manufacturer and is known as the maximum drift rate. • If two clocks are drifting from the Universal Coordinated Time (UTC) in opposite direction, at a time Δt after they are synchronized, they may be as much as 2*ρ*Δt apart. • If the operating system designer want to guarantee that no two clocks ever differ by more than δ, clocks must be synchronized at least every δ/2 ρ seconds.
  • 60. Remember the definition of synchronous distributed system? • Known bounds for message delay, clock drift rate and execution time. • Clock synchronization is easy in this case • In practice most DS are asynchronous. • Cristian’s Algorithm • The Berkeley Algorithm
  • 61. • Consider internal synch between two process in a synch DS • P sends time t on its local clock to Q in a msg m • In principle, Q could set its clock to the time t + Ttrans, where Ttrans is the time taken to transmit m between them • The two processes would then agree (internal synch)
  • 62. • Unfortunately, Ttrans is subject to variation and is unknown • All processes are competing for resources with P and Q and other messages are competing with m for the network • But there is always a minimum transmission time min that would be obtained if no other processes executed and no other network traffic existed • min can be measured or conservatively estimated
  • 63. • In synch system, by definition, there is also an upper bound max on the time taken to transmit any message • Let the uncertainty in the msg transmission time be u, so that u = (max – min) • If Q sets its clock to be (t + min), then clock skew may be as much as u (since the message may in fact have taken time max to arrive). • If Q sets it to (t + max), the skew may again be as large as u. • If, however, Q sets it clock to (t + (max + min)/2), then the skew is at most u/2. • In general, for a synch system, the optimum bound that can be achieved on clock skew when synchronizing N clocks is u(1-1/N) • For an asynchronous system Ttrans = min + x, where x >=0
  • 64. Asynchronous system • Achieves synchronization only if the observed RTT between the client and server is sufficiently short compared with the required accuracy. Observations: • RTT between processes are reasonably short in practice, yet theoretically unbounded • Practical estimate possible if RTT is sufficiently short in comparison to required accuracy • In LAN RTT should be around 1-10ms during which a clock with a drift rate of 10-6s/s varies by at most 10-5ms. Hence the estimate of RTT is reasonably accurate
  • 65. • A coordinator (time server): master • Just the opposite approach of Cristian’s algorithm • Periodically the master polls the time of each client (slave) whose clocks are to be synchronized. • Based on the answer (by observing the RTT as in Cristian’s algorithm), it computes the average (including its own clock value) and broadcasts the new time. • This method is suitable for a system in which no machine has a WWV receiver getting the UTC. • The time daemon’s time must be set manually by the operator periodically. • The balance of probabilities is that the average cancels out the individual clock’s tendencies to run fast or slow
  • 66. • The accuracy depends upon a nominal maximum RTT between the master and the slaves • The master eliminates any occasional readings associated with larger times than this maximum • Instead of sending the updated current time back to the comps – which will introduce further uncertainty due to message transmission time – the master send the amount by which each individual slave’s clock requires adjustment (+ or - ) • The algorithm eliminates readings from faulty clocks (since these could have significant adverse effects if an ordinary average was taken) – a subset of clock is chosen that do not differ by more than a specified amount and then the average is taken.
  • 67. • The time daemon asks all the other machines for their clock values • The machines answer • The time daemon tells everyone how to adjust their clock
  • 68. • Both Cristian’s and Berkeley’s methods are highly centralized, with the usual disadvantages - single point of failure, congestion around the server, … etc. • One class of decentralized clock synchronization algorithms works by dividing time into fixed-length re-synchronization intervals. • The ith interval starts at T0 + iR and runs until T0 + (i+1)R, where T0 is an agreed upon moment in the past, and R is a system parameter. • At the beginning of each interval, every machine broadcasts the current time according to its clock. • After a machine broadcasts its time, it starts a local timer to collect all other broadcasts that arrive during some interval S. • When all broadcasts arrive, an algorithm is run to compute a new time.
  • 69. • Some algorithms: • average out the time. • discard the m highest and m lowest and average the rest -- this is to prevent up to m faulty clocks sending out nonsense • correct each message by adding to it an estimate of propagation time from the source. This estimate can be made from the known topology of the network, or by timing how long it takes for probe message to be echoed.
  • 70. Cristian’s and Berkeley algorithms  synch within intranet • NTP – defines an architecture for a time service and a protocol to distribute time information over the Internet • Provides a service enabling clients across the Internet to be synchronized accurately to UTC • Provides a reliable service that can survive lengthy losses of connectivity • Enables client to resynchronize sufficiently frequently to offset clock drifts • Provides protection against interference with the time service
  • 71. • Uses a network of time servers to synchronize all processes on a network. • Time servers are connected by a synchronization subnet tree. The root is in touch with UTC. Each node synchronizes its children nodes. Secondary servers, synched by the primary server Primary server, direct synch. Strata 3, synched by the secondary servers 1 2 2 3 3 3 3 3 3 2
  • 72. • t and t’: actual transmission times for m and m’(unknown) • o: true offset of clock at B relative to clock at A • oi: estimateof actual offset betweenthe twoclocks • di: estimate of accuracy of oi ; total transmission times for m and m’; di=t+t’ Ti Ti-1Ti-2 Ti-3 Server B Server A Time m m' Time i-2T = i-3T +t + o iT = i-1T +t'-o This leads to id = t +t' = i-2T - i-3T + iT - i-1T o = io +(t'-t) / 2, where io = ( i-2T - i-3T + i-1T - iT ) / 2. It can then be shown that io - id / 2 £ o £ io + id / 2.
  • 73. • NTP servers apply a data filtering algorithm to successive pairs < oi , di> which estimates the offset o and calculates the quality of this estimate as a statistical quantity called the filter dispersion. • The eight most recent pairs < oi , di> are retained • The value of oi that corresponds to the min value of di is chosen to estimate o.
  • 74. • Compare time Ts provided by the time server to time Tc at computer C • If Ts > Tc (e.g. 9:07 am vs 9:05 am), could advance C’s time to Ts • May miss some clock ticks, probably OK • If Ts < Tc (e.g. 9:07 am vs 9:10 am), cannot rollback C’s time to Ts • Many applications assume that time always advances
  • 75. • The solution is not to set C’s clock back – but can cause C’s clock to run slowly until it resynchronizes with the time server • This can be achieved in SW, w/o changing the rate at which the HW clock ticks (an operation which is not always supported by HW clocks) • Calculation …
  • 76. • Value received from UTC receiver is only accurate to within 0.1–10 milliseconds • At best, we can synchronize clocks to within 10–30 milliseconds of each other • We have to synchronize frequently, to avoid local clock drift
  • 77. • Time synchronization important for distributed systems • Cristian’s algorithm • Berkeley algorithm • NTP • Relative order of events enough for practical purposes • Lamport’s logical clocks • Vector clocks