1
BetaC 2021 by Dr. Paul Battaglia as prepared at Florida Tech
MGT5061-FA21
Systems & Log Support Mgt
note15-W05
Chapter 2 RMA (aka RAM)
We are
here
Introduction to the Course
Ch 1 - Introduction to logistics
Ch 2 - Reliability, Maintainability, and Availability Measures
Ch 3 - Measures of Logistics and System Support
Ch 4 -The Systems Engineering Process
Ch 5 - Logistics and Supportability Analysis
Ch 6- Logistics in System Design and Development
Ch 7 - Logistics in the Production/Construction Phase
Ch 8- Logistics in the System Utilization, Sustaining Support,
and Retirement Phases
Ch 9- Logistics Management
2
We originally skipped chapter 2 (reliability, maintainability,
and availability measures).
We jumped from chapter 1 (introduction) to chapter 3
(measures of logistics and system support).
Chapter 3 is a more general case of systemic metrics.
On the other hand, Chapter 2 on reliability, maintainability,
and availability is pretty specific.
It seemed to me that Chapters 2 and 3 were in reverse sequence
in the subject matter as addressed by Blanchard.
At any rate, here we are in chapter 2.
3
On page iv look at how chapter 2 is developed.
** We know there is an introduction
2.1 Then Reliability measures & factors
2.2 Maintainability measures and factors
2.3 Availability (measures) and factors
That is RMA.
-=-=
Often referred to in a slightly different sequence, yes?
RAM.
4
On page 73, in section 2.4 look at the summary.
** Back on page 11 in figure 1.4 we saw that there are a wide
variety of logistics and system support activities for both
** forward; and
** reverse flows.
** How well the system successfully accomplishes these activities
is due, in large part, to how the system is designed.
*** BUT do note that design is not the entire story.
There are other factors!
5
One big considerations is the behavior of employees/people in
general.
An example?
GTYA!
We can make an electrical connector so that it only fits one
way.
Thereby helping to ensure that the circuit is “always”
connected properly and will work.
But inevitably [some] people will try to force the connector in
an improper orientation.
** It is very hard to make a system or product that is
entirely
employee proof (or aka GI proof).
-=-=
Glad that you asked.
** Availability is a function of reliability and maintainability
and “other” considerations
A = f (R,M,O)
Now we see why the chapter is titled reliability,
maintainability, and then availability. It aligns better with the
equation.
In this view, the term RAM basically has the sequence
not-quite-correct.
** Also we should to look at a “systems approach”. Consider
other factors such as
** software
** people
** facilities
** data
** etc (all the factors mentioned in the work so far)
6
** Chapter 2 focuses on terms and metrics for reliability ...
1BetaC 2021 by Dr. Paul Battaglia as prepared at Florida Te
1. 1
BetaC 2021 by Dr. Paul Battaglia as prepared at Florida Tech
MGT5061-FA21
Systems & Log Support Mgt
note15-W05
Chapter 2 RMA (aka RAM)
We are
here
Introduction to the Course
Ch 1 - Introduction to logistics
Ch 2 - Reliability, Maintainability, and Availability Measures
Ch 3 - Measures of Logistics and System Support
Ch 4 -The Systems Engineering Process
Ch 5 - Logistics and Supportability Analysis
Ch 6- Logistics in System Design and Development
Ch 7 - Logistics in the Production/Construction Phase
Ch 8- Logistics in the System Utilization, Sustaining Support,
and Retirement Phases
Ch 9- Logistics Management
2
We originally skipped chapter 2 (reliability, maintainability,
and availability measures).
We jumped from chapter 1 (introduction) to chapter 3
2. (measures of logistics and system support).
Chapter 3 is a more general case of systemic metrics.
On the other hand, Chapter 2 on reliability, maintainability,
and availability is pretty specific.
It seemed to me that Chapters 2 and 3 were in reverse sequence
in the subject matter as addressed by Blanchard.
At any rate, here we are in chapter 2.
3
On page iv look at how chapter 2 is developed.
** We know there is an introduction
2.1 Then Reliability measures & factors
2.2 Maintainability measures and factors
2.3 Availability (measures) and factors
That is RMA.
-=-=
Often referred to in a slightly different sequence, yes?
RAM.
4
3. On page 73, in section 2.4 look at the summary.
** Back on page 11 in figure 1.4 we saw that there are a wide
variety of logistics and system support activities for both
** forward; and
** reverse flows.
** How well the system successfully accomplishes these
activities
is due, in large part, to how the system is designed.
*** BUT do note that design is not the entire story.
There are other factors!
5
One big considerations is the behavior of employees/people in
general.
An example?
GTYA!
We can make an electrical connector so that it only fits one
way.
Thereby helping to ensure that the circuit is “always”
connected properly and will work.
But inevitably [some] people will try to force the connector in
an improper orientation.
4. ** It is very hard to make a system or product that is
entirely
employee proof (or aka GI proof).
-=-=
Glad that you asked.
** Availability is a function of reliability and maintainability
and “other” considerations
A = f (R,M,O)
Now we see why the chapter is titled reliability,
maintainability, and then availability. It aligns better with the
equation.
In this view, the term RAM basically has the sequence
not-quite-correct.
** Also we should to look at a “systems approach”. Consi der
other factors such as
** software
** people
** facilities
** data
** etc (all the factors mentioned in the work so far)
6
** Chapter 2 focuses on terms and metrics for reliability &
maintainability & (in invisible ink) availability.
5. After all, availability has a section in the chapter; but for
some reason availability was omitted from the summary.
7
8
On page 46 we have a few introductory paragraphs.
** One view of logistics is all considerations needed to (help-
see
note) ensure effective and economical support of a system
throughout the system's life cycle.
** We want the system to:
** perform (work) as intended (to meet the need)
** be available when required
** be cost-effective when being used
Indeed, be cost-effective even when not being
used!
** Further, Blanchard tenants (which seems reasonable) that
meeting these characteristics is heavily influenced by the goals
and
characteristics in the system design.
-=-=-=-
More than wordsmithing? In a probabilistic world we can only
“help
ensure” a particular action. We cannot ensure (make 100%
certain!) that
an event will happen. Even 6-sigma is not 100%? The best laid
6. plans of
mice and men .......
9
** A key measure is availability.
** And availability is a function of reliability and
maintainability (and as he said in the summary, the “others” of
the system).
hence our “old friend” A = f (R, M, O)
The sequence in the book is reliability, maintainabili ty,
and then availability.
So we flip the equation and have f (R, M, O) = A
Of course, the book then drops “other” so f (R, M) =
A
10
On page 46 in footnote 1 Blanchard comments on the use of
statistics,
probability density functions, and the like.
Hopefully you recall some of this.
11
7. In the following notes I am not going to make an extensive
effort to cover every point or equation.
You can read along and follow the algebra (and where he has
quantitative examples you should do the math).
I will try to hit the major takeaways.
12
On page 47,
2.1 addresses reliability.
Reliability is the
** probability that a system or product
** will perform in a satisfactory manner
** for a given period of time
** when used under specified operating conditions.
Reliability can be expressed as R(t).
13
Recall that the total probability of all possible outcomes should
total to 1.
*** Actually for most probability functio ns it will be
close to 1.
Remember that depending upon the probability
distribution
being used the tail or tails are asymptomatic to the
x-axis (or the y-axis if so oriented).
So the tail or tails stretch to infinity.
8. So we will get close to 1, but never actually reach 1.
Ta -da. For our purposes the difference is so small
that we
can use 1!
So if we define F(t) as the probability that the system will fail,
then
the reliability is 1 less the probability of failure
R(t) = 1 - F(t) (2.1)
14
On page 47,
follow the progression from equation 2.1 to equation 2.5.
If you cannot see the math, then just take it on faith that
equation 2.5 is valid.
It might be easier to see if we eliminate that “minus” in the
exponent. We do that by moving the term from the numerator
to the denominator.
R(t) = ____1______ (modified 2.5)
e t/m
Reliability at time t is
the constant number one
divided by the natural log base (e) raised to the power
of t divided by m (the mean time between failure).
9. 15
As a slight detour (an aside) ....
This is one of the main reasons why logisticians who are
involved with maintenance place so much emphasis on the
mean time between failure (MTBF).
MTBF is an integral part of the reliability function!!!
And as we shall see MTBF appears in many, many of the other
key formulas.
16
Recall that e has a value of about 2.718 (the value goes on
indefinitely
without repetition as far as we know).
At the bottom of page 47 we ask the equivalent question ---
What is the reliability if the running time (t) is equal to
the mean time between failure (m)?
The exponential function applies.
R(t) = 1/ ( e t/m )
if the run time is equal to the MTBF (or m)
then t/m = 1
10. example.
MTBF for a system/product is 5 hours.
We start the system/product at time zero.
At exactly 5 hours the system/product fails.
t = 5 hours = 1
m 5 hours
17
So R(t) = 1 / ( e 1 )
Get out your spreadsheet or calculator.
2.718 raised to the first power => 2.718 1 = 2.718
R(t) = 1/2.718 = 0.3679 = 0.37 (the answer in the book!)
If we operate the system starting at time zero and until t
reaches the MTBF, the reliability is 37% (not as we might
expect say 50%).
Figure 2.1 is simply the graph of the function R(t).
18
On page 48,
the book goes on to generalize the situation in terms of lambda
11. (or the number of failures divided by the total operating hours).
lambda = (# failures / total time)
Do the problem at the bottom of page 48.
REVISED
Hours
1 75
2 125
3 130
4 325
5 525
6 525
7 525
8 525
9 525
10 525
Tot 3,805
** The 5 failures are easy to see.
** We set up 10 components.
** All 10 start running at the same time.
** First failure after 75 hours.
** Second failure after 125 hours.
** etc to fifth failure at 525 hours.
** Stop the test.
** 3,805 hours is the total running hours of
the TEN components that were being
tested.
The
12. test
stops
after
the
fifth
failure
19
On page 49,
we note that this is an equation with THREE variables.
lambda = (# failures) / (total time)
In any equation,
if we know the values of any TWO variables,
then we can solve for the value of the THIRD variable.
-=-=
Simple example of 3 variables and we know 2
distance = rate * time
d=10 miles and rate = 5 mph
what is the time?
10 miles = 5 mph * time
time = 10 miles/ 5 mph = 2 hours
Piece of cake
20
13. Lesson for the rest of eternity
No matter the number of variables, as long as there is only 1
unknown we can solve for the unknown.
might not be easy, but it can be done
If there are 100 variables and we know the values for 99 of
them, then we can solve for number 100!
21
On page 52,
in figure 2.4 for semi-obvious reasons the top part is referred to
as the “bathtub” curve.
Notre that we have THREE sections.
0
22
The left-hand section is often referred to as the “infant
mortality
period”.
Time 0 is when the system or product starts.
If not DOA, then it takes time to “work out the bugs”.
14. If DOA, then the curve starts at the y-axis!
So the failure rate is expected to decrease as the system/product
is
debugged.
0
23
The center section has the period that we just discussed.
A (relatively) constant failure rate. Or we can assume pretty
constant. As a system/product ages, then the failure rate
typically
increases a bit. Hence the slight upward slope at the bottom.
And the exponential failure function applies.
MTBF is most important in this section..
0
24
The right-hand section shows the period as the system/product
is
getting “older”.
15. The failure rate is increasing.
It is getting more difficult to keep “that old car repaired”.
The maintenance needs are increasing!
Getting near system/product retirement?
0
25
Put the three sections together and we have ----
0
26
On page 52,
at the bottom of Figure 2.4 we see some “subsets”.
The failure rates for electronic equipment tends to be different
from the rates for mechanical equipment.
*** Electronic equipment tends to “burn out” (operate until it
just
gives up the ghost).
Like most light bulbs.
16. No example is perfect? We know that some fluorescent
bulbs tend to dim over time.
*** In large part, mechanical equipment tends to “wear out”.
There is a gradual decrease in performance.
e.g., piston rings wearing out
27
As an aside, here is the Centennial Light Bulb.
It has been burning almost continuously since 1901.
Gaps of only minutes in that period!
This is an extreme example of reliability for a particular item.
Really “messing up” that reliability function!
The bulb is in the fire department in Livermore California.
There is a live cam where you can watch this.
Search for Livermore Centennial light bulb.
28
On page 52,
in 2.1.2 we take note that most systems do not have only ONE
component.
There are multiple components.
In a simple case (electrical circuits are easy to use as an
example)
these can be either
17. #1 SERIES
or
#2 PARALLEL
Later #3 a combination of series and parallel
29
On page 53,
#1 covers a SERIES system with THREE components (A, B, C)
Reliability = (Ra) (Rb) (Rc) (2.8)
Reliability is the product of the reliability of the three
components.
Ra, Rb, and Rc do NOT need to be equal.
Do the quick problem at the bottom of page 53.
30
On page 54,
#2 covers PARALLEL networks with THREE components
(A, B, C)
Reliability = 1 - (1-Ra) (1-Rb) (1-Rc) (2.11)
Reliability is the PRODUCT of the terms.
Again, Ra, Rb, and Rc do NOT need to be equal.
Do the quick problem at the bottom of page 55.
18. 31
On page 55,
in #3 we have a system with BOTH SERIES and PARALLEL.
Follow along.
Note that this is typical of complex systems or products.
32
The main takeaways are
** Reliability is not “a given” or “fixed”.
** We can manage (to varying extent) the reliability of the
system and/or the product!
33
BUT (a big but) improving the reliability is not totally “free”.
For example, we can increase the reliability of a system by
incorporating a redundant system.
** have parallel circuits so if one fails the other still can
19. carry
the load.
** Alas, that means we need to have at least TWO of
everything.
** If three deep, then we need 3, and so on.
More weight
Greater cost
Likely increased size (need to house all the paths).
Makes sense for high impact systems.
** Such as hydraulics on an airplane.
Even a non-aviator strongly suspects that it is NOT a
good
idea to lose hydraulic control at 30,000 feet.
34
We can also improve reliability by using heavier, more
capable parts.
** Such as higher rated capacitor --- but the capacitor
also
costs just a bit more.
** A flat-free tire, but again costs a bit more (and might
be heavier degrading performance).
35
Of course it is also possible that a slightly more
expensive part will help to avoid costly, inconvenient
20. repairs.
** The electronic equipment with the bigger capacitor
does
not burn out when being used on the job.
** The flat-free tire means your bike does not get a flat in
the middle of a century ride in a remote area (100-mile ride).
-=-=-
So we need to weigh costs and benefits!
As in so many of our actions.
36
An example of better reliability and the use of available data.
The Boeing 737 Max had TWO exterior Angle of Attack (AOA)
sensors, one on each side of the cockpit.
In simple terms, alas, Boeing made the decision to use the
signal
from only ONE sensor on a given flight. Then on the next
flight
they switched sensor.
On flight #1 use sensor A.
On flight #2 use sensor B.
On flight #3 use sensor A.
In retrospect (but even in advance?) why not use redundancy for
sensors that were already installed?
Certainly the slightly more complex software needed to
use
21. both sensors and compare results would not seem to be a major
problem!
37
In other cases improving the reliability can also
help to reduce costs (or at least make “life
easier”).
Recall the olden days when going from the US to another
country
usually meant lugging around a heavy transformer to change
220-240 VAC to US 110-120 VAC?
Nowadays, many electronics (e.g., laptop computer) are dual
voltage.
** Make the wiring able to withstand 240 VAC.
** Then with an inexpensive circuit design the system can
automatically adjust to the input voltage 110-120 or 220-240.
** Easier: no special equipment (transformer).
The power supply is a matter of ounces (rather than
pounds).
** Might be able to get negative (or at least low marginal
cost).
Due to operating with a battery as an electric supply, most
smaller
electronics (e.g., laptop computer) operate at lower voltage than
even 110. So an in-line step down transformer was needed
anyways.
One power supply can work world-wide. Less need for multiple
models.
** The user may still need a plug adapter for the wall
socket.
But that is a much smaller matter.
22. 38
On page 58,
2.2 maintainability.
Maintainability (from page 34) pertains to the ease, accuracy,
safety, and economy when performing maintenance actions.
Ideally, the design should enable maintenance to be done
without
major investments in time, cost, or other resources.
Maintainability is the ability of the system or product to be
maintained.
[A very poor academic definition, no?]
The good news is that much of this can be expressed in terms of
probability!
39
On page 58, maintenance can be classified into two major
categories.
#1 Corrective maintenance to restore to required
level of performance.
On page 59, figure 2.11 shows a typical “corrective
maintenance cycle”.
23. #2 Preventative maintenance to retain a system or
product at the required level of performance.
40
Again statistics can play a big role in quantifying these.
So we should not find it unusual to see terms such as
** mean (average)
** median (half above, half below)
** maximum
41
Rearranging, the sequence of the “measures” or “metrics” might
make a bit more sense. The definitions, etc are pretty
straightforward.
p58 #1 mean corrective maintenance time
p66 #3 median corrective maintenance time
p67 #5 mean active maintenance time
p67 #6 Maximum active corrective maintenance time
p65 #2 mean preventative maintenance time
24. p65 #4 median active preventative maintenance time
Note also some “other” time periods that should be addressed.
p68 #7 logistics delay time
p68 #8 administrative delay time
p68 #9 maintenance downtime
p68 maintenance labor-hour factors
per operating hour/cycle/month/action
p70 maintenance frequency factors
mean time between maintenance/replacement
p72 maintenance costs
per action/operating hour/month/,mission/life-cycle
42
On page 60,
speaking of repair times these normally follow one of three
distributions.
-- If for some reason another distribution is a better fit, then
use
the better distribution.
Distribution Typical characteristics
#1 normal distribution * simple repair
* remove & replace with fairly
25. constant time
#2 exponential distribution * excellent built-in test or
otherwise easy to diagnosis
* remove & replace
* maintenance rate is constant.
#3 log-normal distribution. * most actions
* task times vary; and/or
* frequency of repair varies
43
For a log-normal curve, the logarithmic the value of the
function is normally distributed.
Which means the shape is a bit different from the usual
“normal curve”.
See next slide
44
normal exponential
log-normal
45
26. From pages 58 to 67 there a number of examples.
You should be able to follow along pretty easily.
-=-=-
One comment on data collection.
On page 60 in table 2.2 I was confused at first.
This is supposed to be a sample of 50 observations of the time
to
perform corrective maintenance (in minutes).
The 50 points are displayed as a 10 x 5 array. The array itself
is
meaningless.
Better to collect and display as the 50 data points
Even better in the sequence collected.
46
That would be something like so .....
Observation time in min.
1 40
2 71
3 75
4 67
.....
27. 49 63
50 63
Six reasons to collect and present the
information in this way.
#1 It makes more “sense”.
#2 It is easier to plot without resorting to
midpoints, etc.
#3 It is easier to test for the distribution,
compute the
mean-median-mode-variance-standard
deviation-etc.
#4 It is easier to translate into “quality
oriented metrics” assuming we are in a
“quality management environment”
#5 Easier to spot a time-or-cycle related
trend.
#6 In case of a question on a value (e.g.,
seems high, an outlier), then it is much easier
to track to the paperwork.
47
Also, I suspect that it would also be easier to spot some
“apparent
contradictions, or at least questions”.
28. For example,
** on page 60 the normal distribution is described as usually
applying to straightforward maintenance with a (relatively)
standard time to “remove and replace”.
** however, a quick review shows a range in value of 67
minutes.
between 30 minutes and 97 minutes
** That does not strike me as being “relatively standard time”.
** If we expected this repair to take whatever minutes, then we
might want to look into “Why the large variability in time to
repair?”
-=-=-
Fascinating stuff, no?
48
On page 72,
2.3 availability factors
#1 Can be used in the sense of READINESS (3 views)
** Probability will be available for use when needed.
** Probability will be available for use when needed AND
will complete the mission.
** Probability that all is satisfactory at any give point in a
mission.
29. #2 always depends on the scenario.
49
As implied by the sequence R-M-A, the availability factors are
based on the metrics for
** reliability; and
** maintainability
Metrics for reliability &
maintainability used to calculate
availability
Mean time between failure MTBF
Mean corrective maintenance time M-bar ct
Mean time between maintenance MTBM
Mean active maintenance time M-bar
Maintenance down time MDT
50
Availabil
ity
Support
environment
30. Probability
2.3.1 Inherent
availability
ideal will operate as
required at any
point in time
Excludes scheduled
preventive
maintenance,
logistics delay time,
admin delay time
2.3.2 Achieved
availability
ideal will operate as
required at any
point in time
includes scheduled
preventative
maintenance
2.3.3 Operational
availability
actual will operate as
required when
called upon
(needed)
31. Here is a quick summary table
51
On page 73.
Already back to 2.4 and the summary.
That is it for Chapter 2.