- The document discusses maximizing uptime in mission critical facilities through predictive analysis and reliability centered maintenance (RCM).
- Key factors that impact uptime are reliability, availability, maintainability, predictability, and scalability (RAMPS).
- To implement an effective RCM program, facilities must collect comprehensive operational data from design through commissioning and ongoing maintenance to predict and prevent future equipment failures.
- Integrating startup and commissioning data into ongoing maintenance is critical for understanding past performance issues and developing an accurate predictive model.
We are committed to an early & total eradication of all wasteful practices at the client's premises. We focus upon making improvements by surfacing hidden problems,finding solutions for them through empowered team work and making the processes self- regulated, efficient and effective solutions to chronic issues of an organization.
For more info:-http://bit.ly/1TVRWPJ
We are committed to an early & total eradication of all wasteful practices at the client's premises. We focus upon making improvements by surfacing hidden problems,finding solutions for them through empowered team work and making the processes self- regulated, efficient and effective solutions to chronic issues of an organization.
For more info:-http://bit.ly/1TVRWPJ
Презентация к уроку математики в 3-м классе Образовательной системы «Школа 2100» (учебники «Моя Математика» авторы Т.Е.Демидова, С.А.Козлова, А.П.Тонких).
Математика. 3 класс. Урок 2.23 Решение задач
Эту презентацию можно посмотреть по адресу:
http://avtatuzova.ru/publ/matematika_2100_3_klass_urok_2_23_reshenie_zadach/30-1-0-212
Остальные презентации расположены:
http://avtatuzova.ru
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...SolarWinds
In this presentation, SolarWinds Head Geek™ Leon Adato will explore the ways in which having a visual representation of data makes it more meaningful, intelligible, and actionable. He shows some examples of how data display techniques can help the IT professional in day to day scenarios.
How often does your team make reliability predictions?
The easy answer is very often. Each time you want to know how long a product will operate. The accompanying question on how well the estimate will match actual performance makes the real answer more difficult.
We regularly and intuitively do reliability predictions all the time. When starting a car at the beginning of a trip, we estimate the ability of the vehicle to complete the journey. When we purchase a phone, we expect it to operate for at least two years (your expectations may differ).
During the design process we may have formal or informal useful life expectations. It is not knowing if our decisions related to the design will fulfill the lifetime expectations that leads to the desire to know how well the resulting system will operate. We also may need to estimate warranty or maintenance costs, thus knowing what is likely to fail become important.
A Proactive Maintenance Technician is a highly-trained professional expert in his skills area who has knowledge of other skills areas, including safety and production, and a desire to learn more. He knows and can implement a Failure Modes Driven Maintenance Strategy for any piece of equipment.
Презентация к уроку математики в 3-м классе Образовательной системы «Школа 2100» (учебники «Моя Математика» авторы Т.Е.Демидова, С.А.Козлова, А.П.Тонких).
Математика. 3 класс. Урок 2.23 Решение задач
Эту презентацию можно посмотреть по адресу:
http://avtatuzova.ru/publ/matematika_2100_3_klass_urok_2_23_reshenie_zadach/30-1-0-212
Остальные презентации расположены:
http://avtatuzova.ru
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...SolarWinds
In this presentation, SolarWinds Head Geek™ Leon Adato will explore the ways in which having a visual representation of data makes it more meaningful, intelligible, and actionable. He shows some examples of how data display techniques can help the IT professional in day to day scenarios.
How often does your team make reliability predictions?
The easy answer is very often. Each time you want to know how long a product will operate. The accompanying question on how well the estimate will match actual performance makes the real answer more difficult.
We regularly and intuitively do reliability predictions all the time. When starting a car at the beginning of a trip, we estimate the ability of the vehicle to complete the journey. When we purchase a phone, we expect it to operate for at least two years (your expectations may differ).
During the design process we may have formal or informal useful life expectations. It is not knowing if our decisions related to the design will fulfill the lifetime expectations that leads to the desire to know how well the resulting system will operate. We also may need to estimate warranty or maintenance costs, thus knowing what is likely to fail become important.
A Proactive Maintenance Technician is a highly-trained professional expert in his skills area who has knowledge of other skills areas, including safety and production, and a desire to learn more. He knows and can implement a Failure Modes Driven Maintenance Strategy for any piece of equipment.
Tech Talk: The New CA Application Performance Management Team Center—Faster T...CA Technologies
CA Application Performance Management (APM) r10 delivers new patent-pending innovations based on the E.P.I.C. application performance management strategy that takes easy, proactive, intelligent and collaborative to new levels, enabling you to delight your users while protecting your experts. Learn more about how these new, patent-pending innovations for perspectives, timeline and differential analysis and how these new capabilities help you to quickly triage and diagnose application performance. Seating is limited and available first come-first served.
For more information, please visit http://cainc.to/Nv2VOe
With the economy as it has been for the past decade it is important to understand how we all can make the bottom line better and maintain our equipment more efficient. This book is not in any way designed to be the end all to your maintenance problems, but rather a guide to help understand the importance of empowering your mechanics so they will be able to improve the work they do. The better they are at what they do, the better they will be to reducing maintenance costs.
Nagios Conference 2014 - Nate Broderick - SLA - The Marriage of an Effective ...Nagios
Nate Broderick's presentation on SLA - The Marriage of an Effective Tool With a Well Planned Architecture.
The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
GDG Cloud Southlake #6 Tammy Bryant Butow: Chaos Engineering The Road To Res...James Anderson
Our Speaker: Tammy Bryant Butow, a Principal Site Reliability Engineer @ Gremlin will talk to us about The Road To Resilience: Chaos Engineering, Disaster Recovery & GameDays.
Abstract:
Over the years much research has been conducted and books have been written on how to improve the resilience of our software. This tech talk will dive deep into three key practices that improve the key measures of tempo and stability (outlined in Accelerate). These 3 practices are Chaos Engineering, Disaster Recovery and GameDays. You'll learn practical tips that you can put into action focused on resource consumption, capacity planning, region failover, decoupling services and deployment pain. You'll also hear how you can get certified in Chaos Engineering - whether you are a beginner or have many years of experience.
Talks about #sre, #tech, #chaosengineering, #performanceengineering, and #sitereliabilityengineering
GDG Cloud Southlake #6 Tammy Bryant Butow: Chaos Engineering The Road To Res...
Predictive RCM
1. ZDT Group LLC
5460 Sandstone Ct
Cumming, Ga. 30040
P – 770-886-9555
www.zdtgroup.com
By Joe Soroka Copyright 2009
Maximizing Uptime through Predictive Analysis
Operating Mission Critical Facilities in today’s environment means continuing to strive to
meet and exceed our uptime requirements, as our operating budgets continue to shrink.
Taking a holistic approach is needed to track our facility information to accomplish this.
As I have discussed in previous whitepapers and other presentations, uptime is achieved
through a process that I call “RAMPS”. The tier rating of a facility is only the first thing
that affects uptime. You may have heard stories of people who have designed and built
tier IV facilities and have had outages time after time and where a tier I facility has been
operating for over ten years without a single outage. That’s because there is more than
just the design of the facility that affects the facility uptime, and that is “RAMPS”.
Reliability, Availability, Maintainability, Predictability, and Scalability are the keys to
success. Without paying attention to all facets of the facility, your uptime requirements
cannot be realized.
Reliability Center Maintenance (RCM) can be used to help in meeting the demands of a
shrinking operations budget. If you do not properly implement a RCM program it is
called a “LMP” Lack of Maintenance Program not RCM. However, to correctly implement
an RCM program you need to answer 7 questions per SAE JA1011 Evaluation Criteria for
RCM process. These questions are as follows:
1. What is the equipment supposed to do and what is the performance standard?
2. In what ways can it fail to meet the performance standard?
3. What are the events that will lead to failure?
4. What happens when the piece of equipment fails?
5. In what way does each of the failure modes matter to system operations?
6. What systematic task can be performed to prevent the failure?
7. What must be done if a suitable preventative task cannot be implemented?
Prior to implementing an RCM program, one must have a comprehensive Predictive
Analysis program in affect. By understanding the past performance and current operating
conditions of the equipment, we can develop predictions on failure modes for the future.
Tracking more items with a longer duration will result in a greater accuracy in our
2. ZDT Group LLC
5460 Sandstone Ct
Cumming, Ga. 30040
P – 770-886-9555
www.zdtgroup.com
By Joe Soroka Copyright 2009
predictions. This is why it is important to start at the beginning of the project with
design, moving to construction, commissioning and then ongoing operation and
maintenance. The information that is gained in each of these phases needs to be
captured in a manner that allows you to look at specific details across the entire life of the
equipment.
Data gathered without purpose is just that, data. It is important to define how you need
to use this data, and how it can assist you in your operation - from increasing reliability to
reducing operating costs. When in the design phase, the requirements that the facility
needs to operate at are explained in the Basis of Design (BoD) and the Sequence of
Operation (SoO). It is these two documents that need to be fully developed in the design
phase and updated during the life of the facility, for they act as the road map on how the
facility shall operate. With these documents a baseline is established for the performance
of your equipment, and when the site is started up and commissioned it is this baseline
that the facility needs to operate at. During the startup and commissioning phases these
baseline documents are updated so that changes that may have occurred during
construction and startup phases are captured in the updated BoD and SoO documents.
These documents need to be living documents and should be reviewed and updated as
often as required.
The information gathered during the commissioning process is typically never properly
integrated into the operations. Having the data that was discovered during startup and
commissioning is invaluable. Not having this data would be similar to taking all the
photos of your children prior to their high school graduation and locking them up
somewhere. Then years later, when you are sitting down with your son or daughter’s
future spouse, trying to show them your child’s life story, you have to begin with high
school pictures due to lack of data. All of the data collected during birth, startup, and
commissioning is part of the equipment life story. The whole story needs to be looked at
and shared between the parents and the spouse. As the spouse adds new photos to the
album, the story of the person’s life is evolving and you are able to look through the
album and see the entire story.
Predictability is one part of the Uptime RAMPS that often gets ignored. We may have
some trends we look at, and some testing that allows us to do some forecasting, but many
times having a fully predictive program is lacking. The data gathered and collected
during construction and commissioning should be fully integrated into the operation and
3. ZDT Group LLC
5460 Sandstone Ct
Cumming, Ga. 30040
P – 770-886-9555
www.zdtgroup.com
By Joe Soroka Copyright 2009
maintenance program. This data becomes invaluable to you as you develop a
predictability program. Imagine your data center is fully loaded with servers and you are
operating at the design load of 10kW per cabinet. The CFD modeling you did with your
heat load is proving out to be correct and everything is operating correctly. You decide to
leave early for the fishing trip up in the mountains that you had planned for some time
now. As you are enjoying your well-deserved trout fishing you realize how cool it is and
you think “wow it must be hot back in town”. You pull out your cellphone to check the
weather in town and you see there is no cell coverage. You don’t worry, you’re not on call
and you have a good staff at the site, so you go back to fishing. The fish are biting, there
is great weather, and you decide to stay late on Sunday.
When you come home Sunday you are shocked by the amount of voicemails and emails
you missed. It’s the one from your team that really hits you hard in the stomach. “Boss
we not sure where you are and we hope the fishing is good but we just lost the site, the
load dropped, all servers are down”. You immediately turn your car to the site and start
calling everyone. When you arrive you spend the next couple of hours explaining where
you were and trying to figure out what happened.
The next day you review the incident reports and start to put together a detailed
explanation of why your chiller plant failed, so you can complete a failure analysis report.
When the failure occurred, both your primary chiller and redundant chiller failed due to
high head pressure caused by high condenser water temperature. The cooling tower fans
were off and would not start. The Tower Fan VFDs were working and they tried to put
them in hand and still the tower fans would not operate. It took your team some time to
figure out the problem, but in the time it took to correct the issue the data center floor
overheated and servers started to fail. It was the vibration switches that were mounted
on the fans that tripped off. Three of the four tower fans failed and the fourth fan could
not support the load. It took some time to identify the problem but once it was
discovered the vibration sensors were jumped out the fans started and the chillers were
brought back online, but unfortunately not in time to keep the site up.
You remember that you were the only one on the job when the startup and
commissioning occurred, the rest of the team was hired after the site was turned over.
You remember something about those vibration limit switches during startup 3 years ago.
You look in the startup documents and there is no mention of any issue so you make
some calls and track down the person who did the startup. You reach this person and ask
4. ZDT Group LLC
5460 Sandstone Ct
Cumming, Ga. 30040
P – 770-886-9555
www.zdtgroup.com
By Joe Soroka Copyright 2009
him if he remembers what the issue with the vibration switch was, and as luck would
have it he remembers that project. He tells you “Yes, we had a lot of issues with those
switches during the startup, we replaced one and the others were adjusted.”. There is no
mention of the parts that were replaced during startup. If it was noted and was listed in
your system you would have known you had a failed device and you would have looked
into the issues during startup. No matter if a part fails in the factory, during startup, or
during the life of the equipment tracking, those failures and analyzing the nature of the
failure will allow you to predict future issues. During the PM visits since the site was
turned over to Operations, the vibration switches were never part of the maintenance
procedures so they were not tested or adjusted. This lead to the switches tripping prior to
any significant vibration. The Operation team was never trained on the vibration switch
and did not know they were installed or what their function was. Many time equipment
accessories are missed in training and preventative maintenance.
A week later you are at home. You get that dreaded phone call and rush back to site.
This time it’s the UPS system Module A. It has failed and dropped offline, however the
site remains protected by the other modules. You call out your service rep and they find
that the AC filter capacitors failed. Again you sit-down in your office the next morning to
fill out another failure analysis report, and in the process of reviewing past maintenance
records you notice that the AC filter current that is measured and recorded each visit was
about the same - up to the last PM when it was 35% higher from the previous readings. It
was three months ago when those AC Filter Capacitors were telling you it was time to
replace them, but the data was tracked in a manner that couldn’t be analyzed.
That evening at home you start thinking about how much information you were able to
gather from the past documentation that you used to complete your failure analysis
report. Why not use the data to predict what might fail rather than explaining why it did.
Besides, a failure analysis report is just like a predictive analysis report, it’s just after the
fact. So you start to identify your potential issues and figure out a way to capture and
analyze the data.
When preforming startup and ongoing maintenance the testing scripts need to be written
in a switch level detailed fashion. The system that you use should be in a database so you
can analyze the data and set thresholds. For example, do not write a test script that states
“check the oil pressure” and just have a checkbox next to it. First of all, a checkbox is not
sufficient data. You might have checked it but if the oil pressure was 2PSI, I’m sure that
5. ZDT Group LLC
5460 Sandstone Ct
Cumming, Ga. 30040
P – 770-886-9555
www.zdtgroup.com
By Joe Soroka Copyright 2009
your engine will not like that for very long. Your test script should have stated “Record
the oil pressure” and it should list a tolerance level. You inspect the oil pressure, it’s 35
PSI, and you record it. Your system should have a min max tolerance and if it is outside
the acceptable reading, you should get a failed condition and an issue log should be
automatically updated. Also you should have a % changed for a failed condition. This
averages all of the past readings for the oil pressure and results in a failed condition if the
percentage is x% outside of the average reading. Another approach could be rate of
change, a failed condition which occurs if the rate of change is x% from the last reading.
With these three types of automation added to your logging tool, you will start to capture
and analyze items automatically that will allow you to predict failure rates.
With a database as the backend for your commissioning and operations tool, you can
intelligently analyze critical points over a period of time. A set-point can drift over time
and may not be noticed because of the small changes from week to week, but with the
proper automation tools these points can be analyzed and alarms can be generated
automatically. You could have your building automation system run a script every
morning for the critical points that you are trending, and then write the result to a XML
file or a SOAP report that your automation tool could read as input for a daily PM task.
Now, not only is your team performing site inspection and maintenance at the site, but
your Building Automation System is talking directly with your operations automation
tool (or CMMS), and you are gaining valuable data that is being analyzed automatically
and alerting you of potential issues.
Joe Soroka has been working in the mission-critical field for the past 26 years. He has
worked with many clients over the years with conceptual designs, commissioning,
operation and maintenance and failure analysis. Joe has commissioned over 8 million
square feet of mission-critical facilities. He has worked on developing and reviewing
Operations & Maintenance, Training and Safety programs, including Method Operating
Procedures for mission-critical facilities. If you have any questions you may contact Joe
Soroka at joe@zdtgroup.com or visit www.zdtgroup.com