SlideShare a Scribd company logo
1 of 5
Download to read offline
ZDT Group LLC
5460 Sandstone Ct
Cumming, Ga. 30040
P – 770-886-9555
www.zdtgroup.com
By Joe Soroka Copyright 2009
Maximizing Uptime through Predictive Analysis
Operating Mission Critical Facilities in today’s environment means continuing to strive to
meet and exceed our uptime requirements, as our operating budgets continue to shrink.
Taking a holistic approach is needed to track our facility information to accomplish this.
As I have discussed in previous whitepapers and other presentations, uptime is achieved
through a process that I call “RAMPS”. The tier rating of a facility is only the first thing
that affects uptime. You may have heard stories of people who have designed and built
tier IV facilities and have had outages time after time and where a tier I facility has been
operating for over ten years without a single outage. That’s because there is more than
just the design of the facility that affects the facility uptime, and that is “RAMPS”.
Reliability, Availability, Maintainability, Predictability, and Scalability are the keys to
success. Without paying attention to all facets of the facility, your uptime requirements
cannot be realized.
Reliability Center Maintenance (RCM) can be used to help in meeting the demands of a
shrinking operations budget. If you do not properly implement a RCM program it is
called a “LMP” Lack of Maintenance Program not RCM. However, to correctly implement
an RCM program you need to answer 7 questions per SAE JA1011 Evaluation Criteria for
RCM process. These questions are as follows:
1. What is the equipment supposed to do and what is the performance standard?
2. In what ways can it fail to meet the performance standard?
3. What are the events that will lead to failure?
4. What happens when the piece of equipment fails?
5. In what way does each of the failure modes matter to system operations?
6. What systematic task can be performed to prevent the failure?
7. What must be done if a suitable preventative task cannot be implemented?
Prior to implementing an RCM program, one must have a comprehensive Predictive
Analysis program in affect. By understanding the past performance and current operating
conditions of the equipment, we can develop predictions on failure modes for the future.
Tracking more items with a longer duration will result in a greater accuracy in our
ZDT Group LLC
5460 Sandstone Ct
Cumming, Ga. 30040
P – 770-886-9555
www.zdtgroup.com
By Joe Soroka Copyright 2009
predictions. This is why it is important to start at the beginning of the project with
design, moving to construction, commissioning and then ongoing operation and
maintenance. The information that is gained in each of these phases needs to be
captured in a manner that allows you to look at specific details across the entire life of the
equipment.
Data gathered without purpose is just that, data. It is important to define how you need
to use this data, and how it can assist you in your operation - from increasing reliability to
reducing operating costs. When in the design phase, the requirements that the facility
needs to operate at are explained in the Basis of Design (BoD) and the Sequence of
Operation (SoO). It is these two documents that need to be fully developed in the design
phase and updated during the life of the facility, for they act as the road map on how the
facility shall operate. With these documents a baseline is established for the performance
of your equipment, and when the site is started up and commissioned it is this baseline
that the facility needs to operate at. During the startup and commissioning phases these
baseline documents are updated so that changes that may have occurred during
construction and startup phases are captured in the updated BoD and SoO documents.
These documents need to be living documents and should be reviewed and updated as
often as required.
The information gathered during the commissioning process is typically never properly
integrated into the operations. Having the data that was discovered during startup and
commissioning is invaluable. Not having this data would be similar to taking all the
photos of your children prior to their high school graduation and locking them up
somewhere. Then years later, when you are sitting down with your son or daughter’s
future spouse, trying to show them your child’s life story, you have to begin with high
school pictures due to lack of data. All of the data collected during birth, startup, and
commissioning is part of the equipment life story. The whole story needs to be looked at
and shared between the parents and the spouse. As the spouse adds new photos to the
album, the story of the person’s life is evolving and you are able to look through the
album and see the entire story.
Predictability is one part of the Uptime RAMPS that often gets ignored. We may have
some trends we look at, and some testing that allows us to do some forecasting, but many
times having a fully predictive program is lacking. The data gathered and collected
during construction and commissioning should be fully integrated into the operation and
ZDT Group LLC
5460 Sandstone Ct
Cumming, Ga. 30040
P – 770-886-9555
www.zdtgroup.com
By Joe Soroka Copyright 2009
maintenance program. This data becomes invaluable to you as you develop a
predictability program. Imagine your data center is fully loaded with servers and you are
operating at the design load of 10kW per cabinet. The CFD modeling you did with your
heat load is proving out to be correct and everything is operating correctly. You decide to
leave early for the fishing trip up in the mountains that you had planned for some time
now. As you are enjoying your well-deserved trout fishing you realize how cool it is and
you think “wow it must be hot back in town”. You pull out your cellphone to check the
weather in town and you see there is no cell coverage. You don’t worry, you’re not on call
and you have a good staff at the site, so you go back to fishing. The fish are biting, there
is great weather, and you decide to stay late on Sunday.
When you come home Sunday you are shocked by the amount of voicemails and emails
you missed. It’s the one from your team that really hits you hard in the stomach. “Boss
we not sure where you are and we hope the fishing is good but we just lost the site, the
load dropped, all servers are down”. You immediately turn your car to the site and start
calling everyone. When you arrive you spend the next couple of hours explaining where
you were and trying to figure out what happened.
The next day you review the incident reports and start to put together a detailed
explanation of why your chiller plant failed, so you can complete a failure analysis report.
When the failure occurred, both your primary chiller and redundant chiller failed due to
high head pressure caused by high condenser water temperature. The cooling tower fans
were off and would not start. The Tower Fan VFDs were working and they tried to put
them in hand and still the tower fans would not operate. It took your team some time to
figure out the problem, but in the time it took to correct the issue the data center floor
overheated and servers started to fail. It was the vibration switches that were mounted
on the fans that tripped off. Three of the four tower fans failed and the fourth fan could
not support the load. It took some time to identify the problem but once it was
discovered the vibration sensors were jumped out the fans started and the chillers were
brought back online, but unfortunately not in time to keep the site up.
You remember that you were the only one on the job when the startup and
commissioning occurred, the rest of the team was hired after the site was turned over.
You remember something about those vibration limit switches during startup 3 years ago.
You look in the startup documents and there is no mention of any issue so you make
some calls and track down the person who did the startup. You reach this person and ask
ZDT Group LLC
5460 Sandstone Ct
Cumming, Ga. 30040
P – 770-886-9555
www.zdtgroup.com
By Joe Soroka Copyright 2009
him if he remembers what the issue with the vibration switch was, and as luck would
have it he remembers that project. He tells you “Yes, we had a lot of issues with those
switches during the startup, we replaced one and the others were adjusted.”. There is no
mention of the parts that were replaced during startup. If it was noted and was listed in
your system you would have known you had a failed device and you would have looked
into the issues during startup. No matter if a part fails in the factory, during startup, or
during the life of the equipment tracking, those failures and analyzing the nature of the
failure will allow you to predict future issues. During the PM visits since the site was
turned over to Operations, the vibration switches were never part of the maintenance
procedures so they were not tested or adjusted. This lead to the switches tripping prior to
any significant vibration. The Operation team was never trained on the vibration switch
and did not know they were installed or what their function was. Many time equipment
accessories are missed in training and preventative maintenance.
A week later you are at home. You get that dreaded phone call and rush back to site.
This time it’s the UPS system Module A. It has failed and dropped offline, however the
site remains protected by the other modules. You call out your service rep and they find
that the AC filter capacitors failed. Again you sit-down in your office the next morning to
fill out another failure analysis report, and in the process of reviewing past maintenance
records you notice that the AC filter current that is measured and recorded each visit was
about the same - up to the last PM when it was 35% higher from the previous readings. It
was three months ago when those AC Filter Capacitors were telling you it was time to
replace them, but the data was tracked in a manner that couldn’t be analyzed.
That evening at home you start thinking about how much information you were able to
gather from the past documentation that you used to complete your failure analysis
report. Why not use the data to predict what might fail rather than explaining why it did.
Besides, a failure analysis report is just like a predictive analysis report, it’s just after the
fact. So you start to identify your potential issues and figure out a way to capture and
analyze the data.
When preforming startup and ongoing maintenance the testing scripts need to be written
in a switch level detailed fashion. The system that you use should be in a database so you
can analyze the data and set thresholds. For example, do not write a test script that states
“check the oil pressure” and just have a checkbox next to it. First of all, a checkbox is not
sufficient data. You might have checked it but if the oil pressure was 2PSI, I’m sure that
ZDT Group LLC
5460 Sandstone Ct
Cumming, Ga. 30040
P – 770-886-9555
www.zdtgroup.com
By Joe Soroka Copyright 2009
your engine will not like that for very long. Your test script should have stated “Record
the oil pressure” and it should list a tolerance level. You inspect the oil pressure, it’s 35
PSI, and you record it. Your system should have a min max tolerance and if it is outside
the acceptable reading, you should get a failed condition and an issue log should be
automatically updated. Also you should have a % changed for a failed condition. This
averages all of the past readings for the oil pressure and results in a failed condition if the
percentage is x% outside of the average reading. Another approach could be rate of
change, a failed condition which occurs if the rate of change is x% from the last reading.
With these three types of automation added to your logging tool, you will start to capture
and analyze items automatically that will allow you to predict failure rates.
With a database as the backend for your commissioning and operations tool, you can
intelligently analyze critical points over a period of time. A set-point can drift over time
and may not be noticed because of the small changes from week to week, but with the
proper automation tools these points can be analyzed and alarms can be generated
automatically. You could have your building automation system run a script every
morning for the critical points that you are trending, and then write the result to a XML
file or a SOAP report that your automation tool could read as input for a daily PM task.
Now, not only is your team performing site inspection and maintenance at the site, but
your Building Automation System is talking directly with your operations automation
tool (or CMMS), and you are gaining valuable data that is being analyzed automatically
and alerting you of potential issues.
Joe Soroka has been working in the mission-critical field for the past 26 years. He has
worked with many clients over the years with conceptual designs, commissioning,
operation and maintenance and failure analysis. Joe has commissioned over 8 million
square feet of mission-critical facilities. He has worked on developing and reviewing
Operations & Maintenance, Training and Safety programs, including Method Operating
Procedures for mission-critical facilities. If you have any questions you may contact Joe
Soroka at joe@zdtgroup.com or visit www.zdtgroup.com

More Related Content

Viewers also liked

LA CIENCIA Y LA SOCIEDAD DEL SIGLO XXI.
LA CIENCIA Y LA SOCIEDAD DEL SIGLO XXI.LA CIENCIA Y LA SOCIEDAD DEL SIGLO XXI.
LA CIENCIA Y LA SOCIEDAD DEL SIGLO XXI.jjpj61
 
Tema 10 herramientas presentación
Tema 10 herramientas presentaciónTema 10 herramientas presentación
Tema 10 herramientas presentaciónmanolete revivalero
 
Mahaprasthanam
MahaprasthanamMahaprasthanam
Mahaprasthanamjaganchary
 
PJ Part 5_Team#9
PJ Part 5_Team#9PJ Part 5_Team#9
PJ Part 5_Team#9Wang Dylan
 
LA GUERRA CIVIL ESPAÑOLA.
LA GUERRA CIVIL ESPAÑOLA.LA GUERRA CIVIL ESPAÑOLA.
LA GUERRA CIVIL ESPAÑOLA.jjpj61
 
EL REINADO DE ALFONSO XII (1874 1885).
EL REINADO DE ALFONSO XII (1874 1885).EL REINADO DE ALFONSO XII (1874 1885).
EL REINADO DE ALFONSO XII (1874 1885).jjpj61
 
Geografia guía de estudio 1er parcial
Geografia guía de estudio 1er parcialGeografia guía de estudio 1er parcial
Geografia guía de estudio 1er parcialDelfina Moroyoqui
 
SAP BusinessObjects Web Intelligence Report
SAP BusinessObjects Web Intelligence ReportSAP BusinessObjects Web Intelligence Report
SAP BusinessObjects Web Intelligence ReportBigClasses Com
 
2100. 3 класс. Урок 2.23 Решение задач
2100. 3 класс. Урок 2.23 Решение задач2100. 3 класс. Урок 2.23 Решение задач
2100. 3 класс. Урок 2.23 Решение задачavtatuzova
 

Viewers also liked (11)

LA CIENCIA Y LA SOCIEDAD DEL SIGLO XXI.
LA CIENCIA Y LA SOCIEDAD DEL SIGLO XXI.LA CIENCIA Y LA SOCIEDAD DEL SIGLO XXI.
LA CIENCIA Y LA SOCIEDAD DEL SIGLO XXI.
 
Tema 10 herramientas presentación
Tema 10 herramientas presentaciónTema 10 herramientas presentación
Tema 10 herramientas presentación
 
Mahaprasthanam
MahaprasthanamMahaprasthanam
Mahaprasthanam
 
PJ Part 5_Team#9
PJ Part 5_Team#9PJ Part 5_Team#9
PJ Part 5_Team#9
 
LA GUERRA CIVIL ESPAÑOLA.
LA GUERRA CIVIL ESPAÑOLA.LA GUERRA CIVIL ESPAÑOLA.
LA GUERRA CIVIL ESPAÑOLA.
 
SAP MRP: Introduction to "Phantom Assembly"
SAP MRP: Introduction to "Phantom Assembly"SAP MRP: Introduction to "Phantom Assembly"
SAP MRP: Introduction to "Phantom Assembly"
 
EL REINADO DE ALFONSO XII (1874 1885).
EL REINADO DE ALFONSO XII (1874 1885).EL REINADO DE ALFONSO XII (1874 1885).
EL REINADO DE ALFONSO XII (1874 1885).
 
Geografia guía de estudio 1er parcial
Geografia guía de estudio 1er parcialGeografia guía de estudio 1er parcial
Geografia guía de estudio 1er parcial
 
SAP BusinessObjects Web Intelligence Report
SAP BusinessObjects Web Intelligence ReportSAP BusinessObjects Web Intelligence Report
SAP BusinessObjects Web Intelligence Report
 
2100. 3 класс. Урок 2.23 Решение задач
2100. 3 класс. Урок 2.23 Решение задач2100. 3 класс. Урок 2.23 Решение задач
2100. 3 класс. Урок 2.23 Решение задач
 
Tarea iii de seminario
Tarea iii de seminarioTarea iii de seminario
Tarea iii de seminario
 

Similar to Maximizing Uptime Through Predictive Analysis and Data Integration

Reliability predictions essay FMS Reliability
Reliability predictions  essay FMS ReliabilityReliability predictions  essay FMS Reliability
Reliability predictions essay FMS ReliabilityAccendo Reliability
 
5 Single Shift CI Projects (1)
5 Single Shift CI Projects (1)5 Single Shift CI Projects (1)
5 Single Shift CI Projects (1)Jaime Alboim
 
Performance Testing Web 2.0 Applications—in an Agile World
Performance Testing Web 2.0 Applications—in an Agile WorldPerformance Testing Web 2.0 Applications—in an Agile World
Performance Testing Web 2.0 Applications—in an Agile WorldTechWell
 
DOES14 - Dominica Degrandis - How we used Kanban in Operations to Get Things ...
DOES14 - Dominica Degrandis - How we used Kanban in Operations to Get Things ...DOES14 - Dominica Degrandis - How we used Kanban in Operations to Get Things ...
DOES14 - Dominica Degrandis - How we used Kanban in Operations to Get Things ...Gene Kim
 
Some Rules for Successful Data Center Operations
Some Rules for Successful Data Center OperationsSome Rules for Successful Data Center Operations
Some Rules for Successful Data Center OperationsThomas Goulding
 
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...SolarWinds
 
Value add: Single User Performance Testing (http://managingperformancetesting...
Value add: Single User Performance Testing (http://managingperformancetesting...Value add: Single User Performance Testing (http://managingperformancetesting...
Value add: Single User Performance Testing (http://managingperformancetesting...akbollinger
 
Doing agile with an ISO-20000 Telco (AgilePT 2015)
Doing agile with an ISO-20000 Telco (AgilePT 2015)Doing agile with an ISO-20000 Telco (AgilePT 2015)
Doing agile with an ISO-20000 Telco (AgilePT 2015)Manuel Padilha
 
When to Do a Reliability Prediction
When to Do a Reliability PredictionWhen to Do a Reliability Prediction
When to Do a Reliability PredictionAccendo Reliability
 
A Day In the Life Of a Proactive Maintenance PdM Tech
A Day In the Life Of a Proactive Maintenance PdM TechA Day In the Life Of a Proactive Maintenance PdM Tech
A Day In the Life Of a Proactive Maintenance PdM TechRicky Smith CMRP, CMRT
 
Tech Talk: The New CA Application Performance Management Team Center—Faster T...
Tech Talk: The New CA Application Performance Management Team Center—Faster T...Tech Talk: The New CA Application Performance Management Team Center—Faster T...
Tech Talk: The New CA Application Performance Management Team Center—Faster T...CA Technologies
 
Flight checks -QA for Releases that Prevent Disasters from Escaping into the ...
Flight checks -QA for Releases that Prevent Disasters from Escaping into the ...Flight checks -QA for Releases that Prevent Disasters from Escaping into the ...
Flight checks -QA for Releases that Prevent Disasters from Escaping into the ...Brie Hoblin
 
DR Planning and Testing
DR Planning and TestingDR Planning and Testing
DR Planning and TestingJason Dea
 
Some of the shocking facts concerning underground utility strikes
Some of the shocking facts concerning underground utility strikesSome of the shocking facts concerning underground utility strikes
Some of the shocking facts concerning underground utility strikessygmasolutions
 
Nagios Conference 2014 - Nate Broderick - SLA - The Marriage of an Effective ...
Nagios Conference 2014 - Nate Broderick - SLA - The Marriage of an Effective ...Nagios Conference 2014 - Nate Broderick - SLA - The Marriage of an Effective ...
Nagios Conference 2014 - Nate Broderick - SLA - The Marriage of an Effective ...Nagios
 
Top 5 DB2 Support Nightmares 2018 #2
Top 5 DB2 Support Nightmares 2018 #2Top 5 DB2 Support Nightmares 2018 #2
Top 5 DB2 Support Nightmares 2018 #2Carol Davis-Mann
 

Similar to Maximizing Uptime Through Predictive Analysis and Data Integration (20)

Reliability predictions essay FMS Reliability
Reliability predictions  essay FMS ReliabilityReliability predictions  essay FMS Reliability
Reliability predictions essay FMS Reliability
 
5 Single Shift CI Projects (1)
5 Single Shift CI Projects (1)5 Single Shift CI Projects (1)
5 Single Shift CI Projects (1)
 
Performance Testing Web 2.0 Applications—in an Agile World
Performance Testing Web 2.0 Applications—in an Agile WorldPerformance Testing Web 2.0 Applications—in an Agile World
Performance Testing Web 2.0 Applications—in an Agile World
 
DOES14 - Dominica Degrandis - How we used Kanban in Operations to Get Things ...
DOES14 - Dominica Degrandis - How we used Kanban in Operations to Get Things ...DOES14 - Dominica Degrandis - How we used Kanban in Operations to Get Things ...
DOES14 - Dominica Degrandis - How we used Kanban in Operations to Get Things ...
 
Some Rules for Successful Data Center Operations
Some Rules for Successful Data Center OperationsSome Rules for Successful Data Center Operations
Some Rules for Successful Data Center Operations
 
salamanca_carlos_report
salamanca_carlos_reportsalamanca_carlos_report
salamanca_carlos_report
 
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...
 
Value add: Single User Performance Testing (http://managingperformancetesting...
Value add: Single User Performance Testing (http://managingperformancetesting...Value add: Single User Performance Testing (http://managingperformancetesting...
Value add: Single User Performance Testing (http://managingperformancetesting...
 
Doing agile with an ISO-20000 Telco (AgilePT 2015)
Doing agile with an ISO-20000 Telco (AgilePT 2015)Doing agile with an ISO-20000 Telco (AgilePT 2015)
Doing agile with an ISO-20000 Telco (AgilePT 2015)
 
When to Do a Reliability Prediction
When to Do a Reliability PredictionWhen to Do a Reliability Prediction
When to Do a Reliability Prediction
 
Decision Making
Decision MakingDecision Making
Decision Making
 
A Day In the Life Of a Proactive Maintenance PdM Tech
A Day In the Life Of a Proactive Maintenance PdM TechA Day In the Life Of a Proactive Maintenance PdM Tech
A Day In the Life Of a Proactive Maintenance PdM Tech
 
Tech Talk: The New CA Application Performance Management Team Center—Faster T...
Tech Talk: The New CA Application Performance Management Team Center—Faster T...Tech Talk: The New CA Application Performance Management Team Center—Faster T...
Tech Talk: The New CA Application Performance Management Team Center—Faster T...
 
Flight checks -QA for Releases that Prevent Disasters from Escaping into the ...
Flight checks -QA for Releases that Prevent Disasters from Escaping into the ...Flight checks -QA for Releases that Prevent Disasters from Escaping into the ...
Flight checks -QA for Releases that Prevent Disasters from Escaping into the ...
 
DR Planning and Testing
DR Planning and TestingDR Planning and Testing
DR Planning and Testing
 
Bpg 5 s Lean
Bpg 5 s LeanBpg 5 s Lean
Bpg 5 s Lean
 
Some of the shocking facts concerning underground utility strikes
Some of the shocking facts concerning underground utility strikesSome of the shocking facts concerning underground utility strikes
Some of the shocking facts concerning underground utility strikes
 
2011 05 Ms
2011 05 Ms2011 05 Ms
2011 05 Ms
 
Nagios Conference 2014 - Nate Broderick - SLA - The Marriage of an Effective ...
Nagios Conference 2014 - Nate Broderick - SLA - The Marriage of an Effective ...Nagios Conference 2014 - Nate Broderick - SLA - The Marriage of an Effective ...
Nagios Conference 2014 - Nate Broderick - SLA - The Marriage of an Effective ...
 
Top 5 DB2 Support Nightmares 2018 #2
Top 5 DB2 Support Nightmares 2018 #2Top 5 DB2 Support Nightmares 2018 #2
Top 5 DB2 Support Nightmares 2018 #2
 

Maximizing Uptime Through Predictive Analysis and Data Integration

  • 1. ZDT Group LLC 5460 Sandstone Ct Cumming, Ga. 30040 P – 770-886-9555 www.zdtgroup.com By Joe Soroka Copyright 2009 Maximizing Uptime through Predictive Analysis Operating Mission Critical Facilities in today’s environment means continuing to strive to meet and exceed our uptime requirements, as our operating budgets continue to shrink. Taking a holistic approach is needed to track our facility information to accomplish this. As I have discussed in previous whitepapers and other presentations, uptime is achieved through a process that I call “RAMPS”. The tier rating of a facility is only the first thing that affects uptime. You may have heard stories of people who have designed and built tier IV facilities and have had outages time after time and where a tier I facility has been operating for over ten years without a single outage. That’s because there is more than just the design of the facility that affects the facility uptime, and that is “RAMPS”. Reliability, Availability, Maintainability, Predictability, and Scalability are the keys to success. Without paying attention to all facets of the facility, your uptime requirements cannot be realized. Reliability Center Maintenance (RCM) can be used to help in meeting the demands of a shrinking operations budget. If you do not properly implement a RCM program it is called a “LMP” Lack of Maintenance Program not RCM. However, to correctly implement an RCM program you need to answer 7 questions per SAE JA1011 Evaluation Criteria for RCM process. These questions are as follows: 1. What is the equipment supposed to do and what is the performance standard? 2. In what ways can it fail to meet the performance standard? 3. What are the events that will lead to failure? 4. What happens when the piece of equipment fails? 5. In what way does each of the failure modes matter to system operations? 6. What systematic task can be performed to prevent the failure? 7. What must be done if a suitable preventative task cannot be implemented? Prior to implementing an RCM program, one must have a comprehensive Predictive Analysis program in affect. By understanding the past performance and current operating conditions of the equipment, we can develop predictions on failure modes for the future. Tracking more items with a longer duration will result in a greater accuracy in our
  • 2. ZDT Group LLC 5460 Sandstone Ct Cumming, Ga. 30040 P – 770-886-9555 www.zdtgroup.com By Joe Soroka Copyright 2009 predictions. This is why it is important to start at the beginning of the project with design, moving to construction, commissioning and then ongoing operation and maintenance. The information that is gained in each of these phases needs to be captured in a manner that allows you to look at specific details across the entire life of the equipment. Data gathered without purpose is just that, data. It is important to define how you need to use this data, and how it can assist you in your operation - from increasing reliability to reducing operating costs. When in the design phase, the requirements that the facility needs to operate at are explained in the Basis of Design (BoD) and the Sequence of Operation (SoO). It is these two documents that need to be fully developed in the design phase and updated during the life of the facility, for they act as the road map on how the facility shall operate. With these documents a baseline is established for the performance of your equipment, and when the site is started up and commissioned it is this baseline that the facility needs to operate at. During the startup and commissioning phases these baseline documents are updated so that changes that may have occurred during construction and startup phases are captured in the updated BoD and SoO documents. These documents need to be living documents and should be reviewed and updated as often as required. The information gathered during the commissioning process is typically never properly integrated into the operations. Having the data that was discovered during startup and commissioning is invaluable. Not having this data would be similar to taking all the photos of your children prior to their high school graduation and locking them up somewhere. Then years later, when you are sitting down with your son or daughter’s future spouse, trying to show them your child’s life story, you have to begin with high school pictures due to lack of data. All of the data collected during birth, startup, and commissioning is part of the equipment life story. The whole story needs to be looked at and shared between the parents and the spouse. As the spouse adds new photos to the album, the story of the person’s life is evolving and you are able to look through the album and see the entire story. Predictability is one part of the Uptime RAMPS that often gets ignored. We may have some trends we look at, and some testing that allows us to do some forecasting, but many times having a fully predictive program is lacking. The data gathered and collected during construction and commissioning should be fully integrated into the operation and
  • 3. ZDT Group LLC 5460 Sandstone Ct Cumming, Ga. 30040 P – 770-886-9555 www.zdtgroup.com By Joe Soroka Copyright 2009 maintenance program. This data becomes invaluable to you as you develop a predictability program. Imagine your data center is fully loaded with servers and you are operating at the design load of 10kW per cabinet. The CFD modeling you did with your heat load is proving out to be correct and everything is operating correctly. You decide to leave early for the fishing trip up in the mountains that you had planned for some time now. As you are enjoying your well-deserved trout fishing you realize how cool it is and you think “wow it must be hot back in town”. You pull out your cellphone to check the weather in town and you see there is no cell coverage. You don’t worry, you’re not on call and you have a good staff at the site, so you go back to fishing. The fish are biting, there is great weather, and you decide to stay late on Sunday. When you come home Sunday you are shocked by the amount of voicemails and emails you missed. It’s the one from your team that really hits you hard in the stomach. “Boss we not sure where you are and we hope the fishing is good but we just lost the site, the load dropped, all servers are down”. You immediately turn your car to the site and start calling everyone. When you arrive you spend the next couple of hours explaining where you were and trying to figure out what happened. The next day you review the incident reports and start to put together a detailed explanation of why your chiller plant failed, so you can complete a failure analysis report. When the failure occurred, both your primary chiller and redundant chiller failed due to high head pressure caused by high condenser water temperature. The cooling tower fans were off and would not start. The Tower Fan VFDs were working and they tried to put them in hand and still the tower fans would not operate. It took your team some time to figure out the problem, but in the time it took to correct the issue the data center floor overheated and servers started to fail. It was the vibration switches that were mounted on the fans that tripped off. Three of the four tower fans failed and the fourth fan could not support the load. It took some time to identify the problem but once it was discovered the vibration sensors were jumped out the fans started and the chillers were brought back online, but unfortunately not in time to keep the site up. You remember that you were the only one on the job when the startup and commissioning occurred, the rest of the team was hired after the site was turned over. You remember something about those vibration limit switches during startup 3 years ago. You look in the startup documents and there is no mention of any issue so you make some calls and track down the person who did the startup. You reach this person and ask
  • 4. ZDT Group LLC 5460 Sandstone Ct Cumming, Ga. 30040 P – 770-886-9555 www.zdtgroup.com By Joe Soroka Copyright 2009 him if he remembers what the issue with the vibration switch was, and as luck would have it he remembers that project. He tells you “Yes, we had a lot of issues with those switches during the startup, we replaced one and the others were adjusted.”. There is no mention of the parts that were replaced during startup. If it was noted and was listed in your system you would have known you had a failed device and you would have looked into the issues during startup. No matter if a part fails in the factory, during startup, or during the life of the equipment tracking, those failures and analyzing the nature of the failure will allow you to predict future issues. During the PM visits since the site was turned over to Operations, the vibration switches were never part of the maintenance procedures so they were not tested or adjusted. This lead to the switches tripping prior to any significant vibration. The Operation team was never trained on the vibration switch and did not know they were installed or what their function was. Many time equipment accessories are missed in training and preventative maintenance. A week later you are at home. You get that dreaded phone call and rush back to site. This time it’s the UPS system Module A. It has failed and dropped offline, however the site remains protected by the other modules. You call out your service rep and they find that the AC filter capacitors failed. Again you sit-down in your office the next morning to fill out another failure analysis report, and in the process of reviewing past maintenance records you notice that the AC filter current that is measured and recorded each visit was about the same - up to the last PM when it was 35% higher from the previous readings. It was three months ago when those AC Filter Capacitors were telling you it was time to replace them, but the data was tracked in a manner that couldn’t be analyzed. That evening at home you start thinking about how much information you were able to gather from the past documentation that you used to complete your failure analysis report. Why not use the data to predict what might fail rather than explaining why it did. Besides, a failure analysis report is just like a predictive analysis report, it’s just after the fact. So you start to identify your potential issues and figure out a way to capture and analyze the data. When preforming startup and ongoing maintenance the testing scripts need to be written in a switch level detailed fashion. The system that you use should be in a database so you can analyze the data and set thresholds. For example, do not write a test script that states “check the oil pressure” and just have a checkbox next to it. First of all, a checkbox is not sufficient data. You might have checked it but if the oil pressure was 2PSI, I’m sure that
  • 5. ZDT Group LLC 5460 Sandstone Ct Cumming, Ga. 30040 P – 770-886-9555 www.zdtgroup.com By Joe Soroka Copyright 2009 your engine will not like that for very long. Your test script should have stated “Record the oil pressure” and it should list a tolerance level. You inspect the oil pressure, it’s 35 PSI, and you record it. Your system should have a min max tolerance and if it is outside the acceptable reading, you should get a failed condition and an issue log should be automatically updated. Also you should have a % changed for a failed condition. This averages all of the past readings for the oil pressure and results in a failed condition if the percentage is x% outside of the average reading. Another approach could be rate of change, a failed condition which occurs if the rate of change is x% from the last reading. With these three types of automation added to your logging tool, you will start to capture and analyze items automatically that will allow you to predict failure rates. With a database as the backend for your commissioning and operations tool, you can intelligently analyze critical points over a period of time. A set-point can drift over time and may not be noticed because of the small changes from week to week, but with the proper automation tools these points can be analyzed and alarms can be generated automatically. You could have your building automation system run a script every morning for the critical points that you are trending, and then write the result to a XML file or a SOAP report that your automation tool could read as input for a daily PM task. Now, not only is your team performing site inspection and maintenance at the site, but your Building Automation System is talking directly with your operations automation tool (or CMMS), and you are gaining valuable data that is being analyzed automatically and alerting you of potential issues. Joe Soroka has been working in the mission-critical field for the past 26 years. He has worked with many clients over the years with conceptual designs, commissioning, operation and maintenance and failure analysis. Joe has commissioned over 8 million square feet of mission-critical facilities. He has worked on developing and reviewing Operations & Maintenance, Training and Safety programs, including Method Operating Procedures for mission-critical facilities. If you have any questions you may contact Joe Soroka at joe@zdtgroup.com or visit www.zdtgroup.com