Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Essential Elements of Data
Center Facility Operations
Schneider Electric
Data Center Science Center
Schneider Electric – D...
70% of data center outages are directly attributable to human
error according to the Uptime Institute’s analysis of their
...
Introduction
Importance of operations and maintenance (O&M) program
• Most facility outages attributable to human (operato...
Mission Critical Mentality
● Focuses on risk mitigation
● Grasps interconnectedness of facility
and IT systems
● Data cent...
Mission Critical Mentality
Code of Conduct
“Mission Critical Mindset” principles Impact
Focused on risk mitigation in all ...
12 Essential Elements of an O&M Program
Environmental Health and Safety
● Key components include
● Injury, illness prevent...
12 Essential Elements of an O&M Program
Environmental Health and Safety
Key Program Attributes Description
Safety plans an...
12 Essential Elements of an O&M Program
Personnel Management
● Hiring and training
● Competent, team-oriented people with
...
12 Essential Elements of an O&M Program
Emergency Preparedness and Response
● Develop emergency operating
procedures – EOP...
12 Essential Elements of an O&M Program
Maintenance Management
● Key tasks
● Asset management
● Work order management
● Sp...
12 Essential Elements of an O&M Program
Maintenance Management > Asset Management
● Accurate, consistent tracking of criti...
12 Essential Elements of an O&M Program
Maintenance Management > Asset Management
● Recommended asset management informati...
12 Essential Elements of an O&M Program
Maintenance Management > Work Order Management
● Tool for service process manageme...
12 Essential Elements of an O&M Program
Maintenance Management > Spare Parts Management
● Shortens mean time to recovery M...
12 Essential Elements of an O&M Program
Change Management
● Method of Procedure - MOP
- process
● Detailed checklist of
sp...
12 Essential Elements of an O&M Program
Documentation Management
● Facilitates development of
● Accurate procedures
● Prop...
12 Essential Elements of an O&M Program
Training
● Establish training program that organizes operational and maintenance
t...
12 Essential Elements of an O&M Program
Infrastructure Management
● System to match facility resources with changing IT re...
12 Essential Elements of an O&M Program
Quality Management
● Key components
● Quality Assurance (QA): Typified by process ...
12 Essential Elements of an O&M Program
Energy Management
● Energy typically the single
largest data center expense
● 3 co...
12 Essential Elements of an O&M Program
Financial Management
● Financial-related issues can impact facility’s
day-to-day a...
12 Essential Elements of an O&M Program
Performance Monitoring and Review
● Regularly monitor and review facility
performa...
Common Mistakes
CommonMistakes Description
Maintenance program is not driven
by metrics
Often the result of poor asset man...
Facility Operations Services
Using Outside Vendors for O&M Programs
● Offer services for both existing and new data center...
12 Essential Elements of an O&M Program
Performance Monitoring and Review > Recommended Facility KPIs
● Critical load upti...
Conclusion
● Efficient Operations & Maintenance program
● Mitigates threats, effects of human error
● Focus on 12 essentia...
Resources
Facility Operations Maturity Model for Data Centers
White Paper 197
How To Write an Effective RFP For Data Cente...
Upcoming SlideShare
Loading in …5
×

Essential Elements of Data Center Facility Operations

1,096 views

Published on

This presentation describes unique management principles and provides a comprehensive, high-level overview of the necessary program elements for operating a mission critical facility efficiently and reliably throughout its life cycle. Practical management tips and advice are also given.

Published in: Technology
  • Be the first to comment

Essential Elements of Data Center Facility Operations

  1. 1. Essential Elements of Data Center Facility Operations Schneider Electric Data Center Science Center Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 Data Center Science Center White Paper 196
  2. 2. 70% of data center outages are directly attributable to human error according to the Uptime Institute’s analysis of their “abnormal incident” reporting (AIR) database1. This figure highlights the critical importance of having an effective operations and maintenance (O&M) program. This presentation describes Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 and maintenance (O&M) program. This presentation describes unique management principles and provides a comprehensive, high-level overview of the necessary program elements for operating a mission critical facility efficiently and reliably throughout its life cycle. Practical management tips and advice are also given.
  3. 3. Introduction Importance of operations and maintenance (O&M) program • Most facility outages attributable to human (operator) error • Majority of data center facility TCO is in OPEX, not CAPEX, where greatest potential cost savings reside • Largest portion of OPEX are energy costs, which are rising Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 • Drive for energy efficiency reducing capacity safety margins and system redundancy, increasing importance of proactive maintenance and data center infrastructure management (DCIM) • High levels of facility automation and equipment performance data have created new opportunities for enhancing reliability while reducing costs, when properly managed
  4. 4. Mission Critical Mentality ● Focuses on risk mitigation ● Grasps interconnectedness of facility and IT systems ● Data center availability is paramount Failure is not an option Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Data center availability is paramount ● Highly complex, fast-paced changes in mission critical facility ● Challenging to manage ● Unique outside pressures ● Government regulations ● Customer audits NOTE: In this paper, only system planning is covered. System planning refers to the power, cooling, racks, and other support infrastructure systems. Planning related to the IT equipment is not discussed here.
  5. 5. Mission Critical Mentality Code of Conduct “Mission Critical Mindset” principles Impact Focused on risk mitigation in all operational and maintenance activities, work processes, and procedures Proactively deals with all potential threats to systemavailability and worker/occupant safety Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 Acting with confidence and patience that is an outgrowth of careful planning and preparation Preventsrisks from becoming problems; enables faster response times and fewer errors if problems do arise Analytical,process-drivenapproach to risk avoidance and problem solving Helps identify and mitigate risk in complex environments;ensures predictable and safe operation Comprehensiveunderstandingof the function and interconnectedness of facility systemsand components Quickly identify and resolve potential threats or actual problems; avoid or reduce system downtime Commitmentto continuous learning and process improvement Increasesskills and operational efficiency to maintainan edge in a constantly changing environment
  6. 6. 12 Essential Elements of an O&M Program Environmental Health and Safety ● Key components include ● Injury, illness prevention ● Electrical safety ● Hazard analysis Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Hazard analysis ● Hazard communication
  7. 7. 12 Essential Elements of an O&M Program Environmental Health and Safety Key Program Attributes Description Safety plans and training Written safety plans must be established that describe the safe work practices and procedures to be observed by all workers. Regular training on the program elements must also be conducted. Hazard analysis All operational procedures shall start with an analysis of the possible hazards involved. Risks must be identified and safety measures assigned. Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 involved. Risks must be identified and safety measures assigned. Lockout/tagout procedures Proper procedures to prevent the unexpected energizing or startup of machines or equipment (or which causes a release of stored energy) shall be used when servicing or maintaining equipment. Personal protective equipment (PPE) Appropriate protective equipment should be provided, properly sized, stored, maintained, and utilized as required to mitigate identified safety hazards. Hazardous material handling Hazardous materials must be properly identified, labeled, stored, maintained, and used in conformance with manufacturer’s requirements, local laws, and ordinances. Hazard communications program Includes a list of hazardous chemicals, use of material safety data sheets (MSDS), proper labeling of all hazardous materials containers, and employee training on use of and protection from hazardous materials. Compliance with all applicable health and safety laws and regulations Requirements will likely vary by region and by level of government (e.g., local, state, federal).
  8. 8. 12 Essential Elements of an O&M Program Personnel Management ● Hiring and training ● Competent, team-oriented people with mission critical mentality ● Well-rounded team Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Well-rounded team ● Develop staffing model ● Clearly defined roles and responsibilities
  9. 9. 12 Essential Elements of an O&M Program Emergency Preparedness and Response ● Develop emergency operating procedures – EOPs – for all high-risk failure scenarios ● Develop, rehearse escalation Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Develop, rehearse escalation procedures ● Conduct regular scenario drills ● Formal failure analysis for significant facility events See White Paper 199, “Data Center Emergency Preparedness and Response”, for more information.
  10. 10. 12 Essential Elements of an O&M Program Maintenance Management ● Key tasks ● Asset management ● Work order management ● Spare parts management Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Ensure power and cooling continual performance ● Improved reliability with ● Good asset intelligence ● Proactive and preventative predictive maintenance plan ● Results in ● More accurate maintenance budget forecasts ● Minimized TCO and downtime
  11. 11. 12 Essential Elements of an O&M Program Maintenance Management > Asset Management ● Accurate, consistent tracking of critical facility assets ● Computerized maintenance management system (CMMS) ● Record, track, and manage asset data and maintenance history ● Scope of service (SOS) Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Scope of service (SOS) ● Defines maintenance frequency, specific activities, # of man hours ● Establishes standard for procurement of ● Service agreements ● Maintenance scheduling ● Procedure development ● Continuous program improvement
  12. 12. 12 Essential Elements of an O&M Program Maintenance Management > Asset Management ● Recommended asset management information ● Type - top level classification (e.g. electrical, mechanical, fire system) ● Sub-type (e.g. PDU, UPS, CRAH) Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Text description of asset ● Make - asset manufacturer name ● Model - manufacturer model # ● Size or rating ● Location ID (room/area) ● Trade responsible for maintenance ● Manufacturer serial # ● Install date ● Warranty expiration date ● Date asset to be replaced
  13. 13. 12 Essential Elements of an O&M Program Maintenance Management > Work Order Management ● Tool for service process management ● Allows work to be ● Correctly prioritized Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Correctly prioritized ● Assigned the right resources ● Complete d on schedule ● Standalone ticketing system OR ● Integrated work order module in a CMS or DCIM system ● Provide valuable information to facility personnel
  14. 14. 12 Essential Elements of an O&M Program Maintenance Management > Spare Parts Management ● Shortens mean time to recovery MTTR ● Inventory should include parts with lead times longer than acceptable downtime ● Maintain spare parts list Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Maintain spare parts list ● Stock frequently used items ● Re-evaluate annually
  15. 15. 12 Essential Elements of an O&M Program Change Management ● Method of Procedure - MOP - process ● Detailed checklist of specified tasks ● MOP helps control work Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● MOP helps control work activity along with ● Operational procedure development and review ● Risk analysis and communication ● Structured work practices ● Vendor/contractor supervision
  16. 16. 12 Essential Elements of an O&M Program Documentation Management ● Facilitates development of ● Accurate procedures ● Proper training ● Workplace safety Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Process improvement ● Document management software application ● System to keep critical infrastructure records organized, up-to-date ● Detailed checklist of specified tasks ● Manual process can also work
  17. 17. 12 Essential Elements of an O&M Program Training ● Establish training program that organizes operational and maintenance tasks into categories ● Mapped to capability levels – basic, intermediate, advanced ● Train and evaluate personnel to certify them Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Require annual recertification exams ● Ongoing education keeps personnel current
  18. 18. 12 Essential Elements of an O&M Program Infrastructure Management ● System to match facility resources with changing IT requirements ● Prevent downtime ● Improve resiliency and response Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 and response ● Reduce operating expenses ● Provide a sound basis for capacity planning decisions ● Three key tasks ● Facility monitoring ● Capacity management ● IT/Facilities integration
  19. 19. 12 Essential Elements of an O&M Program Quality Management ● Key components ● Quality Assurance (QA): Typified by process and procedure standardization ● Quality Control (QC): Quality checks, inspections, and audits Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Quality Control (QC): Quality checks, inspections, and audits ● Continuous Quality Improvement
  20. 20. 12 Essential Elements of an O&M Program Energy Management ● Energy typically the single largest data center expense ● 3 core tasks of an effective Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● 3 core tasks of an effective energy management program ● Performance benchmarking ● Efficiency analysis ● Strategic energy sourcing ● Optimized energy sourcing ● Reduce exposure to price volatility ● Secure pricing that fits budget and business objectives
  21. 21. 12 Essential Elements of an O&M Program Financial Management ● Financial-related issues can impact facility’s day-to-day availability and resiliency Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Processes should focus on ● Purchasing ● Invoice matching ● Financial reporting/analysis ● Facility managers and purchasing department should maintain close relationship
  22. 22. 12 Essential Elements of an O&M Program Performance Monitoring and Review ● Regularly monitor and review facility performance ● Determines health and effectiveness of O&M program Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Shows where it is trending ● Quality process should incorporate facility KPIs ● Benefits ● Aligns operational activities with business goals ● Positive reinforcement for innovation and process improvement
  23. 23. Common Mistakes CommonMistakes Description Maintenance program is not driven by metrics Often the result of poor asset management No linkage made between break/fix maintenance activities and preventative maintenance Poor training Trainingis not formalized and/or is not taken seriously Over-reliance on technician “shadowing” No linkage between certification level and tasking Ineffective change management Inadequaterisk analysis Poor or non-existent procedures Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 Ineffective change management Poor or non-existent procedures No defined process for performingcritical work tasks Failure to consistently test & evaluate skills Existingskills/traininglevel not formally evaluated Scenario drills are not employed Incident and drill results are not evaluated Poor documentation No coherent sequence of operations Drawingsand schedules are outdated Lack of revision control and/or lack of digitization Failure to develop and implement a quality control system Lack of governance or resources to measure,monitor, and review performance Stuck in manual mode Failure to implement CMMS, EDMS, DCIM,etc Overconfidence Assumptionthat future performancecan be predicted by past experience
  24. 24. Facility Operations Services Using Outside Vendors for O&M Programs ● Offer services for both existing and new data centers ● Advise on ● Develop ● Implement Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Implement ● Operate See White Paper 198, “How to Write an Effective RFP for Data Center Facility Operations Services”, for more information.
  25. 25. 12 Essential Elements of an O&M Program Performance Monitoring and Review > Recommended Facility KPIs ● Critical load uptime ● Load redundancy maintained ● Support system uptime ● Safety policy and procedure adherence ● Procedure development, management and use Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Support system uptime ● Maintenance completion ● Staffing coverage ● Security policy conformance ● Emergency preparedness drills ● Emergency response procedure adherence ● Quality control/improvement ● Training compliance ● Process improvement ● Operational reporting ● Proper event notification and escalation ● Timely and accurate cost reporting
  26. 26. Conclusion ● Efficient Operations & Maintenance program ● Mitigates threats, effects of human error ● Focus on 12 essential elements of O&M program ● Must have facilities operation team with “mission critical” mindset ● Operational philosophy focuses on Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Operational philosophy focuses on ● risk mitigation ● Preparedness ● standardized processes ● continuous improvement
  27. 27. Resources Facility Operations Maturity Model for Data Centers White Paper 197 How To Write an Effective RFP For Data Center Facility Operations Services White Paper 198 Data Center Emergency Preparedness and Response White Paper 199 Classification of Data Center Infrastructure Management (DCIM) Tools White Paper 104 Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 Browse all APC white papers whitepapers.apc.com Browse all APC TradeOff Tools™ tools.apc.com White Paper 104 How Data Center Infrastructure Management (DCIM) Software Improves Planning and Cuts Operational Costs White Paper 107 Avoiding Common Pitfalls of Evaluating and Implementing DCIM Software White Paper 170

×