Essential Elements of Data Center Facility Operations

1,240 views

Published on

This presentation describes unique management principles and provides a comprehensive, high-level overview of the necessary program elements for operating a mission critical facility efficiently and reliably throughout its life cycle. Practical management tips and advice are also given.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,240
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
64
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Essential Elements of Data Center Facility Operations

  1. 1. Essential Elements of Data Center Facility Operations Schneider Electric Data Center Science Center White Paper 196 Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  2. 2. 70% of data center outages are directly attributable to human error according to the Uptime Institute’s analysis of their “abnormal incident” reporting (AIR) database1. This figure highlights the critical importance of having an effective operations and maintenance (O&M) program. This presentation describes unique management principles and provides a comprehensive, high-level overview of the necessary program elements for operating a mission critical facility efficiently and reliably throughout its life cycle. Practical management tips and advice are also given. Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  3. 3. Introduction Importance of operations and maintenance (O&M) program • Most facility outages attributable to human (operator) error • Majority of data center facility TCO is in OPEX, not CAPEX, where greatest potential cost savings reside • Largest portion of OPEX are energy costs, which are rising • Drive for energy efficiency reducing capacity safety margins and system redundancy, increasing importance of proactive maintenance and data center infrastructure management (DCIM) • High levels of facility automation and equipment performance data have created new opportunities for enhancing reliability while reducing costs, when properly managed Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  4. 4. Mission Critical Mentality Failure is not an option ● Focuses on risk mitigation ● Grasps interconnectedness of facility and IT systems ● Data center availability is paramount ● Highly complex, fast-paced changes in mission critical facility ● Challenging to manage ● Unique outside pressures ● Government regulations ● Customer audits NOTE: In this paper, only system planning is covered. System planning refers to the power, cooling, racks, and other support infrastructure systems. Planning related to the IT equipment is not discussed here. Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  5. 5. Mission Critical Mentality Code of Conduct “Mission Critical Mindset” principles Impact Focused on risk mitigation in all operational and maintenance activities, work processes, and procedures Proactively deals with all potential threats to system availability and worker/occupant safety Acting with confidence and patience that is an outgrowth of careful planning and preparation Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 Prevents risks from becoming problems; enables faster response times and fewer errors if problems do arise Analytical, process-driven approach to risk avoidance and problem solving Helps identify and mitigate risk in complex environments; ensures predictable and safe operation Comprehensive understanding of the function and interconnectedness of facility systems and components Quickly identify and resolve potential threats or actual problems; avoid or reduce system downtime Commitment to continuous learning and process improvement Increases skills and operational efficiency to maintain an edge in a constantly changing environment
  6. 6. 12 Essential Elements of an O&M Program Environmental Health and Safety ● Key components include ● Injury, illness prevention ● Electrical safety ● Hazard analysis ● Hazard communication Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  7. 7. 12 Essential Elements of an O&M Program Environmental Health and Safety Key Program Attributes Description Safety plans and training Written safety plans must be established that describe the safe work practices and procedures to be observed by all workers. Regular training on the program elements must also be conducted. Hazard analysis All operational procedures shall start with an analysis of the possible hazards involved. Risks must be identified and safety measures assigned. Lockout/tagout procedures Proper procedures to prevent the unexpected energizing or startup of machines or equipment (or which causes a release of stored energy) shall be used when servicing or maintaining equipment. Personal protective equipment (PPE) Appropriate protective equipment should be provided, properly sized, stored, maintained, and utilized as required to mitigate identified safety hazards. Hazardous material handling Hazardous materials must be properly identified, labeled, stored, maintained, and used in conformance with manufacturer’s requirements, local laws, and ordinances. Hazard communications program Includes a list of hazardous chemicals, use of material safety data sheets (MSDS), proper labeling of all hazardous materials containers, and employee training on use of and protection from hazardous materials. Compliance with all applicable health and safety laws and regulations Requirements will likely vary by region and by level of government (e.g., local, state, federal). Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  8. 8. 12 Essential Elements of an O&M Program Personnel Management ● Hiring and training ● Competent, team-oriented people with mission critical mentality ● Well-rounded team ● Develop staffing model ● Clearly defined roles and responsibilities Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  9. 9. 12 Essential Elements of an O&M Program Emergency Preparedness and Response ● Develop emergency operating procedures – EOPs – for all high-risk failure scenarios ● Develop, rehearse escalation procedures ● Conduct regular scenario drills ● Formal failure analysis for significant facility events See White Paper 199, “Data Center Emergency Preparedness and Response”, for more information. Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  10. 10. 12 Essential Elements of an O&M Program Maintenance Management ● Key tasks ● Asset management ● Work order management ● Spare parts management ● Ensure power and cooling continual performance Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Improved reliability with ● Good asset intelligence ● Proactive and preventative predictive maintenance plan ● Results in ● More accurate maintenance budget forecasts ● Minimized TCO and downtime
  11. 11. 12 Essential Elements of an O&M Program Maintenance Management > Asset Management ● Accurate, consistent tracking of critical facility assets ● Computerized maintenance management system (CMMS) ● Record, track, and manage asset data and maintenance history ● Scope of service (SOS) ● Defines maintenance frequency, specific activities, # of man hours ● Establishes standard for procurement of ● Service agreements ● Maintenance scheduling ● Procedure development ● Continuous program improvement Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  12. 12. 12 Essential Elements of an O&M Program Maintenance Management > Asset Management ● Recommended asset management information ● Type - top level classification (e.g. electrical, mechanical, fire system) ● Sub-type (e.g. PDU, UPS, CRAH) ● Text description of asset ● Make - asset manufacturer name ● Model - manufacturer model # ● Size or rating ● Location ID (room/area) ● Trade responsible for maintenance ● Manufacturer serial # ● Install date ● Warranty expiration date ● Date asset to be replaced Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  13. 13. 12 Essential Elements of an O&M Program Maintenance Management > Work Order Management ● Tool for service process management ● Allows work to be ● Correctly prioritized ● Assigned the right resources ● Complete d on schedule ● Standalone ticketing system OR ● Integrated work order module in a CMS or DCIM system ● Provide valuable information to facility personnel Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  14. 14. 12 Essential Elements of an O&M Program Maintenance Management > Spare Parts Management ● Shortens mean time to recovery MTTR ● Inventory should include parts with lead times longer than acceptable downtime ● Maintain spare parts list ● Stock frequently used items ● Re-evaluate annually Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  15. 15. 12 Essential Elements of an O&M Program Change Management ● Method of Procedure - MOP - process ● Detailed checklist of specified tasks ● MOP helps control work activity along with ● Operational procedure development and review ● Risk analysis and communication ● Structured work practices ● Vendor/contractor supervision Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  16. 16. 12 Essential Elements of an O&M Program Documentation Management ● Facilitates development of ● Accurate procedures ● Proper training ● Workplace safety ● Process improvement ● Document management software application ● System to keep critical infrastructure records organized, up-to-date ● Detailed checklist of specified tasks ● Manual process can also work Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  17. 17. 12 Essential Elements of an O&M Program Training ● Establish training program that organizes operational and maintenance tasks into categories ● Mapped to capability levels – basic, intermediate, advanced ● Train and evaluate personnel to certify them ● Require annual recertification exams ● Ongoing education keeps personnel current Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  18. 18. 12 Essential Elements of an O&M Program Infrastructure Management ● System to match facility resources with changing IT requirements ● Prevent downtime ● Improve resiliency and response ● Reduce operating expenses ● Provide a sound basis for capacity planning decisions ● Three key tasks ● Facility monitoring ● Capacity management ● IT/Facilities integration Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  19. 19. 12 Essential Elements of an O&M Program Quality Management ● Key components ● Quality Assurance (QA): Typified by process and procedure standardization ● Quality Control (QC): Quality checks, inspections, and audits ● Continuous Quality Improvement Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  20. 20. 12 Essential Elements of an O&M Program Energy Management ● Energy typically the single largest data center expense ● 3 core tasks of an effective energy management program ● Performance benchmarking ● Efficiency analysis ● Strategic energy sourcing ● Optimized energy sourcing ● Reduce exposure to price volatility ● Secure pricing that fits budget and business objectives Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  21. 21. 12 Essential Elements of an O&M Program Financial Management ● Financial-related issues can impact facility’s day-to-day availability and resiliency ● Processes should focus on ● Purchasing ● Invoice matching ● Financial reporting/analysis ● Facility managers and purchasing department should maintain close relationship Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  22. 22. 12 Essential Elements of an O&M Program Performance Monitoring and Review ● Regularly monitor and review facility performance ● Determines health and effectiveness of O&M program ● Shows where it is trending ● Quality process should incorporate facility KPIs ● Benefits ● Aligns operational activities with business goals ● Positive reinforcement for innovation and process improvement Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  23. 23. Common Mistakes Common Mistakes Description Maintenance program is not driven by metrics Often the result of poor asset management No linkage made between break/fix maintenance activities and preventative maintenance Poor training Training is not formalized and/or is not taken seriously Over-reliance on technician “shadowing” No linkage between certification level and tasking Ineffective change management Inadequate risk analysis Poor or non-existent procedures No defined process for performing critical work tasks Failure to consistently test & evaluate skills Existing skills/training level not formally evaluated Scenario drills are not employed Incident and drill results are not evaluated Poor documentation No coherent sequence of operations Drawings and schedules are outdated Lack of revision control and/or lack of digitization Failure to develop and implement a quality control system Lack of governance or resources to measure, monitor, Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 and review performance Stuck in manual mode Failure to implement CMMS, EDMS, DCIM, etc Overconfidence Assumption that future performance can be predicted by past experience
  24. 24. Facility Operations Services Using Outside Vendors for O&M Programs ● Offer services for both existing and new data centers ● Advise on ● Develop ● Implement ● Operate See White Paper 198, “How to Write an Effective RFP for Data Center Facility Operations Services”, for more information. Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  25. 25. 12 Essential Elements of an O&M Program Performance Monitoring and Review > Recommended Facility KPIs ● Critical load uptime ● Load redundancy maintained ● Support system uptime ● Safety policy and procedure adherence ● Procedure development, management and use ● Maintenance completion ● Staffing coverage ● Security policy conformance ● Emergency preparedness drills ● Emergency response procedure adherence Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014 ● Quality control/improvement ● Training compliance ● Process improvement ● Operational reporting ● Proper event notification and escalation ● Timely and accurate cost reporting
  26. 26. Conclusion ● Efficient Operations & Maintenance program ● Mitigates threats, effects of human error ● Focus on 12 essential elements of O&M program ● Must have facilities operation team with “mission critical” mindset ● Operational philosophy focuses on ● risk mitigation ● Preparedness ● standardized processes ● continuous improvement Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014
  27. 27. Resources Facility Operations Maturity Model for Data Centers White Paper 197 How To Write an Effective RFP For Data Center Facility Operations Services White Paper 198 Data Center Emergency Preparedness and Response White Paper 199 Classification of Data Center Infrastructure Management (DCIM) Tools White Paper 104 How Data Center Infrastructure Management (DCIM) Software Improves Planning and Cuts Operational Costs White Paper 107 Avoiding Common Pitfalls of Evaluating and Implementing DCIM Software White Paper 170 Browse all APC white papers whitepapers.apc.com Browse all APC TradeOff Tools™ tools.apc.com Schneider Electric – Data Center Science Center WP 196 Presentation – February 2014

×