1. Dim-to-Dark Datacenter Operations
Emerging Methods & Technology Trends Towards
Greater IT Efficiency & Business Impact
Matthew J. Mansell
www.linkedin.com/in/mjmansell
M A Y 2 0 1 4
2. Attitudes are changing about what defines a well run
datacenter, as cloud computing gained momentum..
Oct’11: Newly installed USA CIO, Steven VanRoekel
2
May 2011: Vivek Kundra, 1st USA CIO, Federal Cloud First Strategy:
Matthew J. Mansell www.linkedin.com/in/mjmansell
3. …But, all federal agency missions, commercial companies
business priorities, and customers are NOT exactly the same!
Terry Halverson, US Navy CIO: at the US Capitol on 10 July 2013 speaking to a forum of
government and industry executives…
“Stop worrying about the number of datacenters. Its absolutely not the metric that
matters. The metric should be, how much am I paying to store all the data collectively?
My approach would be rather than reducing the number of datacenters, the Navy
would take out $1.4B in targeted cost reductions and then rechannel into more
productive IT investments.”
(U) Our new IC IT architecture will enable better integration of the IC and build trust
in information sharing. Missions will benefit from improved agility, scalability, and
security while realizing lower operating costs. We will employ cloud technologies,
wide-spread virtualization, thin-client desktops, application stores and improved
security as we evolve from an agency-centric to a modern, data-centric IC IT
Enterprise model.
Al Tarasiuk, IC CIO & Assistant Director of National Intelligence
(U) IC ITE Strategy 2012-2017 [ www.dni.gov/files/documents/IC_ITE_Strategy.pdf ]
“Doing in Common What is Commonly Done”
3
Matthew J. Mansell www.linkedin.com/in/mjmansell
4. Navy Federal Credit Union: “Global Presence – Small IT Footprint”
Bank Systems & Technology, 31 August 2010
4
Jerry Hermes, NFCU CIO (Retired):
The NFCU global footprint is not overly large, but it can be very remote to
reach our deployed customers.
NFCU cannot and do not have IT support people on station in many of
these globally remote AOAs.
Our goal is to automate as much IT support as possible, using remote IT
network monitoring tools to keep tabs and as a priority continue to
implement new monitoring and business management software.
We just don’t want to manage/monitor IT infrastructure components, but
know how IT assets deliver business services.
With new capabilities, tools, and skillsets our intent will be to shift from
just analyzing a failing/failed server to having better optics into, if not
proactively address, the business service that could be interrupted.
Rather than just monitor, IT will use analytics to make IT more resilient.
We are heavily virtualized, but cloud solutions are the next step.
Matthew J. Mansell www.linkedin.com/in/mjmansell
5. General State of Affairs in Traditional Legacy Datacenters
5
• Average datacenter is only 30% efficient, 70% lost to electricity (HVAC, lightening, etc)
• Average datacenter in USA has PUE* = 2.0 (..or 1 watt of overhead power to every watt
for IT equipment)
• State-of-Art datacenter is believed to have a 1.2 PUE or less
• Growth in average RU density:
o 1996 = 7 servers per rack
o 2010 = 20 servers per rack; with high-end at 60+ servers per rack.
o Rack level power consumption has increased 8x in same timespan
• As a result only about ½, and usually less, of all power consumed in the datacenter gets
to rack level for server power.
• Its estimated from its inception that VMware through server virtualization has enabled
legacy datacenter power reductions by 70-80%
Foundations of Green IT: Consolidation, Virtuization, Efficiency, & ROI in the datacenter, Poniatowski, 2010
* PUE = Power Usage Equivalent | 1.0 is ideal in terms of conversion of total power consumer versus that allocated IT equipment.
Matthew J. Mansell www.linkedin.com/in/mjmansell
6. The Capability Climb to Dim/Dark Centers
6
Architecture
Standards
Evolve to
80/20 Rule
Continuous
Monitoring,
Analytics &
Process
Discipline
Automation
Efficiency for
Workloads &
Business Services
SDDC / Clouds
Emerge to Enable
Aligned Business
and IT Strategy
Dim/Dark
Center/Modules
as NextGen IT
Capital Mgmt
Capability and
Competitive
Advantage for
the Enterprise
Define I&O Portfolio Playbooks for each layer of the
I&O physical-to-virtual architecture:
• SWOT /Readiness Assessment
• Systems Eng Transition Plan Dependencies & Gaps
• Value-Add I&O Process
• KPIs for IT Efficiency & Business Effectiveness
** Have customer-centric views mapped of how IT assets
support IT services linked to business strategy, core
mission functions, and emerging competitive priorities.
Not necessarily a single threaded migration
Matthew J. Mansell www.linkedin.com/in/mjmansell
7. 5 Datacenter and IT Infrastructure Lessons from the Cloud Giants
– Forrester Research | August 2013
7
SIZE MATTERS: LARGE SCALE PROVIDES ECONOMIC LEVERAGE
Server-to-administrator ratios
Infrastructure utilization rates
Power usage effectiveness (PUE)
Resiliency
HOW DO THEY DO IT: STANDARDIZE, OPTIMIZE, & AUTOMATE
Hardware Standardization And Simplification Sets The Stage For Operations At Scale
Implementation more often than not, via standardized, minimalist infrastructure
Drive system pre-defined configurations by use case at the time of purchase
Drive out complexity: homogeneous, simplified infrastructure with extreme cost savings
Workload Optimization Drives Infrastructure Consistency (Economies-of-Scope)
Automation is the Key Enabler to Productivity and Efficiency
Scripting enables management by automation
Minimal requirements let scripted automation proliferate
Hire fewer, but more sophisticated/multifunctional engineers
Aggressive Management of Power and Cooling is Universal
Matthew J. Mansell www.linkedin.com/in/mjmansell
8. 8
Lesson #1: Bring IT Process Automation to the Facilities Level
Lesson #2: Prioritize Speed-to-Market and Standards Over Customization
Lesson #3: Automate Basic IT Infrastructure Processes
Configuration / Rack Pre-Configuration / Rapid Deployment Architectures
Operating system and software installation
Ongoing monitoring
Lesson #4: Shift from Infrastructure Mgmt to Infrastructure Service Delivery
Think like a service provider and embrace customer service
Shift your administrators to service managers
Mask infrastructure complexity with service-level capabilities
Remember to give administrators a career path in the service delivery world
Lesson #5: Break Down Organizational Silos
Work across silos to ensure consistent operations – monitor process, not just servers!
Engage AD&D and EA and shift to modern application architectures
Matthew J. Mansell www.linkedin.com/in/mjmansell
5 Datacenter and IT Infrastructure Lessons from the Cloud Giants
– Forrester Research | August 2013 (2)
9. 2014 Predictions: How Hyperscale Computing Demands & Flash will
Reshape Server and Datacenter Architectures – Forrester Research | Dec 2013
9
1. The next major turn of the x86 clock will happen in 2014
• Intel completes high-end 22 nm server transition; CPU core counts could be as high as 15 cores per socket and offer
at least 50% ISO-power and ISO socket performance improvement throughput. Prediction: 16 socket servers
introduced by all major vendors and could provide up to 3x the throughput of topline 8-socket Xeon server today.
• I&O needs to: If extremely tight on floor and rack space, or limited power budget, systems refresh could produce
50% more workload over old servers and stretch operational budgets further; deferring new datacenter investment.
2. Microservers and ultra-dense servers will proliferate
• Expect most 1U/2U rack form factors to grow rapidly. For example, HP introducing Moonshot 1500 server packing
45 cartridges in a 4.3 unit enclosure for effective density greater than 10 server/RU; giving HP a performance-vs-
density and performance/wattage over Dell and IBM products initially.
• I&O needs to: Re-evaluate plans for near-term server upgrades, especially where power limitations to rack units
have limited capacity. Workload-to-power efficiency assessment is critical.
3. Commodity server, driven by open compute project (OCP) designs, will proliferate
• ODMs will continue to drive market to replace name brand servers with whitebox equivalents; and momentum
continues with initiatives like OCP for commodity performance in datacenters. Multiple large enterprises (i.e.,
Facebook) have already purchased 10,000-units of custom-specified servers and new OCP designs are routinely
introduced and only show signs of accelerating through 2014.
• I&O needs to: Develop OCP acquisition and management strategies if OCP approach fits price performance/
architecture roadmap for datacenter. If so, also consider starting acquisition of vendor neutral DCIM tools like
Nagios, Chef, or Puppet.
Matthew J. Mansell www.linkedin.com/in/mjmansell
10. 10
4. Memory channel storage will disrupt server flash architecture
• Diablo technology introduces Memory Channel Storage (MCS) architecture for high performance flash memory in
plugable dim modules for main memory connected directly to CPU. Three transformational benefits anticipated:
(i) with MCS direct connect to dim socket, it offers far less space/power overhead compared to PCIe vendor specific
cards; (ii) significant application performance and IO with low latency connection through system memory
controller with MCS flash; (iii) requires minimal hardware redesign for system board vendors. Expect major server
vendors to announce MCS flash integrated into high-end server product lines as option.
• I&O needs to: Determine what your server vendor’s plans are for MCS flash product integration and begin to
evaluate price performance options as soon as they are available.
5. ARM-based servers must outperform x86 in 2014 or they’ll remain niche products
• Intel x86 technology continues to improve with 14 animator CPU in 2014 and continues to widen gap on ARM,
which will likely eliminate ARM from the mid- to high-end server market.
• I&O needs to: Work with DevOps to evaluate pro/con of either architecture chip performance relative to software
architecture.
6. AMD and Intel will battle for low-power designs; Intel will prevail in enterprise servers
• Ultra-low-power (ULP) server market will remain competitive.
• I&O needs to: Determine if ARM chip product line can occupy niche in enterprise server architecture.
7. IBM will introduce faster Power 8 and will likely retain its performance edge
• Forrester believes that Power 8 (RISC CPU Line) will continue to give IBM advantage for leading performance in
per core and per socket CPU over industry competitors.
2014 Predictions: How Hyperscale Computing Demands & Flash will
Reshape Server and Datacenter Architectures – Forrester Research (2)
Matthew J. Mansell www.linkedin.com/in/mjmansell
11. Considerations for Developing the Software-
Defined Datacenter
11
SDDC changes everything, to include Cybersecurity!
…start retooling security team skills and methods for the “white-boxed/OCP” architectures that
will proliferate 3-5 years from now.
Prepare for hypervisor/API sprawl – with multiple virtualization, cloud/IaaS
etc and network controllers/ correlators methods converge even to common
VM fabric and DCIM plane are key
….checkout Hotlink superVISOR – software-defined hypervisor/middleware
Scripting and coding proficiency manage a more heterogeneous approach
needs to improve for SDDC engineers. Start evaluating tools to leverage:
Puppet Labs – Bank of America
Veean
OpenFlow (SDN)
Jeda Network (SDNS) – intelligent Fabric Network Controller that converts standard Ethernet
switch to enterprise storage fabric.
Matthew J. Mansell www.linkedin.com/in/mjmansell
12. Notable Dark Datacenter Innovators
12
• Uses 38% less energy to do same work at 24%
less cost of other Facebook datacenters
• PUE 1.07 vs. other Facebook datacenters
averaging 1.5
• 480-volt electric distribution to reduce energy
load
• Incorporates OCP Server Design
• Hot aisle air integrated to winter HVAC
• Eliminate central UPS
• Reduced non-computing overhead to 12%
• Google’s more stringent global datacenter PUE
overhead for 12 mos average = 1.12
• Highest site PUE = 1.21 / Lowest = 1.09
• Best site at PUE < 1.06 if used industry lose
integration of Green Grid metric
• Incorporates OCP Server Design
• Runs cold aisle at 80 Fahrenheit
Matthew J. Mansell www.linkedin.com/in/mjmansell
13. 13
Notable Dark Datacenter Innovators (2)
4 July 2012: Project Nibiru
AOL’s ATC Microdata center (box) live
AOL Objectives for ATC
• The ability to deliver that capacity anywhere in the world with
minimal to no staffing.
• Redefines software architecture for greater resiliency
• Delivering extremely dense compute capacity to give us the longest
possible use of these assets once deployed into the field.
• The ability to deliver a “Microdata Center” anywhere on the planet
regardless of temperature and humidity settings
• The ability to support/maintain/and administer remotely.
• The ability to fit into the power envelope of a normal office
building – deploy in increments of 500 kw to 1 mw of power.
• Participation in our cloud environment and capabilities
• Provision server (VM) in 8 seconds. If servers fail move to planned
maintenance mode or create new ones.
• 100% pre-racked from vendor
• Innovation on Configuration Management Systems
• Virtually 100% non-blocked network topology
• 2012 helped decommission 8,253 servers saving almost $3M,
compared to Barclay’s Bank saving over $4M up shutting down
5,515 servers – via Uptime Institute's Server Roundup Competition.
Matthew J. Mansell www.linkedin.com/in/mjmansell
14. Where are the Innovators Headed?
14
Infrastructure predictive analytics, still seems to be good idea, but has to be managed –
likely to have long learning curve to implement and operate effectively.
Netflix model which ASSUMES stuff will break and architects for failure. Even going to the
point of using "Chaos Monkeys". Chaos Monkeys are applications that go around and
intentionally break systems in the PRODUCTION datacenter to create random failure
situations and see if the infrastructure can handle it. They don't do this during prime time
hours (….like Friday nights when everyone is streaming) but they do cause failures during
other regular production times.
Build uber-elastic, clustered systems that are that don't matter if a system fails and have
procedures/discipline to take advantage of that investment. It is a “Pets versus Cattle”
approach for servers etc! If you treat servers like pets, you have to monitor carefully,
manage them, and take them to the vet when they get sick. If they are treated like cattle,
weelllll…you just shoot them. When they get a little sick, replace them – the server has to
keep up with my business and its operations – not me with it!
With elastic architectures built on IaaS fabrics you just don't care about individual servers.
Matthew J. Mansell www.linkedin.com/in/mjmansell
15. SDDC Reference Architecture Example:
VMware vCloud Automation Center
15
Matthew J. Mansell www.linkedin.com/in/mjmansell
16. Oracle Optimized Datacenter Reference Architecture
16
Logical view – December 2012
Matthew J. Mansell www.linkedin.com/in/mjmansell
17. HP FlexFabric Reference Architecture
with Intelligent Resilient Framework
17
Matthew J. Mansell www.linkedin.com/in/mjmansell
18. Juniper Cloud Ready Datacenter
18
Matthew J. Mansell www.linkedin.com/in/mjmansell
21. Enterprise Cloud Broker Model
Leveraging Vendor Agnostic API SuperVisor by HotLink
21
Matthew J. Mansell www.linkedin.com/in/mjmansell
22. Federated Enterprise Services Bus Architecture
A Conceptual Model for Enabling Strategic Capital Markets Integration via
Composable Business Domains and Enterprise Shared Service Utilities
22
Matthew J. Mansell www.linkedin.com/in/mjmansell
23. Integrated Enterprise Services Deployed via
Common Component Federation Framework
23
Matthew J. Mansell www.linkedin.com/in/mjmansell