SlideShare a Scribd company logo
©2014 Extreme Networks, Inc. All rights
reserved.
How to move a data center in 35 days
Rich Casselberry
@richcasselberry
www.linkedin.com/in/richcasselberry
©2014 Extreme Networks, Inc. All rights
reserved.
Hmmm
Don’t take this the wrong way, but….
2
©2014 Extreme Networks, Inc. All rights
reserved.
Lesson 1: Read the fine print….
3
pursuant to section 3, paragraph c
©2014 Extreme Networks, Inc. All rights
reserved.
Lesson 2
“Don’t throw out the dirty water until you have clean water”
4
©2014 Extreme Networks, Inc. All rights
reserved.
Lesson 3: Crazy isn’t always wrong
Just because you’re crazy doesn’t mean you’re wrong
5
©2014 Extreme Networks, Inc. All rights
reserved.
Week 1 – Retire, Consolidate, Virtualize
 Communicate
 Retired anything and everything we could
 Start virtualizing
 Develop IT application test plans
 Met with other data centers to see if there was any other places to
go.
6
©2014 Extreme Networks, Inc. All rights
reserved.
Week 2 (2/10 to 2/16)
 Designed new backup architecture
 Finished cabinet layout and
ordered cabinets
 Started planning for production
move, ordered trucks and quote
from EMC
 Moved 35 servers from Boston
(mostly non-production iSCSI)
8
©2014 Extreme Networks, Inc. All rights
reserved.
Week 3 (2/17 to 2/24)
 Ordered backup hardware
 Received new cabinets and redesigned the room for placement
 More power issues, and resolutions
 Servers retired and virtualized this week
 Network up and running in some cabinets
9
©2014 Extreme Networks, Inc. All rights
reserved.
Space and cooling planned out
By week 3 we had the space pretty well figured out, though we soon
found out some cabinets would need to move so that the permanent
AC could be installed. Though it won’t be used until June, we needed to
make sure it could be put in the permanent spot.
10
©2014 Extreme Networks, Inc. All rights
reserved.
Week 4 (2/24 to 3/1)
 Backup gear arrives and gets installed
 Move schedule close to finalized
 Temporary cooling units in and online
 Network gear up and running
 Cabinet layout finalized
 First production move
11
©2014 Extreme Networks, Inc. All rights
reserved.
Backups arrived, setup, running and tested
Since we had used managed backups we had no gear to backup or
restore. Working with our vendors we were able to get it on site,
configured, cabled, powered and tested for the move this weekend.
13
Week 4 - Backups
©2014 Extreme Networks, Inc. All rights
reserved.
Cooling units arrive
Four 3 ton cooling units arrive and get setup and running to keep
the room at a constant 65. Some spots are warmer but using fans
we are able to keep the servers cool enough.
14
©2014 Extreme Networks, Inc. All rights
reserved.
Servers pulled and ready to load on the truck
Friday night we started at 2:00PM and had the servers pulled and on
the truck by 9:00PM. Each server went in a separate pile depending on
the cabinet it would be racked in, once in Andover. This allowed us to
quickly get things running again.
15
©2014 Extreme Networks, Inc. All rights
reserved.
Week 5 (3/2 to 3/9)
 Reviewed what worked/what didn’t
 Reviewed power usage and moved engineering servers to lab
 Finalized schedule for final move
 Built DMZ for externally facing boxes
 Defined outage plan for web sites
17
©2014 Extreme Networks, Inc. All rights
reserved.
Some servers moved to the lab
19
©2014 Extreme Networks, Inc. All rights
reserved.
Saturday AM
We had most of the servers racked, powered and network
attached Saturday night. These are all cabled and ready to be
turned on.
20
©2014 Extreme Networks, Inc. All rights
reserved.
Sunday testing
Of course the last thing we wanted to see was a power truck at the end
of Minuteman road on Sunday, but nevertheless that’s what happened.
High winds caused power issues in the park. Luckily our generator and
UPS allowed us to continue testing.
23
©2014 Extreme Networks, Inc. All rights
reserved.
Data Center update
 Move was successful
– Virtualized 37 servers
– Retired approximately 15 servers
– Moved 110 servers from Boston to Andover and another 36 in Andover
– Removed and reinstalled approximately 160 fiber connections, 700
copper connections and 260 power cords.
– Designed, installed, deployed and tested backups for 80 servers (23
TB)
– Virtual servers (all 37) moved in under 2 hours. 90 minutes was waiting
for the snapshots to finish.
– Over 2500 extra hours of planning, designing, testing, moving and
troubleshooting were involved in this move.
24
©2014 Extreme Networks, Inc. All rights
reserved.
Tip 1 - Plan everything
“Plans are useless, planning is essential”
25
©2014 Extreme Networks, Inc. All rights
reserved.
Tip 2 - The plan is only as good as the
documentation.
“If the documentation is 90% correct, 10% of the stuff won’t
work”
26
©2014 Extreme Networks, Inc. All rights
reserved.
Tip 3 – Communicate early and consistently.
“When people don’t hear, they assume the worst”
27
©2014 Extreme Networks, Inc. All rights
reserved.
Tip 4 – People are, well people
“Sleep is for the weak”
28
©2014 Extreme Networks, Inc. All rights
reserved.
Tip 5 – Check supplies
“Getting a roll of tape Tuesday afternoon is easy. At 3:00AM on
Saturday in the middle of nowhere is not”
29
©2014 Extreme Networks, Inc. All rights
reserved.
Tip 6 – Review process
“Wait for the admin to shutdown the box first”
30
©2014 Extreme Networks, Inc. All rights
reserved.
Tip 7 - Timing
A team of two people will take
–5 minutes to rack a server in the cabinet
–5 minutes per Ethernet cable
–2 minutes per power cable
31
©2014 Extreme Networks, Inc. All rights
reserved.
Tip 8 – Always have plan B and C
 “If the elevator breaks can you really move all of that up the stairs?”
32
©2014 Extreme Networks, Inc. All rights
reserved.
Tip 9 – Do what you can early
 “Anything that can already be done, should be”
33
©2014 Extreme Networks, Inc. All rights
reserved.
Tip 10 – Test and troubleshoot
“You aren’t done until it’s tested”
34
©2014 Extreme Networks, Inc. All rights
reserved.
Tip 11 – review and recognize
“You never know when the next move will be”
35
©2014 Extreme Networks, Inc. All rights
reserved.
Bonus Tip– Because 12 tips sounds better
than 11
Be there
36
©2014 Extreme Networks, Inc. All rights
reserved.

More Related Content

Similar to Afcom how to move a data center

How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competition
Evans Ye
 
Inoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migrationInoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migration
OpenNebula Project
 
Neo4j GraphTalks Zurich - Taming the Complexity of Network & IT Ops
Neo4j GraphTalks Zurich - Taming the Complexity of Network & IT OpsNeo4j GraphTalks Zurich - Taming the Complexity of Network & IT Ops
Neo4j GraphTalks Zurich - Taming the Complexity of Network & IT Ops
Neo4j
 
World Youth Day 2008 - Lightening Talk
World Youth Day 2008 - Lightening TalkWorld Youth Day 2008 - Lightening Talk
World Youth Day 2008 - Lightening Talk
Skeeve Stevens
 
Automated Performance Testing for Desktop Applications by Ciprian Balea
Automated Performance Testing for Desktop Applications by Ciprian BaleaAutomated Performance Testing for Desktop Applications by Ciprian Balea
Automated Performance Testing for Desktop Applications by Ciprian Balea
3Pillar Global
 
FInal Project MGMT 404
FInal Project MGMT 404FInal Project MGMT 404
FInal Project MGMT 404
Steven Quenzel
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebula Project
 
Continuous Delivery with Jenkins and Wildfly (2014)
Continuous Delivery with Jenkins and Wildfly (2014)Continuous Delivery with Jenkins and Wildfly (2014)
Continuous Delivery with Jenkins and Wildfly (2014)
Tracy Kennedy
 
2 b brychan watkins
2 b brychan watkins2 b brychan watkins
2 b brychan watkins
CFG
 
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebula Project
 
Real time visualization of structured things
Real time visualization of structured thingsReal time visualization of structured things
Real time visualization of structured things
Nurul Amin Choudhury
 
Advanced RF Design & Troubleshooting #AirheadsConf Italy
Advanced RF Design & Troubleshooting #AirheadsConf ItalyAdvanced RF Design & Troubleshooting #AirheadsConf Italy
Advanced RF Design & Troubleshooting #AirheadsConf Italy
Aruba, a Hewlett Packard Enterprise company
 
Next Gen Storage and Networking in Container Environments - September 2016 Ra...
Next Gen Storage and Networking in Container Environments - September 2016 Ra...Next Gen Storage and Networking in Container Environments - September 2016 Ra...
Next Gen Storage and Networking in Container Environments - September 2016 Ra...
Shannon Williams
 
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
Continuent
 
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
Masaaki Nakagawa
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013
Server Density
 
MIS Internship.pptx
MIS Internship.pptxMIS Internship.pptx
MIS Internship.pptx
austin antoine
 
Cilent to server network plan
Cilent to server network planCilent to server network plan
Cilent to server network plan
Ricky Asher
 
Capstone presentation
Capstone presentationCapstone presentation
Capstone presentation
Lamarr Spencer
 
CPwE Helps Jack Daniels Cut Downtime, Start Connected Enterprise Journey
CPwE Helps Jack Daniels Cut Downtime, Start Connected Enterprise JourneyCPwE Helps Jack Daniels Cut Downtime, Start Connected Enterprise Journey
CPwE Helps Jack Daniels Cut Downtime, Start Connected Enterprise Journey
Rockwell Automation
 

Similar to Afcom how to move a data center (20)

How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competition
 
Inoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migrationInoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migration
 
Neo4j GraphTalks Zurich - Taming the Complexity of Network & IT Ops
Neo4j GraphTalks Zurich - Taming the Complexity of Network & IT OpsNeo4j GraphTalks Zurich - Taming the Complexity of Network & IT Ops
Neo4j GraphTalks Zurich - Taming the Complexity of Network & IT Ops
 
World Youth Day 2008 - Lightening Talk
World Youth Day 2008 - Lightening TalkWorld Youth Day 2008 - Lightening Talk
World Youth Day 2008 - Lightening Talk
 
Automated Performance Testing for Desktop Applications by Ciprian Balea
Automated Performance Testing for Desktop Applications by Ciprian BaleaAutomated Performance Testing for Desktop Applications by Ciprian Balea
Automated Performance Testing for Desktop Applications by Ciprian Balea
 
FInal Project MGMT 404
FInal Project MGMT 404FInal Project MGMT 404
FInal Project MGMT 404
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
Continuous Delivery with Jenkins and Wildfly (2014)
Continuous Delivery with Jenkins and Wildfly (2014)Continuous Delivery with Jenkins and Wildfly (2014)
Continuous Delivery with Jenkins and Wildfly (2014)
 
2 b brychan watkins
2 b brychan watkins2 b brychan watkins
2 b brychan watkins
 
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
 
Real time visualization of structured things
Real time visualization of structured thingsReal time visualization of structured things
Real time visualization of structured things
 
Advanced RF Design & Troubleshooting #AirheadsConf Italy
Advanced RF Design & Troubleshooting #AirheadsConf ItalyAdvanced RF Design & Troubleshooting #AirheadsConf Italy
Advanced RF Design & Troubleshooting #AirheadsConf Italy
 
Next Gen Storage and Networking in Container Environments - September 2016 Ra...
Next Gen Storage and Networking in Container Environments - September 2016 Ra...Next Gen Storage and Networking in Container Environments - September 2016 Ra...
Next Gen Storage and Networking in Container Environments - September 2016 Ra...
 
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
Harnessing the Power of Master/Slave Clusters to Operate Data-Driven Business...
 
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013
 
MIS Internship.pptx
MIS Internship.pptxMIS Internship.pptx
MIS Internship.pptx
 
Cilent to server network plan
Cilent to server network planCilent to server network plan
Cilent to server network plan
 
Capstone presentation
Capstone presentationCapstone presentation
Capstone presentation
 
CPwE Helps Jack Daniels Cut Downtime, Start Connected Enterprise Journey
CPwE Helps Jack Daniels Cut Downtime, Start Connected Enterprise JourneyCPwE Helps Jack Daniels Cut Downtime, Start Connected Enterprise Journey
CPwE Helps Jack Daniels Cut Downtime, Start Connected Enterprise Journey
 

Recently uploaded

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

Afcom how to move a data center

  • 1. ©2014 Extreme Networks, Inc. All rights reserved. How to move a data center in 35 days Rich Casselberry @richcasselberry www.linkedin.com/in/richcasselberry
  • 2. ©2014 Extreme Networks, Inc. All rights reserved. Hmmm Don’t take this the wrong way, but…. 2
  • 3. ©2014 Extreme Networks, Inc. All rights reserved. Lesson 1: Read the fine print…. 3 pursuant to section 3, paragraph c
  • 4. ©2014 Extreme Networks, Inc. All rights reserved. Lesson 2 “Don’t throw out the dirty water until you have clean water” 4
  • 5. ©2014 Extreme Networks, Inc. All rights reserved. Lesson 3: Crazy isn’t always wrong Just because you’re crazy doesn’t mean you’re wrong 5
  • 6. ©2014 Extreme Networks, Inc. All rights reserved. Week 1 – Retire, Consolidate, Virtualize  Communicate  Retired anything and everything we could  Start virtualizing  Develop IT application test plans  Met with other data centers to see if there was any other places to go. 6
  • 7. ©2014 Extreme Networks, Inc. All rights reserved. Week 2 (2/10 to 2/16)  Designed new backup architecture  Finished cabinet layout and ordered cabinets  Started planning for production move, ordered trucks and quote from EMC  Moved 35 servers from Boston (mostly non-production iSCSI) 8
  • 8. ©2014 Extreme Networks, Inc. All rights reserved. Week 3 (2/17 to 2/24)  Ordered backup hardware  Received new cabinets and redesigned the room for placement  More power issues, and resolutions  Servers retired and virtualized this week  Network up and running in some cabinets 9
  • 9. ©2014 Extreme Networks, Inc. All rights reserved. Space and cooling planned out By week 3 we had the space pretty well figured out, though we soon found out some cabinets would need to move so that the permanent AC could be installed. Though it won’t be used until June, we needed to make sure it could be put in the permanent spot. 10
  • 10. ©2014 Extreme Networks, Inc. All rights reserved. Week 4 (2/24 to 3/1)  Backup gear arrives and gets installed  Move schedule close to finalized  Temporary cooling units in and online  Network gear up and running  Cabinet layout finalized  First production move 11
  • 11. ©2014 Extreme Networks, Inc. All rights reserved. Backups arrived, setup, running and tested Since we had used managed backups we had no gear to backup or restore. Working with our vendors we were able to get it on site, configured, cabled, powered and tested for the move this weekend. 13 Week 4 - Backups
  • 12. ©2014 Extreme Networks, Inc. All rights reserved. Cooling units arrive Four 3 ton cooling units arrive and get setup and running to keep the room at a constant 65. Some spots are warmer but using fans we are able to keep the servers cool enough. 14
  • 13. ©2014 Extreme Networks, Inc. All rights reserved. Servers pulled and ready to load on the truck Friday night we started at 2:00PM and had the servers pulled and on the truck by 9:00PM. Each server went in a separate pile depending on the cabinet it would be racked in, once in Andover. This allowed us to quickly get things running again. 15
  • 14. ©2014 Extreme Networks, Inc. All rights reserved. Week 5 (3/2 to 3/9)  Reviewed what worked/what didn’t  Reviewed power usage and moved engineering servers to lab  Finalized schedule for final move  Built DMZ for externally facing boxes  Defined outage plan for web sites 17
  • 15. ©2014 Extreme Networks, Inc. All rights reserved. Some servers moved to the lab 19
  • 16. ©2014 Extreme Networks, Inc. All rights reserved. Saturday AM We had most of the servers racked, powered and network attached Saturday night. These are all cabled and ready to be turned on. 20
  • 17. ©2014 Extreme Networks, Inc. All rights reserved. Sunday testing Of course the last thing we wanted to see was a power truck at the end of Minuteman road on Sunday, but nevertheless that’s what happened. High winds caused power issues in the park. Luckily our generator and UPS allowed us to continue testing. 23
  • 18. ©2014 Extreme Networks, Inc. All rights reserved. Data Center update  Move was successful – Virtualized 37 servers – Retired approximately 15 servers – Moved 110 servers from Boston to Andover and another 36 in Andover – Removed and reinstalled approximately 160 fiber connections, 700 copper connections and 260 power cords. – Designed, installed, deployed and tested backups for 80 servers (23 TB) – Virtual servers (all 37) moved in under 2 hours. 90 minutes was waiting for the snapshots to finish. – Over 2500 extra hours of planning, designing, testing, moving and troubleshooting were involved in this move. 24
  • 19. ©2014 Extreme Networks, Inc. All rights reserved. Tip 1 - Plan everything “Plans are useless, planning is essential” 25
  • 20. ©2014 Extreme Networks, Inc. All rights reserved. Tip 2 - The plan is only as good as the documentation. “If the documentation is 90% correct, 10% of the stuff won’t work” 26
  • 21. ©2014 Extreme Networks, Inc. All rights reserved. Tip 3 – Communicate early and consistently. “When people don’t hear, they assume the worst” 27
  • 22. ©2014 Extreme Networks, Inc. All rights reserved. Tip 4 – People are, well people “Sleep is for the weak” 28
  • 23. ©2014 Extreme Networks, Inc. All rights reserved. Tip 5 – Check supplies “Getting a roll of tape Tuesday afternoon is easy. At 3:00AM on Saturday in the middle of nowhere is not” 29
  • 24. ©2014 Extreme Networks, Inc. All rights reserved. Tip 6 – Review process “Wait for the admin to shutdown the box first” 30
  • 25. ©2014 Extreme Networks, Inc. All rights reserved. Tip 7 - Timing A team of two people will take –5 minutes to rack a server in the cabinet –5 minutes per Ethernet cable –2 minutes per power cable 31
  • 26. ©2014 Extreme Networks, Inc. All rights reserved. Tip 8 – Always have plan B and C  “If the elevator breaks can you really move all of that up the stairs?” 32
  • 27. ©2014 Extreme Networks, Inc. All rights reserved. Tip 9 – Do what you can early  “Anything that can already be done, should be” 33
  • 28. ©2014 Extreme Networks, Inc. All rights reserved. Tip 10 – Test and troubleshoot “You aren’t done until it’s tested” 34
  • 29. ©2014 Extreme Networks, Inc. All rights reserved. Tip 11 – review and recognize “You never know when the next move will be” 35
  • 30. ©2014 Extreme Networks, Inc. All rights reserved. Bonus Tip– Because 12 tips sounds better than 11 Be there 36
  • 31. ©2014 Extreme Networks, Inc. All rights reserved.

Editor's Notes

  1. My name is Rich Casselberry. I run the network and security for Extreme Networks. We are the fourth largest network equipment provider in the world. We have some unique features that give great visibility and control in the data center that you aren’t going to hear about today. Because today isn’t about that. We all hear about super cool new or upcoming technology too. I read an article last month about Disaster avoidance. Apparently it’s not cool to be able to recover from a disaster, instead people are just moving their virtual data centers somewhere else while the hurricane, snowstorm or tornado goes through and then move it back on the fly. Very cool. I wish I could do that, but like a lot of people I still have a data center. A real one with air conditioning and UPSes. I’m not talking about that either
  2. Thanks.. “Don’t take this the wrong way but” Yeah we all know what that means right? That means the person telling you that is about to call you an idiot and they are hoping by saying that first that you won’t realize it giving them enough time to get away before you figure it out. A friend of mine is a freelance writer and was doing a story on the dumb things we have done in IT. He asked for stories and having been in IT for a long time I quickly fired off a list of some of the top of mind blunders. He emailed me back in 5 minutes with “Dude, we need to talk.” We scheduled a 30 minute call and 90 minutes later he said “Don’t take this the wrong way but how is it you haven’t been fired?” How could I take that the wrong way right? When Sherry first asked me to talk here I was flattered. Deep down though I knew it was because she knows how many incredibly stupid things I have done and really what she wanted was for me to share them so you don’t make the same dumb mistakes.. This is the story about one of those.
  3. Instead I wanted to share one of my biggest blunder, well collection of blunders. What happens when the company that gives you colo data center decides that you aren’t a strategic customer and actually asks you to leave… in 45 days. We were using a colo data center space and on February 4th, my sisters birthday actually, 8 years ago got a letter that basically said “pursuant to section 3, paragraph c, we are required to give you not less than 45 days to vacate the facility” Lesson 1 – Read the fine print… Everyone knows it takes 12-18 months to move a data center, If you are aggressive maybe 6 months. Yet most contracts have the ability to terminate your contract with 45 days. Notice from either party. Now to be fair we had been in the data center for over 2 years and our contract had expired but we were paying month to month. We were also pretty open that we were moving our data center back in house so we could use it to show customers how we build data center networks and manage our data center. But we also were clear that we were going to move in late fall, not February. So I called our sales rep and said “really 45 days? I can’t even get a circuit that fast”
  4. He replied with “Oh yeah I meant to give you a heads up on that. I did talk to corporate and they agreed to let you stay if you sign a 2 year contract, at double the square footage cost.” He went on to explain that the price they were charging us was causing them to lose money and that it was significantly below market rates. Now I’m pretty in touch with the market rates in Boston and while we have a good rate, it’s not half price. To put it in perspective for us this meant almost 1.2m. We didn’t really have a plan so I politely got out of the call. Which brings up lesson 2.
  5. We had plans and had started building out new power feeds, UPS, switchboards, 360tons of cooling and a 4000sq.ft data center on site, but there was no way it would be ready in time. We didn’t have enough space, power or cooling to adequately fit in our existing data rooms and didn’t have any idea if we could do it. I met with the team that night and remarkably no one said “Can’t be done”. To me that was the most amazing part, no one gave up. We took a quick look on our existing room and like probably a lot of data center, there was a lot of stuff that was old, some stuff turned off and cabinets that were half full. We had our electrician measure the power used and found someone that would rent us 4 ton portable air conditioners. We thought it might be possible so I went to the CIO. He thought I was nuts, but he was open to it. We had 3 days to convince him we could do it. We did a power audit, space audit and decided it was close, but we also were able to free up 5 cabinets for an old remote lab that we planned to decommission but it had never really been important enough to do. We made it important enough to do. That 5 cabinets gave us the extra room we needed and the momentum to convince the executives we could do it. So I called our sales rep back and said “You know we thought about what corporate offered and what I’d like you to do is go back to corporate and tell them to take a big bite out of my ass. We’re leaving and will be out in 5 weeks”
  6. We knew that everything else we were doing was on hold for the next 6 weeks so the first thing we did was send an email to the entire company, letting them know what was going on. We were still really tight on power, not just power in the data center, but power to the whole building. At one point we calculated we had a spare .7 amps if we moved everything and the AC’s were running. That was a bit too tight so we spent a lot of the first week sweeping the building and were pretty ruthless about turning off anything, personal refrigerators, old monitors, space heaters. We also had another team look to see what we could virtualize. VMware was still pretty new but we had been using it for test and development just not production. We virtualized 35 systems that first week. We also designed our backup system since the data center managed all of that for us. Luckily we have good relationships with most of our vendors and they jumped through hoops to help us. We also were halfway through a storage migration from EMC to Lefthand storage. We pretty quickly realized we could use that extra space and move our virtual machines over the network. And we met with the applications team and the rest of IT to make sure every knew the schedule and plan and worked up a pretty robust application list broken up by criticality We had planned to meet some local data center companies, including one that was in the same business part as us just in case, but decided to cancel once we were sure we could pull it off. Hedging our bets seemed too make it more risky and less certain so we doubled down on the move.
  7. Like I sa
  8. We started week 2 fully committed to the plan and with the full support of the company. We ordered our backup environment, new cabinets and network gear and AC units. We identified 35 test and development servers that we could move. With no objections we loaded them in a truck on Wednesday and by 11:00 they were in our data center. By 5:00 they were racked, and had power cables plugged in. The next day we ran all the network cables and powered them on. We also designed the cabinet layout in visio, built a temperature monitoring system so we could make sure we weren’t cooking anything, figured out how to get the networks to live in both places, and for final approval for the move weekends. One on 2.29 and one the following week.
  9. Week 3 We virtualized more systems over the weekend, figured out power was harder than we thought and found a few more cabinets that we could move to our lab area if needed. They will need rack mount UPS’es if we do that since there isn’t redundant power in the labs. The first third of our cabinet showed up and some custom length power cords. 18” is the right length. 8’ is too long. We also built our ESX environment on new hardware in Andover. This will allow us to move the virtual machines nearly live. We also started testing our backup system to make sure we could import the catalog in case we needed to restore old tapes.
  10. We did figure out that where we planned to place cabinets was also where the permanent AC unit was going so we had to redo the design. It all still fit, but required a bit of work.
  11. The backup gear and the remainder of the cabinets arrived and we started placing them, installing network switches and running fiber. We got our 4 portable AC’s and after running some extension cords got them powered up and cooling. We also still were removing old systems and pulled another 15 out, reducing power draw by 30 amps.
  12. Many of the bus ducts were fed from a 200Amp circuit but much of the power we used came from a 400Amp circuit so we had to run extension cords across the various rooms to get power were it was needed.
  13. Move 1 started Friday at 2. We had figured out which systems could move. Anything critical or attached to the fiber channel SAN, had to wait. Anything else was fair game. We even broke redundancy of some systems to even out the load and reduce the risk for the more critical move. We started breaking down at 3PM with the admins on the phone. They would tell us when the system was down, we’d run over pull it out and put it on the pile. The pile was decided based on where it was going, not where it came from. We had a worksheet for each one that we would use to keep us on track. Saturday AM team 2 started racking servers and by the time team 1 got back on site we could start network cabling. By Saturday night we had brought up some of the systems and Sunday we were able to finish it all up.
  14. We reviewed on Monday afternoon what we did well and what didn’t go well. When we looked at power we knew there was no way we had enough so we bit the bullet and moved the 2 racks of engineering servers to the lab.
  15. Moving these 35 servers from the MDF to the lab freed up 50 amps of power. They went to an area of the lab with very little equipment running so power and cooling was not an issue. We had to run fiber and an N1 to provide connectivity.
  16. Friday night we broke down everything and got it on the truck and then followed the truck from Boston to Andover and unloaded the pallets. Then we went to bed and the next team started racking the servers in the cabinets. When we got back up around 8 and got in the office we stared cabling. By the end of Saturday all the servers were pretty much ready to turn on. We brought up some of the base infrastructure but left most of it off until Sunday AM.
  17. Driving in Sunday and seeing a power truck at the end of the road was not how I wanted to start the day. Luckily almost all of the machines came up, in the right order, and with no major problems. One or two drives needed to be reseated and a power supply changed, but nothing catastrophic. By noon we were largely done testing and feeling pretty good
  18. At the end of it we moved around 150 servers, 160 fiber, 700 copper, and 260 power cords and added 80 backups to a new backup environment
  19. OK so what did we learn? Tip 1. Plan everything. We have a guy that does these crazy detailed plans. This was hit whiteboard for the first move. Literally he would have a schedule that says “From 7:45 to 8:37, you will be cabling in cabinet r1c3. “ Really down to the minute. And of course 5 minutes in the plan was already off track but it allowed us to space people out and have a base to start from  
  20. Tip #2, absolutely have to have perfect documentation. We helped one customer move their data center and when we asked about their docs they said it was good, probably 90% right… We spent 3 weeks helping them get it to 100% correct. Literlally using a paperclip on the cable to make sure we didn’t get anything wrong. A few other parts to this tip. If everyone has a different copy of the docs, throw all of them out and start over. One last bit on this, the docs can’t be online, at least not on the servers you are moving…
  21. We were really good about communicating our progress through the project. This kept almost everyone aware of what was going on, almost. We did have one VP from south America call us just after we turned off the ERP system because he needed to process a hot order. Luckily we were able to power it back up and it didn’t delay us too long, but we also knew if we didn’t bring it back we were covered. We also during the moves we had everyone dial into a conference call and stay muted so if we needed someone we could just ask for them. Also it kept everyone tuned in to what was going on. Tip 3b. Really make sure you are muted when going to the bathroom… different story
  22. Tip 4, remember people are people. They don’t like cold temps, wind, or noise, so if you can turn stuff off good. They also need to be fed regularly and stop for sleep. We did one data center move where we worked 38 hours straight. Everyone was completely useless half way through it. Now we do teams. One team breaks down then sleeps for 6-8 hours while team 2 racks and cables. We also try to do a separate team for testing and troubleshooting but we’re a small team and that rarely happens as well as we would like. All the teams are team members. Many times other departments, contractors or subcontractors will help. Treat them as if they were your own employees. The success of your project depends on them too.
  23. Tip 5, sort of obvious but make sure you have what you need. Also why it is a good practice to do a test move early. Finding a roll of tape, or carts, or screwdrivers tuesday afternoon is easy, Saturday at 3Am not so much… Some of the things we forgot, cable ties, tools, carts, pallet jack, tape, paper (yes including toilet paper)
  24. Go over the process ahead of time. A walkthrough the day before and again an hour before will make sure people don’t get confused when they start putting servers back where they came from. Shutdown and startup order is important. If you try to bring up servers before the domain, or some applications before the database servers, you can cause issues. If there is an order, make sure the people racking and cabling the servers leave them off until they are ready to come up. Make sure everyone understands the port numbering. If the switches go 1-24 on the top row and some people think it is odd on the top and even on the bottom, you will have problems. Same goes for power. We had some admins thing A power was left and B power was on the right, but some cabinets had 4 PDUS’s and some servers got plugged in with the thinking that front was A and back was B and when we did a power test and turned off B power, half of the servers went down. Luckily one of our admins was smart enough to point out that if a server didn’t lose either power supply that you were still wrong since you should have lost one and you plugged both supplies into A
  25. Probably one of the most important slides. A team of two will take 5 minutes to rack a server in the cabinet, 5 minutes per Ethernet cable and 2 minutes per power cable. Our last move because we didn’t want to have to buy new cabinets and network gear the total down time was going to be 67 hours, so we would shutdown at noon on Friday and be back up Monday 8am. Everyone seemed good with this, until 2 weeks before when suddenly that was too long. Luckily we have moved enough that no one questioned the time and instead we just agreed to buy new cabinets and network gear and we were back up Saturday night instead. Once we walked them through the math it was an easy discussion.
  26. Always have a plan B and plan C. When we moved our data center the building we were moving to had these elevators that would pretty regularly stop working and could be down for 6 hours or more while we got the repair guys in. We actually had a backup plan where we would put the servers on carts, and then have teams that would carry them one by one up to the third floor. In our plan we calculated how much more time this would take and had people on standby just in case we needed people to be “runners”.
  27. Some of this may sounds silly but as much as you can do prior to the shutdown you should do. Even things like unwrapping the patch cables and taking the ties off of them can save a lot of time. 30 seconds per cable doesn’t sound like much, but when there are 360 of them that’s almost 3 hours’ worth of time. Interestingly enough labeling the cables ahead of time is a horrible idea. It seems like a really good idea but inevitably they will get mixed up and you will spend much more time trying to find the right cable then if you just label it then.. having the labels printed helps. Also if you can reboot the servers a week early this helps in case you have applied patches and forgotten to reboot. Suddenly problems that have nothing to do with the move will get caught early and you won’t run down a rat hole thinking it has anything to do with the move, when it doesn’t. Develop a priority list ahead of time. When you get to the end and stuff isn’t all working is not when you want someone deciding what’s more important. It’s likely that who ever is making that call is already tired and probably going to make a stupid decision. Avoid that, plan early on what can wait and what needs to be running.
  28. Plan time to test and troubleshoot. We actually built a quick dirty little script that just did a ping of all the servers before the move, and showed what was still up or still down so at a glance we could tell how much was left to do. It was a .bat file (for those that remember those) since we wanted it to run when nothing else was online. It even pinged by IP not hostnames since DNS may not be running. Now we also had a full test plan including placing orders, releasing shipments, printing reports etc. Testing and the corresponding troubleshooting always takes longer than you expect but the more testing you do before, the less problems you will have on Monday. When we finally moved into our “permanent” home the CFO didn’t believe we actually moved because everything was working on Monday.
  29. Finally after you are done, review and recognize what you accomplished. We track every issue that we had and the resolution, how much time it took for each step, which is why we know how many minutes it takes for an Ethernet cable. When we complete our moves we send out a summary email to the company on the amount of stuff we moved and issues we found before we unleashed the users on the systems. One of the things I;ve found is when users know the effort that goes into these moves, the more understanding they are when things go wrong. We actually use Chatter which is sort of an internal facebook app for informal communications and during the move will chat photos and updates to the company. It’s a SAAS application from salesforce so it works even when we are moving our data center…. And it’s paid dividends with our user community. In fact we have even had departments bring us donuts on Monday after the moves to show their appreciation.
  30. One last tip I’ve learned. If you are the manager for the project or team, be on site and involved in the move, not in the way, and don’t try to help, but get coffee, food, snacks, coil up the old cables, sweep the floor, etc.  Let’s be honest, you probably aren’t that much help, but having you there emphasizes how important the project is. Besides, if it goes really badly at least you know you can sleep in late on Monday. You’re probably going to get fired anyway, might as well be rested.