Submit Search
Upload
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public Cloud Scale
•
1 like
•
438 views
D
donaghmccabe
Follow
Hewlett Packard Helion Public Cloud
Read less
Read more
Software
Report
Share
Report
Share
1 of 31
Download now
Download to read offline
Recommended
Syncing with-upstream
Syncing with-upstream
Darragh Bailey
Open source: Top issues in the top enterprise packages
Open source: Top issues in the top enterprise packages
Rogue Wave Software
Open source applied: Real-world uses
Open source applied: Real-world uses
Rogue Wave Software
Spring Data (GemFire) Overview
Spring Data (GemFire) Overview
John Blum
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Cedric CARBONE
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
DataWorks Summit
Application High Availability and Upgrades Using Oracle GoldenGate
Application High Availability and Upgrades Using Oracle GoldenGate
Shane Borden
Webinar slides: Replication Topology Changes for MySQL and MariaDB
Webinar slides: Replication Topology Changes for MySQL and MariaDB
Severalnines
Recommended
Syncing with-upstream
Syncing with-upstream
Darragh Bailey
Open source: Top issues in the top enterprise packages
Open source: Top issues in the top enterprise packages
Rogue Wave Software
Open source applied: Real-world uses
Open source applied: Real-world uses
Rogue Wave Software
Spring Data (GemFire) Overview
Spring Data (GemFire) Overview
John Blum
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Cedric CARBONE
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
DataWorks Summit
Application High Availability and Upgrades Using Oracle GoldenGate
Application High Availability and Upgrades Using Oracle GoldenGate
Shane Borden
Webinar slides: Replication Topology Changes for MySQL and MariaDB
Webinar slides: Replication Topology Changes for MySQL and MariaDB
Severalnines
HPLN Meet Git - Public
HPLN Meet Git - Public
Liran Tal
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
DataWorks Summit
What Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOps
Matt Ray
Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013
Matt Tesauro
Getting to Walk with DevOps
Getting to Walk with DevOps
Eklove Mohan
Power of the AWR Warehouse
Power of the AWR Warehouse
Kellyn Pot'Vin-Gorman
Getting Started with Apache Geode
Getting Started with Apache Geode
John Blum
Upgrade/Migrate to Oracle 12c: Live and Uncensored!
Upgrade/Migrate to Oracle 12c: Live and Uncensored!
Guatemala User Group
DevOps tools for winning agility
DevOps tools for winning agility
Kellyn Pot'Vin-Gorman
Preparing for DevOps
Preparing for DevOps
Eklove Mohan
Realtime analytics with_hadoop
Realtime analytics with_hadoop
Edgar Alejandro Villegas
High Availability with MariaDB Enterprise
High Availability with MariaDB Enterprise
MariaDB Corporation
2.Oracle’S High Availability Vision
2.Oracle’S High Availability Vision
Ermando
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Kellyn Pot'Vin-Gorman
Leveraging HP Performance Center
Leveraging HP Performance Center
Martin Spier
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
How to Upgrade Hundreds or Thousands of Databases
How to Upgrade Hundreds or Thousands of Databases
Guatemala User Group
Apache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance Benchmarks
Hortonworks
OPNFV: From the Trenches
OPNFV: From the Trenches
OPNFV
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
DataWorks Summit
2015 UJUG, Servlet 4.0 portion
2015 UJUG, Servlet 4.0 portion
mnriem
[db tech showcase Tokyo 2016] E34: Oracle SE - RAC, HA and Standby are Still ...
[db tech showcase Tokyo 2016] E34: Oracle SE - RAC, HA and Standby are Still ...
Insight Technology, Inc.
More Related Content
What's hot
HPLN Meet Git - Public
HPLN Meet Git - Public
Liran Tal
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
DataWorks Summit
What Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOps
Matt Ray
Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013
Matt Tesauro
Getting to Walk with DevOps
Getting to Walk with DevOps
Eklove Mohan
Power of the AWR Warehouse
Power of the AWR Warehouse
Kellyn Pot'Vin-Gorman
Getting Started with Apache Geode
Getting Started with Apache Geode
John Blum
Upgrade/Migrate to Oracle 12c: Live and Uncensored!
Upgrade/Migrate to Oracle 12c: Live and Uncensored!
Guatemala User Group
DevOps tools for winning agility
DevOps tools for winning agility
Kellyn Pot'Vin-Gorman
Preparing for DevOps
Preparing for DevOps
Eklove Mohan
Realtime analytics with_hadoop
Realtime analytics with_hadoop
Edgar Alejandro Villegas
High Availability with MariaDB Enterprise
High Availability with MariaDB Enterprise
MariaDB Corporation
2.Oracle’S High Availability Vision
2.Oracle’S High Availability Vision
Ermando
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Kellyn Pot'Vin-Gorman
Leveraging HP Performance Center
Leveraging HP Performance Center
Martin Spier
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
How to Upgrade Hundreds or Thousands of Databases
How to Upgrade Hundreds or Thousands of Databases
Guatemala User Group
Apache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance Benchmarks
Hortonworks
OPNFV: From the Trenches
OPNFV: From the Trenches
OPNFV
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
DataWorks Summit
What's hot
(20)
HPLN Meet Git - Public
HPLN Meet Git - Public
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
What Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOps
Testing at-cloud-speed sans-app-sec-austin-2013
Testing at-cloud-speed sans-app-sec-austin-2013
Getting to Walk with DevOps
Getting to Walk with DevOps
Power of the AWR Warehouse
Power of the AWR Warehouse
Getting Started with Apache Geode
Getting Started with Apache Geode
Upgrade/Migrate to Oracle 12c: Live and Uncensored!
Upgrade/Migrate to Oracle 12c: Live and Uncensored!
DevOps tools for winning agility
DevOps tools for winning agility
Preparing for DevOps
Preparing for DevOps
Realtime analytics with_hadoop
Realtime analytics with_hadoop
High Availability with MariaDB Enterprise
High Availability with MariaDB Enterprise
2.Oracle’S High Availability Vision
2.Oracle’S High Availability Vision
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Leveraging HP Performance Center
Leveraging HP Performance Center
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
How to Upgrade Hundreds or Thousands of Databases
How to Upgrade Hundreds or Thousands of Databases
Apache Hive 0.13 Performance Benchmarks
Apache Hive 0.13 Performance Benchmarks
OPNFV: From the Trenches
OPNFV: From the Trenches
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
Similar to Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public Cloud Scale
2015 UJUG, Servlet 4.0 portion
2015 UJUG, Servlet 4.0 portion
mnriem
[db tech showcase Tokyo 2016] E34: Oracle SE - RAC, HA and Standby are Still ...
[db tech showcase Tokyo 2016] E34: Oracle SE - RAC, HA and Standby are Still ...
Insight Technology, Inc.
HP Helion Webinar #4 - Open stack the magic pill
HP Helion Webinar #4 - Open stack the magic pill
BeMyApp
L2 and L3 agent restructure
L2 and L3 agent restructure
Rossella Sblendido
Removing Barriers Between Dev and Ops
Removing Barriers Between Dev and Ops
VMware Tanzu
HADR Best Practices (High Availability Disaster Recovery)
HADR Best Practices (High Availability Disaster Recovery)
Rocket Software
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Gonzalo Marcos Ansoain
MySQL Manchester TT - Performance Tuning
MySQL Manchester TT - Performance Tuning
Mark Swarbrick
Change management in hybrid landscapes
Change management in hybrid landscapes
Chris Kernaghan
How to Upgrade Hundreds or Thousands of Databases
How to Upgrade Hundreds or Thousands of Databases
DLT Solutions
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
OpenStack Korea Community
PCF Cloud-Native Workshop Slides
PCF Cloud-Native Workshop Slides
VMware Tanzu
Skytap parasoft webinar new years resolution- accelerate sdlc
Skytap parasoft webinar new years resolution- accelerate sdlc
Skytap Cloud
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
DataWorks Summit/Hadoop Summit
Heat monasca auto scale
Heat monasca auto scale
Kanagaraj M
Removing Barriers Between Dev and Ops
Removing Barriers Between Dev and Ops
cornelia davis
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
In-Memory Computing Summit
PayPal's Private Cloud @ Scale
PayPal's Private Cloud @ Scale
PayPal
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
In-Memory Computing Summit
Online spanish meetup #2
Online spanish meetup #2
Alexandra N. Martinez
Similar to Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public Cloud Scale
(20)
2015 UJUG, Servlet 4.0 portion
2015 UJUG, Servlet 4.0 portion
[db tech showcase Tokyo 2016] E34: Oracle SE - RAC, HA and Standby are Still ...
[db tech showcase Tokyo 2016] E34: Oracle SE - RAC, HA and Standby are Still ...
HP Helion Webinar #4 - Open stack the magic pill
HP Helion Webinar #4 - Open stack the magic pill
L2 and L3 agent restructure
L2 and L3 agent restructure
Removing Barriers Between Dev and Ops
Removing Barriers Between Dev and Ops
HADR Best Practices (High Availability Disaster Recovery)
HADR Best Practices (High Availability Disaster Recovery)
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
MySQL Manchester TT - Performance Tuning
MySQL Manchester TT - Performance Tuning
Change management in hybrid landscapes
Change management in hybrid landscapes
How to Upgrade Hundreds or Thousands of Databases
How to Upgrade Hundreds or Thousands of Databases
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
PCF Cloud-Native Workshop Slides
PCF Cloud-Native Workshop Slides
Skytap parasoft webinar new years resolution- accelerate sdlc
Skytap parasoft webinar new years resolution- accelerate sdlc
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
Heat monasca auto scale
Heat monasca auto scale
Removing Barriers Between Dev and Ops
Removing Barriers Between Dev and Ops
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
PayPal's Private Cloud @ Scale
PayPal's Private Cloud @ Scale
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
Online spanish meetup #2
Online spanish meetup #2
Recently uploaded
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
Ortus Solutions, Corp
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Christina Lin
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
BradBedford3
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
Hanief Utama
Asset Management Software - Infographic
Asset Management Software - Infographic
Hr365.us smith
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
Christina Lin
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
Power Karaoke
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
Tier1 app
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
OnePlan Solutions
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
OPEN KNOWLEDGE GmbH
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
stazi3110
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
kotipi9215
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
VICTOR MAESTRE RAMIREZ
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
MyIntelliSource, Inc.
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
soniya singh
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
MyIntelliSource, Inc.
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
Wave PLM
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
umasea
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
Sujith Sukumaran
Recently uploaded
(20)
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
Asset Management Software - Infographic
Asset Management Software - Infographic
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public Cloud Scale
1.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The OpenStack TM attribution statement should used: The OpenStack wordmark and the Square O Design, together or part, are trademarks or registered trademarks of OpenStack Foundation in the United States and other countries, and are used with the OpenStack Foundation’s permission. Vancouver OpenStack® Summit
2.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Maintaining and Operating Swift at Public Cloud Scale Lorcan Browne Donagh McCabe May 18, 2015
3.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Agenda Agenda • Swift in Helion Public Cloud • Monitoring • Swift Runbook • Deployments/Operations • Q & A
4.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Swift in HP Helion Public Cloud
5.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5 HP Cloud Swift Ecosystem Service Monitoring/ Customer Issues Deployments/ Operations Operations Runbook Swift Data Center Operators Tech Ops Network Operations CenterSwift Service Team Openstack Core Team
6.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6 HP Helion Public Cloud Two Data Centers • Over 3 years operational • 18 PB of raw storage • 3.5 billion objects • 130 Proxy Servers • ~700 Storage Nodes • ~8,000 Storage Drives Features • 3 Replicas • 3 “Availability” Zones • Single storage policy • Upstream Swift, except for: – Content Delivery Network – Support legacy SWAUTH accounts Swift at Scale
7.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7 Server Racking Example 1xRack: 1620 “TB” raw replica count: 3 usable: 80% 3xRack: ~1 PB usable Proxy- Account-Container 6x HP DL360pGen9 4x HP 800 GB SSD Swift Object-Server 18x HP DL380 G9 15x HP 6TB SAS 7.2K
8.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 Data Center – Failure Zones LB LB LB AZ1 AZ2 AZ3
9.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9 37% 55% 8% 0% Objects – Number and Size • Most object are small • Bulk of objects are 1k to 64k • A tiny fraction % (0.01) of objects are very large. < 1KB > 100MB 100KB – 100MB 1KB- 100KB
10.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10 52% 47% 1% small objects large objects containers Objects – Space usage • Half of capacity used by 0.01% of objects • Account and container databases are ~1% of object size
11.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11 0 5 10 15 Millions PUTs (4m) GETs (5.8m) DELETEs (1.5m) Other (10m) Millions of Operations per day Normalized to 1PB of user data
12.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12 83% 17% PUTs GETs PUT/GET Size – per 1PB of user data • Normalized to 1PB of user data • PUTs: 2.5 TB per day • GETs: 508 GB per day
13.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Monitoring Service Monitoring Monitoring/Alerting
14.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14 Uptime/latency Monitor swift-uptime-mon Over several cycles we will visit every server in the system • 60 seconds between cycles • ~100 Requests: – GET/PUT/DELETE object requests – GET /healthcheck requests • Measures and logs: – “Soft” failure – any failure – “Hard” failure – is failed even after retry – Latency (average/max in cycle)
15.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15 External view of uptime – 2 years of Pingdom Rounded! Actually 99.998%
16.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16 Smoke Tests Jenkins Job • Emphasis on features that require external support (specifically Keystone). Runs in a few minutes. • Not regression or functional testing – we cover that elsewhere in the development cycle • Runs hourly and more frequently prior to, during and after software deploys
17.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17 Monitoring/Alerts Obvious • swift-recon – async_pending • Hardware: – Drive status – NIC status • Services running? – PIDs still there? – Responds to /healthcheck? • Keystone validating tokens? Less obvious • Hardware – NIC speeds – I/O wait times (next slide) – Firmware versions • Replication time • SSL certs approaching end of life? • Numbers of file descriptors • NTP operating? • Connectivity to 11211 (memcached) • DNS failures
18.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18 From collectl toolkit – histogram of number of requests in time range buckets Drive I/O wait times % disk-anal.pl -d Disk: sda Wait: 62804 8 1 0 0 0 0 0 0 0 0 Disk: sdb Wait: 54901 886 138 14 70 52 1 0 15 0 0 ... Disk: sdk Time: 63410 9 11 0 0 0 0 0 0 0 0
19.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Viewing monitoring/alert data
20.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20 Dashboard – for metrics • Trend spotting • 24hours and 7 days • Uses Public Cloud analytics pipeline • Collectd/rrdtool • Vertica database
21.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Swift Runbook
22.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22 Runbook Swift Runbook CMDB Basic systems checks Swift operations Monitoring results explanation Log interpretation How-to-fix procedures Data Center Operators Tech Ops Network Operations Center • Essential for scale and continued development • Populated by service team • Consumed by NOC/Ops teams • Continuously updated • Automated where possible
23.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23 Case Study - Unrecoverable Read Errors Background: • Data, once written, cannot later be read (URE) • As seen in Swift: – In objects. Automatically renames and gets object-replicator to create new copy – In filesystem metadata. swift-drive-audit scans kern.log for evidence Issue: • NOC were spending a significant time fixing problems manually • Warnings were not going away immediately after the fix Action: • Improve efficiency and reports from swift-drive-audit • Automate repair • Revalidate “problem” sectors – clears old alarms
24.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Deployment/Operating
25.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25 Making Software/Configuration Changes Development/Test systems QA production mimic system Production system
26.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26 Deploying Changes To Production Start Deploy to single AZ System operating ? Revert back to original code More AZs ? End Deploy No No Yes • System Pre-Check: • Smoke Tests • Icinga • Dashboard • Check before, during and after deploy • Currently using chef infrastructure • 1 availability zone at a time • 2/3rd of system always in a usable state • Some deploys require rolling restarts • Reload (not restart) • Limit to ~10 servers at a time System pre- check Yes
27.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.27 Managing the Ring CSV file swift-ring-builder Package rings into debian package Deploy rings to /var/cache/swift Match checksum? Copy rings to /etc/swift Deploy fails NoYes # Ring parameters acc_conf,15,3,24 con_conf,15,3,24 obj_conf,15,3,24 # IP, ZONE, TYPE *10.184.9.123,1,SE1170s_3 *10.184.9.124,1,SE1170s_3 *10.184.10.1,1,SE1170s_3 *10.184.10.2,1,SE1170s_3 Diff CSV-builder: generate add/remove commands Deploy checksum to nodes Add checksum to system config
28.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.28 Deploying Racks – Sequencing Ring Change • Series of ring changes often required: • Swift Dispersion Report – checks a sample of containers and objects. Each replica on canonical location? • Replication Time – time increases dramatically after a ring deploy due to partitions being shifted between devices • We don’t change rings for drive/server failures • Swift proxies are not in the ring so much easier to add or remove System Pre-check Replication time? Wait Dispersion report? Active NO 100% Deploy Ring Changes Wait More Ring Changes? No Yes >3 copies End
29.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.29 Rack deployments over one month Replication Time
30.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30 Operating Swift - Summary Lessons • Make procedures repeatable • Monitor everything • Keep on top of problems and faults • However, don’t panic when async_pending is high Day to Day • General break-fix • UREs (swift-drive-audit) • Reviewing system state • User queries
31.
© Copyright 2015
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The OpenStack TM attribution statement should used: The OpenStack wordmark and the Square O Design, together or part, are trademarks or registered trademarks of OpenStack Foundation in the United States and other countries, and are used with the OpenStack Foundation’s permission. Q & A
Download now