SlideShare a Scribd company logo
CEPH@DeutscheTelekom
A 2+ Years ProductionLiaison
IevgenNelen, GerdPruessmann - Deutsche Telekom AG, DBU Cloud Services,P&I
07.05.2015 2
Speakers
Ievgen Nelen & Gerd Prüßmann
• Cloud Operations Engineer
• Ceph cuttlefish
• Openstack diablo
• @eugene_nelen
• i.nelen@telekom.de
• Head of PlatformEngineering
• CEPH argonaut
• Openstack cactus
• @2digitsLeft
• g.pruessmann@telekom.de
Overview
the business case
07.05.2015 4
Overview
Business Marketplace
• https://portal.telekomcloud.com/
• SaaS Applications from Software Partners (ISVs) and
DT offered to SME customers
• i.e.Saperion, Sage,PadCloud, Teamlike, Fastbill, Imeet,
Weclapp, SilverERP, Teamdisk ...
• Complements othercloud offerings from Deutsche Telekom (Enterprise cloud from T-Systems, Cisco
Intercloud, Mediencenter etc.)
• IaaS platform based only on Open Source technologies like OpenStack, CEPH and Linux
• Project started in 2012 with OS Essex, CEPH in production since 3/2013 (bobtail)
07.05.2015–Strictlyconfidential,Confidential,Internal– Author/Presentationtitle 5
Overview
why opensource?Why ceph?
• no vendorlock in!
• easier to change and adapt newtechnologies / concepts - more independent from vendor priorities
• low cost of ownership and operation, utilizing commodity hardware and Open Source
• no license fees - but professional support
• modular and horizontally scalable platform
• automation and flexibility allow for faster deployment cycles, than in traditional hosting
• control overopen sourcecode - faster bug fixing and feature delivery
DETAILS
BASICS
07.05.2015 7
DETAILS
ceph basics
• Bobtail > Cuttlefish > Dumpling > Firefly (0.80.9)
• Multiple CEPH clusters
• overall raw capacity 4.8 PB
• OneS3and cluster(~810TB raw capacity - 15 storage nodes - 3 MONs)
• multiple smaller RBD clusters for REF, LIFE and DEV
• S3 storage for cloud native apps (Teamdisk, Teamlike) and for backups(i.eRBD)
• RBD for persistent volumes / data via Openstack Cinder(i.e.DB volumes)
07.05.2015 8
Details
ceph basics
DETAILS
Hardware
07.05.2015 10
DETAILS
hardware
• Supermicro
2x Intel Xeon E5-2640 v2
@ 2.00GHz
64GB RAM
7x SSDs
18x HDDs
• Seagate Terascale
ST4000NC000
4TB HDDs
• LSI MegaRAID SAS 9271-8i
• 18 OSDs per node: RAID1 with 2
SSD for /, 3 RAID0 with 1 SSD for
journals, 18 raid0 with 1 hdd for OSD
• 2x10Gb network adapters
07.05.2015 11
DETAILS
hardware
• Supermicro
1x Intel Xeon E5-2650L
@ 1.80GHz
64GB RAM
36x HDDs
• Seagate Barracuda
ST3000DM001
3TB HDDs
• LSI MegaRAID
SAS 9271-8i
• 10 OSDs per node: RAID1 for /, 10
RAID0 with 1 hdd for journals, 10
raid0 with 2 hdd for OSD
• 2x10Gb network adapters
Details
Configuration& deployment
07.05.2015 13
details
configuration& deployment
• Razor
• Puppet
• https://github.com/TelekomCloud/pup
pet-ceph
• dm-crypt disk encryption
• osd location
• XFS
• 3 replica
• OMD/Check_mk http://omdistro.org/
• ceph-dash https://github.com/TelekomCloud/ceph-dash for dashboard
and API
• check_mkplugins (Cluster health, OSDs,S3)
Details
performancetuning
07.05.2015 15
details
performance tuning
• Problem - Low IOPS,IOPSdrops
• fio
• Enable RAID0 Writeback cache
• Useseparate disks for cephjournals (better use SSDs –scale out project)
• Problem - Recovery/Backfilling consumes a lot of cpu, decreaseof performance
• osd_recovery_max_active 1 numberofactiverecoveryrequestsperOSDatonetime
• osd_max_backfills 1 maximumnumberofbackfills allowedtoorfromasingleOSD
07.05.2015 16
details
performance Tests – current hardware / IO
07.05.2015 17
details
performance Tests – curr.Hardware/Bandwidth
lessonslearned
07.05.2015 19
lessonslearned
operational experience
• Chose your hardware well !!
• I,e. RAID and hard disks -> enterprise gradedisks (desktop HDs aremissing important features like TLER/ERC)
• CPU/RAM planning: calculate 1GHz CPU powerand 2GB RAM persingle OSD
• pick nodes with lowstorage capacity density for smaller clusters
• At least 5 nodes for a 3 replica cluster (i.e.for PoC, testing and development purposes)
• Cluster configuration “adjustments”:
• increasing PG num> impact on cluster becauseofmassive data migration
• Rolling software updates / upgrades workedperfectly
• CEPH: has a character– buthighly reliable - neverlost data
07.05.2015 20
lessonslearned
operational experience
• Failed / ”Slow”disks
• Inconsistent PGs
• Incomplete PGs
• RBD pool configured with min_size=2
• Blocks IO operations to the pool / cluster
• fixed in Hammer (allows PG replication while replica level below min_size pool/OSD)
/var/log/syslog.log
Apr 12 04:59:47 cephosd5 kernel: [12473860.669262] sd 6:2:10:0: [sdk]
Unhandled error code
root@cephosd5:/var/log# mount | grep sdk /dev/mapper/cephosd5-journal-sdk on
/var/lib/ceph/osd/journal-disk9
root@cephosd5:/var/log# grep journal-disk9 /etc/ceph/ceph.conf osd journal =
/var/lib/ceph/osd/journal-disk9/osd.151-journal
/var/log/ceph/ceph-osd.151.log.1.gz
2015-04-12 04:59:47.891284 7f8a10c76700 -1 journal FileJournal::do_ write:
pwrite(fd=25, hbp.length=4096) failed :(5) Input/output error
07.05.2015 21
lessonslearned
operational experience
5/7/2015 22
lessonslearned
incompletePGs- what happened?
OSDnode
OSD
Journal
pg pg
OSD
Journal
OSDnode
OSD
Journal
pg pg
OSD
Journal
OSDnode
OSD
Journal
pg pg
OSD
Journal
pg
glimpseof the future
07.05.2015 24
Overview
SCALE OUT Project
+40%
Current overall capacity:
 ~60 storage nodes
 5,4 PB Storage Gross
 ~0,5 PB S3 Storage Net
Planned Capacity for 2015:
 ~90 storage nodes
 7,5 PB Storage Gross
 ~1,5 PB S3 Storage Net
07.05.2015 25
Future setup
scale out project
• 2 physically separated rooms
• Data distributed accordingthe rule
• not more than 2 replicas in - oneroom not more than 1
replica in onerack
07.05.2015 26
Future setup
New crushmap rules
rule myrule {
ruleset 3
type replicated
min_size 1
max_size 10
step take default
step choose firstn 2 type room
step chooseleaf firstn 2 type rack
step emit
}
crushtool -i real7 --test --show-
statistics --rule 3 --min-x 1 --
max-x 1024 --num-rep 3 --show-
mappings
CRUSH rule 3 x 1 [12,19,15]
CRUSH rule 3 x 2 [14,16,13]
CRUSH rule 3 x 3 [3,0,7]
…
Listing 1: crushmap rule Listing 2: Simulate 1024Objects
07.05.2015 27
Future setup
dreams
• cachetiering
• make use of shiny newSSDs in a hot zone / cachepool
• SSD pools
• Openstack live migration for VMs(boot from rbd volume)
Q & a
07.05.2015 29
QUESTION & ANSWERS
• Ievgen Nelen
• @eugene_nelen
• i.nelen@telekom.de
• Gerd Prüßmann
• @2digitsLeft
• g.pruessmann@telekom.de

More Related Content

What's hot

What's hot (17)

Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud Stroage
 
Ceph Day Beijing- Ceph Community Update
Ceph Day Beijing- Ceph Community UpdateCeph Day Beijing- Ceph Community Update
Ceph Day Beijing- Ceph Community Update
 
openSUSE storage workshop 2016
openSUSE storage workshop 2016openSUSE storage workshop 2016
openSUSE storage workshop 2016
 
Ceph Day San Jose - From Zero to Ceph in One Minute
Ceph Day San Jose - From Zero to Ceph in One Minute Ceph Day San Jose - From Zero to Ceph in One Minute
Ceph Day San Jose - From Zero to Ceph in One Minute
 
Ceph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To EnterpriseCeph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To Enterprise
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash Storage
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and Ceph
 
SLE12 SP2 : High Availability et Geo Cluster
SLE12 SP2 : High Availability et Geo ClusterSLE12 SP2 : High Availability et Geo Cluster
SLE12 SP2 : High Availability et Geo Cluster
 
Ceph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFS
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Ceph optimized Storage / Global HW solutions for SDS, David Alvarez
Ceph optimized Storage / Global HW solutions for SDS, David AlvarezCeph optimized Storage / Global HW solutions for SDS, David Alvarez
Ceph optimized Storage / Global HW solutions for SDS, David Alvarez
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoC
 

Viewers also liked

2nd Qt 2016 Loveland-Northern Colorado RE Market Report Cover
2nd Qt 2016 Loveland-Northern Colorado RE Market Report Cover2nd Qt 2016 Loveland-Northern Colorado RE Market Report Cover
2nd Qt 2016 Loveland-Northern Colorado RE Market Report Cover
Michael Masciola
 
The Digital Garage Certification
The Digital Garage CertificationThe Digital Garage Certification
The Digital Garage Certification
Javier Senra
 
Mempraktikan suatu Program Aplikasi
Mempraktikan suatu Program AplikasiMempraktikan suatu Program Aplikasi
Mempraktikan suatu Program Aplikasi
Erianaretnoputri
 
Final Report Literature Review
Final Report Literature ReviewFinal Report Literature Review
Final Report Literature Review
Jarett Pederson
 

Viewers also liked (13)

Case Solution for More Vino Ltd. - Expansion Proposal
Case Solution for More Vino Ltd. - Expansion ProposalCase Solution for More Vino Ltd. - Expansion Proposal
Case Solution for More Vino Ltd. - Expansion Proposal
 
LSG
LSGLSG
LSG
 
casiii_poster_final
casiii_poster_finalcasiii_poster_final
casiii_poster_final
 
Case Solution for Mountainarious Sporting Co.
Case Solution for Mountainarious Sporting Co.Case Solution for Mountainarious Sporting Co.
Case Solution for Mountainarious Sporting Co.
 
2nd Qt 2016 Loveland-Northern Colorado RE Market Report Cover
2nd Qt 2016 Loveland-Northern Colorado RE Market Report Cover2nd Qt 2016 Loveland-Northern Colorado RE Market Report Cover
2nd Qt 2016 Loveland-Northern Colorado RE Market Report Cover
 
Evaluación primer quimestre (1)
Evaluación primer quimestre  (1)Evaluación primer quimestre  (1)
Evaluación primer quimestre (1)
 
Executive Summary Summer 2016
Executive Summary Summer 2016Executive Summary Summer 2016
Executive Summary Summer 2016
 
The Digital Garage Certification
The Digital Garage CertificationThe Digital Garage Certification
The Digital Garage Certification
 
Mempraktikan suatu Program Aplikasi
Mempraktikan suatu Program AplikasiMempraktikan suatu Program Aplikasi
Mempraktikan suatu Program Aplikasi
 
Kış ve kilo problemleri
Kış ve kilo problemleriKış ve kilo problemleri
Kış ve kilo problemleri
 
برنامج محاسبة التكاليف والتصنيع
برنامج محاسبة التكاليف والتصنيعبرنامج محاسبة التكاليف والتصنيع
برنامج محاسبة التكاليف والتصنيع
 
Final Report Literature Review
Final Report Literature ReviewFinal Report Literature Review
Final Report Literature Review
 
Cert Aviana Melissa
Cert Aviana MelissaCert Aviana Melissa
Cert Aviana Melissa
 

Similar to Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red_Hat_Storage
 

Similar to Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison (20)

Xap memory xtend-tutorial-2014
Xap memory xtend-tutorial-2014Xap memory xtend-tutorial-2014
Xap memory xtend-tutorial-2014
 
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco ObinuAzure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
2013 09-02 senzations-bimschas-part4-setting-up-your-own-testbed
2013 09-02 senzations-bimschas-part4-setting-up-your-own-testbed2013 09-02 senzations-bimschas-part4-setting-up-your-own-testbed
2013 09-02 senzations-bimschas-part4-setting-up-your-own-testbed
 
IaaS for DBAs in Azure
IaaS for DBAs in AzureIaaS for DBAs in Azure
IaaS for DBAs in Azure
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
 
Running E-Business Suite Database on Oracle Database Appliance
Running E-Business Suite Database on Oracle Database ApplianceRunning E-Business Suite Database on Oracle Database Appliance
Running E-Business Suite Database on Oracle Database Appliance
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
Cisco connect montreal 2018 compute v final
Cisco connect montreal 2018   compute v finalCisco connect montreal 2018   compute v final
Cisco connect montreal 2018 compute v final
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Are your ready for in memory applications?
Are your ready for in memory applications?Are your ready for in memory applications?
Are your ready for in memory applications?
 
Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...
Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...
Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
The Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux KernelThe Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux Kernel
 

Recently uploaded

Recently uploaded (20)

The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Ceph Day Berlin: CEPH@DeutscheTelekom - a 2+ years production liaison

  • 1. CEPH@DeutscheTelekom A 2+ Years ProductionLiaison IevgenNelen, GerdPruessmann - Deutsche Telekom AG, DBU Cloud Services,P&I
  • 2. 07.05.2015 2 Speakers Ievgen Nelen & Gerd Prüßmann • Cloud Operations Engineer • Ceph cuttlefish • Openstack diablo • @eugene_nelen • i.nelen@telekom.de • Head of PlatformEngineering • CEPH argonaut • Openstack cactus • @2digitsLeft • g.pruessmann@telekom.de
  • 4. 07.05.2015 4 Overview Business Marketplace • https://portal.telekomcloud.com/ • SaaS Applications from Software Partners (ISVs) and DT offered to SME customers • i.e.Saperion, Sage,PadCloud, Teamlike, Fastbill, Imeet, Weclapp, SilverERP, Teamdisk ... • Complements othercloud offerings from Deutsche Telekom (Enterprise cloud from T-Systems, Cisco Intercloud, Mediencenter etc.) • IaaS platform based only on Open Source technologies like OpenStack, CEPH and Linux • Project started in 2012 with OS Essex, CEPH in production since 3/2013 (bobtail)
  • 5. 07.05.2015–Strictlyconfidential,Confidential,Internal– Author/Presentationtitle 5 Overview why opensource?Why ceph? • no vendorlock in! • easier to change and adapt newtechnologies / concepts - more independent from vendor priorities • low cost of ownership and operation, utilizing commodity hardware and Open Source • no license fees - but professional support • modular and horizontally scalable platform • automation and flexibility allow for faster deployment cycles, than in traditional hosting • control overopen sourcecode - faster bug fixing and feature delivery
  • 7. 07.05.2015 7 DETAILS ceph basics • Bobtail > Cuttlefish > Dumpling > Firefly (0.80.9) • Multiple CEPH clusters • overall raw capacity 4.8 PB • OneS3and cluster(~810TB raw capacity - 15 storage nodes - 3 MONs) • multiple smaller RBD clusters for REF, LIFE and DEV • S3 storage for cloud native apps (Teamdisk, Teamlike) and for backups(i.eRBD) • RBD for persistent volumes / data via Openstack Cinder(i.e.DB volumes)
  • 11. • Supermicro 2x Intel Xeon E5-2640 v2 @ 2.00GHz 64GB RAM 7x SSDs 18x HDDs • Seagate Terascale ST4000NC000 4TB HDDs • LSI MegaRAID SAS 9271-8i • 18 OSDs per node: RAID1 with 2 SSD for /, 3 RAID0 with 1 SSD for journals, 18 raid0 with 1 hdd for OSD • 2x10Gb network adapters 07.05.2015 11 DETAILS hardware • Supermicro 1x Intel Xeon E5-2650L @ 1.80GHz 64GB RAM 36x HDDs • Seagate Barracuda ST3000DM001 3TB HDDs • LSI MegaRAID SAS 9271-8i • 10 OSDs per node: RAID1 for /, 10 RAID0 with 1 hdd for journals, 10 raid0 with 2 hdd for OSD • 2x10Gb network adapters
  • 13. 07.05.2015 13 details configuration& deployment • Razor • Puppet • https://github.com/TelekomCloud/pup pet-ceph • dm-crypt disk encryption • osd location • XFS • 3 replica • OMD/Check_mk http://omdistro.org/ • ceph-dash https://github.com/TelekomCloud/ceph-dash for dashboard and API • check_mkplugins (Cluster health, OSDs,S3)
  • 15. 07.05.2015 15 details performance tuning • Problem - Low IOPS,IOPSdrops • fio • Enable RAID0 Writeback cache • Useseparate disks for cephjournals (better use SSDs –scale out project) • Problem - Recovery/Backfilling consumes a lot of cpu, decreaseof performance • osd_recovery_max_active 1 numberofactiverecoveryrequestsperOSDatonetime • osd_max_backfills 1 maximumnumberofbackfills allowedtoorfromasingleOSD
  • 16. 07.05.2015 16 details performance Tests – current hardware / IO
  • 17. 07.05.2015 17 details performance Tests – curr.Hardware/Bandwidth
  • 19. 07.05.2015 19 lessonslearned operational experience • Chose your hardware well !! • I,e. RAID and hard disks -> enterprise gradedisks (desktop HDs aremissing important features like TLER/ERC) • CPU/RAM planning: calculate 1GHz CPU powerand 2GB RAM persingle OSD • pick nodes with lowstorage capacity density for smaller clusters • At least 5 nodes for a 3 replica cluster (i.e.for PoC, testing and development purposes) • Cluster configuration “adjustments”: • increasing PG num> impact on cluster becauseofmassive data migration • Rolling software updates / upgrades workedperfectly • CEPH: has a character– buthighly reliable - neverlost data
  • 20. 07.05.2015 20 lessonslearned operational experience • Failed / ”Slow”disks • Inconsistent PGs • Incomplete PGs • RBD pool configured with min_size=2 • Blocks IO operations to the pool / cluster • fixed in Hammer (allows PG replication while replica level below min_size pool/OSD)
  • 21. /var/log/syslog.log Apr 12 04:59:47 cephosd5 kernel: [12473860.669262] sd 6:2:10:0: [sdk] Unhandled error code root@cephosd5:/var/log# mount | grep sdk /dev/mapper/cephosd5-journal-sdk on /var/lib/ceph/osd/journal-disk9 root@cephosd5:/var/log# grep journal-disk9 /etc/ceph/ceph.conf osd journal = /var/lib/ceph/osd/journal-disk9/osd.151-journal /var/log/ceph/ceph-osd.151.log.1.gz 2015-04-12 04:59:47.891284 7f8a10c76700 -1 journal FileJournal::do_ write: pwrite(fd=25, hbp.length=4096) failed :(5) Input/output error 07.05.2015 21 lessonslearned operational experience
  • 22. 5/7/2015 22 lessonslearned incompletePGs- what happened? OSDnode OSD Journal pg pg OSD Journal OSDnode OSD Journal pg pg OSD Journal OSDnode OSD Journal pg pg OSD Journal pg
  • 24. 07.05.2015 24 Overview SCALE OUT Project +40% Current overall capacity:  ~60 storage nodes  5,4 PB Storage Gross  ~0,5 PB S3 Storage Net Planned Capacity for 2015:  ~90 storage nodes  7,5 PB Storage Gross  ~1,5 PB S3 Storage Net
  • 25. 07.05.2015 25 Future setup scale out project • 2 physically separated rooms • Data distributed accordingthe rule • not more than 2 replicas in - oneroom not more than 1 replica in onerack
  • 26. 07.05.2015 26 Future setup New crushmap rules rule myrule { ruleset 3 type replicated min_size 1 max_size 10 step take default step choose firstn 2 type room step chooseleaf firstn 2 type rack step emit } crushtool -i real7 --test --show- statistics --rule 3 --min-x 1 -- max-x 1024 --num-rep 3 --show- mappings CRUSH rule 3 x 1 [12,19,15] CRUSH rule 3 x 2 [14,16,13] CRUSH rule 3 x 3 [3,0,7] … Listing 1: crushmap rule Listing 2: Simulate 1024Objects
  • 27. 07.05.2015 27 Future setup dreams • cachetiering • make use of shiny newSSDs in a hot zone / cachepool • SSD pools • Openstack live migration for VMs(boot from rbd volume)
  • 28. Q & a
  • 29. 07.05.2015 29 QUESTION & ANSWERS • Ievgen Nelen • @eugene_nelen • i.nelen@telekom.de • Gerd Prüßmann • @2digitsLeft • g.pruessmann@telekom.de