SlideShare a Scribd company logo
1 of 38
Download to read offline
SF BAY AREA CEPH
USERS GROUP

INAUGURAL MEETUP

Thursday, January 16, 14
AGENDA
Intro to Ceph
Ceph Networking
Public Topologies
Cluster Topologies
Network Hardware

2

Thursday, January 16, 14
THE FORECAST

By 2020
over 39 ZB
of data will
be stored.
1.5 ZB are
stored today.

3
THE PROBLEM

Growth of data

 Existing systems don’t
scale

IT Storage Budget

 Increasing cost and
complexity
2010

4

Thursday, January 16, 14

2020

 Need to invest in new
platforms ahead of time
THE SOLUTION

PAST: SCALE UP

FUTURE: SCALE OUT

5

Thursday, January 16, 14
CEPH
Thursday, January 16, 14
INTRO TO CEPH
 Distributed storage system
 Horizontally scalable
 No single point of failure
 Self healing and self managing
 Runs on commodity hardware
 GPLv2 License

7

Thursday, January 16, 14
ARCHITECTURE

8

Thursday, January 16, 14
SERVICE COMPONENTS
MONITOR
 PAXOS for consensus
 Maintain cluster state
 Typically 3-5 nodes
 NOT in write path

OSD
 Object storage interface
 Gossips with peers
 Data lives here

9

Thursday, January 16, 14

PART 1
SERVICE COMPONENTS
RADOS GATEWAY
 Provides S3/Swift compatibility
 Scale out

METADATA
 Object storage interface
 Gossips with peers
 Dynamic subtree partitioning

10

Thursday, January 16, 14

PART 2
CRUSH
 Ceph uses CRUSH for data placement
 Aware of cluster topography
 Statistically even distribution across pool
 Supports asymmetric nodes and devices
 Hierarchal weighting

11

Thursday, January 16, 14
DATA PLACEMENT

12

Thursday, January 16, 14
POOLS
 Groupings of OSDs
 Both physical and logical
 Volumes / Images
 Hot SSD pool
 Cold SATA pool
 DMCrypt pool

13

Thursday, January 16, 14
REPLICATION
 Original data durability mechanism
 Ceph creates N replicas of each RADOS object
 Uses CRUSH to determine replica placement
 Required for mutable objects (RBD, CephFS)
 More reasonable for smaller installations

14

Thursday, January 16, 14
ERASURE CODING
 (8:4) MDS code in example
 1.5x overhead
 8 units of client data to write
 4 parity units generated using FEC
 All 12 units placed with CRUSH
 8/12 total units to satisfy a read

15

Thursday, January 16, 14

Firefly Release
CLIENT COMPONENTS
Native API
 Mutable object store
 Many language bindings
 Object classes

CephFS
 Linux Kernel CephFS client since 2.6.34
 FUSE client
 Hadoop JNI bindings

16

Thursday, January 16, 14
CLIENT COMPONENTS
Block Storage
 Linux Kernel RBD client since 2.6.37+
 KVM/QEMU integration
 Xen integration

S3/Swift
S3/SWIFT
OSD
 RESTful interfaces (HTTP)
 CRUD operations
 Usage accounting for billing

17

Thursday, January 16, 14
Ceph Networking
Thursday, January 16, 14
INFINIBAND
 Currently only supported via IPoIB
 Accelio (libxio) integration in Ceph is in early stages
 Accelio supports multiple transports RDMA, TCP and
Shared-Memory
 Accelio supports multiple RDMA transports (IB, RoCE,
iWARP)

19

Thursday, January 16, 14
ETHERNET
 Tried and true
 Proven at scale
 Economical
 Many suitable vendors

20

Thursday, January 16, 14
10GbE or 1GbE
 Cost of 10GbE trending downward
 White box switches turning up heat on vendors
 Twinax relatively inexpensive and low power
 SFP+ is versatile wrt distance
 Single 10GbE for object
 Dual 10GbE for block storage (public/cluster)
 Bonding many 1GbE links adds lots of complexity

21

Thursday, January 16, 14
IPv4 or IPv6 Native
 It’s 2014, is this really a question?
 Ceph fully supports both modes of operation
 Hierarchal allocation models allows “roll up” of routes
 Optimal efficiency in RIB
 Some tools believe the earth is flat

22

Thursday, January 16, 14
LAYER 2
 Spanning tree
 Switch table size
 Broadcast domains (ARP)
 MAC frame checksum
 Storage protocols (FCoE, ATAoE)
 TRILL, MLAG
 Layer 2 DCI is crazy pants
 Layer 2 tunneled over internet is super crazy pants

23

Thursday, January 16, 14
LAYER 3
 Address and subnet planning
 Proven scale at big web shops
 Error detection only on TCP header
 Equal cost multi-path (ECMP)
 Reasonable for inter-site connectivity

24

Thursday, January 16, 14
Public Topologies
Thursday, January 16, 14
CLIENT TOPOLOGIES
 Path diversity for resiliency
 Minimize network diameter
 Consistent hop count to minimize net long tail latency
 Ease of scaling
 Tolerate adversarial traffic patterns (fan-in/fan-out)

26

Thursday, January 16, 14
FOLDED CLOS
 Sometimes called Fat Tree or Spine and Leaf
 Minimum 4 fixed switches, grows to 10k+ node fabrics
 Rack or cluster oversubscription possible
 Non-blocking also possible
S
S

S

S

 Path diversity

S
....

....
1

27

Thursday, January 16, 14

2

N

1

2

S

....
N

1

2

....
N

1

2

N
Cluster Topologies
Thursday, January 16, 14
REPLICA TOPOLOGIES
 Replica and erasure fan-out
 Recovery and remap impact on cluster bandwidth
 OSD peering
 Backfill served from primary
 Tune backfills to avoid large fan-in

29

Thursday, January 16, 14
FOLDED CLOS
 Sometimes called Fat Tree or Spine and Leaf
 Minimum 4, grows to 10k+ node fabrics
 Rack or cluster oversubscription possible
 Non-blocking also possible
S
S

S

S

 Path diversity

S
....

....
1

30

Thursday, January 16, 14

2

N

1

2

S

....
N

1

2

....
N

1

2

N
N-WAY PARTIAL MESH

31

Thursday, January 16, 14
EVALUATE
 Replication
 Erasure coding
 Special purpose vs general purpose
 Extra port cost

32

Thursday, January 16, 14
Network Hardware
Thursday, January 16, 14
Features
 Buffer sizes
 Cut through vs store and forward
 Oversubscribed vs non-blocking
 Automation and monitoring

34

Thursday, January 16, 14
FIXED
 Fixed switches can easily build large clusters
 Easier to source
 Smaller failure domains
 Fixed designs have many control planes
 Virtual chassis.. L3 split brain hilarity?

35

Thursday, January 16, 14
LESS SKU
 Utilize as few vendor SKUs as possible
 If permitted, use same fixed switch for spine and leaf
 More affordable to have spares on site or more spares
 Quicker MTTR when gear is ready to go

36

Thursday, January 16, 14
Thanks to our host!

37

Thursday, January 16, 14
Kyle Bader
Sr. Solutions Architect

kyle@inktank.com

Thursday, January 16, 14

More Related Content

What's hot

Ceph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldCeph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldSage Weil
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016John Spray
 
An intro to Ceph and big data - CERN Big Data Workshop
An intro to Ceph and big data - CERN Big Data WorkshopAn intro to Ceph and big data - CERN Big Data Workshop
An intro to Ceph and big data - CERN Big Data WorkshopPatrick McGarry
 
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...Ian Colle
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversLinaro
 
Hadoop over rgw
Hadoop over rgwHadoop over rgw
Hadoop over rgwzhouyuan
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about cephEmma Haruka Iwao
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turkbuildacloud
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonSage Weil
 
Openstack with ceph
Openstack with cephOpenstack with ceph
Openstack with cephIan Colle
 

What's hot (19)

Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
Ceph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldCeph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud world
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
 
An intro to Ceph and big data - CERN Big Data Workshop
An intro to Ceph and big data - CERN Big Data WorkshopAn intro to Ceph and big data - CERN Big Data Workshop
An intro to Ceph and big data - CERN Big Data Workshop
 
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM servers
 
Hadoop over rgw
Hadoop over rgwHadoop over rgw
Hadoop over rgw
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
 
librados
libradoslibrados
librados
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
Openstack with ceph
Openstack with cephOpenstack with ceph
Openstack with ceph
 

Viewers also liked

Why MySQL High Availability Matters
Why MySQL High Availability MattersWhy MySQL High Availability Matters
Why MySQL High Availability MattersMark Swarbrick
 
Tiery Eyed
Tiery EyedTiery Eyed
Tiery EyedZendCon
 
Framework Shootout
Framework ShootoutFramework Shootout
Framework ShootoutZendCon
 
PHP on IBM i Tutorial
PHP on IBM i TutorialPHP on IBM i Tutorial
PHP on IBM i TutorialZendCon
 
Oracle cloud ravello介绍及测试账户申请
Oracle cloud ravello介绍及测试账户申请Oracle cloud ravello介绍及测试账户申请
Oracle cloud ravello介绍及测试账户申请Zhaoyang Wang
 
MySQL Tech Tour 2015 - 5.7 Connector/J/Net
MySQL Tech Tour 2015 - 5.7 Connector/J/NetMySQL Tech Tour 2015 - 5.7 Connector/J/Net
MySQL Tech Tour 2015 - 5.7 Connector/J/NetMark Swarbrick
 
Solving the C20K problem: Raising the bar in PHP Performance and Scalability
Solving the C20K problem: Raising the bar in PHP Performance and ScalabilitySolving the C20K problem: Raising the bar in PHP Performance and Scalability
Solving the C20K problem: Raising the bar in PHP Performance and ScalabilityZendCon
 
Oracle Compute Cloud Service快速实践
Oracle Compute Cloud Service快速实践Oracle Compute Cloud Service快速实践
Oracle Compute Cloud Service快速实践Zhaoyang Wang
 
Oracle Compute Cloud Service介绍
Oracle Compute Cloud Service介绍Oracle Compute Cloud Service介绍
Oracle Compute Cloud Service介绍Zhaoyang Wang
 
Zend Core on IBM i - Security Considerations
Zend Core on IBM i - Security ConsiderationsZend Core on IBM i - Security Considerations
Zend Core on IBM i - Security ConsiderationsZendCon
 
Zend_Tool: Practical use and Extending
Zend_Tool: Practical use and ExtendingZend_Tool: Practical use and Extending
Zend_Tool: Practical use and ExtendingZendCon
 
MySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats newMySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats newMark Swarbrick
 
A Storage Story #ChefConf2013
A Storage Story #ChefConf2013A Storage Story #ChefConf2013
A Storage Story #ChefConf2013Kyle Bader
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 
Application Diagnosis with Zend Server Tracing
Application Diagnosis with Zend Server TracingApplication Diagnosis with Zend Server Tracing
Application Diagnosis with Zend Server TracingZendCon
 
Oracle cloud 使用云市场快速搭建小型电商网站
Oracle cloud 使用云市场快速搭建小型电商网站Oracle cloud 使用云市场快速搭建小型电商网站
Oracle cloud 使用云市场快速搭建小型电商网站Zhaoyang Wang
 
PHP on Windows - What's New
PHP on Windows - What's NewPHP on Windows - What's New
PHP on Windows - What's NewZendCon
 
PHP and Platform Independance in the Cloud
PHP and Platform Independance in the CloudPHP and Platform Independance in the Cloud
PHP and Platform Independance in the CloudZendCon
 

Viewers also liked (20)

Why MySQL High Availability Matters
Why MySQL High Availability MattersWhy MySQL High Availability Matters
Why MySQL High Availability Matters
 
Tiery Eyed
Tiery EyedTiery Eyed
Tiery Eyed
 
Framework Shootout
Framework ShootoutFramework Shootout
Framework Shootout
 
PHP on IBM i Tutorial
PHP on IBM i TutorialPHP on IBM i Tutorial
PHP on IBM i Tutorial
 
Oracle cloud ravello介绍及测试账户申请
Oracle cloud ravello介绍及测试账户申请Oracle cloud ravello介绍及测试账户申请
Oracle cloud ravello介绍及测试账户申请
 
MySQL Tech Tour 2015 - 5.7 Connector/J/Net
MySQL Tech Tour 2015 - 5.7 Connector/J/NetMySQL Tech Tour 2015 - 5.7 Connector/J/Net
MySQL Tech Tour 2015 - 5.7 Connector/J/Net
 
Solving the C20K problem: Raising the bar in PHP Performance and Scalability
Solving the C20K problem: Raising the bar in PHP Performance and ScalabilitySolving the C20K problem: Raising the bar in PHP Performance and Scalability
Solving the C20K problem: Raising the bar in PHP Performance and Scalability
 
Oracle Compute Cloud Service快速实践
Oracle Compute Cloud Service快速实践Oracle Compute Cloud Service快速实践
Oracle Compute Cloud Service快速实践
 
Oracle Compute Cloud Service介绍
Oracle Compute Cloud Service介绍Oracle Compute Cloud Service介绍
Oracle Compute Cloud Service介绍
 
Zend Core on IBM i - Security Considerations
Zend Core on IBM i - Security ConsiderationsZend Core on IBM i - Security Considerations
Zend Core on IBM i - Security Considerations
 
MySQL in your laptop
MySQL in your laptopMySQL in your laptop
MySQL in your laptop
 
Zend_Tool: Practical use and Extending
Zend_Tool: Practical use and ExtendingZend_Tool: Practical use and Extending
Zend_Tool: Practical use and Extending
 
Script it
Script itScript it
Script it
 
MySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats newMySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats new
 
A Storage Story #ChefConf2013
A Storage Story #ChefConf2013A Storage Story #ChefConf2013
A Storage Story #ChefConf2013
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Application Diagnosis with Zend Server Tracing
Application Diagnosis with Zend Server TracingApplication Diagnosis with Zend Server Tracing
Application Diagnosis with Zend Server Tracing
 
Oracle cloud 使用云市场快速搭建小型电商网站
Oracle cloud 使用云市场快速搭建小型电商网站Oracle cloud 使用云市场快速搭建小型电商网站
Oracle cloud 使用云市场快速搭建小型电商网站
 
PHP on Windows - What's New
PHP on Windows - What's NewPHP on Windows - What's New
PHP on Windows - What's New
 
PHP and Platform Independance in the Cloud
PHP and Platform Independance in the CloudPHP and Platform Independance in the Cloud
PHP and Platform Independance in the Cloud
 

Similar to SF Ceph Users Jan. 2014

Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBDDan Frincu
 
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)Ali Ordoubadian
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 
Ceph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Community
 
P4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC OffloadP4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC OffloadOpen-NFP
 
FOSDEM 2017 Trip Report
FOSDEM 2017 Trip ReportFOSDEM 2017 Trip Report
FOSDEM 2017 Trip ReportOCaml Labs
 
The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...OVHcloud
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterEttore Simone
 
Fb i pv6-sparchimanv1.0
Fb i pv6-sparchimanv1.0Fb i pv6-sparchimanv1.0
Fb i pv6-sparchimanv1.0Fred Bovy
 
P4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and ControlP4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and ControlOpen-NFP
 
Webinar-Linux Networking is Awesome
Webinar-Linux Networking is AwesomeWebinar-Linux Networking is Awesome
Webinar-Linux Networking is AwesomeCumulus Networks
 
June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on Videoguy
 
Basic of ip subnet and addressing
Basic of ip subnet and addressingBasic of ip subnet and addressing
Basic of ip subnet and addressingrahul_cuet
 
Openlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionOpenlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionCcie Light
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPThomas Graf
 

Similar to SF Ceph Users Jan. 2014 (20)

Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
ONOS Deployment Brigade
ONOS Deployment BrigadeONOS Deployment Brigade
ONOS Deployment Brigade
 
BSDCan2006.pdf
BSDCan2006.pdfBSDCan2006.pdf
BSDCan2006.pdf
 
Ceph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem Update
 
6LoWPAN: An Open IoT Networking Protocol
6LoWPAN: An Open IoT Networking Protocol6LoWPAN: An Open IoT Networking Protocol
6LoWPAN: An Open IoT Networking Protocol
 
I Pv6
I Pv6I Pv6
I Pv6
 
P4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC OffloadP4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC Offload
 
FOSDEM 2017 Trip Report
FOSDEM 2017 Trip ReportFOSDEM 2017 Trip Report
FOSDEM 2017 Trip Report
 
The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
 
IPv6 ND 2020
IPv6 ND 2020IPv6 ND 2020
IPv6 ND 2020
 
Fb i pv6-sparchimanv1.0
Fb i pv6-sparchimanv1.0Fb i pv6-sparchimanv1.0
Fb i pv6-sparchimanv1.0
 
P4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and ControlP4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and Control
 
Webinar-Linux Networking is Awesome
Webinar-Linux Networking is AwesomeWebinar-Linux Networking is Awesome
Webinar-Linux Networking is Awesome
 
June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on
 
Basic of ip subnet and addressing
Basic of ip subnet and addressingBasic of ip subnet and addressing
Basic of ip subnet and addressing
 
Openlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionOpenlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sion
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
 

Recently uploaded

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 

Recently uploaded (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 

SF Ceph Users Jan. 2014

  • 1. SF BAY AREA CEPH USERS GROUP INAUGURAL MEETUP Thursday, January 16, 14
  • 2. AGENDA Intro to Ceph Ceph Networking Public Topologies Cluster Topologies Network Hardware 2 Thursday, January 16, 14
  • 3. THE FORECAST By 2020 over 39 ZB of data will be stored. 1.5 ZB are stored today. 3
  • 4. THE PROBLEM Growth of data  Existing systems don’t scale IT Storage Budget  Increasing cost and complexity 2010 4 Thursday, January 16, 14 2020  Need to invest in new platforms ahead of time
  • 5. THE SOLUTION PAST: SCALE UP FUTURE: SCALE OUT 5 Thursday, January 16, 14
  • 7. INTRO TO CEPH  Distributed storage system  Horizontally scalable  No single point of failure  Self healing and self managing  Runs on commodity hardware  GPLv2 License 7 Thursday, January 16, 14
  • 9. SERVICE COMPONENTS MONITOR  PAXOS for consensus  Maintain cluster state  Typically 3-5 nodes  NOT in write path OSD  Object storage interface  Gossips with peers  Data lives here 9 Thursday, January 16, 14 PART 1
  • 10. SERVICE COMPONENTS RADOS GATEWAY  Provides S3/Swift compatibility  Scale out METADATA  Object storage interface  Gossips with peers  Dynamic subtree partitioning 10 Thursday, January 16, 14 PART 2
  • 11. CRUSH  Ceph uses CRUSH for data placement  Aware of cluster topography  Statistically even distribution across pool  Supports asymmetric nodes and devices  Hierarchal weighting 11 Thursday, January 16, 14
  • 13. POOLS  Groupings of OSDs  Both physical and logical  Volumes / Images  Hot SSD pool  Cold SATA pool  DMCrypt pool 13 Thursday, January 16, 14
  • 14. REPLICATION  Original data durability mechanism  Ceph creates N replicas of each RADOS object  Uses CRUSH to determine replica placement  Required for mutable objects (RBD, CephFS)  More reasonable for smaller installations 14 Thursday, January 16, 14
  • 15. ERASURE CODING  (8:4) MDS code in example  1.5x overhead  8 units of client data to write  4 parity units generated using FEC  All 12 units placed with CRUSH  8/12 total units to satisfy a read 15 Thursday, January 16, 14 Firefly Release
  • 16. CLIENT COMPONENTS Native API  Mutable object store  Many language bindings  Object classes CephFS  Linux Kernel CephFS client since 2.6.34  FUSE client  Hadoop JNI bindings 16 Thursday, January 16, 14
  • 17. CLIENT COMPONENTS Block Storage  Linux Kernel RBD client since 2.6.37+  KVM/QEMU integration  Xen integration S3/Swift S3/SWIFT OSD  RESTful interfaces (HTTP)  CRUD operations  Usage accounting for billing 17 Thursday, January 16, 14
  • 19. INFINIBAND  Currently only supported via IPoIB  Accelio (libxio) integration in Ceph is in early stages  Accelio supports multiple transports RDMA, TCP and Shared-Memory  Accelio supports multiple RDMA transports (IB, RoCE, iWARP) 19 Thursday, January 16, 14
  • 20. ETHERNET  Tried and true  Proven at scale  Economical  Many suitable vendors 20 Thursday, January 16, 14
  • 21. 10GbE or 1GbE  Cost of 10GbE trending downward  White box switches turning up heat on vendors  Twinax relatively inexpensive and low power  SFP+ is versatile wrt distance  Single 10GbE for object  Dual 10GbE for block storage (public/cluster)  Bonding many 1GbE links adds lots of complexity 21 Thursday, January 16, 14
  • 22. IPv4 or IPv6 Native  It’s 2014, is this really a question?  Ceph fully supports both modes of operation  Hierarchal allocation models allows “roll up” of routes  Optimal efficiency in RIB  Some tools believe the earth is flat 22 Thursday, January 16, 14
  • 23. LAYER 2  Spanning tree  Switch table size  Broadcast domains (ARP)  MAC frame checksum  Storage protocols (FCoE, ATAoE)  TRILL, MLAG  Layer 2 DCI is crazy pants  Layer 2 tunneled over internet is super crazy pants 23 Thursday, January 16, 14
  • 24. LAYER 3  Address and subnet planning  Proven scale at big web shops  Error detection only on TCP header  Equal cost multi-path (ECMP)  Reasonable for inter-site connectivity 24 Thursday, January 16, 14
  • 26. CLIENT TOPOLOGIES  Path diversity for resiliency  Minimize network diameter  Consistent hop count to minimize net long tail latency  Ease of scaling  Tolerate adversarial traffic patterns (fan-in/fan-out) 26 Thursday, January 16, 14
  • 27. FOLDED CLOS  Sometimes called Fat Tree or Spine and Leaf  Minimum 4 fixed switches, grows to 10k+ node fabrics  Rack or cluster oversubscription possible  Non-blocking also possible S S S S  Path diversity S .... .... 1 27 Thursday, January 16, 14 2 N 1 2 S .... N 1 2 .... N 1 2 N
  • 29. REPLICA TOPOLOGIES  Replica and erasure fan-out  Recovery and remap impact on cluster bandwidth  OSD peering  Backfill served from primary  Tune backfills to avoid large fan-in 29 Thursday, January 16, 14
  • 30. FOLDED CLOS  Sometimes called Fat Tree or Spine and Leaf  Minimum 4, grows to 10k+ node fabrics  Rack or cluster oversubscription possible  Non-blocking also possible S S S S  Path diversity S .... .... 1 30 Thursday, January 16, 14 2 N 1 2 S .... N 1 2 .... N 1 2 N
  • 32. EVALUATE  Replication  Erasure coding  Special purpose vs general purpose  Extra port cost 32 Thursday, January 16, 14
  • 34. Features  Buffer sizes  Cut through vs store and forward  Oversubscribed vs non-blocking  Automation and monitoring 34 Thursday, January 16, 14
  • 35. FIXED  Fixed switches can easily build large clusters  Easier to source  Smaller failure domains  Fixed designs have many control planes  Virtual chassis.. L3 split brain hilarity? 35 Thursday, January 16, 14
  • 36. LESS SKU  Utilize as few vendor SKUs as possible  If permitted, use same fixed switch for spine and leaf  More affordable to have spares on site or more spares  Quicker MTTR when gear is ready to go 36 Thursday, January 16, 14
  • 37. Thanks to our host! 37 Thursday, January 16, 14
  • 38. Kyle Bader Sr. Solutions Architect kyle@inktank.com Thursday, January 16, 14