SlideShare a Scribd company logo
1 of 21
Download to read offline
Scaling an Academic Cloud
with Ceph
28.04.2015 | Berlin, Germany
Ceph Day Berlin
Christian Spindeldreher
Enterprise Technologist
Dell EMEA
The Cloud
2
The
Software-Defined
Datacenter
3
Defining “software-defined”
The capabilities
• Compute
• Storage/availability
• Networking/
security & management
The benefits
• Automated &
simplified
• Unlimited agility
• Maximum efficiency
SDN
SDS
SDC
SDE
4
Data
plane
Control
plane
Traditional
system
Purpose-built
hardware & software
General-purpose hardware
Software-
defined
Open standard,
e.g., OpenFlow
Next-gen compute block
Purpose-built function virtualized
in general-purpose hardware
delivered as a service
The basics
5
The Cloud Operating System
Manage the Resources…
6
Ceph and OpenStack
Ceph in
Academia & Research
7
CLIMB project
8
picture from http://westcampus.yale.edu
• Collaboration between 4 Universities:
Birmingham, Cardiff, Swansea & Warwick
• Ceph environment across the 4 sites
– part of a HPC Cloud to deploy virtual
resources for microbial bioinformatics
(e.g. DNA sequencer output,…)
– shared data across the sites
– robust solution with low €/TB ratio for
mid/long term storage
– Ceph Solution by OCF, Inktank* & Dell
– more information:
http://www.climb.ac.uk
* now Red Hat
CLIMB project
• 4 Ceph Clusters
– 6.9PB raw capacity (total)
– 3 replicas – at least 1 remote:
2.3PB useable capacity
– server infrastructure (per site)
› 5 MON nodes
› 2 Gateway nodes
– R420, 4x 10GbE
› 27 OSD nodes
– R730xd, 16x 4TB, 2 SSDs, 2x 10GbE
– network infrastructure
› Brocade VDX6740T switches
– 48x 10GbE, 4x 40GbE
9
S3IT − Central IT, University of Zurich (UZH)
• UZH – some interesting facts
– 26.000 enrolled students – Switzlerland‘s
largest university
– member of the “League European Research
Universities” (LERU)
– international renown in medicine, immunology,
genetics, neuroscience, structural biology,
economics,…
› 12 UZH scholars have been awarded the Nobel Prize
• Scale-Out Storage for Scientific Cloud (based on OpenStack)
– based on Ceph
– commodity components
– ethernet network
– good balance between performance, capacity & cost
10
picture: http://www.hausarztmedizin.uzh.ch/index.html
S3IT − Central IT, University of Zurich (UZH)
• Requirements for High-Capacity Tier
– 4.2PB raw capacity (1st batch)
› cinder volumes, glance images, ephemeral disks of VMs,
radosgw (S3-like object storage)
› replication, erasure coding & cache tiering
– R630 + 2x MD1400 JBOD
› 24x 4TB nSAS
› 6x 800GB SSD (in R630)
• Requirements for High-Performance Tier
– 112TB raw capacity (1st batch)
› block access
› SSD pool, replicated
– R630
› 8x 1.6TB SSD
• Network
– scale-out 40GbE back-bone:
2x Z9500 (132x 40GbE in 3RU)
– ToR: S4810 (48x 10GbE, 4x 40GbE)
11
Requirements in Academia, Science & Research today
What we see…
• Ceph Stand-Alone vs. OpenStack-related
• Large Scale Environments
– 5PB / 20PB / 100PB target capacity
– usually object
• Multi-Site Environments
– cross-site replication
– unified object space
– searchable meta data
› out-of-scope for Ceph?!
12
Design Considerations
13
Infrastructure Considerations – Storage Nodes
• Form Factors
– Small Nodes vs. Big Nodes
vs. Super-Nodes
– Node Count
– Ethernet-based Drives
• Use of SSDs
– Journaling
– Cache Tiering
– SSD-only Pools
– Check new SSD Types
› PCIe, form factors (1.8“ size),
write endurance,…
14
Infrastructure Considerations – Storage Node Example
• Storage Node: R730xd
– 2 RU
– 1 or 2 CPUs
– local drives
› 16x 3.5“ HDD slots (+ 2x 2.5“ for boot)
– up to 6TB per drive today (96TB total)
› 24x 2.5“ HDD slots (+ 2x 2.5“ for boot)
› 8x 3.5“ HDD slots + 18x 1.8“ SSDs
(+ 2x 2.5“ for boot)
– highly flexible system
– JBOD expansion optional
15
Infrastructure Considerations – Storage Node Example
• Head Node: R630
– 1 RU
– 1 or 2 CPUs
– local drives
› 10x 2.5“ HDD slots or
› 24x 1.8“ SSDs
› could host Write Journaling, Cache Tiering or
SSD-only pools (then without a JBOD)
• JBOD: MD3060e
– 4 RUs
– SAS attach
– 60x 3.5“ HDD slost
› up to 6TB per drive today (360TB total)
• VoC (example)
– “Write Journal on SSD has no real impact
with 60 HDDs“
16
SAS
Infrastructure Considerations – Network
• Client-facing vs. Cluster-internal IO
– be aware of replication traffic
• ToR
– 1x or 2x 10GbE Switch
› failure domain?!
– 40GbE Uplinks
• Distributed Core
– Scale-Out Core-Switch Design
– 40/50/100GbE Mesh
– Virtual Link Trunking (VLT) for HA/Load-
Balancing
17
Infrastructure Considerations – the Site/DC…
• Power & Cooling
– high density has some impacts
– example for 1 rack (42 RUs)
› R630 & MD3060e building block / 8 units
› input power:
› weight:
› raw capacity:
• Fresh Air Technology
– use higher air temperature for cooling
– 25°C vs. 30°C vs. 40°C
18
High Density: TACC Stampede Cluster
› 21kW
› ~ 1000kg
› 2.9PB
Dell Fresh Air Hot House,
Round Rock TX
19
Dell|Inktank (now RH) Ceph Reference Architecture
HW + SW + Services
Hardware
HW Reference
Architecture
• R730xd Servers
• Storage and compute
• Dell S/Z-Series Switches
Configuration
• Min of 6 nodes:
3x MON + 3x Data
Software
Software
• Inktank ICE platform
• optional OpenStack cloud
software
Operating System
• RHEL
• SUSE, Ubuntu,…
Access • Object & Block (today)
Services
Deployment
• Onsite HW Install
• Onsite SW Install
• Whiteboard session & training
Support
• HW: Dell ProSupport
• SW: OpenStack support
Solution based on (e.g.):
• Server nodes:
• R730xd,…
• Fully populated drives
• Dell F10 10/40GbE switches
• Modules are flexible
Dell Solution Centers
• 30-90 minute briefings
• 1-4 hour Design Workshops
• 5-10 days Proofs-of-Concept for
hands-on “prove-it”
20
Thank You!
Christian_Spindeldreher@Dell.com

More Related Content

What's hot

Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory StorageDataWorks Summit
 
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Community
 
Software Defined storage
Software Defined storageSoftware Defined storage
Software Defined storageKirillos Akram
 
Ceph at salesforce ceph day external presentation
Ceph at salesforce   ceph day external presentationCeph at salesforce   ceph day external presentation
Ceph at salesforce ceph day external presentationSameer Tiwari
 
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...Ceph Community
 
Mesos study report 03v1.2
Mesos study report  03v1.2Mesos study report  03v1.2
Mesos study report 03v1.2Stefanie Zhao
 
Ceph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide DeckCeph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide DeckDaystromTech
 
SanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraSanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraDataStax Academy
 
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data CenterMesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data CenterAnkur Chauhan
 
Storage Geeks 101 - 2019
Storage Geeks 101 - 2019Storage Geeks 101 - 2019
Storage Geeks 101 - 2019Andrew McGee
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori
 
Home For Gypsies – Storage for NoSQL Databases​
Home For Gypsies – Storage for NoSQL Databases​Home For Gypsies – Storage for NoSQL Databases​
Home For Gypsies – Storage for NoSQL Databases​Atish Kathpal
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsDataWorks Summit
 
Towards Application Driven Storage
Towards Application Driven StorageTowards Application Driven Storage
Towards Application Driven StorageJavier González
 

What's hot (20)

Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory Storage
 
Exadata
ExadataExadata
Exadata
 
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
 
Software Defined storage
Software Defined storageSoftware Defined storage
Software Defined storage
 
Ceph at salesforce ceph day external presentation
Ceph at salesforce   ceph day external presentationCeph at salesforce   ceph day external presentation
Ceph at salesforce ceph day external presentation
 
HDFS Issues
HDFS IssuesHDFS Issues
HDFS Issues
 
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
 
Mesos study report 03v1.2
Mesos study report  03v1.2Mesos study report  03v1.2
Mesos study report 03v1.2
 
Ceph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide DeckCeph Days 2014 Paul Evans Slide Deck
Ceph Days 2014 Paul Evans Slide Deck
 
SanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraSanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and Cassandra
 
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data CenterMesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
 
Ceph c01
Ceph c01Ceph c01
Ceph c01
 
Storage Geeks 101 - 2019
Storage Geeks 101 - 2019Storage Geeks 101 - 2019
Storage Geeks 101 - 2019
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
Home For Gypsies – Storage for NoSQL Databases​
Home For Gypsies – Storage for NoSQL Databases​Home For Gypsies – Storage for NoSQL Databases​
Home For Gypsies – Storage for NoSQL Databases​
 
HDF-EOS Development: Current Status and Tools
HDF-EOS Development: Current Status and ToolsHDF-EOS Development: Current Status and Tools
HDF-EOS Development: Current Status and Tools
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloads
 
GlusterFS And Big Data
GlusterFS And Big DataGlusterFS And Big Data
GlusterFS And Big Data
 
Towards Application Driven Storage
Towards Application Driven StorageTowards Application Driven Storage
Towards Application Driven Storage
 

Viewers also liked

презентация на виват
презентация на виватпрезентация на виват
презентация на виватsirbia0
 
Tra le onde dell'amore v
Tra le onde dell'amore v Tra le onde dell'amore v
Tra le onde dell'amore v IstitutoCastri5
 
Temmuz aralık 2009
Temmuz aralık 2009Temmuz aralık 2009
Temmuz aralık 2009usevik
 
Σύλλογος γονέων και Κηδεμόνων_Οδηγός
Σύλλογος γονέων και Κηδεμόνων_ΟδηγόςΣύλλογος γονέων και Κηδεμόνων_Οδηγός
Σύλλογος γονέων και Κηδεμόνων_Οδηγόςparentbook
 
Naj Kuchárky 2015
Naj Kuchárky 2015Naj Kuchárky 2015
Naj Kuchárky 2015Kucharky
 
Ceph Day Berlin: Erasure Code in Ceph
Ceph Day Berlin: Erasure Code in Ceph Ceph Day Berlin: Erasure Code in Ceph
Ceph Day Berlin: Erasure Code in Ceph Ceph Community
 
الفصل الثالث من مانجا الرجل دو اللكمة الواحدة - one punch man
الفصل الثالث من مانجا الرجل دو اللكمة الواحدة - one punch manالفصل الثالث من مانجا الرجل دو اللكمة الواحدة - one punch man
الفصل الثالث من مانجا الرجل دو اللكمة الواحدة - one punch manSidi Mohamed
 
Биохимические, физи-ко-химические и мик-робиологические про-цессы при произво...
Биохимические, физи-ко-химические и мик-робиологические про-цессы при произво...Биохимические, физи-ко-химические и мик-робиологические про-цессы при произво...
Биохимические, физи-ко-химические и мик-робиологические про-цессы при произво...qwer78
 
Bai 15 quyen va nghia vu hoc tap
Bai 15  quyen va nghia vu hoc tapBai 15  quyen va nghia vu hoc tap
Bai 15 quyen va nghia vu hoc tapHoa Phượng
 
Ceph Day Berlin: Ceph and iSCSI in a high availability setup
Ceph Day Berlin: Ceph and iSCSI in a high availability setupCeph Day Berlin: Ceph and iSCSI in a high availability setup
Ceph Day Berlin: Ceph and iSCSI in a high availability setupCeph Community
 
Zomato Conference - Moment's Of Truth
Zomato Conference - Moment's Of TruthZomato Conference - Moment's Of Truth
Zomato Conference - Moment's Of TruthMike Said
 
Influence of Carbon & Glass Fiber Reinforcements on Flexural Strength of Epox...
Influence of Carbon & Glass Fiber Reinforcements on Flexural Strength of Epox...Influence of Carbon & Glass Fiber Reinforcements on Flexural Strength of Epox...
Influence of Carbon & Glass Fiber Reinforcements on Flexural Strength of Epox...IJERA Editor
 
ΧΡΥΣΑ ΝΕΑ ΤΕΥΧΟΣ 5
ΧΡΥΣΑ ΝΕΑ ΤΕΥΧΟΣ 5ΧΡΥΣΑ ΝΕΑ ΤΕΥΧΟΣ 5
ΧΡΥΣΑ ΝΕΑ ΤΕΥΧΟΣ 5Haris Gamvrelis
 

Viewers also liked (14)

презентация на виват
презентация на виватпрезентация на виват
презентация на виват
 
Tra le onde dell'amore v
Tra le onde dell'amore v Tra le onde dell'amore v
Tra le onde dell'amore v
 
Temmuz aralık 2009
Temmuz aralık 2009Temmuz aralık 2009
Temmuz aralık 2009
 
Σύλλογος γονέων και Κηδεμόνων_Οδηγός
Σύλλογος γονέων και Κηδεμόνων_ΟδηγόςΣύλλογος γονέων και Κηδεμόνων_Οδηγός
Σύλλογος γονέων και Κηδεμόνων_Οδηγός
 
Ptdt
PtdtPtdt
Ptdt
 
Naj Kuchárky 2015
Naj Kuchárky 2015Naj Kuchárky 2015
Naj Kuchárky 2015
 
Ceph Day Berlin: Erasure Code in Ceph
Ceph Day Berlin: Erasure Code in Ceph Ceph Day Berlin: Erasure Code in Ceph
Ceph Day Berlin: Erasure Code in Ceph
 
الفصل الثالث من مانجا الرجل دو اللكمة الواحدة - one punch man
الفصل الثالث من مانجا الرجل دو اللكمة الواحدة - one punch manالفصل الثالث من مانجا الرجل دو اللكمة الواحدة - one punch man
الفصل الثالث من مانجا الرجل دو اللكمة الواحدة - one punch man
 
Биохимические, физи-ко-химические и мик-робиологические про-цессы при произво...
Биохимические, физи-ко-химические и мик-робиологические про-цессы при произво...Биохимические, физи-ко-химические и мик-робиологические про-цессы при произво...
Биохимические, физи-ко-химические и мик-робиологические про-цессы при произво...
 
Bai 15 quyen va nghia vu hoc tap
Bai 15  quyen va nghia vu hoc tapBai 15  quyen va nghia vu hoc tap
Bai 15 quyen va nghia vu hoc tap
 
Ceph Day Berlin: Ceph and iSCSI in a high availability setup
Ceph Day Berlin: Ceph and iSCSI in a high availability setupCeph Day Berlin: Ceph and iSCSI in a high availability setup
Ceph Day Berlin: Ceph and iSCSI in a high availability setup
 
Zomato Conference - Moment's Of Truth
Zomato Conference - Moment's Of TruthZomato Conference - Moment's Of Truth
Zomato Conference - Moment's Of Truth
 
Influence of Carbon & Glass Fiber Reinforcements on Flexural Strength of Epox...
Influence of Carbon & Glass Fiber Reinforcements on Flexural Strength of Epox...Influence of Carbon & Glass Fiber Reinforcements on Flexural Strength of Epox...
Influence of Carbon & Glass Fiber Reinforcements on Flexural Strength of Epox...
 
ΧΡΥΣΑ ΝΕΑ ΤΕΥΧΟΣ 5
ΧΡΥΣΑ ΝΕΑ ΤΕΥΧΟΣ 5ΧΡΥΣΑ ΝΕΑ ΤΕΥΧΟΣ 5
ΧΡΥΣΑ ΝΕΑ ΤΕΥΧΟΣ 5
 

Similar to Ceph Day Berlin: Scaling an Academic Cloud

QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Red_Hat_Storage
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesHazelcast
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoBig Data Joe™ Rossi
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisMike Pittaro
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis PyData
 
Deploying Efficient OpenStack Clouds, Yaron Haviv
Deploying Efficient OpenStack Clouds, Yaron HavivDeploying Efficient OpenStack Clouds, Yaron Haviv
Deploying Efficient OpenStack Clouds, Yaron HavivCloud Native Day Tel Aviv
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
 
Ncar globally accessible user environment
Ncar globally accessible user environmentNcar globally accessible user environment
Ncar globally accessible user environmentinside-BigData.com
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)Lars Marowsky-Brée
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Florida State University Open Stack
Florida State University Open StackFlorida State University Open Stack
Florida State University Open Stackinside-BigData.com
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 

Similar to Ceph Day Berlin: Scaling an Academic Cloud (20)

QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysis
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis
 
Deploying Efficient OpenStack Clouds, Yaron Haviv
Deploying Efficient OpenStack Clouds, Yaron HavivDeploying Efficient OpenStack Clouds, Yaron Haviv
Deploying Efficient OpenStack Clouds, Yaron Haviv
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Ncar globally accessible user environment
Ncar globally accessible user environmentNcar globally accessible user environment
Ncar globally accessible user environment
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Florida State University Open Stack
Florida State University Open StackFlorida State University Open Stack
Florida State University Open Stack
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 

Recently uploaded

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Ceph Day Berlin: Scaling an Academic Cloud

  • 1. Scaling an Academic Cloud with Ceph 28.04.2015 | Berlin, Germany Ceph Day Berlin Christian Spindeldreher Enterprise Technologist Dell EMEA
  • 4. Defining “software-defined” The capabilities • Compute • Storage/availability • Networking/ security & management The benefits • Automated & simplified • Unlimited agility • Maximum efficiency SDN SDS SDC SDE 4 Data plane Control plane Traditional system Purpose-built hardware & software General-purpose hardware Software- defined Open standard, e.g., OpenFlow Next-gen compute block Purpose-built function virtualized in general-purpose hardware delivered as a service The basics
  • 5. 5 The Cloud Operating System Manage the Resources…
  • 7. Ceph in Academia & Research 7
  • 8. CLIMB project 8 picture from http://westcampus.yale.edu • Collaboration between 4 Universities: Birmingham, Cardiff, Swansea & Warwick • Ceph environment across the 4 sites – part of a HPC Cloud to deploy virtual resources for microbial bioinformatics (e.g. DNA sequencer output,…) – shared data across the sites – robust solution with low €/TB ratio for mid/long term storage – Ceph Solution by OCF, Inktank* & Dell – more information: http://www.climb.ac.uk * now Red Hat
  • 9. CLIMB project • 4 Ceph Clusters – 6.9PB raw capacity (total) – 3 replicas – at least 1 remote: 2.3PB useable capacity – server infrastructure (per site) › 5 MON nodes › 2 Gateway nodes – R420, 4x 10GbE › 27 OSD nodes – R730xd, 16x 4TB, 2 SSDs, 2x 10GbE – network infrastructure › Brocade VDX6740T switches – 48x 10GbE, 4x 40GbE 9
  • 10. S3IT − Central IT, University of Zurich (UZH) • UZH – some interesting facts – 26.000 enrolled students – Switzlerland‘s largest university – member of the “League European Research Universities” (LERU) – international renown in medicine, immunology, genetics, neuroscience, structural biology, economics,… › 12 UZH scholars have been awarded the Nobel Prize • Scale-Out Storage for Scientific Cloud (based on OpenStack) – based on Ceph – commodity components – ethernet network – good balance between performance, capacity & cost 10 picture: http://www.hausarztmedizin.uzh.ch/index.html
  • 11. S3IT − Central IT, University of Zurich (UZH) • Requirements for High-Capacity Tier – 4.2PB raw capacity (1st batch) › cinder volumes, glance images, ephemeral disks of VMs, radosgw (S3-like object storage) › replication, erasure coding & cache tiering – R630 + 2x MD1400 JBOD › 24x 4TB nSAS › 6x 800GB SSD (in R630) • Requirements for High-Performance Tier – 112TB raw capacity (1st batch) › block access › SSD pool, replicated – R630 › 8x 1.6TB SSD • Network – scale-out 40GbE back-bone: 2x Z9500 (132x 40GbE in 3RU) – ToR: S4810 (48x 10GbE, 4x 40GbE) 11
  • 12. Requirements in Academia, Science & Research today What we see… • Ceph Stand-Alone vs. OpenStack-related • Large Scale Environments – 5PB / 20PB / 100PB target capacity – usually object • Multi-Site Environments – cross-site replication – unified object space – searchable meta data › out-of-scope for Ceph?! 12
  • 14. Infrastructure Considerations – Storage Nodes • Form Factors – Small Nodes vs. Big Nodes vs. Super-Nodes – Node Count – Ethernet-based Drives • Use of SSDs – Journaling – Cache Tiering – SSD-only Pools – Check new SSD Types › PCIe, form factors (1.8“ size), write endurance,… 14
  • 15. Infrastructure Considerations – Storage Node Example • Storage Node: R730xd – 2 RU – 1 or 2 CPUs – local drives › 16x 3.5“ HDD slots (+ 2x 2.5“ for boot) – up to 6TB per drive today (96TB total) › 24x 2.5“ HDD slots (+ 2x 2.5“ for boot) › 8x 3.5“ HDD slots + 18x 1.8“ SSDs (+ 2x 2.5“ for boot) – highly flexible system – JBOD expansion optional 15
  • 16. Infrastructure Considerations – Storage Node Example • Head Node: R630 – 1 RU – 1 or 2 CPUs – local drives › 10x 2.5“ HDD slots or › 24x 1.8“ SSDs › could host Write Journaling, Cache Tiering or SSD-only pools (then without a JBOD) • JBOD: MD3060e – 4 RUs – SAS attach – 60x 3.5“ HDD slost › up to 6TB per drive today (360TB total) • VoC (example) – “Write Journal on SSD has no real impact with 60 HDDs“ 16 SAS
  • 17. Infrastructure Considerations – Network • Client-facing vs. Cluster-internal IO – be aware of replication traffic • ToR – 1x or 2x 10GbE Switch › failure domain?! – 40GbE Uplinks • Distributed Core – Scale-Out Core-Switch Design – 40/50/100GbE Mesh – Virtual Link Trunking (VLT) for HA/Load- Balancing 17
  • 18. Infrastructure Considerations – the Site/DC… • Power & Cooling – high density has some impacts – example for 1 rack (42 RUs) › R630 & MD3060e building block / 8 units › input power: › weight: › raw capacity: • Fresh Air Technology – use higher air temperature for cooling – 25°C vs. 30°C vs. 40°C 18 High Density: TACC Stampede Cluster › 21kW › ~ 1000kg › 2.9PB Dell Fresh Air Hot House, Round Rock TX
  • 19. 19 Dell|Inktank (now RH) Ceph Reference Architecture HW + SW + Services Hardware HW Reference Architecture • R730xd Servers • Storage and compute • Dell S/Z-Series Switches Configuration • Min of 6 nodes: 3x MON + 3x Data Software Software • Inktank ICE platform • optional OpenStack cloud software Operating System • RHEL • SUSE, Ubuntu,… Access • Object & Block (today) Services Deployment • Onsite HW Install • Onsite SW Install • Whiteboard session & training Support • HW: Dell ProSupport • SW: OpenStack support Solution based on (e.g.): • Server nodes: • R730xd,… • Fully populated drives • Dell F10 10/40GbE switches • Modules are flexible
  • 20. Dell Solution Centers • 30-90 minute briefings • 1-4 hour Design Workshops • 5-10 days Proofs-of-Concept for hands-on “prove-it” 20