This document summarizes a presentation about Ceph, an open-source distributed storage system. It discusses Ceph's introduction and components, benchmarks Ceph's block and object storage performance on Intel architecture, and describes optimizations like cache tiering and erasure coding. It also outlines Intel's product portfolio in supporting Ceph through optimized CPUs, flash storage, networking, server boards, software libraries, and contributions to the open source Ceph community.
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
Â
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
Ceph scale testing with 10 Billion ObjectsKaran Singh
Â
In this performance testing, we ingested 10 Billion objects into the Ceph Object Storage system and measured its performance. We have observed deterministic performance, check out this presentation to know the details.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
Â
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
Ceph scale testing with 10 Billion ObjectsKaran Singh
Â
In this performance testing, we ingested 10 Billion objects into the Ceph Object Storage system and measured its performance. We have observed deterministic performance, check out this presentation to know the details.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Ceph is an open source project, which provides software-defined, unified storage solutions. Ceph is a distributed storage system which is massively scalable and high-performing without any single point of failure. From the roots, it has been designed to be highly scalable, up to exabyte level and beyond while running on general-purpose commodity hardware.
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
Â
Together with my colleagues at Red Hat Storage Team, i am very proud to have worked on this reference architecture for Ceph Object Storage.
If you are building Ceph object storage at scale, this document is for you.
CRUSH is the powerful, highly configurable algorithm Red Hat Ceph Storage uses to determine how data is stored across the many servers in a cluster. A healthy Red Hat Ceph Storage deployment depends on a properly configured CRUSH map. In this session, we will review the Red Hat Ceph Storage architecture and explain the purpose of CRUSH. Using example CRUSH maps, we will show you what works and what does not, and explain why.
Presented at Red Hat Summit 2016-06-29.
BlueStore: a new, faster storage backend for CephSage Weil
Â
Traditionally Ceph has made use of local file systems like XFS or btrfs to store its data. However, the mismatch between the OSD's requirements and the POSIX interface provided by kernel file systems has a huge performance cost and requires a lot of complexity. BlueStore, an entirely new OSD storage backend, utilizes block devices directly, doubling performance for most workloads. This talk will cover the motivation a new backend, the design and implementation, the improved performance on HDDs, SSDs, and NVMe, and discuss some of the thornier issues we had to overcome when replacing tried and true kernel file systems with entirely new code running in userspace.
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCERCeph Community
Â
This talk will introduce the ceph-mgr balancer and the placement group ""upmap"" features added in Luminous.||Experienced Ceph operators will learn practical methods to:| - achieve perfectly uniform OSD distributions| - painlessly migrate data between servers| - easily add capacity to clusters big or small| - transparently modify CRUSH rules or tunables without fear!|
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
Â
Ceph is designed around the assumption that all components of the system (disks, hosts, networks) can fail, and has traditionally leveraged replication to provide data durability and reliability. The CRUSH placement algorithm is used to allow failure domains to be defined across hosts, racks, rows, or datacenters, depending on the deployment scale and requirements.
Recent releases have added support for erasure coding, which can provide much higher data durability and lower storage overheads. However, in practice erasure codes have different performance characteristics than traditional replication and, under some workloads, come at some expense. At the same time, we have introduced a storage tiering infrastructure and cache pools that allow alternate hardware backends (like high-end flash) to be leveraged for active data sets while cold data are transparently migrated to slower backends. The combination of these two features enables a surprisingly broad range of new applications and deployment configurations.
This talk will cover a few Ceph fundamentals, discuss the new tiering and erasure coding features, and then discuss a variety of ways that the new capabilities can be leveraged.
In this session, you'll learn how RBD works, including how it:
Uses RADOS classes to make access easier from user space and within the Linux kernel.
Implements thin provisioning.
Builds on RADOS self-managed snapshots for cloning and differential backups.
Increases performance with caching of various kinds.
Uses watch/notify RADOS primitives to handle online management operations.
Integrates with QEMU, libvirt, and OpenStack.
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
Â
BlueStore is a new storage backend for Ceph OSDs that consumes block devices directly, bypassing the local XFS file system that is currently used today. It's design is motivated by everything we've learned about OSD workloads and interface requirements over the last decade, and everything that has worked well and not so well when storing objects as files in local files systems like XFS, btrfs, or ext4. BlueStore has been under development for a bit more than a year now, and has reached a state where it is becoming usable in production. This talk will cover the BlueStore design, how it has evolved over the last year, and what challenges remain before it can become the new default storage backend.
Ceph is an open source project, which provides software-defined, unified storage solutions. Ceph is a distributed storage system which is massively scalable and high-performing without any single point of failure. From the roots, it has been designed to be highly scalable, up to exabyte level and beyond while running on general-purpose commodity hardware.
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
Â
Together with my colleagues at Red Hat Storage Team, i am very proud to have worked on this reference architecture for Ceph Object Storage.
If you are building Ceph object storage at scale, this document is for you.
CRUSH is the powerful, highly configurable algorithm Red Hat Ceph Storage uses to determine how data is stored across the many servers in a cluster. A healthy Red Hat Ceph Storage deployment depends on a properly configured CRUSH map. In this session, we will review the Red Hat Ceph Storage architecture and explain the purpose of CRUSH. Using example CRUSH maps, we will show you what works and what does not, and explain why.
Presented at Red Hat Summit 2016-06-29.
BlueStore: a new, faster storage backend for CephSage Weil
Â
Traditionally Ceph has made use of local file systems like XFS or btrfs to store its data. However, the mismatch between the OSD's requirements and the POSIX interface provided by kernel file systems has a huge performance cost and requires a lot of complexity. BlueStore, an entirely new OSD storage backend, utilizes block devices directly, doubling performance for most workloads. This talk will cover the motivation a new backend, the design and implementation, the improved performance on HDDs, SSDs, and NVMe, and discuss some of the thornier issues we had to overcome when replacing tried and true kernel file systems with entirely new code running in userspace.
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCERCeph Community
Â
This talk will introduce the ceph-mgr balancer and the placement group ""upmap"" features added in Luminous.||Experienced Ceph operators will learn practical methods to:| - achieve perfectly uniform OSD distributions| - painlessly migrate data between servers| - easily add capacity to clusters big or small| - transparently modify CRUSH rules or tunables without fear!|
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
Â
Ceph is designed around the assumption that all components of the system (disks, hosts, networks) can fail, and has traditionally leveraged replication to provide data durability and reliability. The CRUSH placement algorithm is used to allow failure domains to be defined across hosts, racks, rows, or datacenters, depending on the deployment scale and requirements.
Recent releases have added support for erasure coding, which can provide much higher data durability and lower storage overheads. However, in practice erasure codes have different performance characteristics than traditional replication and, under some workloads, come at some expense. At the same time, we have introduced a storage tiering infrastructure and cache pools that allow alternate hardware backends (like high-end flash) to be leveraged for active data sets while cold data are transparently migrated to slower backends. The combination of these two features enables a surprisingly broad range of new applications and deployment configurations.
This talk will cover a few Ceph fundamentals, discuss the new tiering and erasure coding features, and then discuss a variety of ways that the new capabilities can be leveraged.
In this session, you'll learn how RBD works, including how it:
Uses RADOS classes to make access easier from user space and within the Linux kernel.
Implements thin provisioning.
Builds on RADOS self-managed snapshots for cloning and differential backups.
Increases performance with caching of various kinds.
Uses watch/notify RADOS primitives to handle online management operations.
Integrates with QEMU, libvirt, and OpenStack.
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
Â
BlueStore is a new storage backend for Ceph OSDs that consumes block devices directly, bypassing the local XFS file system that is currently used today. It's design is motivated by everything we've learned about OSD workloads and interface requirements over the last decade, and everything that has worked well and not so well when storing objects as files in local files systems like XFS, btrfs, or ext4. BlueStore has been under development for a bit more than a year now, and has reached a state where it is becoming usable in production. This talk will cover the BlueStore design, how it has evolved over the last year, and what challenges remain before it can become the new default storage backend.
Have you heard that all in-memory databases are equally fast but unreliable, inconsistent and expensive? This session highlights in-memory technology that busts all those myths.
Redis, the fastest database on the planet, is not a simply in-memory key-value data-store; but rather a rich in-memory data-structure engine that serves the worldâs most popular apps. Redis Labsâ unique clustering technology enables Redis to be highly reliable, keeping every data byte intact despite hundreds of cloud instance failures and dozens of complete data-center outages. It delivers full CP system characteristics at high performance. And with the latest Redis on Flash technology, Redis Labs achieves close to in-memory performance at 70% lower operational costs. Learn about the best uses of in-memory computing to accelerate everyday applications such as high volume transactions, real time analytics, IoT data ingestion and more.
Cost Effectively Run Multiple Oracle Database Copies at Scale NetApp
Â
Scaling multiple databases with a single legacy storage system works well from a cost perspective, but workload conflicts and hardware contention make these solutions an unattractive choice for anything but low-performance applications.
Yesterday's thinking may still believe NVMe (NVM Express) is in transition to a production ready solution. In this session, we will discuss how the evolution of NVMe is ready for production, the history and evolution of NVMe and the Linux stack to address where NVMe has progressed today to become the low latency, highly reliable database key value store mechanism that will drive the future of cloud expansion. Examples of protocol efficiencies and types of storage engines that are optimizing for NVMe will be discussed. Please join us for an exciting session where in-memory computing and persistence have evolved.
Ceph is unstable, vSAN got extremely poor performance. Data center need real high end distributed storage to replace traditional disk array support mission critical applications. PhegData X here raise up to answer...
DevOps and Testing slides at DASA ConnectKari Kakkonen
Â
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Â
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Dev Dives: Train smarter, not harder â active learning and UiPath LLMs for do...UiPathCommunity
Â
đĽ Speed, accuracy, and scaling â discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Miningâ˘:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing â with little to no training required
Get an exclusive demo of the new family of UiPath LLMs â GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
đ¨âđŤ Andras Palfi, Senior Product Manager, UiPath
đŠâđŤ Lenka Dulovicova, Product Program Manager, UiPath
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
Â
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Â
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Â
Clients donât know what they donât know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clientsâ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
Â
As AI technology is pushing into IT I was wondering myself, as an âinfrastructure container kubernetes guyâ, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefitâs both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
Â
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
Â
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
Â
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
Â
Ceph
1. 1
Make the Future with China!
Ceph: Open Source Storage Software Optimizations
on IntelÂŽ Architecture for Cloud Workloads
Jian Zhang â Software Engineer, Intel Corporation
DATS005
2. 2
Agenda
⢠The Problem
⢠Ceph Introduction
⢠Ceph Performance
⢠Ceph Cache Tiering and Erasure Code
⢠Intel Product Portfolio for Ceph
⢠Ceph Best Practices
⢠Summary
3. 3
Agenda
⢠The Problem
⢠Ceph Introduction
⢠Ceph Performance
⢠Ceph Cache Tiering and Erasure Code
⢠Intel Product Portfolio for Ceph
⢠Ceph Best Practices
⢠Summary
4. 4
The Problem: Data Big Bang
From 2013 to 2020, the digital universe will grow by a factor of 10, from 4.4 ZB to 44 ZB
It more than doubles every two years.
Data needs are growing at a rate unsustainable with todayâs infrastructure and labor costs
Source: IDC â The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things - April 2014
1 2 3 4 7
12
19
30
48
77
125
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
1.2
COST CHALLENGES COTINUE TO
GROW
Storage Cost structure needs
a fundamental shift
Storage
Capacity in
TB 62%
CAGR
IT
Budget
2%
CAGR
IT PROS WILL SHOULDER A
GREATER STORAGE BURDEN
230 GB
Per IT Pro
1,231 GB
Per IT Pro
5. 5
Diverse Workloads & Cost Drive Need for Distributed Storage
Traditional Workloads Todayâs Workloads
Diverse Workloads
Management
Scale on Demand
Increasing Complexity
Cost
Traditional Workloads Todayâs Trends
Storage Fabric Network Fabric
Mobile HPC Big Data &
Analytics
CloudOLTP Email ERP
Apps Apps
Challenges
CRM
6. 6
Distributed Storage
Traditional Scale-up Model
SAN
(Storage Area
Network)
Server
High Availability (Failover)
High perf workloads (e.g., database)
Enterprise Mission Critical Hybrid Cloud
Limited Scale
Costly (Cap-ex and Op-ex)
Pay as you Grow, massive on-demand scale
Cost, Performance optimized
Open and commercial solutions on x86 servers
Applicable to cloud workloads
Not a good fit for traditional high perf workloads
⢠Ceph is the most popularâ open source virtual block storage option. Also provides object, file
(experimental).
⢠Strong customer interest - several production implementations already.
Distributed Storage Model
Application Servers
Storage Nodes
Storage Client
Metadata Servers
Converged
Network
â OpenStack User Survey Insights: http://superuser.openstack.org/articles/openstack-user-survey-insights-november-2014
7. 7
Agenda
⢠The Problem
⢠Ceph Introduction
⢠Ceph Performance
⢠Ceph Cache Tiering and Erasure Code
⢠Intel Product Portfolio for Ceph
⢠Ceph Best Practices
⢠Summary
8. 8
Ceph Introduction
⢠Ceph is an open-source, massively scalable, software-defined storage system which
provides object, block and file system storage in a single platform. It runs on commodity
hardwareâsaving you costs, giving you flexibilityâand because itâs in the Linux* kernel, itâs
easy to consume.
⢠Object Store (RADOSGW)
- A bucket based REST gateway
- Compatible with S3 and swift
⢠File System (CEPH FS)
- A POSIX-compliant distributed file system
- Kernel client and FUSE
⢠Block device service (RBD)
- OpenStack* native support
- Kernel client and QEMU/KVM driver
RADOS
A software-based, reliable, autonomous, distributed
object store comprised of self-healing, self-managing,
intelligent storage nodes and lightweight monitors
LIBRADOS
A library allowing apps to directly access RADOS
RGW
A web services
gateway for
object storage
Application
RBD
A reliable, fully
distributed block
device
CephFS
A distributed file
system with
POSIX semantics
Host/VM Client
9. 9
Ceph Cluster Overview
Client Servers
Storage Servers
⢠Ceph Clients
- Block/Object/File system storage
- User space or kernel driver
⢠Peer to Peer via Ethernet
- Direct access to storage
- No centralized metadata = no
bottlenecks
⢠Ceph Storage Nodes
- Data distributed and replicated
across nodes
- No single point of failure
- Scale capacity and performance
with additional nodes
Ceph scales to 1000s of nodes
Ethernet
KVM
Application
Guest OS
Application
Guest OS
KVM
Application
Guest OS
Application
Guest OS
KVM
Application
Guest OS
Application
Guest OS
RADOS
RBD Object File
RADOS
RBD Object File
RADOS
RBD Object File
SSD SSD
OSDOSD OSD
SSD SSD
OSDOSD OSD
SSD SSD
OSDOSD OSD
SSD SSD
OSDOSD OSD
MON MON
10. 10
Object Store Daemon (OSD) Read and Write Flow
OSD
Disk
OSD OSD OSD
Disk Disk Disk
Compute
Node
RBD
RADOS
App
OSD
Disk
OSD OSD OSD
Disk Disk Disk
OSD
Disk
OSD OSD OSD
Disk Disk Disk
1
2 2
3
3
4
Server Server Server
Compute
Node
RBD
RADOS
App
1 2
WriteRead 1
2
3
4
Client app writes data,
RADOS sends data to
primary OSD
Primary OSD identifies
replica OSDs and sends
data, writes data to local
disk
Replica OSDs write data
to local disk, signal
completion to primary
Primary OSD signals
completion to client app
Client app
issues read
request, RADOS
sends request
to primary OSD
Primary OSD
reads data from
local disk and
completes read
request
1
2
12. 12
Agenda
⢠The Problem
⢠Ceph Introduction
⢠Ceph Performance
⢠Ceph Cache Tiering and Erasure Code
⢠Intel Product Portfolio for Ceph
⢠Ceph Best Practices
⢠Summary
13. 13
Ceph Block Performance â Configuration
1x10Gb NIC
Test Environment Compute Node
⢠2 nodes with IntelŽ
Xeon⢠processor
x5570 @ 2.93GHz,
128GB mem
⢠1 node with IntelXeon
processor E5 2680
@2.8GHz, 56GB mem
Storage Node
⢠IntelXeon processor
E3-1275 v2 @ 3.5
GHz
⢠32GB Memory
⢠1xSSD for OS
⢠10x 3 TB 7200rpm
⢠2x 400GB IntelŽ SSD
DC S3700
CEPH1
MON
OSD1 OSD10âŚ
CEPH2
OSD1 OSD10âŚ
CEPH3
OSD1 OSD10âŚ
CEPH4
OSD1 OSD10âŚ
KVM
VM1
FIO FIO
VM40
CLIENT 1
KVM
VM1
FIO FIO
VM40
CLIENT 2
KVM
VM1
FIO FIO
VM40
CLIENT 3
2x10Gb NIC
Note: See page #37, #38, #39 for system configuration and benchmark data
14. 14
Ceph Block Performance â Measure Raw
Performance
3 TB
7200 rpm
1 Run FIO on one HDD, collect disk IO performance
Note: Sequential 64K (Client) = Random 512K (Ceph OSD)
2 Estimate cluster performance (include replication overhead for writes â 2x in this test)
3 TB
7200 rpm3 TB
7200 rpm3 TB
7200 rpm20 HDDs
(Writes)
40 HDDs
(Reads)
70
MB/s
80
MB/s
270
IOPS
160
IOPS
Random 512K
Write Read
Random 4K
Write Read
1400
MB/s
3200
MB/s
5400
IOPS
6400
IOPS
Write Read
Random 4K
Write Read
Random 512K
Note: See page #37, #38, #39 for system configuration and benchmark data
15. 15
0 1000 2000 3000 4000 5000 6000 7000
4K Rand Read - IOPS
4K Rand Write - IOPS
64K Seq Read - MB/s
64K Seq Write - MB/s
Maximum Possible Peak Observed
Ceph Block Performance â Test Results
Drop OSD Cache
Prepare Data (dd)
Run FIO
1. 60GB Span
2. 4 IOs: Sequential
(W,R), Random (W, R)
3. 100s warm-up, 600s
test
4. RBD images â 1 to 120 Note: Random tests use Queue Depth=8, Sequential tests use Queue Depth=64
See page #39, #40, #41 for system configuration and benchmark data
CEPH Cluster Performance
88%
64%
102%
94%
Ceph performance is close to max cluster IO limit for all but random
writes â room for further optimizations
16. 16
Ceph Block Performance â Tuning effects
0 1000 2000 3000 4000 5000 6000 7000
64K Seq Write - MB/s
64K Seq Read - MB/s
4K Rand Write - IOPS
4K Rand Read - IOPS
Ceph Block Performance Tuning Impact
--compared with default ceph.conf
Throughput W/ Tunings Throughput
2.36x
1.88x
3.59x
2.00x
Best Tuning Knobs
Read ahead = 2048
I/O merge & write cache
Omap data on a separate Disk
Large pg number: 81920
Note: See page #37, #38, #39 for system configuration and benchmark data
17. 17
Ceph Object Performance â Configuration
1x10Gb NIC
Test Environment
Client Node
⢠2 nodes with IntelŽ
Xeon⢠processor
x5570 @ 2.93GHz,
24GB mem
Storage Node
⢠IntelXeon processor
E3-1280 v2 @ 3.6
GHz
⢠16 GB Memory
⢠1xSSD for OS
⢠10x1 TB 7200rpm
⢠3x 480GB IntelŽ SSD
S530
CEPH1
MON
OSD1 OSD10âŚ
CEPH2
OSD1 OSD10âŚ
CEPH3
OSD1 OSD10âŚ
CEPH4
OSD1 OSD10âŚ
KVM
VM1
FIO FIO
VM20
CLIENT 1
KVM
VM1
FIO FIO
VM20
CLIENT 2
2x10Gb NIC
Note: See page #40, #41 for system configuration and benchmark data
Rados Gateway
(RGW)
Rados Gateway
⢠1 nodes with Intel
Xeon processor
E5-2670 @
2.6GHz, 64GB mem
18. 18
Ceph Object Performance â Test Results
Prepare the Data
Run COSBench
1. 100 containers x 100
objects each
2. 4 IOs: 128K Read/Write;
10M Read/write
3. 100s warm-up, 600s
test
4. COSBench workers â 1
to 2048
Note: COSBench is an Intel development open source cloud object storage benchmark
https://github.com/intel-cloud/cosbench
Note: See page #42, #43 for system configuration and benchmark data
CEPH Cluster Performance
Ceph Object performance is close to max cluster IO limit
#con x #
obj
Object-
Size
RW-Mode
Worker-
Count
Avg-
ResTime
95%-
ResTime
Throughput Bandwidth Bottleneck
-- -- -- ms ms op/s MB/s --
100x100
128KB
Read 80 10 20 7,951 971 RGW CPU
Write 320 143 340 2,243 274 OSD CPU
10MB
Read 160 1,365 3,870 117 1,118 RGW NIC
Write 160 3,819 6,530 42 397 OSD NIC
19. 19
Agenda
⢠The Problem
⢠Ceph Introduction
⢠Ceph Performance
⢠Ceph Cache Tiering and Erasure Code
⢠Intel Product Portfolio for Ceph
⢠Ceph Best Practices
⢠Summary
20. 20
Ceph Cache Tiering
⢠An important feature towards Ceph
enterprise readiness
- Cost-effective
- High performance tier W/ more SSDs
- Use a pool of fast storage devices (Typically
SSDs) and use it as a cache for an existing
larger pool
ď§ E.g., Reads would first check the cache pool for
a copy of the object, and then fall through to
the existing pool if there is a miss
⢠Cache tiering mode
- Read only
- Write back
Application
Ceph Storage Cluster
Cache Pool
(Replicated)
Backing Pool
(Erasure Coded)
21. 21
Ceph Cache Tiering Optimization â Proxy
Read/write
Proxy the read/write operations while the object is missing in cache tier, promotion
in background
⢠3.8x and 3.6x performance improvement respectively with proxy-write and proxy-read
optimization
Promotion logic
Cache tier
Base tier
Client
â
⥠â˘
âŁ
Promotion
logic
Cache tier
Base tier
Client
â
⥠â˘
âŁ
Proxy
logic
Current design Proxy design
0
2000
4000
6000
8000
0
500
1000
1500
2000
RR RW
Latency(ms)
IOPS
W/ CT IOPS W/ CT optimized IOPS
W/ CT Latency W/ CT optimized Latency
Cache Tiering optimization
4K random write
Proxy-read and write significantly improved cache tiering performance
Note: See page #37, #38, #39 for system configuration and benchmark data
22. 22
Ceph Erasure Coding
Ceph Storage Cluster
Replicated Pool
COPY
COPY COPY
Object
Ceph Storage Cluster
Erasure Coded Pool
1
Object
2 3 4 X Y
Full Copies of stored objects
⢠Very high durability
⢠3x (200% overhead)
⢠Quicker recovery
One Copy plus parity
⢠Cost-effective durability
⢠1.5X (50% store overhead)
⢠Expensive recovery
23. 23
Ceph EC optimization â I/O hint
⢠ISA-L EC library merged to Firefly
⢠EC performance
- Acceptable performance impact
ď§ <10% degradation for 10M objects large scale read/write tests
- But: Compared with 3x replica, we now tolerate 40%
object loss with 1.6x space
⢠Rados I/O hint
- Provide a hint to the storage system to classify the
operate type brings differentiated storage services
- Balance throughput and cost, and boost Storage
performance
- With rados I/O hint optimization, we can get even higher
throughput compared W/O EC!
Client Node
RADOS Node
Hypervisor
Guest
VM
Qemu/Virtio
Application
RBD
RADOS
RADOS Protocol
OSD
FileStore NewStoreâŚ
SSD
Network
Hint
MemStore KVStore
Hint engine
35% performance improvement for EC write with Rados I/O Hint
SSD SSD SSD SSD
Hint engine Hint engine Hint engine
I/O hint policy engine
Hint
Note: See page #40, #41 for system configuration and benchmark data
24. 24
Agenda
⢠The Problem
⢠Ceph Introduction
⢠Ceph Performance
⢠Ceph Cache Tiering and Erasure Code
⢠Intel Product Portfolio for Ceph
⢠Ceph Best Practices
⢠Summary
25. 25
How is Intel helping?
⢠Deliver Storage workload optimized products and technologies
- Optimize for Silicon, Flash* and Networking technologies
⢠Working with the community to optimize Ceph on IA
- Make Ceph run best on IntelÂŽ Architecture (IA) based platforms (performance,
reliability and management)
⢠Publish IA Reference Architectures
- Share best practices with the ecosystem and community
26. 26
CPU
Platforms
NVM
Networking
Server
Boards &
Systems
Software
Libraries
Software
Products
Opensource
Ecosystem
Intelâs Product Portfolio for Ceph
IntelÂŽ Cache Acceleration
Software (IntelÂŽ CAS)
⢠OEMs / ODMs
⢠ISVs
IntelÂŽ Storage Acceleration
Library (IntelÂŽ ISA-L)
⢠Virtual Storage Manager
(VSM)
⢠Ceph contributions: No.2
⢠Cephperfâ
⢠Reference Architectures
Solution focus with IntelÂŽ platform and software ingredients. Deep collaboration
with Red Hat* and Inktank* (by Red Hat) to deliver enterprise ready Ceph solutions.
â Cephperf to be open sourced in Q2â15
27. 27
VSMâ Ceph Simplified
Home page:
https://01.org/virtual-storage-manager
Code Repository:
https://github.com/01org/virtual-storage-
manager
Issue Tracking:
https://01.org/jira/browse/VSM
Mailing list:
http://vsm-discuss.33411.n7.nabble.com/
VSM (Virtual Storage Manager) - An open source Ceph management tool developed by
Intel, and announced on 2014 Novâs OpenStack* Paris summit, designed to help make
day to day management of Ceph easier for storage administrators.
28. 28
Agenda
⢠The Problem
⢠Ceph Introduction
⢠Ceph Performance
⢠Ceph Cache Tiering and Erasure Code
⢠Intel Product Portfolio for Ceph
⢠Ceph Best Practices
⢠Summary
29. 29
Mon
Caching
Compute Node
RADOS
Node
RADOS
Protocol
RADOS Protocol
OSD
Journal Filestore
SSD
File System
Network
SSD
MON
Ceph â Best Deployment Practices
Note: Refer to backup for detailed tuning recommendations.
⢠XFS, deadline scheduler
⢠Tune read_ahead_kb for
sequential I/O read
⢠Queue Depth â 64 for
sequential, 8 for random
Jumbo Frames for 10Gbps
Use 10x of default queue
params
⢠In small cluster, can co-
locate with OSDs with
4GB-6GB
⢠Beyond 100 OSDs (100
HDDs or 10 nodes),
deploy monitors on
separate nodes
⢠3 monitors for < 200
nodes
⢠One OSD process per disk
⢠Approximately 1GHz IntelŽ
Xeonâ˘-class CPU per OSD
⢠1GB memory per OSD
30. 30
Agenda
⢠The Problem
⢠Ceph Introduction
⢠Ceph Performance
⢠Ceph Cache Tiering and Erasure Code
⢠Intel Product Portfolio for Ceph
⢠Ceph Best Practices
⢠Summary
31. 31
Summary
⢠Cloud workloads and cost is driving the need for distributed storage
solutions
⢠Strong customer interest and lots of production implementations in
Ceph
⢠Intel is optimizing CEPH for IntelŽ Architecture
32. 32
Next Steps
⢠Take advantage of Intel software optimizations and reference
architectures for your production deployments
- Pilot âcephperfâ in Q2â15 and give us feedback http://01.org/cephperf
⢠Engage with open source communities for delivering enterprise features
⢠Innovate storage offerings using value added features with Ceph
33. 33
Additional Sources of Information
⢠A PDF of this presentation is available from our Technical Session
Catalog: www.intel.com/idfsessionsSZ. This URL is also printed on the
top of Session Agenda Pages in the Pocket Guide.
⢠More web based info: http://ceph.com
⢠IntelŽ Solutions Reference Architectures www.intel.com/storage
⢠IntelŽ Storage Acceleration Library (Open Source Version) -
https://01.org/intel%C2%AE-storage-acceleration-library-open-source-
version
34. 34
Legal Notices and Disclaimers
Intel technologiesâ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com,
or from the OEM or retailer.
No computer system can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual
performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and
benchmark results, visit http://www.intel.com/performance.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future
costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice.
Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
Statements in this document that refer to Intelâs plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a
number of risks and uncertainties. A detailed discussion of the factors that could affect Intelâs results and plans is included in Intelâs SEC filings, including the annual
report on Form 10-K.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current
characterized errata are available on request.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether
referenced data are accurate.
Intel, Xeon and the Intel logo are trademarks of Intel Corporation in the United States and other countries.
*Other names and brands may be claimed as the property of others.
Š 2015 Intel Corporation.
35. 35
Risk Factors
The above statements and any others in this document that refer to plans and expectations for the first quarter, the year and the future are forward-
looking statements that involve a number of risks and uncertainties. Words such as "anticipates," "expects," "intends," "plans," "believes," "seeks,"
"estimates," "may," "will," "should" and their variations identify forward-looking statements. Statements that refer to or are based on projections,
uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel's actual results, and variances from Intel's
current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements.
Intel presently considers the following to be important factors that could cause actual results to differ materially from the company's expectations.
Demand for Intelâs products is highly variable and could differ from expectations due to factors including changes in the business and economic
conditions; consumer confidence or income levels; customer acceptance of Intelâs and competitorsâ products; competitive and pricing pressures,
including actions taken by competitors; supply constraints and other disruptions affecting customers; changes in customer order patterns including
order cancellations; and changes in the level of inventory at customers. Intelâs gross margin percentage could vary significantly from expectations based
on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue
levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; excess or obsolete inventory; changes in unit
costs; defects or disruptions in the supply of materials or resources; and product manufacturing quality/yields. Variations in gross margin may also be
caused by the timing of Intel product introductions and related expenses, including marketing expenses, and Intelâs ability to respond quickly to
technological developments and to introduce new features into existing products, which may result in restructuring and asset impairment charges.
Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its
suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in
currency exchange rates. Results may also be affected by the formal or informal imposition by countries of new or revised export and/or import and
doing-business regulations, which could be changed without prior notice. Intel operates in highly competitive industries and its operations have high
costs that are either fixed or difficult to reduce in the short term. The amount, timing and execution of Intelâs stock repurchase program and dividend
program could be affected by changes in Intelâs priorities for the use of cash, such as operational spending, capital spending, acquisitions, and as a result
of changes to Intelâs cash flows and changes in tax laws. Product defects or errata (deviations from published specifications) may adversely impact our
expenses, revenues and reputation. Intelâs results could be affected by litigation or regulatory matters involving intellectual property, stockholder,
consumer, antitrust, disclosure and other issues. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from
manufacturing or selling one or more products, precluding particular business practices, impacting Intelâs ability to design its products, or requiring
other remedies such as compulsory licensing of intellectual property. Intelâs results may be affected by the timing of closing of acquisitions, divestitures
and other significant transactions. A detailed discussion of these and other factors that could affect Intelâs results is included in Intelâs SEC filings,
including the companyâs most recent reports on Form 10-Q, Form 10-K and earnings release.
Rev. 1/15/15
37. 37
Block Test Environment - Configuration Details
Client Nodes
CPU 2 x Intel ŽXeon⢠x5570 @ 2.93GHz (4-core, 8 threads) (Qty: 2)
2x Intel Xeon E5 2680 @2.8GHz (16-core, 32 threads) (Qty: 1)
Memory 128 GB (8GB * 16 DDR3 1333 MHZ) or 56GB (8GB * 7) for E5 server
NIC 10Gb 82599EB
Disks 1 HDD for OS
Client VM
CPU 1 X VCPU VCPUPIN
Memory 512 MB
Ceph Nodes
CPU 1 x Intel Xeon E3-1275 V2 @ 3.5 GHz (4-core, 8
threads)
Memory 32 GB (4 x 8GB DDR3 @ 1600 MHz)
NIC 2 X 82599 10GbE
HBA/C204 {SAS2008 PCI Express* Fusion-MPT SAS-2} / {6
Series/C200 Series Chipset Family SATA AHCI
Controller}
Disks 1 x SSDSA2SH064G1GC 2.5ââ 64GB for OS
2 x Intel SSDSC2BA40 400 GB SSD (Journal)
10 x Seagate* ST3000NM0033-9ZM 3.5ââ 3TB 7200rpm
SATA HDD (Data)
Ceph cluster
OS CentOS 6.5
Kernel 2.6.32-431
Ceph 0.61.2 built from source
Client host
OS Ubuntu* 12.10
Kernel 3.6.3
Client VM
OS Ubuntu 12.10
Kernel 3.5.0-17
⢠XFS as file system for Data Disk
⢠Each Data Disk (SATA HDD) was parted into 1
partition for OSD daemon
⢠Default replication setting (2 replicas), 7872 pgs.
⢠Tunings
- Set read_ahead_kb=2048
- MTU= 8000
⢠Change i/o scheduler to [deadline]:
# Echo deadline >/sys/block/[dev]/queue/scheduler
39. 39
Testing Methodology
Storage interface
Use QemuRBD as storage interface
Tool
⢠Use âddâ to prepare data for R/W tests
⢠Use fio (ioengine=libaio, direct=1) to generate 4 IO
patterns: sequential write/read, random write/read
⢠Access Span: 60GB
⢠For capping tests, Seq Read/Write (60MB/s), and
Rand Read/Write (100 ops/s)
⢠QoS Compliance:
- For random 4k read/write cases: latency <= 20ms
- For sequential 64K read/write cases: BW >= 54 MB/s
Run rules
⢠Drop osds page caches ( â1â > /proc/sys/vm/drop_caches)
⢠100 secs for warm up, 600 secs for data collection
⢠Run 4KB/64KB tests under different # of rbds (1 to 120)
Space allocation (per node)
⢠Data Drive:
- Sits on 10x 3TB HDD drives
- So 4800GB/40 * 2 = 240GB data space will be used on
each Data disk at 80 VMs.
⢠Journal:
- Sits on 2x 400GB SSD drives
- One journal partition per data drive, 10GB
40. 40
Object Test Environment - Configuration Details
Client & RGW
CPU Client: 2 x Intel ŽXeon⢠x5570 @ 2.93GHz (4-core,
8 threads) (Qty: 2)
GRW: 2x Intel Xeon E5 2670@2.6GHz (16-core,
32 threads) (Qty: 1)
Memory 128 GB (8GB * 16 DDR3 1333 MHZ) or 56GB (8GB *
7) for E5 server
NIC 10Gb 82599EB
Disks 1 HDD for OS
Ceph OSD Nodes
CPU 1 x Intel Xeon E3-1280 V2 @ 3.6 GHz (4-core, 8
threads)
Memory 32 GB (4 x 8GB DDR3 @ 1600 MHz)
NIC 2 X 82599 10GbE
HBA/C204 {SAS2308 PCI Express* Fusion-MPT SAS-2} / {6
Series/C200 Series Chipset Family SATA AHCI
Controller}
Disks 1 x SSDSA2SH064G1GC 2.5ââ 64GB for OS
3 x Intel SSDSC2CW480A3 480 GB SSD (Journal)
10 x Seagate* ST1000NM0011 3.5ââ 1TB 7200rpm
SATA HDD (Data)
Ceph cluster
OS Ubuntu 14.04
Kernel 3.13.0
Ceph 0.61.8 built from source
Client host
OS Ubuntu* 12.10
Kernel 3.6.3
⢠XFS as file system for Data Disk
⢠Each Data Disk (SATA HDD) was parted into 1
partition for OSD daemon
⢠Default replication setting (3 replicas), 12416 pgs.
⢠Tunings
- Set read_ahead_kb=2048
- MTU= 8000