Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
Together with my colleagues at Red Hat Storage Team, i am very proud to have worked on this reference architecture for Ceph Object Storage.
If you are building Ceph object storage at scale, this document is for you.
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
Ceph is designed around the assumption that all components of the system (disks, hosts, networks) can fail, and has traditionally leveraged replication to provide data durability and reliability. The CRUSH placement algorithm is used to allow failure domains to be defined across hosts, racks, rows, or datacenters, depending on the deployment scale and requirements.
Recent releases have added support for erasure coding, which can provide much higher data durability and lower storage overheads. However, in practice erasure codes have different performance characteristics than traditional replication and, under some workloads, come at some expense. At the same time, we have introduced a storage tiering infrastructure and cache pools that allow alternate hardware backends (like high-end flash) to be leveraged for active data sets while cold data are transparently migrated to slower backends. The combination of these two features enables a surprisingly broad range of new applications and deployment configurations.
This talk will cover a few Ceph fundamentals, discuss the new tiering and erasure coding features, and then discuss a variety of ways that the new capabilities can be leveraged.
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
Together with my colleagues at Red Hat Storage Team, i am very proud to have worked on this reference architecture for Ceph Object Storage.
If you are building Ceph object storage at scale, this document is for you.
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
Ceph is designed around the assumption that all components of the system (disks, hosts, networks) can fail, and has traditionally leveraged replication to provide data durability and reliability. The CRUSH placement algorithm is used to allow failure domains to be defined across hosts, racks, rows, or datacenters, depending on the deployment scale and requirements.
Recent releases have added support for erasure coding, which can provide much higher data durability and lower storage overheads. However, in practice erasure codes have different performance characteristics than traditional replication and, under some workloads, come at some expense. At the same time, we have introduced a storage tiering infrastructure and cache pools that allow alternate hardware backends (like high-end flash) to be leveraged for active data sets while cold data are transparently migrated to slower backends. The combination of these two features enables a surprisingly broad range of new applications and deployment configurations.
This talk will cover a few Ceph fundamentals, discuss the new tiering and erasure coding features, and then discuss a variety of ways that the new capabilities can be leveraged.
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Sean Cohen
Starting from the basics, we explore the advantages of using Rook as a Storage operator to serve Ceph storage, the leading Software-Defined Storage platform in the Open Source world. Ceph automates the internal storage management, while Rook automates the user-facing operations and effectively turns a storage technology into a service transparent to the user. The combination delivers an impressive improvement in UX and provides the ideal storage platform for Kubernetes.
A comprehensive examination of use cases and open problems will complement our review of the Rook architecture. We will deep-dive into what Rook does well, what it does not do (yet), and what trade-offs using a storage operator involves operationally. With live access to a running cluster, we will showcase Rook in action as we discuss its capabilities.
https://www.openstack.org/summit/denver-2019/summit-schedule/events/23515/storage-101-rook-and-ceph
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
This presentation covers the basics about OpenvSwitch and its components. OpenvSwitch is a Open Source implementation of OpenFlow by the Nicira team.
It also also talks about OpenvSwitch and its role in OpenStack Networking
KVM High Availability Regardless of Storage - Gabriel Brascher, VP of Apache ...ShapeBlue
Having High Availability enabled for KVM Hosts can improve greatly the QoS by handling (fence/recover) a problematic Host as well as re-starting its stopped VMs on healthy hosts. However, there is a limitation on CloudStack HA for KVM; it relies mainly on NFS heartbeat script checks. This Talk illustrates how CloudStack HA works for KVM hosts and it presents a way of improving its implementation in a way that KVM HA works with any storage system pluggable on KVM, not just NFS.
About Gabriel Brasher - https://blogs.apache.org/cloudstack/
------------------------------------------
CloudStack European User Group Virtual happened on May 27th. The first CSEUG Virtual proved to be a huge success. It collected people from 23 countries – Germany, the United Kingdom, Switzerland, India, Bulgaria, Greece, Poland, Serbia, Brazil, Chile, Russia, USA, Canada, Japan, France, Uruguay, Korea …
We also had a record number of registrations and attendees for a CloudStack User Group Event. The physical distance was not a stopper for our speakers, who joined the event from 6 different countries.
------------------------------------------
About CloudStack: https://cloudstack.apache.org/
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingJoe Huang
The slides used in the speech "Building multi-site and multi-openstack cloud with OpenStack cascading" in OpenStack Paris summit 2014. The slides cover the requirement and driving forces, case study of VDF, technologies eloboration and demo of OpenStack cascading.
What CloudStackers Need To Know About LINSTOR/DRBDShapeBlue
Philipp explains the best performing Open Source software-defined storage software available to Apache CloudStack today. It consists of two well-concerted components. LINSTOR and DRBD. Each of them also has its independent use cases, where it is deployed alone. In this presentation, the combination of these two is examined. They form the control plane and the data plane of the SDS. We will touch on: Performance, scalability, hyper-convergence (data-locality for high IO performance), resiliency through data replication (synchronous within a site, 2-way, 3-way, or more), snapshots, backup (to S3), encryption at rest, deduplication, compression, placement policies (regarding failure domains), management CLI and webGUI, monitoring interface, self-healing (restoring redundancy after device/node failure), the federation of multiple sites (async mirroring and repeatedly snapshot difference shipping), QoS control (noisy neighbors limitation) and of course: complete integration with CloudStack for KVM guests. It is Open Source software following the Unix philosophy. Each component solves one task, made for maximal re-usability. The solution leverages the Linux kernel, LVM and/or ZFS, and many Open Source software libraries. Building on these giant Open Source foundations, not only saves LINBIT from re-inventing the wheels, it also empowers your day 2 operation teams since they are already familiar with these technologies.
Philipp Reisner is one of the founders and CEO of LINBIT in Vienna/Austria. He holds a Dipl.-Ing. (comparable to MSc) degree in computer science from Technical University in Vienna. His professional career has been dominated by developing DRBD, a storage replication software for Linux. While in the early years (2001) this was writing kernel code, today he leads a company of 30 employees with locations in Austria and the USA. LINBIT is an Open Source company offering enterprise-level support subscriptions for its Open Source technologies.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
[Open Infrastructure & Cloud Native Days Korea 2019]
커뮤니티 버전의 OpenStack 과 Ceph를 활용하여 대고객서비스를 구축한 사례를 공유합니다. 유연성을 확보한 기업용 클라우드 서비스 구축 사례와 높은 수준의 보안을 요구하는 거래소 서비스를 구축, 운영한 사례를 소개합니다. 또한 이 프로젝트에 사용된 기술 스택 및 장애 해결사례와 최적화 방안을 소개합니다. 오픈스택은 역시 오픈소스컨설팅입니다.
#openstack #ceph #openinfraday #cloudnative #opensourceconsulting
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...HostedbyConfluent
Active-Active, Active-Passive, and stretch clusters are hallmark patterns that have been the gold standard in Apache Kafka® disaster recovery architectures for years. Moving to Kubernetes requires unpacking these patterns and choosing a configuration that allows you to meet the same RTO and RPO requirements.
In this talk, we will cover how Active-Active/Active-Passive modes for disaster recovery have worked in the past and how the architecture evolves with deploying Apache Kafka on Kubernetes. We'll also look at how stretch clusters sitting on this architecture give a disaster recovery solution that's built-in!
Armed with this information, you will be able to architect your new Apache Kafka Kubernetes deployment (or retool your existing one) to achieve the resilience you require.
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Sean Cohen
Starting from the basics, we explore the advantages of using Rook as a Storage operator to serve Ceph storage, the leading Software-Defined Storage platform in the Open Source world. Ceph automates the internal storage management, while Rook automates the user-facing operations and effectively turns a storage technology into a service transparent to the user. The combination delivers an impressive improvement in UX and provides the ideal storage platform for Kubernetes.
A comprehensive examination of use cases and open problems will complement our review of the Rook architecture. We will deep-dive into what Rook does well, what it does not do (yet), and what trade-offs using a storage operator involves operationally. With live access to a running cluster, we will showcase Rook in action as we discuss its capabilities.
https://www.openstack.org/summit/denver-2019/summit-schedule/events/23515/storage-101-rook-and-ceph
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
This presentation covers the basics about OpenvSwitch and its components. OpenvSwitch is a Open Source implementation of OpenFlow by the Nicira team.
It also also talks about OpenvSwitch and its role in OpenStack Networking
KVM High Availability Regardless of Storage - Gabriel Brascher, VP of Apache ...ShapeBlue
Having High Availability enabled for KVM Hosts can improve greatly the QoS by handling (fence/recover) a problematic Host as well as re-starting its stopped VMs on healthy hosts. However, there is a limitation on CloudStack HA for KVM; it relies mainly on NFS heartbeat script checks. This Talk illustrates how CloudStack HA works for KVM hosts and it presents a way of improving its implementation in a way that KVM HA works with any storage system pluggable on KVM, not just NFS.
About Gabriel Brasher - https://blogs.apache.org/cloudstack/
------------------------------------------
CloudStack European User Group Virtual happened on May 27th. The first CSEUG Virtual proved to be a huge success. It collected people from 23 countries – Germany, the United Kingdom, Switzerland, India, Bulgaria, Greece, Poland, Serbia, Brazil, Chile, Russia, USA, Canada, Japan, France, Uruguay, Korea …
We also had a record number of registrations and attendees for a CloudStack User Group Event. The physical distance was not a stopper for our speakers, who joined the event from 6 different countries.
------------------------------------------
About CloudStack: https://cloudstack.apache.org/
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingJoe Huang
The slides used in the speech "Building multi-site and multi-openstack cloud with OpenStack cascading" in OpenStack Paris summit 2014. The slides cover the requirement and driving forces, case study of VDF, technologies eloboration and demo of OpenStack cascading.
What CloudStackers Need To Know About LINSTOR/DRBDShapeBlue
Philipp explains the best performing Open Source software-defined storage software available to Apache CloudStack today. It consists of two well-concerted components. LINSTOR and DRBD. Each of them also has its independent use cases, where it is deployed alone. In this presentation, the combination of these two is examined. They form the control plane and the data plane of the SDS. We will touch on: Performance, scalability, hyper-convergence (data-locality for high IO performance), resiliency through data replication (synchronous within a site, 2-way, 3-way, or more), snapshots, backup (to S3), encryption at rest, deduplication, compression, placement policies (regarding failure domains), management CLI and webGUI, monitoring interface, self-healing (restoring redundancy after device/node failure), the federation of multiple sites (async mirroring and repeatedly snapshot difference shipping), QoS control (noisy neighbors limitation) and of course: complete integration with CloudStack for KVM guests. It is Open Source software following the Unix philosophy. Each component solves one task, made for maximal re-usability. The solution leverages the Linux kernel, LVM and/or ZFS, and many Open Source software libraries. Building on these giant Open Source foundations, not only saves LINBIT from re-inventing the wheels, it also empowers your day 2 operation teams since they are already familiar with these technologies.
Philipp Reisner is one of the founders and CEO of LINBIT in Vienna/Austria. He holds a Dipl.-Ing. (comparable to MSc) degree in computer science from Technical University in Vienna. His professional career has been dominated by developing DRBD, a storage replication software for Linux. While in the early years (2001) this was writing kernel code, today he leads a company of 30 employees with locations in Austria and the USA. LINBIT is an Open Source company offering enterprise-level support subscriptions for its Open Source technologies.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
[Open Infrastructure & Cloud Native Days Korea 2019]
커뮤니티 버전의 OpenStack 과 Ceph를 활용하여 대고객서비스를 구축한 사례를 공유합니다. 유연성을 확보한 기업용 클라우드 서비스 구축 사례와 높은 수준의 보안을 요구하는 거래소 서비스를 구축, 운영한 사례를 소개합니다. 또한 이 프로젝트에 사용된 기술 스택 및 장애 해결사례와 최적화 방안을 소개합니다. 오픈스택은 역시 오픈소스컨설팅입니다.
#openstack #ceph #openinfraday #cloudnative #opensourceconsulting
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...HostedbyConfluent
Active-Active, Active-Passive, and stretch clusters are hallmark patterns that have been the gold standard in Apache Kafka® disaster recovery architectures for years. Moving to Kubernetes requires unpacking these patterns and choosing a configuration that allows you to meet the same RTO and RPO requirements.
In this talk, we will cover how Active-Active/Active-Passive modes for disaster recovery have worked in the past and how the architecture evolves with deploying Apache Kafka on Kubernetes. We'll also look at how stretch clusters sitting on this architecture give a disaster recovery solution that's built-in!
Armed with this information, you will be able to architect your new Apache Kafka Kubernetes deployment (or retool your existing one) to achieve the resilience you require.
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...Tobias Schneck
Have you ever thought about migrating your Kubernetes clusters to Google Cloud to get your services closer to your customers? Yes? We too! Join us on an interactive journey to discover the main challenges of live migration at scale of etcd's, traffic routing and application workloads from your on-premise platform to GCP. The talk will discuss the current state of the technical concept, known problems and insides of the already proven migration steps for stateless workload.
As part of the journey, we'll see the differences between migrating one or one hundred clusters with productive workloads; What parts can be automated? What steps may need to be manual? Let's see how an automated solution could look like in the future and what steps are missing.
How to Migrate 100 Clusters from On-Prem to Google Cloud Without Downtimeloodse
Have you ever thought about migrating your Kubernetes clusters to Google Cloud to get your services closer to your customers? Yes? Us too! Join us on an interactive journey to discover the main challenges of live migration at scale of etcd’s, traffic routing and application workloads from your on-premise platform to GCP. The talk will discuss the current state of the technical concept, known problems and insides of the already proven migration steps for stateless workloads.
As part of the journey, we'll see
- The differences between migrating one or one hundred clusters with productive workloads
- What parts can be automated?
- What steps may need to be done manually?
Presentation delivered at LinuxCon China 2017
Real-Time is used for deadline-oriented applications and time-sensitive workloads. Real-Time KVM is the extension of KVM(Linux Kernel-based Virtual Machine) to allow the virtual machines(VM) to be a truly Real-Time operating system.Users sometimes need to run low-latency applications(such as audio/video streaming, highly interactive systems, etc) to meet their requirements in clouds. NFV is a new network concept which uses virtualization and software instead of dedicated network appliances. For some use cases of telecommunications, network latency must be within a certain range of values. Real-Time KVM can help NFV meet this requirements.
In this presentation, Pei Zhang will talk about:
(1)Real-Time KVM introduction
(2)Real-Time cloud building
(3)Real-Time KVM in NFV: VM with openvswitch, dpdk and qemu’s vhostuser
(4)Performance testing results show
Slides at OpenStack Summit 2017 Sydney
Session Info and Video: https://www.openstack.org/videos/sydney-2017/100gbps-openstack-for-providing-high-performance-nfv
This talk will show how to build your own simple, cheap and scalable CGN solutions with stateful-failover with commodity servers with a decent NIC running Linux, nftables, and bird.
We were in need to introduce NAT into the network and a commercial solution would have required a 6 figure invest, so we build it ourselves for <10% of that cost.
Two Dell servers with a recent CPU, two Mellanox NICs and nftables as well as bird do the trick and make for a simple, cheap and scalable CGN box, supporting ECMP, simple draining and orchestration by your usual Linux tool chain as well as stateful-failover.
Video at: https://www.youtube.com/watch?v=qHsHkjhGibA
Many companies build new-age KVM clouds, only to find out that their applications & workloads do not perform well. In this talk we’ll show you how to get the most out of your KVM cloud and how to optimize it for performance: You’ll understand why performance matters and how to measure it properly. We’ll teach you how to optimize CPU and memory for ultimate performance and how to tune the storage layer for performance. You’ll find out what are the main components of an efficient new-age cloud and which network components work best. In addition, you’ll learn how to select the right hardware to achieve unmatched performance for your new-age cloud and applications.
Venko Moyankov is an experienced system administrator and solutions architect at StorPool storage. He has experience with managing large virtualizations, working in telcos, designing and supporting the infrastructure of large enterprises. In the last year, his focus has been in helping companies globally to build the best storage solution according to their needs and projects.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Unrevealed Story Behind Viettel Network Cloud Hotpot | Đặng Văn Đại, Hà Mạnh Đông, Nguyễn Tuấn Anh
1. Unrevealed Story
behind Biggest Cloud at Vietnam
Đặng Văn Đại @daikk115 Viettel Networks
Hà Mạnh Đông @donghm Viettel Networks
Nguyễn Tuấn Anh @anhnt425 Viettel Networks
1
11. Mixing Compute Resources
Dealing with different CPU models
❖ CPU model we have: SandyBridge,
SandyBridge-IBRS, Broadwell, Skylake,...
❖ OpenStack configurations
11
12. Mixing Compute Resources
Dealing with different CPU models
❖ Same computes should have same BIOS and
firmware version
❖ Check flags and cpu model mapping:
/usr/share/libvirt/cpu_map.xml
12
17. Mixing Storage Resources
Schedule VMs to different storage backends
❖ In fact, not all our compute nodes have HBA
to connect SAN storage
❖ Use Host Aggregates to determine which
compute have HBA or not
➢ Metadata: hba=false or hba=true
❖ Flavor properties
➢ aggregate_instance_extra_specs:hba=true
➢ aggregate_instance_extra_specs:hba=false
17
18. 18
Ceph
❖ One of our Ceph cluster with
difference OSD type, OSD size, need
to be adjusted:
➢ CRUSH rules
➢ OSD weight
19. 19
Ceph
❖ None-raid mode for osd
❖ Three network: public-net, replica-net and
monitoring-net; all are bonding mode 4
(802.3ad)
❖ With HDD OSD: using SSD for rocksdb + wal
➢ Create SSD partition (LVM)
➢ 40GB SSD per HDD OSD (RockDB level 1 only)
20. 20
SAN
❖ Using FC (Fibre Channel)
❖ Fabric mode (Storage controller <-> Switch
<-> Server)
❖ Redundancy: multipath
➢ Update HBA (qlogic/emulex/...) config to
fit with all SAN storage
➢ Update multipath.conf for multiple SAN
storage if need
22. Tuning Sensitive Points in OpenStack
HAProxy
❖ By default, HAProxy will not accept over 2000
established connection for one backend per
thread
❖ In our case, when HAProxy not accept more
connection to MariaDB backend, then most of
services which are connecting to MariaDB
over HAProxy went down
22
23. Tuning Sensitive Points in OpenStack
HAProxy
❖ Increasing threshold
➢ Increase maxconn
➢ Use multi-thread for HAProxy
23
24. Tuning Sensitive Points in OpenStack
HAProxy
❖ Increasing timeout for cinder-api when
creating multiple VMs simultaneously on SAN.
❖ Regarding to timeout: RPC timeout for Nova
and Cinder also need to be increased
24
25. Tuning Sensitive Points in OpenStack
RabbitMQ
❖ Some services of OpenStack published messages
to queues that does not have any consumer
➢ Need to set TTL for them
❖ OpenStack Ceilometer with notification bus
can sent too many connections to RabbitMQ
➢ Separating RabbitMQ cluster for Ceilometer
➢ Or destroy Ceilometer and all Telemetry
services as we done
25
27. Tuning Sensitive Points in OpenStack
RabbitMQ
❖ Delete all messages in queues if need
$ rabbitmqctl purge_queue notifications.error -p "/"
$ rabbitmqctl purge_queue versioned_notifications.error -p "/"
$ rabbitmqctl purge_queue versioned_notifications.info -p "/"
27
28. 28
Thank You for Listening!
Any question?
Be Part of Our Story!
Đặng Văn Đại @daikk115 Viettel Networks
Hà Mạnh Đông @donghm Viettel Networks
Nguyễn Tuấn Anh @anhnt425 Viettel Networks
30. Live migration problems
❖ 100% CPU usage, virtual machine hangs
➢ Rarely
➢ Not fix yet
30
Bugs
VM have 30 vCPUs
31. Live migration problems
❖ VM live migration abort due to high memory usage
➢ Sometimes
➢ Solution
■ Force live migrate
● nova live-migration-force-complete <instance_id> <migration_id>
■ Slow down CPU
● set live_migration_permit_auto_converge=true in nova.conf
31
Bugs
Refer:
https://docs.openstack.org/nova/pike/admin/live-migration-usage.html
32. Live migration problems
❖ Unacceptable CPU info: CPU doesn't have compatibility
➢ Sometimes
➢ Solution
■ cpu_mode = custom in nova.conf
■ Use host aggregates to schedule VM into compatible compute node
32
Bugs