Corosync and Pacemaker
A computer cluster consists of a set of loosely connected or tightly connected computers that work together so that in many respects they can be viewed as a single system.
The components of a cluster are usually connected to each other through fast local area networks ("LAN"), with each node (computer used as a server) running its own instance of an operating system. Computer clusters emerged as a result of convergence of a number of computing trends including the availability of low cost microprocessors, high speed networks, and software for high performance distributed computing.
Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.
Computer clusters have a wide range of applicability and deployment, ranging from small business clusters with a handful of nodes to some of the fastest supercomputers in the world such as IBM's Sequoia
XPDS13: VIRTUAL DISK INTEGRITY IN REAL TIME JP BLAKE, ASSURED INFORMATION SE...The Linux Foundation
This paper introduces the Virtual Disk Integrity in Real Time (vDIRT) monitor, a mechanism to measure virtual hard disks in real time from the Dom0 trusted computing base. vDIRT is an improvement over traditional methods for auditing file integrity which rely on a service in a potentially compromised host. It also overcomes the limitations of existing methods for assuring disk integrity that are coarse grained and do not scale to large disks. vDIRT is a capability to measure disk reads and writes in real time, allowing for fine grained tracking of sectors within files, as well as the overall disk. The vDIRT implementation and its impact on performance is discussed to show that disk operation monitoring from Dom0 is practical.
Corosync and Pacemaker
A computer cluster consists of a set of loosely connected or tightly connected computers that work together so that in many respects they can be viewed as a single system.
The components of a cluster are usually connected to each other through fast local area networks ("LAN"), with each node (computer used as a server) running its own instance of an operating system. Computer clusters emerged as a result of convergence of a number of computing trends including the availability of low cost microprocessors, high speed networks, and software for high performance distributed computing.
Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.
Computer clusters have a wide range of applicability and deployment, ranging from small business clusters with a handful of nodes to some of the fastest supercomputers in the world such as IBM's Sequoia
XPDS13: VIRTUAL DISK INTEGRITY IN REAL TIME JP BLAKE, ASSURED INFORMATION SE...The Linux Foundation
This paper introduces the Virtual Disk Integrity in Real Time (vDIRT) monitor, a mechanism to measure virtual hard disks in real time from the Dom0 trusted computing base. vDIRT is an improvement over traditional methods for auditing file integrity which rely on a service in a potentially compromised host. It also overcomes the limitations of existing methods for assuring disk integrity that are coarse grained and do not scale to large disks. vDIRT is a capability to measure disk reads and writes in real time, allowing for fine grained tracking of sectors within files, as well as the overall disk. The vDIRT implementation and its impact on performance is discussed to show that disk operation monitoring from Dom0 is practical.
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBITOpenNebula Project
You will learn what DRBD is, where it came from in its 15 years of existence. How it evolved into a software defined storage solution interesting for users of OpenNebula and why it is very well suited for hyperconverged deployment architectures. The presentation will contain IO performance results and (if time permits) a live demo.
Live migrating a container: pros, cons and gotchasDocker, Inc.
In this talk I will briefly show why you might want to live migrate a container, why you might want to avoid doing this and what can be done instead. The main topic of the talk would to demonstrate why live migrating a container is more complex than live migrating a virtual machines and what can be done with this complexity.
This is to introduce the related components in SUSE Linux Enterprise High Availability Extension product to build High Available Storage (ha-lvm/drbd/iscsi/nfs, clvm, ocfs2, cluster-raid1).
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBITOpenNebula Project
You will learn what DRBD is, where it came from in its 15 years of existence. How it evolved into a software defined storage solution interesting for users of OpenNebula and why it is very well suited for hyperconverged deployment architectures. The presentation will contain IO performance results and (if time permits) a live demo.
Live migrating a container: pros, cons and gotchasDocker, Inc.
In this talk I will briefly show why you might want to live migrate a container, why you might want to avoid doing this and what can be done instead. The main topic of the talk would to demonstrate why live migrating a container is more complex than live migrating a virtual machines and what can be done with this complexity.
This is to introduce the related components in SUSE Linux Enterprise High Availability Extension product to build High Available Storage (ha-lvm/drbd/iscsi/nfs, clvm, ocfs2, cluster-raid1).
Linux Container Brief for IEEE WG P2302Boden Russell
A brief into to Linux Containers presented to IEEE working group P2302 (InterCloud standards and portability). This deck covers:
- Definitions and motivations for containers
- Container technology stack
- Containers vs Hypervisor VMs
- Cgroups
- Namespaces
- Pivot root vs chroot
- Linux Container image basics
- Linux Container security topics
- Overview of Linux Container tooling functionality
- Thoughts on container portability and runtime configuration
- Container tooling in the industry
- Container gaps
- Sample use cases for traditional VMs
Overall, a bulk of this deck is covered in other material I have posted here. However there are a few new slides in this deck, most notability some thoughts on container portability and runtime config.
Performant and Resilient Storage: The Open Source & Linux WayOpenNebula Project
OpenNebula users have a range of storage options available to them, including proprietary appliances, proprietary software and Open Source software projects. This session will present a fully Open Source approach, that tightly integrates with Linux, and makes full use of the mature building blocks within the Linux kernel (LVM, Software RAID, DM-crypt, NVMe-oF Target, DRBD, etc...), and delivers one of the highest performance open source storage stacks currently available. The core goal is to expose the improved performance of NVMe storage devices to VMs and containers. The solution covers both local NVMe drives and NVMe-oF. For interacting with NVMe-oF targets it supports the Swordfish-API and LVM & Linux’s software NVMe-oF target. The solution contains a storage addon for OpenNebula.
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...OpenNebula Project
OpenNebula users have a range of storage options available to them, including proprietary appliances, proprietary software and Open Source software projects. This session will present a fully Open Source approach, that tightly integrates with Linux, and makes full use of the mature building blocks within the Linux kernel (LVM, Software RAID, DM-crypt, NVMe-oF Target, DRBD, etc...), and delivers one of the highest performance open source storage stacks currently available.
The core goal is to expose the improved performance of NVMe storage devices to VMs and containers. The solution covers both local NVMe drives and NVMe-oF. For interacting with NVMe-oF targets it supports the Swordfish-API and LVM & Linux’s software NVMe-oF target. The solution contains a storage addon for OpenNebula.
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebula Project
Hyperconvergence is one of the big topics in datacenters at the moment. But is it more than an old wine in new bottles? Why we at Runtastic built an hyperconverged datacenter based on Opennebula with Ceph and what we learned.
YouTube: https://youtu.be/50Z4bmevTpg
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
In this session, the speakers will discuss their experiences porting Apache Spark to the Cray XC family of supercomputers. One scalability bottleneck is in handling the global file system present in all large-scale HPC installations. Using two techniques (file open pooling, and mounting the Spark file hierarchy in a specific manner), they were able to improve scalability from O(100) cores to O(10,000) cores. This is the first result at such a large scale on HPC systems, and it had a transformative impact on research, enabling their colleagues to run on 50,000 cores.
With this baseline performance fixed, they will then discuss the impact of the storage hierarchy and of the network on Spark performance. They will contrast a Cray system with two levels of storage with a “data intensive” system with fast local SSDs. The Cray contains a back-end global file system and a mid-tier fast SSD storage. One conclusion is that local SSDs are not needed for good performance on a very broad workload, including spark-perf, TeraSort, genomics, etc.
They will also provide a detailed analysis of the impact of latency of file and network I/O operations on Spark scalability. This analysis is very useful to both system procurements and Spark core developers. By examining the mean/median value in conjunction with variability, one can infer the expected scalability on a given system. For example, the Cray mid-tier storage has been marketed as the magic bullet for data intensive applications. Initially, it did improve scalability and end-to-end performance. After understanding and eliminating variability in I/O operations, they were able to outperform any configurations involving mid-tier storage by using the back-end file system directly. They will also discuss the impact of network performance and contrast results on the Cray Aries HPC network with results on InfiniBand.
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
In this session, the speakers will discuss their experiences porting Apache Spark to the Cray XC family of supercomputers. One scalability bottleneck is in handling the global file system present in all large-scale HPC installations. Using two techniques (file open pooling, and mounting the Spark file hierarchy in a specific manner), they were able to improve scalability from O(100) cores to O(10,000) cores. This is the first result at such a large scale on HPC systems, and it had a transformative impact on research, enabling their colleagues to run on 50,000 cores.
With this baseline performance fixed, they will then discuss the impact of the storage hierarchy and of the network on Spark performance. They will contrast a Cray system with two levels of storage with a “data intensive” system with fast local SSDs. The Cray contains a back-end global file system and a mid-tier fast SSD storage. One conclusion is that local SSDs are not needed for good performance on a very broad workload, including spark-perf, TeraSort, genomics, etc.
They will also provide a detailed analysis of the impact of latency of file and network I/O operations on Spark scalability. This analysis is very useful to both system procurements and Spark core developers. By examining the mean/median value in conjunction with variability, one can infer the expected scalability on a given system. For example, the Cray mid-tier storage has been marketed as the magic bullet for data intensive applications. Initially, it did improve scalability and end-to-end performance. After understanding and eliminating variability in I/O operations, they were able to outperform any configurations involving mid-tier storage by using the back-end file system directly. They will also discuss the impact of network performance and contrast results on the Cray Aries HPC network with results on InfiniBand.
What CloudStackers Need To Know About LINSTOR/DRBDShapeBlue
Philipp explains the best performing Open Source software-defined storage software available to Apache CloudStack today. It consists of two well-concerted components. LINSTOR and DRBD. Each of them also has its independent use cases, where it is deployed alone. In this presentation, the combination of these two is examined. They form the control plane and the data plane of the SDS. We will touch on: Performance, scalability, hyper-convergence (data-locality for high IO performance), resiliency through data replication (synchronous within a site, 2-way, 3-way, or more), snapshots, backup (to S3), encryption at rest, deduplication, compression, placement policies (regarding failure domains), management CLI and webGUI, monitoring interface, self-healing (restoring redundancy after device/node failure), the federation of multiple sites (async mirroring and repeatedly snapshot difference shipping), QoS control (noisy neighbors limitation) and of course: complete integration with CloudStack for KVM guests. It is Open Source software following the Unix philosophy. Each component solves one task, made for maximal re-usability. The solution leverages the Linux kernel, LVM and/or ZFS, and many Open Source software libraries. Building on these giant Open Source foundations, not only saves LINBIT from re-inventing the wheels, it also empowers your day 2 operation teams since they are already familiar with these technologies.
Philipp Reisner is one of the founders and CEO of LINBIT in Vienna/Austria. He holds a Dipl.-Ing. (comparable to MSc) degree in computer science from Technical University in Vienna. His professional career has been dominated by developing DRBD, a storage replication software for Linux. While in the early years (2001) this was writing kernel code, today he leads a company of 30 employees with locations in Austria and the USA. LINBIT is an Open Source company offering enterprise-level support subscriptions for its Open Source technologies.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
Similar to Linux High Availability Overview - openSUSE.Asia Summit 2015 (20)
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
4. 4
What is Cluster?
• HPC (super computing)
• Load Balancer (Very high capacity)
• High Availability
‒ 99.999% = 5 m/year MTTR
‒ SPOF(single point of failure)
‒ Murphy's Law
"Everything that can go wrong will
go wrong"
5. 5
"HA", widely used, often confusing
• VMWare vSphere HA
‒ hypervisor and hardware level. Close-source.
‒ Agnostic on Guest OS inside the VM.
• SUSE HA
‒ Inside Linux OS.
‒ That said, Windows need Windows HA solution.
• Different Industries
‒ We are Enterprise.
‒ HADOOP HA (dot com)
‒ OpenStack HA ( paas )
‒ OpenSAF (telecom)
6. 6
History of HA in Linux OS
• 1990s, Heartbeat project. Simply two nodes.
• Early 2000s, Heartbeat 2.0 too complex.
‒ Industry demands to split.
1) one for cluster membership
2) one for resource management
• Today, ClusterLabs.org
‒ A completely different solution in early days,
pacemaker + corosnc
‒ While merged Heartbeat project.
‒ 2015 HA Summit
7. 7
Typical HA Problem - Split Brain
• Clustering
‒ multiple nodes share the same resources.
• Split partitions run the same service
‒ It just breaks data integrity !!!
• Two key concepts as the solution:
‒ Fencing
Cluster doesn't accept any confusing state.
STONITH - "shoot the other node in the head".
‒ Quorum
It stands for "majority". No quorum, then no actions,
no resource management, no fencing.
8. 8
HA Hardware Components
• Multiple networks
‒ A user network for end user access.
‒ A dedicated network for cluster communication/heartbeat.
‒ A dedicated storage network infrastructure.
• Network Bonding
‒ aka. Link Aggregation
• Fencing/STONITH devices
‒ remote “powerswitch”
• Shared storage
‒ NAS(nfs/cifs), SAN(fc/iscsi)
11. 11
Corosync: messaging and membership
• Consensus algorithm
‒ "Totem Single Ring Ordering and Membership protocol"
• Closed Process Group
‒ Analogue “TCP/IP 3-way hand shaking”
‒ Membership handling.
‒ Message ordering.
• A quorum system
‒ notifies apps when quorum is achieved or lost.
• In-memory object database
‒ for Configuration engines and Service engines.
‒ Shared-nothing cluster.
12. 12
Pacemaker: the resources manager
• The brain of the cluster.
• Policy engine for decision making.
‒ To start/stop resources on a node according to the score.
‒ To monitor resources according to interval.
‒ To restart resources if monitor fails.
‒ To fence/STONITH a node if stop operation fails.
13. 13
Shoot The Other Node In The Head
• Data integrity does not
tolerate any confusing state.
Before migrating resources
to another node in the
cluster, the cluster must
confirm the suspicious node
really is down.
• STONITH is mandatory for
*enterprise* Linux HA
clusters.
14. 14
Popular STONITH devices
• APC PDU
‒ network based powerswitch
• Standard Protocols Integrated with Servers
‒ Intel AMT, HP iLO, Dell DRAC, IBM IMM, IPMI Alliance
• Software libraries
‒ to deal with KVM, Xen and VMware Vms.
• Software based
‒ SBD (STONITH Block Device) to do self termination.
The last implicit option in the fencing topology.
• NOTE: Fencing devices can be chained.
15. 15
Resources Agents (RAs)
• Write RA for your applications
• LSB shell scripts:
start / stop / monitor
• More than hundred contributors in upstream github.
16. 16
Cluster Filesystem
• OCFS2 / GFS2
‒ On the shared storage.
‒ Multiple nodes concurrently access the same filesystem.
http://clusterlabs.org/doc/fr/Pacemaker/1.1/html/Clusters_from_Scratch/_pacemaker_architecture.html
17. 17
Cluster Block Device
• DRBD
‒ network based raid1.
‒ high performance data replication over network.
• cLVM2 + cmirrord
‒ Clustered lvm2 mirroring.
‒ Multiple nodes can manipulate volumes on the shared disk.
‒ clvmd distributes LVM metadata updates in the cluster.
‒ Data replication speed is way too slow.
• Cluster md raid1
‒ multiple nodes use the shared disks as md-raid1.
‒ High performance raid1 solution in cluster.
19. 19
NFS Server ( High Available NAS )
Cluster Example in Diagram
VM1
DRBD MasterDRBD Master
LVMLVM
Filesystem(ext4)Filesystem(ext4)
NFS exportsNFS exports
Virtual IPVirtual IP
VM2
DRBD SlaveDRBD Slave
LVMLVM
Filesystem(ext4)Filesystem(ext4)
NFS exportsNFS exports
Virtual IPVirtual IP
Failover
Pacemaker + CorosyncPacemaker + Corosync
KernelKernel KernelKernel
network raid1
20. 20
HA iSCSI Server ( Active/Passive )
Cluster Example in Diagram
VM1
DRBD MasterDRBD Master
LVMLVM
iSCSITargetiSCSITarget
iSCSILogicalUnitiSCSILogicalUnit
Virtual IPVirtual IP
VM2
DRBD SlaveDRBD Slave
LVMLVM
iSCSITargetiSCSITarget
iSCSILogicalUnitiSCSILogicalUnit
Virtual IPVirtual IP
Failover
Pacemaker + CorosyncPacemaker + Corosync
network raid1
KernelKernel KernelKernel
21. 21
Cluster FS - OCFS2 on shared disk
Cluster Example in Diagram
Host3Host1
OCFS2OCFS2
Host2
OCFS2OCFS2
KernelKernel KernelKernel
Virtual Machine Images and Configuration ( /mnt/images/ )Virtual Machine Images and Configuration ( /mnt/images/ )
Cluster MD-RAID1Cluster MD-RAID1 Cluster MD-RAID1Cluster MD-RAID1
Replication / Backup
/dev/md127 /dev/md127
VMVM VMmigration
Pacemaker + CorosyncPacemaker + Corosync
22. 22
Future outlook
• Upstream activities
‒ OpenStack: from the control plane into the compute domain.
‒ Scalability of corosync/pacemaker
‒ Docker adoption
‒ “Zero” Downtime HA VM
‒ ...
23. 23
Join us ( all *open-source* )
• Play with Leap 42.1: http://www.opensuse.org
Doc: https://www.suse.com/documentation/sle-ha-12/
• Report and Fix Bugs: http://bugzilla.opensuse.org
• Discussion: opensuse-ha@opensuse.org
• HA ClusterLabs: http://clusterlabs.org/
• General HA Users: users@clusterlabs.org