- MySQL HA can be achieved with solutions like shared storage (DRBD), replication, MySQL Cluster, or Linux HA/Pacemaker.
- Linux HA/Pacemaker provides high availability by managing resources across nodes and ensuring that services are running on an available node if one fails.
- It uses a central configuration (CIB) to define resources, constraints between them, and monitor their status to determine the optimal placement of resources across nodes.
This presentation is on the DRBD product. At eNovance, we're using it for several years. In those slides, you will find informations on how we use it, use cases and Ninja tricks.
This document has been realized with a lot of feedbacks and thanks to strong knowledges on that technology that eNovance is able to provide.
The presentation provides you with the necessary steps to follow when migrating to XtraDB Cluster.
Percona provides an in-depth review of your database and recommends appropriate changes by performing a complete MySQL health check in which we identify inefficiencies, find problems before they occur, and ensure that your MySQL database is in the best condition.
This presentation is on the DRBD product. At eNovance, we're using it for several years. In those slides, you will find informations on how we use it, use cases and Ninja tricks.
This document has been realized with a lot of feedbacks and thanks to strong knowledges on that technology that eNovance is able to provide.
The presentation provides you with the necessary steps to follow when migrating to XtraDB Cluster.
Percona provides an in-depth review of your database and recommends appropriate changes by performing a complete MySQL health check in which we identify inefficiencies, find problems before they occur, and ensure that your MySQL database is in the best condition.
Actually any people and employers get high available in your applications, maintain ours environment always available not is easy task. In Open Source World exist tools that maje it possible. This appresentation is a module course UTAH NETWORXS about Hight Available and Performance Course. Utah Networxs is business school in Sao Paulo Brazil Work a Linux System for more than 17 years. Maked to Fabio Pires Directory Utah Networxs and Linux Specialist focusing Clusters and HA services.
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
Apache Cassandra operations have the reputation to be quite simple against single datacenter clusters and / or low volume clusters but they become way more complex against high latency multi-datacenter clusters: basic operations such as repair, compaction or hints delivery can have dramatic consequences even on a healthy cluster.
In this presentation, Julien will go through Cassandra operations in details: bootstrapping new nodes and / or datacenter, repair strategies, compaction strategies, GC tuning, OS tuning, large batch of data removal and Apache Cassandra upgrade strategy.
Julien will give you tips and techniques on how to anticipate issues inherent to multi-datacenter cluster: how and what to monitor, hardware and network considerations as well as data model and application level bad design / anti-patterns that can affect your multi-datacenter cluster performances.
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
Presenters: Michael Nelson, Development Manager at FamilySearch
A recent research project at FamilySearch.org pushed Cassandra to very high scale and performance limits in AWS using a real application. Come see how we achieved 250K reads/sec with latencies under 5 milliseconds on a 400-core cluster holding 6 TB of data while maintaining transactional consistency for users. We'll cover tuning of Cassandra's caches, other server-side settings, client driver, AWS cluster placement and instance types, and the tradeoffs between regular & SSD storage.
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
The Google Chubby lock service presented in 2006 is the inspiration for Apache ZooKeeper: let's take a deep dive into Chubby to better understand ZooKeeper and distributed consensus.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
A brief history of Instagram's adoption cycle of the open source distributed database Apache Cassandra, in addition to details about it's use case and implementation. This was presented at the San Francisco Cassandra Meetup at the Disqus HQ in August 2013.
Historically, sharing a Linux server entailed all kinds of untenable compromises. In addition to the security concerns, there was simply no good way to keep one application from hogging resources and messing with the others. The classic “noisy neighbor” problem made shared systems the bargain-basement slums of the Internet, suitable only for small or throwaway projects.
Serious use-cases traditionally demanded dedicated systems. Over the past decade virtualization (in conjunction with Moore’s law) has democratized the availability of what amount to dedicated systems, and the result is hundreds of thousands of websites and applications deployed into VPS or cloud instances. It’s a step in the right direction, but still has glaring flaws.
Most of these websites are just piles of code sitting on a server somewhere. How did that code got there? How can it can be scaled? Secured? Maintained? It’s anybody’s guess. There simply isn’t enough SysAdmin talent in the world to meet the demands of managing all these apps with anything close to best practices without a better model.
Containers are a whole new ballgame. Unlike VMs, you skip the overhead of running an entire OS for every application environment. There’s also no need to provision a whole new machine to have a place to deploy, meaning you can spin up or scale your application with orders of magnitude more speed and accuracy.
Best practices for MySQL High AvailabilityColin Charles
The MariaDB/MySQL world is full of tradeoffs, and choosing a high availability (HA) solution is no exception. This session aims to look at all the alternatives in an unbiased way. Preference is of course only given to open source solutions.
How do you choose between: asynchronous/semi-synchronous/synchronous replication, MHA (MySQL high availability tools), DRBD, Tungsten Replicator, or Galera Cluster? Do you integrate Pacemaker and Heartbeat like Percona Replication Manager? The cloud brings even more fun, especially if you are dealing with a hybrid cloud and must think about geographical redundancy.
What about newer solutions like using Consul for MySQL HA?
When you’ve decided on your solution, how do you provision and monitor these solutions?
This and more will be covered in a walkthrough of MySQL HA options and when to apply them.
Actually any people and employers get high available in your applications, maintain ours environment always available not is easy task. In Open Source World exist tools that maje it possible. This appresentation is a module course UTAH NETWORXS about Hight Available and Performance Course. Utah Networxs is business school in Sao Paulo Brazil Work a Linux System for more than 17 years. Maked to Fabio Pires Directory Utah Networxs and Linux Specialist focusing Clusters and HA services.
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
Apache Cassandra operations have the reputation to be quite simple against single datacenter clusters and / or low volume clusters but they become way more complex against high latency multi-datacenter clusters: basic operations such as repair, compaction or hints delivery can have dramatic consequences even on a healthy cluster.
In this presentation, Julien will go through Cassandra operations in details: bootstrapping new nodes and / or datacenter, repair strategies, compaction strategies, GC tuning, OS tuning, large batch of data removal and Apache Cassandra upgrade strategy.
Julien will give you tips and techniques on how to anticipate issues inherent to multi-datacenter cluster: how and what to monitor, hardware and network considerations as well as data model and application level bad design / anti-patterns that can affect your multi-datacenter cluster performances.
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
Presenters: Michael Nelson, Development Manager at FamilySearch
A recent research project at FamilySearch.org pushed Cassandra to very high scale and performance limits in AWS using a real application. Come see how we achieved 250K reads/sec with latencies under 5 milliseconds on a 400-core cluster holding 6 TB of data while maintaining transactional consistency for users. We'll cover tuning of Cassandra's caches, other server-side settings, client driver, AWS cluster placement and instance types, and the tradeoffs between regular & SSD storage.
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
The Google Chubby lock service presented in 2006 is the inspiration for Apache ZooKeeper: let's take a deep dive into Chubby to better understand ZooKeeper and distributed consensus.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
A brief history of Instagram's adoption cycle of the open source distributed database Apache Cassandra, in addition to details about it's use case and implementation. This was presented at the San Francisco Cassandra Meetup at the Disqus HQ in August 2013.
Historically, sharing a Linux server entailed all kinds of untenable compromises. In addition to the security concerns, there was simply no good way to keep one application from hogging resources and messing with the others. The classic “noisy neighbor” problem made shared systems the bargain-basement slums of the Internet, suitable only for small or throwaway projects.
Serious use-cases traditionally demanded dedicated systems. Over the past decade virtualization (in conjunction with Moore’s law) has democratized the availability of what amount to dedicated systems, and the result is hundreds of thousands of websites and applications deployed into VPS or cloud instances. It’s a step in the right direction, but still has glaring flaws.
Most of these websites are just piles of code sitting on a server somewhere. How did that code got there? How can it can be scaled? Secured? Maintained? It’s anybody’s guess. There simply isn’t enough SysAdmin talent in the world to meet the demands of managing all these apps with anything close to best practices without a better model.
Containers are a whole new ballgame. Unlike VMs, you skip the overhead of running an entire OS for every application environment. There’s also no need to provision a whole new machine to have a place to deploy, meaning you can spin up or scale your application with orders of magnitude more speed and accuracy.
Best practices for MySQL High AvailabilityColin Charles
The MariaDB/MySQL world is full of tradeoffs, and choosing a high availability (HA) solution is no exception. This session aims to look at all the alternatives in an unbiased way. Preference is of course only given to open source solutions.
How do you choose between: asynchronous/semi-synchronous/synchronous replication, MHA (MySQL high availability tools), DRBD, Tungsten Replicator, or Galera Cluster? Do you integrate Pacemaker and Heartbeat like Percona Replication Manager? The cloud brings even more fun, especially if you are dealing with a hybrid cloud and must think about geographical redundancy.
What about newer solutions like using Consul for MySQL HA?
When you’ve decided on your solution, how do you provision and monitor these solutions?
This and more will be covered in a walkthrough of MySQL HA options and when to apply them.
Quality of software code for a given product shipped effectively translates not only to its functional quality but as well to its non functional aspects say security. Many of the issues in code can be addressed much before they reach SCM.
KMIP stands for key management interoperability protocol. Provides simple binary and TTLV variant protocol to manage various cryptographic key cycles for enterprise needs viz., for enterprise applications, data encryption etc.
Technical overview of how SUSE OpenStack Cloud uses Chef to implement highly available OpenStack infrastructure services.
Target audience: curious developers in the upstream openstack-chef community
These slides were extracted from internal HA training for SUSE OpenStack Cloud developers, and slightly modified for the benefit of the openstack‐chef community.
Building the Enterprise infrastructure with PostgreSQL as the basis for stori...PavelKonotopov
In my talk, I will tell how we built a geographically distributed system of personal data storage based on Open Source software and PostgreSQL. The concept of the inCountry business is to provide customers with a ready-to-use infrastructure for personal data storage. Our business customers are ensured that their customer’s personal data is securely stored within their country’s borders. We wrote an API and SDK and built a variety of services. Our system complies with generally accepted security standards (SOC Type 1, Type 2, PCI DSS, etc.). We built our infrastructure with Consul, Nomad, and Vault, used PostgreSQL, ElasticSearch as a storage system, Nginx, Jenkins, Artifactory, other tools to automate management and deployment. We have assembled our development and management teams - DevOps, Security, Monitoring, and DBA. We use both cloud providers and bare-metal servers located in different regions of the world. Development of the system architecture and ensuring the stability of the infrastructure, consistent and secure operation of all its components is the main task facing our teams.
New to MongoDB? We'll provide an overview of installation, high availability through replication, scale out through sharding, and options for monitoring and backup. No prior knowledge of MongoDB is assumed. This session will jumpstart your knowledge of MongoDB operations, providing you with context for the rest of the day's content.
Introducing Galera Cluster & the Codership Team
Galera Cluster in a nutshell:
True multi-master:
Read & write to any node
* Synchronous replication
* No slave lag
* No integrity issues
* No master-slave failovers or VIP needed
* Multi-threaded slave, no performance penalty
* Automatic node provisioning
Elastic:
Easy scale-out & scale-in, all nodes read-write
Slides presented at Percona Live Europe Open Source Database Conference 2019, Amsterdam, 2019-10-01.
Imagine a world where all Wikipedia articles disappear due to a human error or software bug. Sounds unreal? According to some estimations, it would take an excess of hundreds of million person-hours to be written again. To prevent that scenario from ever happening, our SRE team at Wikimedia recently refactored the relational database recovery system.
In this session, we will discuss how we backup 550TB of MariaDB data without impacting the 15 billion page views per month we get. We will cover what were our initial plans to replace the old infrastructure, how we achieved recovering 2TB databases in less than 30 minutes while maintaining per-table granularity, as well as the different types of backups we implemented. Lastly, we will talk about lessons learned, what went well, how our original plans changed and future work.
With employees based in countries around the globe which provide 24x7 services to MySQL users worldwide, Percona provides enterprise-grade MySQL Support, Consulting, Training, Managed Services, and Server Development services to companies ranging from large organizations, such as Cisco Systems, Alcatel-Lucent, Groupon, and the BBC, to recent startups building MySQL-powered solutions for businesses and consumers.
Most people will claim that this never happens, others hope it never happens, but it happened on March 10, 2021, and it was not just the 1 datacenter that got impacted, but the whole campus of the provider that got powered down. This talk will explain how our customers survived this outage, how our culture, opensource tooling and automation saved the da(y,ta). A talk about disaster recovery, business continuity plans and building cloud agnostic stacks that survive disasters.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
2. Kris Buytaert
● I used to be a Dev, Then Became an Op,
● Today I feel like a dev again
● Senior Linux and Open Source Consultant @inuits.be
● „Infrastructure Architect“
● Building Clouds since before the Cloud
● Surviving the 10th floor test
● Co-Author of some books
● Guest Editor at some sites
4. What is HA Clustering ?
● One service goes down
=> others take over its work
● IP address takeover, service takeover,
● Not designed for high-performance
● Not designed for high troughput (load
balancing)
5. Lies, Damn Lies, and
Statistics
Counting nines
(slide by Alan R)
99.9999% 30 sec
99.999% 5 min
99.99% 52 min
99.9% 9 hr
99% 3.5 day
6. The Rules of HA
● Keep it Simple
● Keep it Simple
● Prepare for Failure
● Complexity is the enemy of reliability
● Test your HA setup
7. Eliminating the SPOF
● Find out what Will Fail
• Disks
• Fans
• Power (Supplies)
● Find out what Can Fail
• Network
• Going Out Of Memory
8. Data vs Connection
● DATA :
• Replication
• Shared storage
• DRBD
● Connection
• LVS
• Proxy
• Heartbeat / Pacemaker
10. DRBD
● Distributed Replicated Block Device
● In the Linux Kernel
● Usually only 1 mount
• Multi mount as of 8.X
• Requires GFS / OCFS2
● Regular FS ext3 ...
● Only 1 MySQL instance Active accessing data
● Upon Failover MySQL needs to be started on
other node
11. DRBD(2)
● What happens when you pull the plug of a
Physical machine ?
• Minimal Timeout
• Why did the crash happen ?
• Is my data still correct ?
• Innodb Consistency Checks ?
• Lengthy ?
• Check your BinLog size
12. Other Solutions Today
● MySQL Cluster NDBD
● Multi Master Replication
● MySQL Proxy
● MMM
● Flipper
● BYO
● ....
13. Pulling Traffic
● Eg. for Cluster, MultiMaster setups
• DNS
• Advanced Routing
• LVS
• Or the upcoming slides
14. Linux-HA PaceMaker
● Plays well with others
● Manages more than MySQL
●
● ...v3 .. don't even think about the rest anymore
●
● http://clusterlabs.org/
15. Heartbeat v1
• Max 2 nodes
• No finegrained resources
• Monitoring using “mon”
/etc/ha.d/ha.cf
/etc/ha.d/haresources
mdb-a.menos.asbucenter.dz ntc-restart-mysql mon IPaddr2::10.8.0.13/16/bond0
IPaddr2::10.16.0.13/16/bond0.16 mon
/etc/ha.d/authkeys
18. Heartbeat v3
• No more /etc/ha.d/haresources
• No more xml
• Better integrated monitoring
• /etc/ha.d/ha.cf has
• crm=yes
19. Pacemaker ?
● Not a fork
● Only CRM Code taken out of Heartbeat
● As of Heartbeat 2.1.3
• Support for both OpenAIS / HeartBeat
• Different Release Cycles as Heartbeat
20. Heartbeat, OpenAis,
Corosync ?
● All Messaging Layers
● Initially only Heartbeat
● OpenAIS
● Heartbeat got unmaintained
● OpenAIS had heisenbugs :(
● Corosync
● Heartbeat maintenance taken over by LinBit
● CRM Detects which layer
22. ● Stonithd : The Heartbeat fencing subsystem.
Pacemaker Architecture
● Lrmd : Local Resource Management Daemon. Interacts
directly with resource agents (scripts).
● pengine Policy Engine. Computes the next state of the
cluster based on the current state and the configuration.
● cib Cluster Information Base. Contains definitions of all
cluster options, nodes, resources, their relationships to
one another and current status. Synchronizes updates to
all cluster nodes.
● crmd Cluster Resource Management Daemon. Largely
a message broker for the PEngine and LRM, it also elects
a leader to co-ordinate the activities of the cluster.
● openais messaging and membership layer.
● heartbeat messaging layer, an alternative to OpenAIS.
● ccm Short for Consensus Cluster Membership. The
Heartbeat membership layer.
24. CRM
configure
property $id="cibbootstrapoptions"
● Cluster Resource stonithenabled="FALSE"
noquorumpolicy=ignore
Manager startfailureisfatal="FALSE"
rsc_defaults $id="rsc_defaultsoptions"
migrationthreshold="1"
● Keeps Nodes in Sync failuretimeout="1"
primitive d_mysql ocf:local:mysql
op monitor interval="30s"
params test_user="sure" test_passwd="illtell"
● XML Based test_table="test.table"
primitive ip_db ocf:heartbeat:IPaddr2
params ip="172.17.4.202" nic="bond0"
● cibadm op monitor interval="10s"
group svc_db d_mysql ip_db
commit
● Cli manageable
● Crm
25. Heartbeat Resources
● LSB
● Heartbeat resource (+status)
● OCF (Open Cluster FrameWork) (+monitor)
● Clones (don't use in HAv2)
● Multi State Resources
26. LSB Resource Agents
● LSB == Linux Standards Base
● LSB resource agents are standard System V-
style init scripts commonly used on Linux and
other UNIX-like OSes
● LSB init scripts are stored under /etc/init.d/
● This enables Linux-HA to immediately support
nearly every service that comes with your
system, and most packages which come with
their own init script
● It's straightforward to change an LSB script to
an OCF script
27. OCF
● OCF == Open Cluster Framework
● OCF Resource agents are the most powerful type of
resource agent we support
● OCF RAs are extended init scripts
• They have additional actions:
• monitor – for monitoring resource health
• meta-data – for providing information about the RA
● OCF RAs are located in
/usr/lib/ocf/resource.d/provider-name/
28. Monitoring
● Defined in the OCF Resource script
● Configured in the parameters
● You have to support multiple states
• Not running
• Running
• Failed
29. Anatomy of a Cluster
config
• Cluster properties
• Resource Defaults
• Primitive Definitions
• Resource Groups and Constraints
30. Cluster Properties
property $id="cib-bootstrap-options"
stonith-enabled="FALSE"
no-quorum-policy="ignore"
start-failure-is-fatal="FALSE"
No-quorum-policy = We'll ignore the loss of quorum on a 2 node cluster
Start-failure : When set to FALSE, the cluster will instead use the resource's failcount and value for resource-failure-
stickiness
31. Resource Defaults
rsc_defaults $id="rsc_defaults-options"
migration-threshold="1"
failure-timeout="1"
resource-stickiness="INFINITY"
failure-timeout means that after a failure there will be a 60 second timeout before the resource can come back to the
node on which it failed.
Migration-treshold=1 means that after 1 failure the resource will try to start on the other node
Resource-stickiness=INFINITY means that the resource really wants to stay where it is now.
33. Parsing a config
● Isn't always done correctly
● Even a verify won't find all issues
● Unexpected behaviour might occur
34. Where a resource runs
• multi state resources
• Master – Slave ,
• e.g mysql master-slave, drbd
• Clones
• Resources that can run on multiple nodes
e.g
• Multimaster mysql servers
• Mysql slaves
• Stateless applications
• location
• Preferred location to run resource, eg. Based on hostname
• colocation
• Resources that have to live together
• e.g ip address + service
• order
Define what resource has to start first, or wait for another resource
• groups
• Colocation + order
35. eg. A Service on DRBD
● DRBD can only be active on 1 node
● The filesystem needs to be mounted on that
active DRBD node
group svc_mine d_mine ip_mine
ms ms_drbd_storage drbd_storage
meta master_max="1" master_node_max="1" clone_max="2" clone_node_max="1"
notify="true"
colocation fs_on_drbd inf: svc_mine ms_drbd_storage:Master
order fs_after_drbd inf: ms_drbd_storage:promote svc_mine:start
location cli-prefer-svc_db svc_db
rule $id="cli-prefer-rule-svc_db" inf: #uname eq db-a
36. A MySQL Resource
● OCF
• Clone
• Where do you hook up the IP ?
• Multi State
• But we have Master Master replication
• Meta Resource
• Dummy resource that can monitor
• Connection
• Replication state
37. Simple 2 node example
primitive d_mysql ocf:ntc:mysql
op monitor interval="30s"
params test_user="just" test_passwd="kidding" test_table="really"
primitive ip_mysql_svc ocf:heartbeat:IPaddr2
params ip="10.8.0.30" cidr_netmask="255.255.255.0"
nic="bond0"
op monitor interval="10s"
group svc_mysql d_mysql ip_mysql_svc
38. Monitor your Setup
● Not just connectivity
● Also functional
• Query data
• Check resultset is correct
● Check replication
• MaatKit
• OpenARK
39. How to deal with replication state ?
● Multiple slaves
• Use Drbd ocf resource
● 2 masters only use own script
• Replication is slow on the active node
• Shouldn't happen talk to HR / cfgmt people
• Replication is slow on the passive node
• Weight--
• Replication breaks on the active node
send out warning, don't modify weights and check other node
• Replication breaks on the passive node
• Fence of the passive node
40. Adding MySQL to the
stack
Replication
Service IP MySQL
“MySQLd” “MySQLd” Resource MySQL
Cluster Stack
Pacemaker
HeartBeat
Node A Node B Hardware
41. Pitfalls & Solutions
● Monitor,
• Replication state
• Replication Lag
● MaatKit
● OpenARK
42. Conclusion
● Plenty of Alternatives
● Think about your Data
● Think about getting Queries to that Data
● Complexity is the enemy of reliability
● Keep it Simple
● Monitor inside the DB
43. Contact
Kris Buytaert Kris.Buytaert@inuits.be
Further Reading
@KrisBuytaert
http://www.krisbuytaert.be/blog/
http://www.inuits.be/
http://www.virtualization.com/
http://www.oreillygmt.com/
Inuits Esquimaux
't Hemeltje Kheops Business
Gemeentepark 2 Center
2930 Brasschaat Avenque Georges
891.514.231 Lemaître 54
6041 Gosselies
+32 473 441 636 889.780.406
+32 495 698 668