Presentation was delivered in a fault tolerance class which talk about the achieving fault tolerance in databases by making use of the replication.Different commercial databases were studied and looked into the approaches they took for replication.Then based on the study an architecture was suggested for military database design using an asynchronous approach and making use of the cluster patterns.
Architecture for building scalable and highly available Postgres ClusterAshnikbiz
As PostgreSQL has made way into business critical applications, many customers who are using Oracle RAC for high availability and load balancing have asked for similar functionality for using PostgreSQL.
In this Hangout session we would discuss architecture and alternatives, based on real life experience, for achieving high availability and load balancing functionality when you deploy PostgreSQL. We will also present some of the key tools and how to deploy them for effectiveness of this architecture.
Cross Datacenter Replication aka CDCR has been a long requested feature in Apache Solr. In this talk, we will discuss CDCR as released in Apache Solr 6.0 and beyond to understand its use-cases, limitations, setup and performance. We will also take a quick look at the future enhancements that can further simplify and scale this feature.
Tungsten Connector / Proxy is truly the secret sauce for the Tungsten Clustering solution. Watch this webinar to learn how the Tungsten Connector enables zero-downtime MySQL maintenance via the manual switch operation, and gain an understanding of the various configuration options for doing local reads in remote composite clusters.
AGENDA
- Review the cluster architecture
- Understand the role of the Connector
- Describe Connector deployment best practices (app, dedicated with lb, db with lb)
- Explore zero-downtime MySQL maintenance using the manual role switch procedure
- Learn about Connector routing patterns inside a composite cluster
- Illustrate a manual site switch
- Explain read affinity and the vast performance improvement of local reads
- Examine Connector multi-cluster support
Resilience Planning & How the Empire Strikes BackC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1pGpnbd.
Bhakti Mehta approaches best practices for building resilient, stable and predictable services: preventing cascading failures, timeouts pattern, retry pattern, circuit breakers and other techniques which have been pervasively used at Blue Jeans Network. Filmed at qconsf.com.
Bhakti Mehta is the author of "RESTful Java Patterns and Best practices” and "Developing RESTful Services with JAX-RS 2.0, WebSockets, and JSON”. Bhakti is a Senior Software Engineer at Blue Jeans Network. As part of her current role, she works on developing RESTful services that can be consumed by ISV partners and the developer community.
Presentation was delivered in a fault tolerance class which talk about the achieving fault tolerance in databases by making use of the replication.Different commercial databases were studied and looked into the approaches they took for replication.Then based on the study an architecture was suggested for military database design using an asynchronous approach and making use of the cluster patterns.
Architecture for building scalable and highly available Postgres ClusterAshnikbiz
As PostgreSQL has made way into business critical applications, many customers who are using Oracle RAC for high availability and load balancing have asked for similar functionality for using PostgreSQL.
In this Hangout session we would discuss architecture and alternatives, based on real life experience, for achieving high availability and load balancing functionality when you deploy PostgreSQL. We will also present some of the key tools and how to deploy them for effectiveness of this architecture.
Cross Datacenter Replication aka CDCR has been a long requested feature in Apache Solr. In this talk, we will discuss CDCR as released in Apache Solr 6.0 and beyond to understand its use-cases, limitations, setup and performance. We will also take a quick look at the future enhancements that can further simplify and scale this feature.
Tungsten Connector / Proxy is truly the secret sauce for the Tungsten Clustering solution. Watch this webinar to learn how the Tungsten Connector enables zero-downtime MySQL maintenance via the manual switch operation, and gain an understanding of the various configuration options for doing local reads in remote composite clusters.
AGENDA
- Review the cluster architecture
- Understand the role of the Connector
- Describe Connector deployment best practices (app, dedicated with lb, db with lb)
- Explore zero-downtime MySQL maintenance using the manual role switch procedure
- Learn about Connector routing patterns inside a composite cluster
- Illustrate a manual site switch
- Explain read affinity and the vast performance improvement of local reads
- Examine Connector multi-cluster support
Resilience Planning & How the Empire Strikes BackC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1pGpnbd.
Bhakti Mehta approaches best practices for building resilient, stable and predictable services: preventing cascading failures, timeouts pattern, retry pattern, circuit breakers and other techniques which have been pervasively used at Blue Jeans Network. Filmed at qconsf.com.
Bhakti Mehta is the author of "RESTful Java Patterns and Best practices” and "Developing RESTful Services with JAX-RS 2.0, WebSockets, and JSON”. Bhakti is a Senior Software Engineer at Blue Jeans Network. As part of her current role, she works on developing RESTful services that can be consumed by ISV partners and the developer community.
Maria DB Galera Cluster for High AvailabilityOSSCube
Want to understand how to set high availability solutions for MySQL using MariaDB Galera Cluster? Join this webinar, and learn from experts. During this webinar, you will also get guidance on how to implement MariaDB Galera Cluster.
Cloud Deployment of Data Harmony
Jeffrey Gordon, Lead Developer, Access Innovations, Inc.
Jeffrey will describe the cloud deployment of the Data Harmony software.
Transforming Legacy Applications Into Dynamically Scalable Web ServicesAdam Takvam
The tools and technologies used to power the modern data center are evolving at a pace faster than most companies can keep up. Aging web services built on LAMP, WAMP, or ASP cannot readily take advantage of the latest in scalable web platforms and technologies. In this presentation, we will discuss what factors must be considered in order for your aging web service to take advantage of technologies such as Apache Mesos, Marathon, Docker, Apache Kafka, and more.
This talk is intended for software developers, operations, and IT managers who are looking to modernize existing privately-hosted web applications. We will look at the transformation of the data center from a high-level perspective, examining before and after topology examples using Key Performance Indicators and Key Performance Metrics to show how levering modern design principles can both improve application performance and reduce operational costs. Next we will look at some example applications and show what needs to be done from both the software development and infrastructure perspectives to successfully accomplish the transformation.
Database as a Service (DBaaS) on KubernetesObjectRocket
Learn about ObjectRocket's adventures in Kubernetes. We'll cover why we chose Kubernetes for our DBaaS platform, the challenges we faced, and how we overcame them. A presentation for DevWeek Austin 2018.
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
The Science and Technology Facilities Council is a UK Research Council which funds research and provides large facilities to the UK Scientific Community. This includes running a Tier 1 site for the LHC computing project, the JASMIN Super Data Cluster and a number of other HPC and HTC facilities. The Scientific Computing Department at the Rutherford Appleton Laboratory has been developing a cloud for use across both sites of the Department and in the wider scientific community. This is an OpenNebula backed by Ceph block storage. I will give a brief background of the project, describe our set up, some use cases and the work we have done around OpenNebula (including a simplified web front-end and a number of hooks to provide us with traceability). I will also discuss how we are creating an elastic boundary between our HTC batch farm and cloud.
Author Biography
I am a Systems Administrator in the Scientific Computing Department of the UK’s Science and Technology Facilities Council. I work as part of the cloud team and I also work on a number of Grid services including our HTC batch farm for the LHC computing project.
Prior to my position here I worked in IT at a SMB focusing on Storage and Virtualisation, in particular Hyper-V and VMWare.
- Introduction to Kubernetes features
- A look at Kubernetes Networking and Service Discovery
- New features in Kubernetes 1.6
- Kubernetes Installation options
To know more about our Kubernetes expertise, visit our center of excellence at: http://www.opcito.com/kubernetes/
Continuous Delivery of Cloud Applications:Blue/Green and Canary DeploymentsPraveen Yalagandula
Continuous delivery is becoming increasingly critical, however, its implementation remains a hard problem many enterprises struggle with. Canary upgrades and Blue/Green deployment are the two commonly used patterns to implement continuous delivery. In Canary upgrades, a small portion of the production traffic is sent to the new version under test. In Blue/Green deployments, all the traffic is switched to the new version.
We will show how to fully automate the above steps to achieve true continuous delivery in K8s. We will show how to use analytics to express and automate application evaluation and ML-based traffic switching without any downtime.
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...Microsoft Technet France
La nouvelle version d'Exchange Server 2013 intègre une foule de nouveautés lui permettant d'être aujourd'hui le serveur de messagerie le plus sécurisé et le plus fiable sur le marché. L'expérience acquise par la gestion des solutions de messagerie Cloud par les équipes Microsoft a été directement intégrée dans cette nouvelle version du produit ce qui va vous permettre la mise en place d'un système de messagerie ultra résilient. Scott Schnoll, Principal Technical Writer dans l'équipe Exchange à Microsoft Corp va vous expliquer de manière didactique l'ensemble des mécanismes de haute disponibilité et les solutions de resilience inter sites dans les plus petits détails. Venez apprendre directement par l'expert qui a travaillé sur ces sujets chez Microsoft ! Attention, session très technique, en anglais.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Maria DB Galera Cluster for High AvailabilityOSSCube
Want to understand how to set high availability solutions for MySQL using MariaDB Galera Cluster? Join this webinar, and learn from experts. During this webinar, you will also get guidance on how to implement MariaDB Galera Cluster.
Cloud Deployment of Data Harmony
Jeffrey Gordon, Lead Developer, Access Innovations, Inc.
Jeffrey will describe the cloud deployment of the Data Harmony software.
Transforming Legacy Applications Into Dynamically Scalable Web ServicesAdam Takvam
The tools and technologies used to power the modern data center are evolving at a pace faster than most companies can keep up. Aging web services built on LAMP, WAMP, or ASP cannot readily take advantage of the latest in scalable web platforms and technologies. In this presentation, we will discuss what factors must be considered in order for your aging web service to take advantage of technologies such as Apache Mesos, Marathon, Docker, Apache Kafka, and more.
This talk is intended for software developers, operations, and IT managers who are looking to modernize existing privately-hosted web applications. We will look at the transformation of the data center from a high-level perspective, examining before and after topology examples using Key Performance Indicators and Key Performance Metrics to show how levering modern design principles can both improve application performance and reduce operational costs. Next we will look at some example applications and show what needs to be done from both the software development and infrastructure perspectives to successfully accomplish the transformation.
Database as a Service (DBaaS) on KubernetesObjectRocket
Learn about ObjectRocket's adventures in Kubernetes. We'll cover why we chose Kubernetes for our DBaaS platform, the challenges we faced, and how we overcame them. A presentation for DevWeek Austin 2018.
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
The Science and Technology Facilities Council is a UK Research Council which funds research and provides large facilities to the UK Scientific Community. This includes running a Tier 1 site for the LHC computing project, the JASMIN Super Data Cluster and a number of other HPC and HTC facilities. The Scientific Computing Department at the Rutherford Appleton Laboratory has been developing a cloud for use across both sites of the Department and in the wider scientific community. This is an OpenNebula backed by Ceph block storage. I will give a brief background of the project, describe our set up, some use cases and the work we have done around OpenNebula (including a simplified web front-end and a number of hooks to provide us with traceability). I will also discuss how we are creating an elastic boundary between our HTC batch farm and cloud.
Author Biography
I am a Systems Administrator in the Scientific Computing Department of the UK’s Science and Technology Facilities Council. I work as part of the cloud team and I also work on a number of Grid services including our HTC batch farm for the LHC computing project.
Prior to my position here I worked in IT at a SMB focusing on Storage and Virtualisation, in particular Hyper-V and VMWare.
- Introduction to Kubernetes features
- A look at Kubernetes Networking and Service Discovery
- New features in Kubernetes 1.6
- Kubernetes Installation options
To know more about our Kubernetes expertise, visit our center of excellence at: http://www.opcito.com/kubernetes/
Continuous Delivery of Cloud Applications:Blue/Green and Canary DeploymentsPraveen Yalagandula
Continuous delivery is becoming increasingly critical, however, its implementation remains a hard problem many enterprises struggle with. Canary upgrades and Blue/Green deployment are the two commonly used patterns to implement continuous delivery. In Canary upgrades, a small portion of the production traffic is sent to the new version under test. In Blue/Green deployments, all the traffic is switched to the new version.
We will show how to fully automate the above steps to achieve true continuous delivery in K8s. We will show how to use analytics to express and automate application evaluation and ML-based traffic switching without any downtime.
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...Microsoft Technet France
La nouvelle version d'Exchange Server 2013 intègre une foule de nouveautés lui permettant d'être aujourd'hui le serveur de messagerie le plus sécurisé et le plus fiable sur le marché. L'expérience acquise par la gestion des solutions de messagerie Cloud par les équipes Microsoft a été directement intégrée dans cette nouvelle version du produit ce qui va vous permettre la mise en place d'un système de messagerie ultra résilient. Scott Schnoll, Principal Technical Writer dans l'équipe Exchange à Microsoft Corp va vous expliquer de manière didactique l'ensemble des mécanismes de haute disponibilité et les solutions de resilience inter sites dans les plus petits détails. Venez apprendre directement par l'expert qui a travaillé sur ces sujets chez Microsoft ! Attention, session très technique, en anglais.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
2. • Committer on the Apache Pulsar
project.
• Former Principal Software Engineer on
Splunk’s Pulsar-as-a-Service team.
• Global Streaming Practice Director at
Streamlio & Hortonworks
3. • Author of Pulsar in Action
• Co-author, Practical Hive
4. When Failure is Not an
Option
Introducing Pulsar’s Failover Client
5. Defining Availability
• Uptime is measured
as the ratio of
uptime to downtime
within a year.
• Each layer build on
the previous one.
6. Multifaceted Availability
• Availability is a
concern across multiple
layers.
• Each of these have their
own uptime metric
• Application uptime is
equal to the lowest
uptime metric across all
layers.
8. Platform Availability Features
• Stateless brokers
• Redundant components
across all layers.
• Ability to leverage cloud
native features like
stateful sets to maintain
minimum replica count
9. Data Availability Features
• Self-healing replicated data storage.
• Rack placement policies
• Geo-replication of data
10. Application Availability Features
• Connection-aware clients
that automatically
detect and recover in
the event a client
disconnects from one of
the brokers.
• Completely transparent
to the application.
11. Availability in Pulsar Before 2.10
• Apache Pulsar can only provide high-availability.
• Application availability is the weakest link.
12. What was missing?
• Up until now, Pulsar clients could only interact with a
single Pulsar cluster and were unable to detect and
respond to a cluster-level failure event.
• In the event of a complete cluster failure, these
clients cannot reroute their messages to a
secondary/standby cluster automatically.
• In such a scenario, any application that uses the Pulsar
client is vulnerable to a prolonged outage since the
clients could not establish a connection to an active
cluster.
13. Pre-2.10 Cluster Failover
• To redirect the clients from the “active” to the standby
cluster, the DNS entry for the Pulsar endpoint that the client
applications are using must be updated to point to the load
balancer of the standby cluster.
• Pulsar clients are
configured to use a single
static URL to connect
• The DNS record is updated
to point to the regional
load balancer
14. What is wrong with this approach?
• It requires your DevOps team to monitor the health of your
Pulsar clusters and manually update the DNS record to
point to the stand-by cluster when the active cluster is
down.
• This cutover is not automatic, and the recovery time is
determined by the response time of your DevOps team.
• Even after the DNS record has been changed, it will take
some additional time before the DNS cache is refreshed.
16. Two new approaches
• There are two new cluster failover strategies
included in the upcoming 2.10 release.
• One supports automatic failover in the event of a
cluster outage, while the other enables you to
control the switch-over through an HTTP endpoint.
17. Automated Failover
• The AutoClusterFailover failover strategy
automatically switches from the primary cluster to a
stand-by cluster in the event of a cluster outage.
• This behavior is controlled by a probe task that
monitors the primary cluster.
• When it finds the primary cluster is unavailable for
more than failoverDelayMs, it will switch the
client connections over to the secondary cluster.
18.
19. Controlled Failover
• The ControlledClusterFailover strategy,
supports switching from the primary cluster to a
stand-by cluster in response to a signal sent
from an external service.
• This strategy enables your administrators to
trigger the cluster switch over.
23. What am I going to demo?
• Automatic Failover:
• Step 1: Start an application that uses the Automatic Failover client
to produce data to a topic.
• Step 2: Start consumers on both the active & standby clusters.
• Step 3: Stop the active Pulsar cluster
• Step 4: Observe the flow of data shift from the active to the standby
cluster
• Step 5: Restart the primary cluster
• Step 6: Observe the flow of data shift back to the primary cluster
24. What am I going to demo?
• Controlled Failover:
• Step 1: Start the REST Endpoint service.
• Step 2: Start an application that uses the Controlled Failover client to
produce data to a topic.
• Step 3: Start consumers on both the active & standby clusters.
• Step 4: Trigger the controller to switch to a different Pulsar cluster after
approximately 20 messages
• Step 5: Observe the flow of data shift from the active to the standby cluster
• Step 6: Trigger the controller to switch to the original Pulsar cluster after
approximately 30 messages
25. Summary
• Release 2.10 of Pulsar includes two new failover
clients that provide continuous availability for your
Pulsar applications
• I demonstrated how to configure and use the Automatic
failover client when producing messages.
• The Controlled Failover client is harder to implement
because it requires an additional service to be
written, but it does provide more flexibility.
26. Thanks for Attending
Scan the QR Code to
learn more about Apache
Pulsar.
Explore the Code
https://github.com/david-streamlio/cluster-failover-demo
Welcome to my talk entitled “when failure is not an option”.
Today I will be discussing the additions to the Apache Pulsar project that can help provide continuous availability for your applications that interact with Pulsar.
My name is David Kjerrumgaard, and I am proud to be a committer on the Apache Pulsar project.
I am currently a Developer Advocate at StreamNative, the company behind Apache Pulsar
Previously I was a principal software engineer at Splunk, where I worked on their Pulsar-as-a-Service team
I am also the author of Pulsar in Action by manning press
And co-author of practical Hive by APress
Developing a continuously-available application requires more than just utilizing fault-tolerant services such as Apache Pulsar in your software stack.
It also requires immediate failure detection and resolution including built-in failover when there are data center outages.
Up until now, Pulsar clients could only interact with a single Pulsar cluster and were unable to detect and respond to a cluster-level failure event. In the event of a complete cluster failure, these clients cannot reroute their messages to a secondary/standby cluster automatically.
This can lead to application failure., which for many is not an option.
uptime is typically measured by calculating the ratio of uptime to downtime within a year, then expressing that ratio as a percentage.
The concept of “five-nines” — availability of 99.999% — has been an industry gold standard for many years.
Systems that can only survive failures at the hardware layer (including individual server outages) is considered ”fault-tolerant”
Systems that can survive an AZ outage are considered “highly-available”
The ability to survive one or more regional outages is considered “continuously available”
When people use the term availability, they tend to think of only PLATFORM availability. i.e., is the system up or down?
This is because availability is generally considered a DevOps concern, but it is also an APPLICATION and DATA concern as well.
one approach to providing high-availability is to distribute the platform resources across different zones and/or geographical regions.
While necessary, this isn’t enough. The data used by the system must be kept in sync across that zones and regions as well.
A system with a missing or incomplete dataset is often worse than not having the system available at all, as it can lead to incorrect information, duplicate processing, etc.
From an application perspective, it is incumbent upon your application to be able to immediately detect a failure in the system and automatically switch over to the “active” platform in a seemless manner.
Let’s start with a quick review of all of Pulsar’s availability features already inside the platform.
Let’s look at Pulsar’s platform availability features.
Pulsar’s multi-tiered design makes it highly-available by default.
Separating the serving layer from the data storage layer allows Pulsar’s brokers to be 100% stateless.
Consequently, any broker can serve data from any topic by reading the data from separate storage layer instead of local disk (like other messaging systems such as Kafka)
Additionally, stateless brokers that fail can be easily replaced with new broker instances w/o any additional setup steps.
Pulsar’s storage layer maintains multiple replicas of the data on different bookie nodes to ensure that the loss of one or more bookies does not result in a loss of the data.
From a Data availability perspective,
Pulsar’s storage layer is self-healing. It will automatically detect any under-replicated data and re-create new copies of the data for you.
This allows us to easily replace any failed bookies with new bookie instances and allow the self-healing mechanism re-populate the new bookie with data.
This ensures data availability within an individual cluster.
Furthermore, Pulsar supports rack-placement to ensure that at least one replica of the data in the storage layer is stored in a different AZ within the same geographical region.
Pulsar’s geo-replication mechanism allows you to asynchronously replicate data across multiple clusters to maintain consist copies of your datasets between regions.
These capabilities combine to provide continuous data availability.
At the application level, Pulsar provides connection-aware clients that insulate the application from intermittent network outages.
The pulsar client automatically detects these network issues and re-establishes the connection rather than throw an exception that (if uncaught) could cause the application to crash
This behavior is completely hidden from the application code and provides resiliency to broker failures.
Prior to the 2.10 release Pulsar was able to provide continuous availability at only the platform and data level.
Pulsar’s geo-replication mechanism allows you to replicate the data across multiple geographic regions. Ensuring that your data will be available even in the event of a region failure event.
Similarly, Pulsar’s architecture supports multiple clusters spread across different geographical regions. Ensuring that a complete Pulsar cluster will be readily available in the event of a region failure event.
The one missing piece to the continuous availability story was the application layer.
READ SLIDE
Up until now, Pulsar clients could only interact with a single Pulsar cluster and were unable to detect and respond to a cluster-level failure event. In the event of a complete cluster failure, these clients cannot reroute their messages to a secondary/standby cluster automatically.
This would eventually lead to prolonged outages at the application level.
Prior to the 2.10 release of Pulsar, the best you could do was to provide a single static endpoint for Pulsar as shown here.
Oftentimes, the connection URL to Pulsar is provided by a configuration file. This value is read once and remains static inside the application.
Then when a regional failure occurred, you had to manually change the DNS entry for that URL to point to the stand-by cluster.
READ SLIDE
Starting with release 2.10 of Pulsar, we have added a new feature called failover clients that solves these problems.
There are two distinct types of failover clients that are available in the 2.10 release
The first is one that will automatically reroute your client connections to a different Pulsar cluster as soon as it detects a cluster outage.
The second one allows you to trigger the failover through an exposed HTTP endpoint. This client will periodically invoke the exposed endpoint to get the connection details
of the cluster it is supposed to connect to. This approach allows your admins to have more control over the failover process.
So, let’s discuss the automated failover client first.
As the name implies, this failover client will automatically switch clients over to a designated standby cluster if and when it detects an outage on the primary cluster.
This is accomplished by a probe task that periodically interrogates the primary cluster to determine if it is running or not.
Once it has detected that the primary cluster is unavailable, it starts a timer to measure the length of the outage. This is to ensure that we don’t inadvertently switch over due to a transient network issue.
If the outage continues for longer than the user-configured duration, then the switch-over occurs.
Let’s look at how this automatic failover client is configured and used
This first thing to note is creation of a separate set of authentication credentials for the secondary cluster.
Next, note that there are both a primary cluster URL property and a secondary property.
The primary property takes the broker URL for your preferred cluster connection, while the secondary takes a list of one or more alternative clusters to connect to.
This allows you to have multiple stand-by clusters, which matches pulsar geo-replication capabilities to support multiple clusters.
The failoverDelay property specifies how long the primary cluster outage must be before switching over to the standby cluster.
The switchback property specifies how long the client waits to switch back to the primary cluster once it detects that the primary cluster is back up and running.
This is because the probe against the primary cluster will continue to run even after the client has failed over to the standby cluster. Once it has detected that the primary cluster is back up it will wait this long to switch back to the primary cluster
The checkInterval controls the frequency at which the probe is executed.
Finally, the failover configuration is the used to build a Pulsar client.
Now let’s discuss the controlled failover client
As the name implies, this client allows you to control when and where your pulsar client will fail over to.
This is accomplished via a REST service that YOU must implement.
Let’s look at how this controlled failover client is configured and used
This first thing to note is creation of a separate set of authentication credentials. These are for accessing the REST endpoint (NOT the standby cluster)
The default service URL property takes the broker URL for your preferred cluster connection.
The checkInterval controls the frequency at which the REST endpoint is executed.
The urlProvider is where you specify the address of the REST service you implemented, and the urlHeader is where you provide the contents of the HTTP header.
The header can be used to provide authentication credentials, etc.
Finally, the failover configuration is the used to build a Pulsar client.
Let’s look at a simple example of a REST endpoint service
First notice that the expected return type is a JSON object that contains the four fields show here.
This data structure allows you to provide all the necessary authentication credentials required to connect to a Pulsar cluster.
Also note that this information is generated dynamically in the code, so it could in theory read this information from a database, etc.
This provides much more flexibility than the Automated failover client which requires you to provide a hard-coded list of Pulsar broker URLs.
In this example, I am forcing a switch over to a standby cluster based on the number of times the REST endpoint is called
This is the demonstrate a failover to a standby cluster and back to the active as we shall see.
Next I will demonstrate both of these failover clients in action.
For those of you that are interested, the source code for this demo is available in the GitHub repo shown here.