Redis Cluster is a great distributed implementation of Redis providing high performances, scalability and availability. In this session we will present how to leverage containers and orchestration layers like Kubernetes and Openshift in order to extend the operability of Redis Cluster and automate operations like scaling, recovery following failures and upgrades.
Hi Everyone, thanks for coming to this session.
Yesterday, my colleague nenad show us during his session how we use Redis for the client session offloading in our Java application in Amadeus.
And today in this session I presenting another aspect of the work that we have done around Redis. That is how we operate Redis-Cluster in our new internal Platform as a service based on Openshift.
Amadeus is a technology company dedicated to the global travel industry.
We are present in more than 190 countries, with a worldwide team of more than 15,000 people.
Offer bunch of Services and tools for the travel industry actors.
You may use our service, when you search the lower flight price, book a flight, checking in an hotel.
Some Numbers
Our Amadeus are processing more than 1.6 billion requests per day,
1.4 billion passengers boarded in a flight last year thanks to our services.
approximately 2000 bookings since I started to talk
We run software since 30 years now.
Other technologies and languages like:
Java
homemade distributed architecture running on Linux and implemented C++
In 2015, we started another transformation with the introduction of our new PAAS call ACS (for Amadeus cloud service) base on Openshift
Why amadeus invests in a new PAAS?
Answer to new Business Requirements: improves SLA, new models
Current infrastructure: each application have there own resources -> spare resources, limited elasticy
Goal: be able to run our application in a multi-cluster - multi datacenter environment.
On private or public cloud like: GCE, AWS or Microsoft Azure.
Solution based on OPENSHIFT
We already have application running on ACS processing requests on Production: Amadeus Airline Cloud Availability or Digital Ecommerce are two of them.
Currents applications manges differently their resiliencies
Compare Redis with other solution on the market
Who in this room didn't know Kubernetes ? and Openshift
opensource system for managing containerized application across multiple hosts,
it provides basic mechanisms for deployment, maintenance, and scaling application.
Openshift Container Platform
ease a application management: enhanced security, built-in Continuous Delivery and deployment, administration UI.
Openshift is a distribution of Kubernetes that provides a complete Platform as a Service.
let's discuss about Redis and and Kubernetes.
When you look at the kubernetes documentation and example we can say the Kubernetes love Redis, the majority of the documentation example are running a Redis-Server process. Mainly thanks to one of the main advantage of Redis that it is a light mono-process with only one port allocated. Also because the full configuration is stored in an unique file.
One of the first example using Redis was the "guestbook" example: a PHP frontend serving form in that allow user to add a comment, and then list all the previous comments. This frontend was using a Redis a backend storage: one Master for the write operation and several slaves for the read.
The second example that it come to my mind was presented during this conference last year. It was an example of how we can run Redis with cluster mode activated. It was the first example that I saw with a cluster configuration. And was very didactic.
Other examples can be found on the web, but no one was providing all the features that we wanted. Since we really wanted to automatize as possible the Redis-Cluster operations.
So key missing features:
a simple deployment
can easily change the cluster topology.
provide a simple Rolling-update mechanism
be resilient to failure.
That why more than one year ago we started to look at how we can achieve it and validate all our requirements with our solution.
First we decided to use Redis with the Cluster mode. we was thinking that having a self manager cluster will remove part of the redis operation support. Also having the sharding mechanism directly provided by the redis-cluster will removed the need of a proxy like temproxy.
The second decision was to automatize as possible the configuration of the cluster. The deployment of a new cluster, should be easy as creating a new pod.
Same for scaling-up or down the cluster.
So we created a new component call "Redis-Manager" that is here to link the kubernetes world to the Redis-Cluster configuration. The Redis-Manager is able to understands and communicates with the kubernetes API, but also it is able to interact to the Redis process.
The cluster configuration that is specific of each cluster, is stored in a configMap, that is constantly watched by the Redis-Manager. You can found in this Configuration the number of Redis-Master and also the replication factor.
So, when the Manager see a difference between the current cluster configuration and the confMap, It start to takes some decisions in order to reconsiliate the cluster state with the wanted configuration. He can scale the Deployment in order to have more/or less Redis-node. When some Redis-slots need to be reassign, the manager is doing it without any human intervention.
Another strong requirement was the rolling-update. Seamless as possible.
So how it works?
Well we implemented it thanks to an Openshift feature: In openshift a Deployment object is call a DeploymentConfig. like the Deployment in Kubernetes, this Object is here to manage the migration from one to another version of an application with the possibility to choose different settings with migration strategy (Rolling update). But in addition to that in Openshift the migration between the 2 version can be handle by a "Custom deployer" that can implement your own update logic.
So let's see how we take advantage of this feature:
First when a DevOps Update the deploymentConfig, with for example a new version of the redis binary. Openshift detect this modification and create a new ReplicationController with the new Pod template. Also Openshift start a new Pod that contains the CustomDeployer call in this example: Redis-Deployer.
This Deployer is configured to have access to the information from the DeploymentConfig and the associated ReplicationController.
In our case the Redis-Deployer start to scale up the new ReplicationController to add new Redis-Node version V2 in the Cluster. when a new Pod is up-and-running the the deployer use the Annotation on the ReplicationController to inform the Redis-Manager that he can start migrating the slot of an old Redis-Node to a new one. This operation is repeated for each Node.
when all slot have been migrated to the new Redis-node pod, the Redis-Manager inform the Deployer that he can start to scale down the Old ReplicationController.
i hope you liked the demo.
So what are the advantage of this solution?
First we saw the basic Redis-Cluster operation are completely automated.
Then this solution don't rely on Persistent volume.
and Finally creating and managing a RedisCluster thanks to the RedisManager is seen as yet another Kubernetes application.
But we already know that we can we can still improve the current solution, we already identify some limitations.
first limitation,
we need to deployed on Redis-Manager per RedisCluster. The Redis-Manager process is a small golang process, but still it may be better to have the possibility to handle severa Cluster with one Manager.
Currently the Redis-Manager react on action triggered by Kubernetes, like the pod deletion when we do a scaledown. It is not possible to say to kubernetes which pod we want to delete first in case of a scale down. So we have some additional logic in the Redis-Node pod to cache the SIGTERM signal and start the Failover node process in case of the current Pod is a Master or just an eviction command if it is a slave.
the Current Rolling-Update solution depends on the possibility to run a custom deployer offer by Openshift, but we would-like to be Kubernetes compliant.
Last point, the User Experience is perfectible, some Actions can be done in several way.
For example the scale, can be done thanks to an update of the ConfigMap or use the scale command on the deploymentConfig.
So how to solve our currents limitations?
Operator Concept?
Since we started to implement our solution, Others companies faced the same problem with other Databases; like CoreOs with Etcd.
CoreOS solve this issue with a new kind of Kubernetes application that they call "Operator". They already proposed officially two Operators: Etcd Operator and the Prometheus Operator.
This is the official definition of an Operator: "An Operator represents human operational knowledge in software, to reliably manage an application."
If we compare the Etcd Operator with the Redis-Manager that I presented previously, we can see that they follow the same logic: Providing a daemon process that will do the glue between the Kubernetes resources management and the specificities of the Application that we want to manage.
But Operators benefit from one new kind of Kubernetes API Object that didn't exist when we started the Redis-Manager implementation called ThirdPartyResources.
Third Party Resources objects are a way to extend the kubernetes API with a new API Object type. Like others native API object type: the new object support CRUD operations and watch.
Thanks to this ability to add new object type, Everyone can implement some custom controller that react on a ThridpartyResource livecycle: creation, update, delete.
So how to resolve our currents limitation: with a Redis-Operator !!!
Lets see how we can transform the Redis-Manager to a Redis-Operator
First instead of using a ConfigMap to store the Cluster configuration, the Redis-Operator will define a Redis-Cluster TPR, that store the same information. In addition of the cluster topology parameter, the Redis-Cluster TPR will also store the PodTemplate that was previously provided in the Deployment Object.
Like this we tiny couple all the informations needed to manager the cluster.
Most of our Redis-Manager logic will be reused in the Redis-Operator. The main change is that instead of watching event on a Deployment object. Now the Redis-Operator will react on Redis-Cluster TPR event.
It will be also the Role of the Redis-Operator to request to the k8s scheduler new instance of the Pod.
All of this new architecture will result to less hacks and so more overall stability.
Main advantage of following the Operator Logic, it to hide the specificity of Redis-Cluster in a new kind of K8s object thanks to the ThridPartyResources.
Also it improves the User experience, User are interacting with only one object that represent the full cluster.
This approach improve the stability of the solution, because now the Redis-Operator is not anymore responsive to decision taken by the K8s controller, since the Operator is the Controller dedicated the the Redis-Cluster TPR.
By default a controller/Operator is designed to react on instances of an Object kind. so one Redis-Operator will be able to manage several Redis-Cluster.
So what next?
First we want to open-source what we already done as it is. We know that it is not the final solution that we want to propose. but we think that it can be already interested for some peoples. And we are interested about their feedbacks.
We already have define a Roadmap to improve the initial open-source version:
we think that the Operator approach is the right direction so the first improvement will be to migrate our logic into an Operator.
then we can think of proposing a Helm Chart for this Redis-Operator.
Another interesting development in the Kubernetes Community is the Implementation of the Service Catalogue, that is defining an Open Service Broker API to consume external services in kubernetes. So we can consider Redis-Cluster as an external service, and when an application needs an Redis-Cluster, a request is sent to a RedisCluster broker that will create for the application, the proper Redis-Cluster ThirdPartyResource. At the end of the process, the Application will get the Name of the service who target its instance of a Redis-Cluster with also the credential.
Thanks for your attention, and now if we have still some time I can answer to your questions?