The Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
Self-Healing AWS-Based Open Source Distributed Rendering System By George Nassef
1. Self-Healing Amazon AWS-Based Open
Source Distributed Rendering System
Presentation to Internet Tech Forum
George Nassef
Chief Technology Officer
June 2, 2014
2. Our Environment
LAMPR (Linux Apache MySQL PHP And Redis)
Millions of IOS and Android Client Mobile Devices
RESTFul API MVC
Ad-hoc video rendering jobs submitted with real-time expectations.
Need for job queue, dispatch, processing and most-importantly: self-healing
recovery.
3. The Problem
I wanted to deploy only Amazon SPOT instances to save money.
However, Spot instances can come and go and be terminated mid-job at
any time due to price/availability changes.
Also, wanted full horizontal scaling with queue-based communications.
Any render job which did not finish must be rapidly identified and
requeued.
However, geo-distribution makes traditional requeue software moot.
4. Solution
Use REDIS for queue by creating work queue keys.
Place to-be-done work in “Messages” key.
Have each render server pick-up work, move the jobid to the “In Work” queue.
Created an outboard Mongodb system to capture all system and application
logs.
Separate process intelligently watches Mongodb system and interprets program
alerts, system messages, warnings, errors.
“Watching” processor requeues render jobs before they fail.
5. Progress
After hundreds of thousands of submitted render jobs, only 2 failures.
Failures were unrelated to “chaos” of environment or spot instances.
Realtime analysis of mongodb logs closes/catches security holes, system
errors and provides data and user response time information.
Huge savings in AWS costs by being able to use ANY availability zone,
worldwide and spin-up or down instances at will without regard to
workload.
System “self heals” as it scales.
6. Challenges / Choices
Hosted REDIS by AWS is ElasticCache.
Accessible seamlessly across Availability Zones, but not Regions.
Autoscale groups for spot instances take time to “spin up.”
CloudWatch metrics may not fit your workload for rapid spinup.
7. Challenges / Choices
Hosted REDIS by AWS is ElasticCache.
Accessible seamlessly across Availability Zones, but not Regions.
Autoscale groups for spot instances take time to “spin up.”
CloudWatch metrics may not fit your workload for rapid spinup.