DevOps sensors 360° high availability in the cloud
DevOps Sensors 360°High Availability in the CloudLahav Savir, Architect & CEOEmind systems Ltd.email@example.com
AboutLahav Savir• 15+ years’ experience• Architect and CEO @ Emind SystemsEmind Systems (est. 2006)• Highly professional system integrator• Dedicated Cloud Architecture and DevOps teams• 24x7 SLA by DevOps Specialists• ~100 AWS customers• Partnerships with leading cloud vendors
What is Availability?The ability to provide quality servicethat can support the online service
DowntimeThe term downtime refers to periods when asystem is unavailable.Downtime or outage duration refers to a periodof time that a system/service fails to provide orperform its primary function.
Unavailability / Causes of Downtime• Hardware failure – 55%• Human error – 22%• Software failure – 18%• Natural disasters – 5%http://www.continuitycentral.com/news06645.htmlReputable studies have concluded that as much as75% of downtime is the result of some sortof human error.http://searchdatacenter.techtarget.com/feature/The-causes-and-costs-of-data-center-system-downtime-Advisory-Board-QA
Hardware / InfrastructureAWS SLA – 99.9 – 100%• Redundant servers– Multiple servers of each type– EBS, Snapshots• Multi AZ– ELB, VPC• Multi Region• PaaS– S3, SQS, DynamoDB, RDS, Route53 . . .
Architect• Plan based on experience and best practices• Continuously review and correct
Applications Counters / Metricsnet-snmp sub-agenthttp://www.emind.co/open-source/• net-snmp_shell_subagent# Syntax# < oid > ; < type > ; < script path >.18.104.22.168.4.1.39731.2100.1:string:/usr/local/emind/sync_manager/sync_manager.sh status status.22.214.171.124.4.1.39731.2100.2:string:/usr/local/emind/sync_manager/sync_manager.sh status state.126.96.36.199.4.1.39731.2100.3:integer:/usr/local/emind/sync_manager/sync_manager.sh status sync_duration_min.188.8.131.52.4.1.39731.2100.4:integer:/usr/local/emind/sync_manager/sync_manager.sh status idle_duration_h.184.108.40.206.4.1.39731.2100.5:string:/usr/local/emind/sync_manager/sync_manager.sh status last