• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Architectures for High Availability - QConSF

Architectures for High Availability - QConSF



Architecture talk aimed at a well informed developer audience (i.e. QConSF Real Use Cases for NoSQL track), focused mainly on availability. Skips the Netflix cloud migration stuff that is in other ...

Architecture talk aimed at a well informed developer audience (i.e. QConSF Real Use Cases for NoSQL track), focused mainly on availability. Skips the Netflix cloud migration stuff that is in other talks.



Total Views
Views on SlideShare
Embed Views



13 Embeds 254

https://twitter.com 189
http://www.newvem.com 15
http://www.techgig.com 15
http://www.linkedin.com 9
http://irq.tumblr.com 8
https://si0.twimg.com 4
http://tweetedtimes.com 3
http://kred.com 3
http://faxo.com 2
http://flavors.me 2
https://www.linkedin.com 2
http://moderation.local 1
https://t.co 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.


14 of 4 previous next Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Hey Adrian, Awesome deck. I am looking into the fifth slide issue. Meanwhile as a work around try uploading pdf version.
    (You can easyly save your pptx into pdf from save as.)
    Are you sure you want to
    Your message goes here
  • Hey Adrian, I am looking into your Slide 5 Issue. Mean while as a work around, Try uploading a pdf version.
    (You can save your pptx into pdf from save as dialog box.) :)
    Are you sure you want to
    Your message goes here
  • Hey Adrian,

    Great deck!

    I will look into the issue for the slide 5. As for you Skynet project, it's for sure a good idea. We have an internal tool being built at Slideshare which is also willing to take over the Cloud.

    Automation is the answer!
    Are you sure you want to
    Your message goes here
  • I don't know why slide 5 is blank, it should show a large picture of lots of consumer electronics devices that play Netflix
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Complete connection pool abstractionQueries and mutations wrapped in objects created by the Keyspace implementation making it possible to retry failed operations.  This varies from other connection pool implementations on which the operation is created on a specific connection and must be completely redone if it fails.Simplified serialization via method overloading.  The low level thrift library only understands data that is serialized to a byte array.  Hector requires serializers to be specified for nearly every call.  Astyanax minimizes the places where serializers are specified by using predefined ColumnFamiliy and ColumnPath definitions which specify the serializers.  The API also overloads set and get operation for common data types.The internal library does not log anything.  All internal events are instead ... calls to a ConnectionPoolMonitor interface.  This allows customization of log levels and filtering of repeating events outside of the scope of the connection poolSuper columns will soon be replaced by Composite column names. As such it is recommended to not use super columns at all and to use Composite column names instead. There is some support for super columns in Astyanax but those methods have been deprecated and will eventually be removed.

Architectures for High Availability - QConSF Architectures for High Availability - QConSF Presentation Transcript

  • Architectural Patterns for High Anxiety Availability November 2012 Adrian Cockcroft @adrianco #netflixcloud #qconsf http://www.linkedin.com/in/adriancockcroft@adrianco
  • The Netflix Streaming Service Now in USA, Canada, Latin America, UK, Ireland, Sweden, Denm ark, Norway and Finland@adrianco
  • US Non-Member Web Site Advertising and Marketing Driven@adrianco
  • Member Web Site Personalization Driven@adrianco
  • Streaming Device API@adrianco
  • Content Delivery Service Distributed storage nodes controlled by Netflix cloud services@adrianco
  • November 2012 Traffic@adrianco
  • Abstract• Netflix on Cloud – What, Why and When• Globally Distributed Architecture• Benchmarks and Scalability• Open Source Components• High Anxiety@adrianco
  • Blah Blah Blah (I’m skipping all the cloud intro etc. did that last year… Netflix runs in the cloud, if you hadn’t figured that out already you aren’t paying attention and should go read Infoq and slideshare.net/netflix)@adrianco
  • Things we don’t do@adrianco
  • Things We Do Do… In production at Netflix• Big Data/Hadoop 2009• AWS Cloud 2009• Application Performance Management 2010• Integrated DevOps Practices 2010• Continuous Integration/Delivery 2010• NoSQL, Globally Distributed 2010• Platform as a Service; Micro-Services 2010• Social coding, open development/github 2011@adrianco
  • How Netflix WorksConsumerElectronics User Data Web Site orAWS Cloud Discovery API Services PersonalizationCDN EdgeLocations DRM Customer Device Streaming API (PC, PS3, TV…) QoS Logging CDN Management and Steering OpenConnect CDN Boxes Content Encoding @adrianco
  • Web Server Dependencies Flow (Home page business transaction as seen by AppDynamics) Each icon is three to a few hundred instances across three AWS zones Cassandra memcached Web service Start Here S3 bucketPersonalization moviegroup chooser @adrianco
  • Component Micro-Services Test With Chaos Monkey, Latency Monkey@adrianco
  • Three Balanced Availability Zones Test with Chaos Gorilla Load Balancers Zone A Zone B Zone C Cassandra and Evcache Cassandra and Evcache Cassandra and Evcache Replicas Replicas Replicas@adrianco
  • Triple Replicated Persistence Cassandra maintenance affects individual replicas Load Balancers Zone A Zone B Zone C Cassandra and Evcache Cassandra and Evcache Cassandra and Evcache Replicas Replicas Replicas@adrianco
  • Isolated Regions US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas@adrianco
  • Failure Modes and EffectsFailure Mode Probability Mitigation PlanApplication Failure High Automatic degraded responseAWS Region Failure Low Wait for region to recoverAWS Zone Failure Medium Continue to run on 2 out of 3 zonesDatacenter Failure Medium Migrate more functions to cloudData store failure Low Restore from S3 backupsS3 failure Low Restore from remote archive@adrianco
  • Zone Failure Modes• Power Outage – Instances lost, ephemeral state lost – Clean break and recovery, fail fast, “no route to host”• Network Outage – Instances isolated, state inconsistent – More complex symptoms, recovery issues, transients• Dependent Service Outage – Cascading failures, misbehaving instances, human errors – Confusing symptoms, recovery issues, byzantine effects More detail on this topic at AWS Re:Invent later this month…@adrianco
  • Cassandra backed Micro-Services A highly scalable, available and durable deployment pattern@adrianco
  • Micro-Service Pattern One keyspace, replaces a single table or materialized view Single function Cassandra Many Different Single-Function REST Clients Cluster Managed by Priam Between 6 and 72 nodes Stateless Data Access REST Service Astyanax Cassandra Client OptionalEach icon represents a horizontally scaled service of three to Datacenterhundreds of instances deployed over three availability zones Update Flow Appdynamics Service Flow Visualization@adrianco
  • Stateless Micro-Service Architecture Linux Base AMI (CentOS or Ubuntu) Optional Apache frontend, memcache Java (JDK 6 or 7) d, non-java apps AppDynamics appagent monitoring Tomcat Monitoring Application war file, base servlet, Log rotation to S3 Healthcheck, status servlets, JMX platform, client interface jars, AppDynamics GC and thread dump interface, Servo autoscale Astyanax machineagent logging Epic/Atlas@adrianco
  • Astyanax Available at http://github.com/netflix• Features – Complete abstraction of connection pool from RPC protocol – Fluent Style API – Operation retry with backoff – Token aware• Recipes – Distributed row lock (without zookeeper) – Multi-DC row lock – Uniqueness constraint – Multi-row uniqueness constraint – Chunked and multi-threaded large file storage@adrianco
  • Astyanax Query ExamplePaginate through all columns in a rowColumnList<String> columns;int pageize = 10;try { RowQuery<String, String> query = keyspace .prepareQuery(CF_STANDARD1) .getKey("A") .setIsPaginating() .withColumnRange(new RangeBuilder().setMaxSize(pageize).build()); while (!(columns = query.execute().getResult()).isEmpty()) { for (Column<String> c : columns) { } }} catch (ConnectionException e) {}@adrianco
  • Astyanax - Cassandra Write Data Flows Single Region, Multiple Availability Zone, Token Aware Cassandra •Disks •Zone A1. Client Writes to local Cassandra 3 2Cassandra If a node goes coordinator •Disks4 3•Disks 4 offline, hinted handoff2. Coodinator writes to •Zone C 1 •Zone B completes the write 2 other zones Token when the node comes3. Nodes return ack back up.4. Data written to Aware internal commit log Clients Requests can choose to disks (no more than Cassandra Cassandra wait for one node, a 10 seconds later) •Disks •Disks quorum, or all nodes to •Zone B •Zone C ack the write Cassandra 3 SSTable disk writes and •Disks 4 compactions occur •Zone A asynchronously @adrianco
  • Data Flows for Multi-Region Writes Token Aware, Consistency Level = Local Quorum1. Client writes to local replicas If a node or region goes offline, hinted handoff2. Local write acks returned to completes the write when the node comes back up. Client which continues when Nightly global compare and repair jobs ensure 2 of 3 local nodes are everything stays consistent. committed3. Local coordinator writes to remote coordinator. 100+ms latency Cassandra Cassandra4. When data arrives, remote • Disks • Zone A • Disks • Zone A coordinator node acks and Cassandra 2 2 Cassandra Cassandra 4Cassandra 6 • Disks • Disks 6 3 5• Disks6 4 Disks6 copies to other remote zones • Zone C 1 • Zone B • Zone C • • Zone B 45. Remote nodes ack to local US EU coordinator Clients Clients Cassandra 2 Cassandra Cassandra Cassandra6. Data flushed to internal • Disks • Zone B • Disks • Zone C 6 • Disks • Zone B • Disks • Zone C commit log disks (no more Cassandra 5 6Cassandra • Disks than 10 seconds later) • Zone A • Disks • Zone A @adrianco
  • Cassandra Instance Architecture Linux Base AMI (CentOS or Ubuntu) Tomcat and Priam on JDK Java (JDK 7) Healthcheck, Status AppDynamics appagent monitoring Cassandra Server Monitoring Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk holding Commit log and AppDynamics GC and thread dump SSTables machineagent logging Epic/Atlas@adrianco
  • Priam – Cassandra Automation Available at http://github.com/netflix• Netflix Platform Tomcat Code• Zero touch auto-configuration• State management for Cassandra JVM• Token allocation and assignment• Broken node auto-replacement• Full and incremental backup to S3• Restore sequencing from S3• Grow/Shrink Cassandra “ring”@adrianco
  • Cassandra Backup• Full Backup Cassandra Cassandra Cassandra – Time based snapshot – SSTable compress -> S3 Cassandra Cassandra• Incremental S3 Backup Cassandra Cassandra – SSTable write triggers compressed copy to S3 Cassandra Cassandra• Archive Cassandra Cassandra – Copy cross region A@adrianco
  • Deployment at Netflix Over 50 Cassandra Clusters Over 500 m2.4xlg+hi1.4xlg Over 30TB of daily backups Biggest cluster 72 nodes 1 cluster over 250Kwrites/s@adrianco
  • Cassandra Explorer for Data Open source on github soon@adrianco
  • ETL for Cassandra• Data is de-normalized over many clusters!• Too many to restore from backups for ETL• Solution – read backup files using Hadoop• Aegisthus – http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html – High throughput raw SSTable processing – Re-normalizes many clusters to a consistent view – Extract, Transform, then Load into Teradata@adrianco
  • Benchmarks and Scalability@adrianco
  • Cloud Deployment Scalability New Autoscaled AMI – zero to 500 instances from 21:38:52 - 21:46:32, 7m40s Scaled up and down over a few days, total 2176 instance launches, m2.2xlarge (4 core 34GB) Min. 1st Qu. Median Mean 3rd Qu. Max. 41.0 104.2 149.0 171.8 215.8 562.0@adrianco
  • Scalability from 48 to 288 nodes on AWS http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Client Writes/s by node count – Replication Factor = 31200000 10998371000000 800000 Used 288 of m1.xlarge 600000 4 CPU, 15 GB RAM, 8 ECU 537172 Cassandra 0.86 400000 Benchmark config only 366828 existed for about 1hr 200000 174373 0 0 50 100 150 200 250 300 350@adrianco
  • “Some people skate to the puck, I skate to where the puck is going to be” Wayne Gretzky@adrianco
  • Cassandra on AWSThe Past The Future• Instance: m2.4xlarge • Instance: hi1.4xlarge• Storage: 2 drives, 1.7TB • Storage: 2 SSD volumes, 2TB• CPU: 8 Cores, 26 ECU • CPU: 8 HT cores, 35 ECU• RAM: 68GB • RAM: 64GB• Network: 1Gbit • Network: 10Gbit• IOPS: ~500 • IOPS: ~100,000• Throughput: ~100Mbyte/s • Throughput: ~1Gbyte/s• Cost: $1.80/hr • Cost: $3.10/hr@adrianco
  • Cassandra Disk vs. SSD Benchmark Same Throughput, Lower Latency, Half Cost@adrianco
  • Netflix Open Source Strategy• Release PaaS Components git-by-git – Source at github.com/netflix – we build from it… – Intros and techniques at techblog.netflix.com – Blog post or new code every few weeks• Motivations – Give back to Apache licensed OSS community – Motivate, retain, hire top engineers – “Peer pressure” code cleanup, external contributions@adrianco
  • Instance creation Bakery & Build tools Asgard Base AMI Instance Autoscaling Application Odin Code scripts Image baked ASG / Instance started Instance Running@adrianco
  • Application Launch Governator Eureka (Guice) Async logging Archaius Edda Servo Service Application initializing Registry, configuration history@adrianco
  • Runtime Astyanax Priam Curator Chaos Monkey Latency Monkey NIWS Exhibitor LB Janitor Monkey REST Cass JMeter Dependency client Command Explorers Client Side Server Side Resiliency aids Components Components@adrianco
  • Open Source Projects Legend Github / Techblog Priam Exhibitor Servo and Autoscaling ScriptsApache Contributions Cassandra as a Service Zookeeper as a Service Astyanax Curator Honu Techblog Post Cassandra client for Java Zookeeper Patterns Log4j streaming to Hadoop Coming Soon CassJMeter EVCache Circuit Breaker Cassandra test suite Memcached as a Service Robust service pattern Cassandra Multi-region EC2 Eureka / Discovery Asgard - AutoScaleGroup datastore support Service Directory based AWS console Aegisthus Archaius Chaos Monkey Hadoop ETL for Cassandra Dynamics Properties Service Robustness verification Edda Explorers Latency Monkey Queryable config history Governator - Library lifecycle Server-side latency/error Janitor Monkey and dependency injection injection Odin REST Client + mid-tier LB Bakeries and AMI Workflow orchestration Async logging Configuration REST endpoints Build dynaslaves@adrianco
  • Cassandra Next Steps• Migrate Production Cassandra to SSD – Many clusters done – 100+ SSD nodes running• Autoscale Cassandra using Priam – Cassandra 1.2 Vnodes make this easier – Shrink Cassandra cluster every night• Automated Zone and Region Operations – Add/Remove Zone, split or merge clusters – Add/Remove Region, split or merge clusters@adrianco
  • YOLO@adrianco
  • Skynet A Netflix Hackday project that might just terminate the world… (hack currently only implemented in Powerpoint – luckly)@adrianco
  • The Plot (kinda)• Skynet is a sentient computer• Skynet defends itself if you try to turn it off• Connor is the guy who eventually turns it off• Terminator is the robot sent to kill Connor@adrianco
  • The Hacktors• Cass_skynet is a self-managing Cassandra cluster• Connor_monkey kills cass_skynet nodes• Terminator_monkey kills connor_monkey nodes@adrianco
  • The Hacktion• Cass_skynet stores a history of its world and action scripts that trigger from what it sees• Action response to losing a node – Auto-replace node and grow cluster size• Action response to losing more nodes – Replicate cluster into a new zone or region• Action response to seeing a Connor_monkey – Startup a Terminator_monkey@adrianco
  • Implementation• Priam – Autoreplace missing nodes – Grow cass_skynet cluster in zone, to new zones or regions• Cassandra Keyspaces – Actions – scripts to be run – Memory – record event log of everything seen• Cron job once a minute – Extract actions from Cassandra and execute – Log actions and results in memory• Chaos Monkey configuration – Terminator_monkey: pick a zone, kill any connor_monkey – Connor_monkey: kill any cass_skynet or terminator_monkey@adrianco
  • “Simulation”@adrianco
  • High Anxiety@adrianco
  • Takeaway Netflix has built and deployed a scalable global platform based on Cassandra and AWS.Key components of the Netflix PaaS are being released as Open Source projects so you can build your own custom PaaS. SSD’s in the cloud are awesome…. http://github.com/Netflix http://techblog.netflix.com http://slideshare.net/Netflix http://www.linkedin.com/in/adriancockcroft @adrianco http://perfcap.blogspot.com@adrianco