Your SlideShare is downloading. ×
Scalable Data Storage Getting You Down? To The Cloud!
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Scalable Data Storage Getting You Down? To The Cloud!

1,706
views

Published on

This was a three hour workshop given at the 2011 Web 2.0 Expo in San Francisco. Due to the length of the presentation and the number of presenters, portions of the slide deck may appear disjoint …

This was a three hour workshop given at the 2011 Web 2.0 Expo in San Francisco. Due to the length of the presentation and the number of presenters, portions of the slide deck may appear disjoint without the accompanying narrative.

Abstract: "The hype cycle is at a high for cloud computing, distributed “NoSQL” data storage, and high availability map-reducing eventually consistent distributed data processing frameworks everywhere. Back in the real world we know that these technologies aren’t a cure-all. But they’re not worthless, either. We’ll take a look behind the curtains and share some of our experiences working with these systems in production at SimpleGeo.

Our stack consists of Cassandra, HBase, Hadoop, Flume, node.js, rabbitmq, and Puppet. All running on Amazon EC2. Tying these technologies together has been a challenge, but the result really is worth the work. The rotten truth is that our ops guys still wake up in the middle of the night sometimes, and our engineers face new and novel challenges. Let us share what’s keeping us busy—the folks working in the wee hours of the morning—in the hopes that you won’t have to do so yourself."

Published in: Technology

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,706
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
111
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SCALABLE DATA STORAGE GETTING YOU DOWN? TO THE CLOUD! Web 2.0 Expo SF 20 Mike Male, Mike Pcnko, Dek Smh, Paul Lhrop
  • 2. THE CAST MIKE MALONE INFRASTRUCTURE ENGINEER @MJMALONE MIKE PANCHENKO INFRASTRUCTURE ENGINEER @MIHASYA DEREK SMITH INFRASTRUCTURE ENGINEER @DSMITTS PAUL LATHROP OPERATIONS @GREYTALYN
  • 3. SIMPLEGEO We originally began as a mobile gaming startup, but quickly discovered that the location services and infrastructure needed to support our ideas didn’t exist. So we took matters into our own hands and began building it ourselves. Mt Gaig Joe Stump CSO & co-founder CTO & co-founder
  • 4. THE STACK www gnop AWS RDS AWS auth/proxy ELB HTTP data centers ... ... api servers record storage reads geocoder queues reverse geocoder GeoIP pushpin writes index storage Apache Cassandra
  • 5. BUT WHY NOT POSTGIS?
  • 6. DATABASESWHAT ARE THEY GOOD FOR?DATA STORAGEDurably persist system stateCONSTRAINT MANAGEMENTEnforce data integrity constraintsEFFICIENT ACCESSOrganize data and implement access methods for efficientretrieval and summarization
  • 7. DATA INDEPENDENCEData independence shields clients from the detailsof the storage system, and data structureLOGICAL DATA INDEPENDENCE Clients that operate on a subset of the attributes in a data set should not be affected later when new attributes are addedPHYSICAL DATA INDEPENDENCE Clients that interact with a logical schema remain the same despite physical data structure changes like • File organization • Compression • Indexing strategy
  • 8. TRANSACTIONAL RELATIONALDATABASE SYSTEMSHIGH DEGREE OF DATA INDEPENDENCE Logical structure: SQL Data Definition Language Physical structure: Managed by the DBMSOTHER GOODIES They’re theoretically pure, well understood, and mostly standardized behind a relatively clean abstraction They provide robust contracts that make it easy to reason about the structure and nature of the data they contain They’re ubiquitous, battle hardened, robust, durable, etc.
  • 9. ACIDThese terms are not formally defined - they’re aframework, not mathematical axiomsATOMICITY Either all of a transaction’s actions are visible to another transaction, or none areCONSISTENCY Application-specific constraints must be met for transaction to succeedISOLATION Two concurrent transactions will not see one another’s transactions while “in flight”DURABILITY The updates made to the database in a committed transaction will be visible to future transactions
  • 10. ACID HELPSACID is a sort-of-formal contract that makes iteasy to reason about your data, and that’s goodIT DOES SOMETHING HARD FOR YOU With ACID, you’re guaranteed to maintain a persistent global state as long as you’ve defined proper constraints and your logical transactions result in a valid system state
  • 11. CAP THEOREMAt PODC 2000 Eric Brewer told us there were threedesirable DB characteristics. But we can only have two.CONSISTENCY Every node in the system contains the same data (e.g., replicas are never out of date)AVAILABILITY Every request to a non-failing node in the system returns a responsePARTITION TOLERANCE System properties (consistency and/or availability) hold even when the system is partitioned and data is lost
  • 12. CAP THEOREM IN 30 SECONDSCLIENT SERVER REPLICA
  • 13. CAP THEOREM IN 30 SECONDSCLIENT SERVER REPLICA wre
  • 14. CAP THEOREM IN 30 SECONDSCLIENT SERVER plice REPLICA wre
  • 15. CAP THEOREM IN 30 SECONDSCLIENT SERVER plice REPLICA wre ack
  • 16. CAP THEOREM IN 30 SECONDSCLIENT SERVER plice REPLICA wre aept ack
  • 17. CAP THEOREM IN 30 SECONDSCLIENT SERVER REPLICA wre FAIL! ni UNAVAILAB!
  • 18. CAP THEOREM IN 30 SECONDSCLIENT SERVER REPLICA wre FAIL! aept CSTT!
  • 19. ACID HURTSCertain aspects of ACID encourage (require?)implementors to do “bad things”Unfortunately, ANSI SQL’s definition of isolation... relies in subtle ways on an assumption that a locking scheme is used for concurrency control, as opposed to an optimistic or multi-version concurrency scheme. This implies that the proposed semantics are ill-defined. Joseph M. Hellerstein and Michael Stonebraker Anatomy of a Database System
  • 20. BALANCEIT’S A QUESTION OF VALUES For traditional databases CAP consistency is the holy grail: it’s maximized at the expense of availability and partition tolerance At scale, failures happen: when you’re doing something a million times a second a one-in-a-million failure happens every second We’re witnessing the birth of a new religion... • CAP consistency is a luxury that must be sacrificed at scale in order to maintain availability when faced with failures
  • 21. NETWORK INDEPENDENCEA distributed system must also manage thenetwork - if it doesn’t, the client has toCLIENT APPLICATIONS ARE LEFT TO HANDLE Partitioning data across multiple machines Working with loosely defined replication semantics Detecting, routing around, and correcting network and hardware failures
  • 22. WHAT’S WRONGWITH MYSQL..?TRADITIONAL RELATIONAL DATABASES They are from an era (er, one of the eras) when Big Iron was the answer to scaling up In general, the network was not considered part of the systemNEXT GENERATION DATABASES Deconstructing, and decoupling the beast Trying to create a loosely coupled structured storage system • Something that the current generation of database systems never quite accomplished
  • 23. UNDERSTANDING CASSANDRA
  • 24. APACHE CASSANDRAA DISTRIBUTED STRUCTURED STORAGE SYSTEMEMPHASIZING Extremely large data sets High transaction volumes High value data that necessitates high availabilityTO USE CASSANDRA EFFECTIVELY IT HELPS TOUNDERSTAND WHAT’S GOING ON BEHIND THE SCENES
  • 25. APACHE CASSANDRAA DISTRIBUTED HASH TABLE WITH SOME TRICKS Peer-to-peer architecture with no distinguished nodes, and therefore no single points of failure Gossip-based cluster management Generic distributed data placement strategy maps data to nodes • Pluggable partitioning • Pluggable replication strategy Quorum based consistency, tunable on a per-request basis Keys map to sparse, multi-dimensional sorted maps Append-only commit log and SSTables for efficient disk utilization
  • 26. NETWORK MODELDYNAMO INSPIREDCONSISTENT HASHING Simple random partitioning mechanism for distribution Low fuss online rebalancing when operational requirements changeGOSSIP PROTOCOL Simple decentralized cluster configuration and fault detection Core protocol for determining cluster membership and providing resilience to partial system failure
  • 27. CONSISTENT HASHINGImproves shortcomings of modulo-based hashing 1 sh(alice) % 3 2 => 23 % 3 => 2 3
  • 28. CONSISTENT HASHINGWith modulo hashing, a change in the number ofnodes reshuffles the entire data set 1 sh(alice) % 4 2 => 23 % 4 => 3 3 4
  • 29. CONSISTENT HASHINGInstead the range of the hash function is mapped to aring, with each node responsible for a segment 0 sh(alice) => 23 84 42
  • 30. CONSISTENT HASHINGWhen nodes are added (or removed) most of the datamappings remain the same 0sh(alice) => 23 84 42 64
  • 31. CONSISTENT HASHINGRebalancing the ring requires a minimal amount ofdata shuffling 0 sh(alice) => 23 96 32 64
  • 32. GOSSIPDISSEMINATES CLUSTER MEMBERSHIP ANDRELATED CONTROL STATE Gossip is initiated by an interval timer At each gossip tick a node will • Randomly select a live node in the cluster, sending it a gossip message • Attempt to contact cluster members that were previously marked as down If the gossip message is unacknowledged for some period of time (statistically adjusted based on the inter-arrival time of previous messages) the remote node is marked as down
  • 33. REPLICATIONREPLICATION FACTOR Determines how manycopies of each piece of data are created in thesystem RF=3 0 sh(alice) => 23 96 32 64
  • 34. CONSISTENCY MODELDYNAMO INSPIREDQUORUM-BASED CONSISTENCY W=2 0 wre sh(alice) => 23 ad 96 32 W+R>N R=2 64 Cstt
  • 35. TUNABLE CONSISTENCYWRITES ZERO DON’T BOTHER WAITING FOR A RESPONSE ANY WAIT FOR SOME NODE (NOT NECESSARILY A REPLICA) TO RESPOND ONE WAIT FOR ONE REPLICA TO RESPOND QUORUM WAIT FOR A QUORUM (N/2+1) TO RESPOND ALL WAIT FOR ALL N REPLICAS TO RESPOND
  • 36. TUNABLE CONSISTENCYREADS ONE WAIT FOR ONE REPLICA TO RESPOND QUORUM WAIT FOR A QUORUM (N/2+1) TO RESPOND ALL WAIT FOR ALL N REPLICAS TO RESPOND
  • 37. CONSISTENCY MODELDYNAMO INSPIREDREAD REPAIRHINTED HANDOFFANTI-ENTROPY W=2 wre fail
  • 38. CONSISTENCY MODELDYNAMO INSPIREDREAD REPAIR Asynchronously checks replicas duringreads and repairs any inconsistenciesHINTED HANDOFFANTI-ENTROPY W=2 wre ad + fix
  • 39. CONSISTENCY MODELDYNAMO INSPIREDREAD REPAIRHINTED HANDOFF Sends failed writes to another nodewith a hint to re-replicate when the failed node returnsANTI-ENTROPY wre plica
  • 40. CONSISTENCY MODELDYNAMO INSPIREDREAD REPAIRHINTED HANDOFF Sends failed writes to another nodewith a hint to re-replicate when the failed node returnsANTI-ENTROPY * ck * pair
  • 41. CONSISTENCY MODELDYNAMO INSPIREDREAD REPAIRHINTED HANDOFFANTI-ENTROPY Manual repair process where nodesgenerate Merkle trees (hash trees) to detect andrepair data inconsistencies pair
  • 42. DATA MODELBIGTABLE INSPIREDSPARSE MATRIX it’s a hash-map (associative array):a simple, versatile data structureSCHEMA-FREE data model, introduces new freedomand new responsibilitiesCOLUMN FAMILIES blend row-oriented and column-oriented structure, providing a high level mechanismfor clients to manage on-disk and inter-node datalocality
  • 43. DATA MODELTERMINOLOGYKEYSPACE A named collection of column families(similar to a “database” in MySQL) you only need one andyou can mostly ignore itCOLUMN FAMILY A named mapping of keys to rowsROW A named sorted map of columns or supercolumnsCOLUMN A <name, value, timestamp> tripleSUPERCOLUMN A named collection of columns, forpeople who want to get fancy
  • 44. DATA MODEL { column family “users”: { key “alice”: { “city”: [“St. Louis”, 1287040737182], columnsrow (name, value, timestamp) “name”: [“Alice” 1287080340940], , }, ... }, “locations”: { }, ... }
  • 45. IT’S A DISTRIBUTED HASH TABLEWITH A TWIST...COLUMNS IN ARE STORED TOGETHER ON ONE NODE,IDENTIFIED BY <keyspace, key>{ column family “users”: { key “ alice”: { “city”: [“St. Louis” 1287040737182], , columns “name”: [“Alice” 1287080340940], , }, ... },} ... bob alice s3b 3e8
  • 46. HASH TABLESUPPORTED QUERIES EXACT MATCH RANGE PROXIMITY ANYTHING THAT’S NOT EXACT MATCH
  • 47. COLUMNSSUPPORTED QUERIES EXACT MATCH { RANGE “users”: { “alice”: { “city”: [“St. Louis”, 1287040737182], PROXIMITY “friend-1”: [“Bob” 1287080340940], , friends “friend-2”: [“Joe”, 1287080340940], “friend-3”: [“Meg” 1287080340940], , “name”: [“Alice” 1287080340940], , }, ... } }
  • 48. LOG-STRUCTURED MERGEMEMTABLES are in memory data structures thatcontain newly written dataCOMMIT LOGS are append only files where newdata is durably writtenSSTABLES are serialized memtables, persisted todiskCOMPACTION periodically merges multiplememtables to improve system performance
  • 49. CASSANDRACONCEPTUAL SUMMARY...IT’S A DISTRIBUTED HASH TABLE Gossip based peer-to-peer “ring” with no distinguished nodes and no single point of failure Consistent hashing distributes workload and simple replication strategy for fault tolerance and improved throughputWITH TUNABLE CONSISTENCY Based on quorum protocol to ensure consistency And simple repair mechanisms to stay available during partial system failuresAND A SIMPLE, SCHEMA-FREE DATA MODEL It’s just a key-value store Whose values are multi-dimensional sorted map
  • 50. ADVANCED CASSANDRA - A case study -SPATIAL DATA IN A DHT
  • 51. A FIRST PASSTHE ORDER PRESERVING PARTITIONERCASSANDRA’S PARTITIONINGSTRATEGY IS PLUGGABLE Partitioner maps keys to nodes Random partitioner destroys locality by hashing Order preserving partitioner retains locality, storing keys in natural lexicographical order around ring z a alice a bob u h sam m
  • 52. ORDER PRESERVING PARTITIONER EXACT MATCH RANGE On a single dimension? PROXIMITY
  • 53. SPATIAL DATAIT’S INHERENTLY MULTIDIMENSIONAL 2 x 2, 2 1 1 2
  • 54. DIMENSIONALITY REDUCTIONWITH SPACE-FILLING CURVES 1 2 3 4
  • 55. Z-CURVESECOND ITERATION
  • 56. Z-VALUE 14 x
  • 57. GEOHASHSIMPLE TO COMPUTE Interleave the bits of decimal coordinates (equivalent to binary encoding of pre-order traversal!) Base32 encode the resultAWESOME CHARACTERISTICS Arbitrary precision Human readable Sorts lexicographically 01101 e
  • 58. DATA MODEL{ “record-index”: { key <geohash>:<id> “9yzgcjn0:moonrise hotel”: { “”: [“”, 1287040737182], }, ... }, “records”: { “moonrise hotel”: { “latitude”: [“38.6554420”, 1287040737182], “longitude”: [“-90.2992910”, 1287040737182], ... } }}
  • 59. BOUNDING BOXE.G., MULTIDIMENSIONAL RANGE Gie stuff  bg box! Gie 2  3 1 2 3 4 Gie 4  5
  • 60. SPATIAL DATASTILL MULTIDIMENSIONALDIMENSIONALITY REDUCTION ISN’T PERFECT Clients must • Pre-process to compose multiple queries • Post-process to filter and merge results Degenerate cases can be bad, particularly for nearest-neighbor queries
  • 61. Z-CURVE LOCALITY
  • 62. Z-CURVE LOCALITY x x
  • 63. Z-CURVE LOCALITY x x
  • 64. Z-CURVE LOCALITY x o o o x o o o o
  • 65. THE WORLDIS NOT BALANCED Credit: C. Mayhew & R. Simmon (NASA/GSFC), NOAA/NGDC, DMSP Digital Archive
  • 66. TOO MUCH LOCALITY 1 2 SAN FRANCISCO 3 4
  • 67. TOO MUCH LOCALITY 1 2 SAN FRANCISCO 3 4
  • 68. TOO MUCH LOCALITY 1 2 I’m sad. SAN FRANCISCO 3 4
  • 69. TOO MUCH LOCALITY I’m b. 1 2 I’m sad. Me o. SAN FRANCISCO 3 4 Let’s py xbox.
  • 70. A TURNING POINT
  • 71. HELLO, DRAWING BOARDSURVEY OF DISTRIBUTED P2P INDEXING An overlay-dependent index works directly with nodes of the peer-to-peer network, defining its own overlay An over-DHT index overlays a more sophisticated data structure on top of a peer-to-peer distributed hash table
  • 72. ANOTHER LOOK AT POSTGISMIGHT WORK, BUT The relational transaction management system (which we’d want to change) and access methods (which we’d have to change) are tightly coupled (necessarily?) to other parts of the system Could work at a higher level and treat PostGIS as a black box • Now we’re back to implementing a peer-to-peer network with failure recovery, fault detection, etc... and Cassandra already had all that. • It’s probably clear by now that I think these problems are more difficult than actually storing structured data on disk
  • 73. LET’S TAKE A STEP BACK
  • 74. EARTH
  • 75. EARTH
  • 76. EARTH
  • 77. EARTH
  • 78. EARTH, TREE, RING
  • 79. DATA MODEL{ “record-index”: { “layer-name:37 .875, -90:40.25, -101.25”: { “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182], , ... }, }, “record-index-meta”: { “layer-name:37 .875, -90:40.25, -101.25”: { “split”: [“false”, 1287040737182], } “layer-name: 37 .875, -90:42.265, -101.25” { “split”: [“true”, 1287040737182], “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182] , “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182] , } }}
  • 80. DATA MODEL{ “record-index”: { “layer-name:37 .875, -90:40.25, -101.25”: { “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182], , ... }, }, “record-index-meta”: { “layer-name:37 .875, -90:40.25, -101.25”: { “split”: [“false”, 1287040737182], } “layer-name: 37 .875, -90:42.265, -101.25” { “split”: [“true”, 1287040737182], “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182] , “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182] , } }}
  • 81. DATA MODEL{ “record-index”: { “layer-name:37 .875, -90:40.25, -101.25”: { “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182], , ... }, }, “record-index-meta”: { “layer-name:37 .875, -90:40.25, -101.25”: { “split”: [“false”, 1287040737182], } “layer-name: 37 .875, -90:42.265, -101.25” { “split”: [“true”, 1287040737182], “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182] , “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182] , } }}
  • 82. SPLITTINGIT’S PRETTY MUCH JUST A CONCURRENT TREE Splitting shouldn’t lock the tree for reads or writes and failures shouldn’t cause corruption • Splits are optimistic, idempotent, and fail-forward • Instead of locking, writes are replicated to the splitting node and the relevant child[ren] while a split operation is taking place • Cleanup occurs after the split is completed and all interested nodes are aware that the split has occurred • Cassandra writes are idempotent, so splits are too - if a split fails, it is simply be retried Split size: A Tunable knob for balancing locality and distributedness The other hard problem with concurrent trees is rebalancing - we just don’t do it! (more on this later)
  • 83. THE ROOT IS HOTMIGHT BE A DEAL BREAKER For a tree to be useful, it has to be traversed • Typically, tree traversal starts at the root • Root is the only discoverable node in our tree Traversing through the root meant reading the root for every read or write below it - unacceptable • Lots of academic solutions - most promising was a skip graph, but that required O(n log(n)) data - also unacceptable • Minimum tree depth was propsed, but then you just get multiple hot- spots at your minimum depth nodes
  • 84. BACK TO THE BOOKSLOTS OF ACADEMIC WORK ON THIS TOPIC But academia is obsessed with provable, deterministic, asymptotically optimal algorithms And we only need something that is probably fast enough most of the time (for some value of “probably” and “most of the time”) • And if the probably good enough algorithm is, you know... tractable... one might even consider it qualitatively better!
  • 85. - THE ROOT -A HOT SPOT AND A SPOF
  • 86. We have We want
  • 87. THINKING HOLISTICALLYWE OBSERVED THAT Once a node in the tree exists, it doesn’t go away Node state may change, but that state only really matters locally - thinking a node is a leaf when it really has children is not fatalSO... WHAT IF WE JUST CACHED NODES THATWERE OBSERVED IN THE SYSTEM!?
  • 88. CACHE ITSTUPID SIMPLE SOLUTION Keep an LRU cache of nodes that have been traversed Start traversals at the most selective relevant node If that node doesn’t satisfy you, traverse up the tree Along with your result set, return a list of nodes that were traversed so the caller can add them to its cache
  • 89. TRAVERSALNEAREST NEIGHBOR o o o xo
  • 90. TRAVERSALNEAREST NEIGHBOR o o o xo
  • 91. TRAVERSALNEAREST NEIGHBOR o o o xo
  • 92. KEY CHARACTERISTICSPERFORMANCE Best case on the happy path (everything cached) has zero read overhead Worst case, with nothing cached, O(log(n)) read overheadRE-BALANCING SEEMS UNNECESSARY! Makes worst case more worser, but so far so good
  • 93. DISTRIBUTED TREESUPPORTED QUERIES EXACT MATCH RANGE PROXIMITY SOMETHING ELSE I HAVEN’T EVEN HEARD OF
  • 94. DISTRIBUTED TREESUPPORTED QUERIES MUL EXACT MATCH DI P NS  RANGE NS! PROXIMITY SOMETHING ELSE I HAVEN’T EVEN HEARD OF
  • 95. LIFE OF A REQUEST
  • 96. THE BIRDS ‘N THE BEES ELB gate gate service service cass worker pool worker pool index
  • 97. THE BIRDS ‘N THE BEES ELB gate service cass worker pool index
  • 98. THE BIRDS ‘N THE BEES ELB load bag; AWS svice gate service cass worker pool index
  • 99. THE BIRDS ‘N THE BEES ELB load bag; AWS svice gate auicn; fwdg service cass worker pool index
  • 100. THE BIRDS ‘N THE BEES ELB load bag; AWS svice gate auicn; fwdg service buss logic - bic validn cass worker pool index
  • 101. THE BIRDS ‘N THE BEES ELB load bag; AWS svice gate auicn; fwdg service buss logic - bic validn cass cd stage worker pool index
  • 102. THE BIRDS ‘N THE BEES ELB load bag; AWS svice gate auicn; fwdg service buss logic - bic validn cass cd stage worker pool buss logic - stage/xg index
  • 103. THE BIRDS ‘N THE BEES ELB load bag; AWS svice gate auicn; fwdg service buss logic - bic validn cass cd stage worker pool buss logic - stage/xg index awome sauce f qryg
  • 104. ELB•Traffic management •Control which AZs are serving traffic •Upgrades without downtime •Able to remove an AZ, upgrade, test, replace•API-level failure scenarios •Periodically runs healthchecks on nodes •Removes nodes that fail
  • 105. GATE•Basic auth•HTTP proxy to specific services •Services are independent of one another •Auth is Decoupled from business logic•First line of defense •Very fast, very cheap •Keeps services from being overwhelmed by poorly authenticated requests
  • 106. RABBITMQ•Decouple accepting writes fromperforming the heavy lifting •Don’t block client while we write to db/ index•Flexibility in the event of degradationfurther down the stack •Queues can hold a lot, and can keep accepting writes throughout incident•Heterogenous consumers - pass the samemessage through multiple code paths easily
  • 107. AWS
  • 108. EC2• Security groups• Static IPs• Choose your data center• Choose your instance type• On-demand vs. Reserved
  • 109. ELASTIC BLOCK SUPPORT• Storage devices that can be anywhere from 1GB to 1TB• Snapshotting• Automatic replication• Mobility
  • 110. ELASTIC LOAD BALANCING• Distribute incoming traffic• Automatic scaling• Health checks
  • 111. SIMPLE STORAGE SERVICE• File sizes can be up to 5TBs• Unlimited amount of files• Individual read/write access credentials
  • 112. RELATIONAL DATABASE SERVICE• MySQL in the cloud• Manages replication• Specify instance types based on need• Snapshots
  • 113. CONFIGURATION MANAGEMENT
  • 114. WHY IS THIS NECESSARY?• Easier DevOps integration• Reusable modules• Infrastructure as code• One word: WINNING
  • 115. POSSIBLE OPTIONS
  • 116. SAMPLE MANIFEST # /root/learning-manifests/apache2.pp package { apache2: ensure => present; } file { /etc/apache2/apache2.conf: ensure => file, mode => 600, notify => Service[‘apache2’], source => /root/learning-manifests/apache2.conf, } service { apache2: ensure => running, enable => true, subscribe => File[/etc/apache2/apache2.conf], }
  • 117. AN EXAMPLE TERMINAL
  • 118. CONTINUOUSINTEGRATION
  • 119. GET ‘ER DONE• Revision control• Automate build process• Automate testing process• Automate deployment Local code changes should result in production deployments.
  • 120. DON’T FORGET TO DEBIANIZE• All codebases must be debianized• Open source project not debianized yet, fork the repo and do it yourself!• Take the time to teach others• Debian directories can easily be reused after a simple search and replace
  • 121. REPOMANHTTPS://GITHUB.COM/SYNACK/REPOMAN.GIT repoman upload myrepo sg-example.tar.gz repoman promote development/sg-example staging
  • 122. HUDSON
  • 123. MAINTAINING MULTIPLEENVIRONMENTS• Run unit tests in a development environment• Promote to staging• Run system tests in a staging environment• Run consumption tests in a staging environment• Promote to production Congratz, you have now just automated yourself out of a job.
  • 124. THE MONEY MAKER Github Plugin
  • 125. TYING IT ALL TOGETHER TERMINAL
  • 126. FLUMEFlume is a distributed, reliable and availableservice for efficiently collecting, aggregating andmoving large amounts of log data. syslog on steriods
  • 127. DATA-FLOWS
  • 128. AGENTSPhysical Host Logical Nodes Source and Sink twitter_stream twitter(“dsmitts”, “mypassword”, “url”) agentSink(35853) tail(“/var/log/nginx/access.log”) i-192df98 tail agentSink(35853) collectorSource(35853) hdfs_writer collectorSink("hdfs://namenode.sg.com/bogus/", "logs")
  • 129. RELIABILITY END-TO-END STORE ON FAILURE BEST EFFORT
  • 130. GETTIN’ JIGGY WIT ITCustom Decorators
  • 131. HOW DO WE USE IT?
  • 132. PERSONAL EXPERIENCES• #flume• Automation was gnarly• Its never a good day when Eclipse is involved• Resource hog (at first)
  • 133. We’ buildg kick-s ols f vops  x,tpt,  csume da cnect  a locn MARCH 2011