Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[B6]heroku postgres-hgmnz


Published on

Published in: Technology
  • Be the first to comment

[B6]heroku postgres-hgmnz

  1. 1. Heroku Postgres The Tale of Conceiving and Building a Leading Cloud Database Service Harold Giménez @hgmnz 1Saturday, September 8, 12
  2. 2. Heroku Postgres • Database-as-a-service • Cloud • Fully managed • Over 2 years in production • From tiny blogs to superbowl commercials 2Saturday, September 8, 12Heroku Postgres is a Database as a Service providerWe provision and run databases in cloud infrastructureIt is fully managed, always on and availableHas been in production for over 2 years, and has powered everything from a personal blog tosites backing superbowl commercial sites.
  3. 3. Heroku origins 3Saturday, September 8, 12Heroku is born with a vision of increasing developer productivity and agility.Anyone remember heroku garden? While that product no longer exists, that vision remainspart of our core culture.We want to enable developers to bring their creations to market as fast and pleasantly aspossible.
  4. 4. focus on rails 4Saturday, September 8, 12heroku got in the business of running web applications. As with any startup, it focused ondoing one thing well, and for heroku that was running rails applications.The approach empowered developers like never before. As a heroku customer, as I was backthen, I was excited to make hobby apps available on the internet on a regular basis. It was soeasy.
  5. 5. rails apps need a database 5Saturday, September 8, 12Clearly, rails apps need a database. Rails got really good at doing CRUD, after all.
  6. 6. web apps need a database 6Saturday, September 8, 12but this is true of any web application
  7. 7. thankfully postgres was chosen 7Saturday, September 8, 12The story was something like“Hey, we need a database. What should we use?”Heroku was a very small team. The security expert happened to speak up and recommendsPostgres, for it’s correctness track record and fine grained user role management
  8. 8. otherwise I wouldn’t be here 8Saturday, September 8, 12I’ve been a Postgres user for years and know it is vastly superior to other open source RDBMSprojects. If Postgres had not been chosen, I wouldn’t be here.
  9. 9. “let’s make a production grade postgres service” 9Saturday, September 8, 12Heroku would give you a free database whenever you create an app.One database server would hold a bunch of users.But this is not sufficient for serious production applications that require exclusive access tomore resources, and higher availability ops.
  10. 10. 10Saturday, September 8, 12This is our team’s mascot. It is a slide from IBM used in marketing materials in the 70s.It’s funny how this vision was not true back then, but we are making it a reality over 30 yearslater.
  11. 11. (hopefully yours) 11Saturday, September 8, 12
  12. 12. Heroku Postgres v.0.pre.alpha • A sinatra app implementing the heroku addons API • create servers • install postgres service • create databases for users - a “Resource” • Sequel talks to postgres • stem talks to AWS 12Saturday, September 8, 12Let’s talk about the tools used to build the very first version of Heroku Postgres.It was built in Ruby.Sinatra is used to expose and APIs.Sequel is used to talk to postgres databases as well as as an ORMstem was built for this project - a very minimalistic and pleasant interface to the AWS APIs.stem was made available as open source software.
  13. 13. Two main entities 13Saturday, September 8, 12There are two main entities in this application
  14. 14. Resource { database: ‘d4f9wdf02’, port: 5432, username: ‘uf0wjasdf’, password: ‘pf14fhjas’, created_at: ‘2012-05-02’, state: ‘available’ } 14Saturday, September 8, 12A resource encapsulates a database, the actual tangible resource that customers buy. Acustomer only cares about the database URL, used to connect to it.
  15. 15. Server { elastic_ip: ‘’, instance_id: ‘i-2efjoiads’, ami: ‘pg-prod’, availability_zone: ‘us-east-1’, created_at: ‘2012-05-02’, state: ‘booting’ } 15Saturday, September 8, 12A server is the physical box where the resource is installed.Customers don’t have direct access to it. It’s for our own bookkeeping and maintenance.It includes an IP address, availability zone, and other AWS related attributes.
  16. 16. ...and a thin admin web interface erb templates in sinatra endpoint 16Saturday, September 8, 12The early application also had an admin interface, right in the very same codebase as erbtemplates within some sinatra HTTP endpoints.
  17. 17. We are just an add-on 17Saturday, September 8, 12The Heroku Postgres offering is just an heroku addon.
  18. 18. 18Saturday, September 8, 12There are numerous components to Heroku, one of which is the addons system.Heroku Postgres is an addon just like any other third party is an addon (such as Papertrail orSendgrid). We don’t utilize any backdoors of any kind, and instead interface with the rest ofHeroku in the same way other addon providers do.This is a great position to be in, because as consumers of the heroku addons echosystem, wehelp drive its evolution.
  19. 19. we run on 19Saturday, September 8, 12Furthermore, the entire Heroku Postgres infrastructure runs on Heroku itself.
  20. 20. the simplest thing that could possibly work, but no less 20Saturday, September 8, 12Simplicity is key to building any sort of system, and in this case, the initial version of theHeroku Postgres management app was as simple as it could be.This allows us to modify behavior and evolve as quickly as possible, on a smaller morepleasant code base.
  21. 21. We’ve come a long way since then 21Saturday, September 8, 12Fast forward a few years, and we are now managing a very large number of databases,keeping them alive, and creating new ones at a higher rate than ever.This requires more sophisticated processes and managers.Let’s dive into how it works today
  22. 22. Monitoring and Workflow 22Saturday, September 8, 12Monitoring and Workflow are key to this type of system.
  23. 23. draw inspiration from gaming 23Saturday, September 8, 12In programming we often draw inspiration from a number of things.A good example is OOP itself, which is inspired by the way messages are sent betweenorganisms in a biological ecosystemThe project lead (@pvh) has a background in gaming.Imagine the bad guy in a Diablo game. He’s just wandering around doing nothing, becausethere’s nothing to attack around him. At some point, he sees your character and chargestoward you. You battle the Diablo. He fights back, and finally you kill him. He dies a slow andpainful death.There are many ways to model these kinds of systems. One can be an events based system,where observers listen on events that are occurring and react to them appropriately. Youcould also load all objects that need monitoring and process that queue. This either gets toocomplex easily, or doesn’t scale at all because of memory constraints and size of theworkload.A state machine is another good way to model this. A state machine is, at heart, an entity thatis fed some inputs, and in return it takes some action, and then may or may not transition toa different state.The bad guy is in a `wondering around` state when nothing is around it. But as soon as itsaw your character, it entered a `battle` state, and so on.We model what happens in real life, which is that we observe our environment, register it, andreact to it.
  24. 24. class Resource def feel monitoring observations.create( ) end end class Feeler def current_environment { service_available?: service_available?, open_connections: open_connections, row_count: row_count, table_count: table_count, seq_scans: seq_scans, index_scans: index_scans } end end 24Saturday, September 8, 12This is what the actual source code looks like.A Resource has a #feel method, which stores an observation based on what the Feeler sees.A Feeler is an object that observes the current environment around it. It checks things like isthe service available, how many connections are open, and many more health checks.
  25. 25. class Resource include Stateful workflow state :available do unless service_available? transition :unavailable end end end resource = resource.transition :available resource.feel resource.tick puts resource.state # ‘unavailable’ 25Saturday, September 8, 12
  26. 26. module Stateful def self.included(base) workflow base.extend ClassMethods end module ClassMethods def state(name, &block) states[name] = block end def states; @states ||= {}; end end def tick self.instance_eval( &self.class.states[self.state.to_sym] ) end def transition(state) # log and assign new state end end 26Saturday, September 8, 12In terms of workflow, we built an extremely simple state machine system.It allows you to define states via the `state` method which takes an arbitrary block of code toexecute when invoked via the `#tick` method.
  27. 27. resource.feel resource.tick Need to do this all the time 27Saturday, September 8, 12We first call #feel on an object, and then call #tick on it.Feel stores new observed information, while #tick uses this information to make systemdecisions, such as transitioning to other states, sending alerts, and much more.We must run these two methods continuously
  28. 28. db1 db2 db3 db4 db5 db6 db7 db8 db9 ... dbn db1.feel db1.tick 28Saturday, September 8, 12One way to run it continously is via a work queue.
  29. 29. db2 db3 db4 db5 db6 db7 db8 db9 ... dbn db1 db2.feel enqueue(db1) db2.tick 29Saturday, September 8, 12We create a queue and place all active resources on it. A set or workers pull jobs from thequeue, invoke feel and tick, and then enqueue themselves again.This is in escense a poorly implemented distributed ring buffer, and it’s served us well.
  30. 30. QueueClassic 30Saturday, September 8, 12Our queue is implemented on top of the QueueClassic gem, which is a queue system built inRuby on top of Postgres with some interesting characteristics.
  31. 31. 31Saturday, September 8, 12Let’s look at some of the states on our resource class. A resource can go through thesestates.One very important aspect of this system is idempotency. The system must be designed insuch a way that each state can be run any number of times and without affecting the endresult.Examples where this is not immediately obvious are the creating and deprovisioning state.
  32. 32. Durability and Availability 32Saturday, September 8, 12Let’s talk about how we handle durability and availability of databases.
  33. 33. 33Saturday, September 8, 12In Postgres, as in other similar systems, when you issue a write transaction, it firsts writes thetransaction to what’s called the Write-Ahead Log (WAL), and only then does it write to thedata files.This ensures that all data committed to the system exists first in the WAL stream.
  34. 34. 34Saturday, September 8, 12Of course, if the WAL stream is on the same physical disks as the data files, there’s a highrisk of data loss.Many opt to place the wal segments on a separate disk than the data files. This is a great firststep (and one we also take).But really, we don’t consider data to be durable until the WAL segments are replicated acrossmany data centers.We ship WAL segments to multi-datacenter storage every 60 seconds. We use Wal-e, apython WAL archiver written at Heroku and now available as open source.
  35. 35. 35Saturday, September 8, 12Now that the WAL segments are out of the box, we can do many other tricks.For example, creating a “follower” is as easy as fetching the WAL segments from thedistributed storage, and replaying these logs on a brand new server - once it has caught up,we set up direct streaming replication between primary and follower.
  36. 36. 36Saturday, September 8, 12Similary, a fork of a database sets pulls down the WAL segments from distributed storage andreplays them on a new server.Once it’s caught up, instead of setting up streaming replication as in the follow case, insteadthis new server starts producing WAL segments of it’s own (when write transactions occur onit). So now the fork is set up to ship WAL segments to distributed storage, just like its leader.
  37. 37. Continuous Protection • Write-Ahead Log segments shipped to durable storage every 60 seconds • We can replay these logs on a new server to recover your data • 37Saturday, September 8, 12This is what we call Continuous Protection.Having WAL segments always available is a primary concern of ours, as it allows us to easilyrebuild a server’s data state, and can be updated continuously as opposed to capturing fullbackups of the system.
  38. 38. Need a more flexible object model 38Saturday, September 8, 12Now, the introduction of all of these functions required us to rethink our object model.
  39. 39. timeline 39Saturday, September 8, 12We have the concept of a timelineA timeline at time = zero contains no data, no commits.
  40. 40. participant 40Saturday, September 8, 12Participants are attached to a timeline. Participants can write data to the timeline.
  41. 41. 41Saturday, September 8, 12Writing data to the timeline moves the timeline forward in time.
  42. 42. 42Saturday, September 8, 12
  43. 43. resource 43Saturday, September 8, 12A resource is what our users get. It maps to a URL. A resource is attached to one participant.
  44. 44. follower44Saturday, September 8, 12This allows us to model followers easily.A follower is just a participant on the same timeline as its reader.The difference is that followers can’t write to the same timeline. Only one participant canwrite to the timeline, the follower’s leader (or primary).
  45. 45. fork 45Saturday, September 8, 12When we fork a database, it creates its own timeline. The new timeline now has drifted awayfrom it’s parent, and can be writable. So it will create it’s own path.
  46. 46. disaster 46Saturday, September 8, 12Finally, this system can be used during the event of catastrophic hardware failure.When a database’s hardware fails completely, instead of trying to recover the server itself, it’sbest to create a new node and “STONITH” (
  47. 47. 47Saturday, September 8, 12What we do is create a new participant, hidden from the user.
  48. 48. recovery48Saturday, September 8, 12And once it is caught up and ready to go, we tie the resource to it.So, the user only sees a blip in availability, but behind the scenes they are actually sitting onentirely new hardware, like magic.
  49. 49. big project 49Saturday, September 8, 12Needless to say, this has become a big project over time.
  50. 50. lots of moving parts 50Saturday, September 8, 12
  51. 51. long test suite 51Saturday, September 8, 12
  52. 52. modularize and build APIs 52Saturday, September 8, 12So it’s time to spread out responsabilities by modularizing the system and building APIs thatare used for them to talk to each other.
  53. 53. 53Saturday, September 8, 12What we’ve built is a constellation of heroku apps. We may split this even further in thefuture.
  54. 54. gain in agility 54Saturday, September 8, 12This gains un in agility.The test suites of each individual project is much smaller now, which improves our ability todevelop quicker.It also means that each component can be deployed individually. For example, a deploy to theadmin front end UI has no effect on the main system’s APIs.
  55. 55. composable services 55Saturday, September 8, 12It also allows us to build better abstractions at the systematic level, which gains us in theability to compose services better.For example, a system that provisions and manages servers from our infrastructure providercan be used by many other consumers, not only heroku postgres.
  56. 56. independently scalable 56Saturday, September 8, 12They can furthermore be scaled individually. Some parts of the system require different loadsand response times than others, so now we are able to easily and clearly tweak our systemoperations based on clearly decoupled subsystems.
  57. 57. Logging and Metrics 57Saturday, September 8, 12Finally, I’d like to talk about visibility into our app.
  58. 58. log generation 58Saturday, September 8, 12First, let’s talk about logging.
  59. 59. 59Saturday, September 8, 12In Heroku, there’s a service called Logplex (it’s open source).Your application is able to send logs to the logplex service to a specific channel (it usesCapability Based Security).Then, one or more consumers can “drain” the logs for that channel.
  60. 60. logs are event streams 60Saturday, September 8, 12
  61. 61. how should you log? 61Saturday, September 8, 12Having this logging infrastructure available, let’s talk about how to make best use of it.
  62. 62. post “/work” do puts “starting to do work” worker = begin worker.lift_things_up worker.put_them_down rescue WorkerError => e puts “Fail :( #{e.message}” status 500 end puts “done doing work” status 200 end 62Saturday, September 8, 12This is an example of terrible logging.
  63. 63. $ heroku logs --tail 2012-07-28T02:43:35 [web.4] starting to do work 2012-07-28T02:43:35 [web.4] Fail :( invalid worker, nothing to do 2012-07-28T02:43:35 heroku[router] POST dyno=web.4 queue=0 wait=0ms service=14ms status=500 bytes=643 63Saturday, September 8, 12There’s no structure to these logs, so it can’t be easily read and interpreted by a computer.
  64. 64. bad logging • What exactly happened? • When did it happen? • How long did it take? • How many times has it happened? 64Saturday, September 8, 12
  65. 65. good logging • parseable • consistent • plentiful 65Saturday, September 8, 12
  66. 66. post “/work” do log(create_work: true, request_id: uuid) do worker = uuid)) begin worker.lift_things_up worker.put_them_down rescue WorkerError => e log_exception(e, create_work: true) end end end helpers do def uuid @uuid ||= SecureRandom.uuid end end 66Saturday, September 8, 12Instead, let’s do some more structured logging.Also note how every request gets a UUID. This is critical to tying up all the logs for a givenrequest.
  67. 67. require ‘scrolls’ module App module Logs extend self def log(data, &block) Scrolls.log(with_env(data), &block) end def log_exception(exception, data, &block) Scrolls.log_exception(with_env(data), &block) end def with_env(hash) { environment: ENV[‘RACK_ENV’] }.merge(data) end end end 67Saturday, September 8, 12On the prior slide, we saw the `log` and `log_exception` methods.This is a small module that provides those methods. It is a wrapper for the `scrolls` (opensource) gem.Scrolls provides a framework for structured logging.This module merely adds our environment name to the logs, which is useful for parsing later.
  68. 68. $ heroku logs --tail 2012-07-28T02:43:35 [web.4] create_work request_id=afe2-f0d at=start 2012-07-28T02:43:35 [web.4] create_work request_id=afe2-f0d at=exception message=invalid worker, nothing to do 2012-07-28T02:43:35 [web.4] create_work request_id=afe2-f0d at=finish elapsed=53 2012-07-28T02:43:35 heroku[router] POST dyno=web.4 queue=0 wait=0ms service=14ms status=500 bytes=643 68Saturday, September 8, 12Now our logs look like this.Easy to parse, and still easy to read by a human.
  69. 69. log consumption 69Saturday, September 8, 12Let’s talk about consuming those logs, which should make it clear why structured logging isso important.
  70. 70. (this is the fun part) 70Saturday, September 8, 12
  71. 71. 71Saturday, September 8, 12As mentioned before, it’s possible to set up multiple log drains.The heroku toolbelt has a utility to print out logs to your terminal (accessible via heroku logs--tail).But why stop there? You can have as many drains as you want!We can set up a drain that stores data locally for further analysis and metrics generation.Here, a postgres database is set up and logs stored to it on the key-value data type calledhstore.
  72. 72. select * from events; 72Saturday, September 8, 12
  73. 73. 73Saturday, September 8, 12Now that we have stored data on a postgres database, we can use SQL to query it andgenerate some metrics.We have a process that continuously queries this database and sends aggregated results to ametrics collection service (third party).
  74. 74. good logging metrics alerts 74Saturday, September 8, 12Visibility into your system starts with good loggingGreat logs enable easy metrics collectionMetrics lead to system alerts.
  75. 75. current tooling • still using sequel and sinatra • fog displaced stem • backbone.js for web UIs • fernet for auth tokens, valcro for validations • python, go and bash in some subsystems 75Saturday, September 8, 12So to wrap up, our current tooling includes these pieces of technology
  76. 76. lessons • managing databases is hard • start simple • extract (and share) reusable code • separate concerns into services • learn to love your event stream 76Saturday, September 8, 12
  77. 77. thanks! @hgmnz @herokupostgres 77Saturday, September 8, 12