Advertisement
Advertisement

More Related Content

Slideshows for you(20)

Advertisement
Advertisement

Constructing Web APIs with Rack, Sinatra and MongoDB

  1. Constructing Web APIs with Rack, Sinatra and mongoDB oisín hurley oi.sin@nis.io @oisin
  2. web API (ecosystem)
  3. web API (mobile access)
  4. web API (revenue)
  5. a good API is...focussed ‣ clear in its intent ‣ epitomizes good coding/behavioural practice ‣ has minimal sugar ‣ has a minimum of control surfaces
  6. a good API is...evolvable ‣ your API will have consumers ‣ you don’t suddenly break the consumers, ever ‣ you control the API lifecycle, you control the expectations
  7. a good web API is...responsive ‣ unchatty ‣ bandwidth sensitive ‣ latency savvy ‣ does paging where appropriate ‣ not unnecessarily fine-grained
  8. a good web API is...resilient ‣ stable in the presence of badness ‣ traps flooding/overload ‣ adapts to surges ‣ makes good on shoddy requests, if possible ‣ authenticates, if appropriate
  9. example application ‣ flavour of the month - location tracker! ‣ now that apple/google no longer do our work for us ‣ register a handset ‣ add a location ‘ping’ signal from handset to server https://github.com/oisin/plink
  10. design (focussed) ‣ PUT a handset for registration ‣ POST location details ‣ DEL a handset when not in use ‣ focussed and short
  11. design (evolvable) ‣ hit it with a hammer - put a version into URL - /api/v1.3/... ‣ in good company - google, twitter ‣ produce a compatibility statement ‣ what it means to minor/major level up ‣ enforce this in code
  12. design (resilience) ‣ mongoDB for scaling ‣ write code to work around badness ‣ throttling of client activity with minimum call interval ‣ not using auth in this edition...
  13. design (responsiveness) ‣ this API is very fine-grained, but not chatty ‣ we should queue to decouple POST response time from db ‣ but mongo is meant to be super-fast ‣ so maybe we get away with it in this edition :)
  14. technologies (sinatra) ‣ web DSL ‣ low hassle whut? ‣ rack compatible http://www.sinatrarb.com/
  15. technologies (rack) ‣ rack - a ruby webserver interface ‣ we’re going to use this for two things ‣ throttling for bad clients using a Rack middleware ‣ mounting multiple Sinatra apps with Rack::Builder (later on) http://rack.rubyforge.org/
  16. technologies (mongodb) ‣ high performance ‣ non-relational ‣ horizontal scaling ‣ may give us resilience and responsiveness ‣ also nice client on MacOS :) http://www.mongodb.org http://mongohub.todayclose.com/
  17. technologies (mongo_mapper) ‣ ORM for mongoDB ‣ a slight tincture of ActiveRecord : models, associations, dynamic finders ‣ embedded documents ‣ indices ‣ also, I like DataMapper and this is a little similar http://mongomapper.com/
  18. mongoDB (deploys)
  19. mongoDB is document-oriented ‣ collections contain documents, which can contain keys, arrays and other documents ‣ a document is like a JSON dictionary (in fact, it’s BSON) ‣ indices, yes, but no schema in the RDBMS sense - but you do plan!
  20. mongoDB is a database ‣ foreign keys - can reference documents living in other collections ‣ indices - same as RDBMS - use in the same way ‣ datatypes - JSON basics plus some others including regex and code ‣ flexible querying with js, regex, kv matching ‣ but no JOINs all the same query
  21. mongoDB can scale ‣ by relaxing some of the constraints of relational DBs, better horizontal scaling can be achieved ‣ replica sets for scaling reads ‣ replica sets & sharding for scaling writes ‣ map/reduce for batch processing of data (like GROUP BY) http://www.mongodb.org/display/DOCS/Replication http://www.mongodb.org/display/DOCS/Sharding
  22. cap/brewer’s theorem All nodes see all data at the same time Consistency Partition Availability Tolerance Node failures do not Only total network failure prevent operation will cause system to respond incorrectly Pick Any Two
  23. consistency model (read) master slave http://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1
  24. mongoDB is performance oriented ‣ removes features that impede performance ‣ will not replace your SQL store ‣ good for this example app - because we want fast ‘write’ performance and scale (consistency not so much) ‣ GridFS - chunkifies and stores your files - neat!
  25. code (API) ➊ ➋ ➍ ➌ config.ru https://github.com/oisin/plink
  26. code (versioning) ➊ ➋ ➌ https://github.com/oisin/plink
  27. code (mongo) ➊ ➋ ➌ ➍ ➎ https://github.com/jnunemaker/mongomapper https://github.com/oisin/plink
  28. code (mongo configure) ➊ ➋ https://github.com/jnunemaker/mongomapper https://github.com/oisin/plink
  29. mongo (console) ➊ ➋ ➌
  30. code (mongo queries) ➊ where query ➋ & creation dynamic query ➌ ➍ deletion https://github.com/jnunemaker/mongomapper https://github.com/oisin/plink
  31. code (mongo embedded docs) ➊ ➋ ➌ ➍ ➎ https://github.com/jnunemaker/mongomapper https://github.com/oisin/plink
  32. mongo (capped collections) ‣ Fixed size, high performance LRU ‣ Maintains insertion order - great for logs/comments/etc ‣ not in use in this example application ‣ embedded documents - no cap on arrays ‣ putting location data in another collection - not sensible ‣ hacked it in the example app
  33. code (throttling) ➋ ➊ ➌ custom throttle strategy https://github.com/datagraph/rack-throttle https://github.com/oisin/plink
  34. deploy
  35. deploy DE VE CR LO AC PE K R
  36. deploy
  37. deploy BU CR TT AC K
  38. viewing the data ➋ ➊ another rack app
  39. fast test (restclient) https://github.com/archiloque/rest-client
  40. wraps (mongo) ‣ programming is straightforward with mongo_mapper ‣ works well with heroku ‣ haven’t done any work with sharding/replication ‣ complement RDBMS - e.g. for GridFS files storage, logs, profiles ‣ worthy of further study and experimentation
  41. improvements (example) ‣ authentication using Rack::Warden ‣ queued invocations using delayed_job ‣ some eye candy for the tracking data ‣ suggestions welcome :-) http://github.com/oisin/plink

Editor's Notes

  1. In which Oisín talks about the motivation for a web API; what makes an API Good, Right and True; an exemplary application; \nsome useful technologies to achieve the application goals; the great mongo; the cap theorem and consistency; \nprogramming mongo through mongomapper; defensive coding for the web API; deployment to Heroku and CloudFoundry;\nand summarizes some realizations about mongo.\n
  2. Developers Developers Developers -- a web API gives you a chance to build an\necosystem of developers and products and business based on your stuff.\n
  3. Chances are if you are writing an app, you’ll need a server side component to hold\ndata, perform queries and share things. You’ll do this with a Web API.\n
  4. Shock - some people are actually making money from web APIs - based on a freemium\nmodel, companies like UrbanAirship charge for pushing data to phones; other data\ncompanies charge subscription access to their data corpora. Next: What makes a good API?\n
  5. APIs can be a bit difficult to get right. So let’s look at the characteristics\nof a good API. Clarity - includes the documentation here. Good practice -\nadhere to naming conventions; no 40 parameter methods; Sugar implies\nno sugar also possible, reduces clarity. Minimum - behavioural hints in \none place, minimal methods. But this all is tempered by reality. \n
  6. A thing that is very important for the longevity (and usefulness) of an API is evolvability. APIs have a lifecycle - you release them into the wild and people start using them. They use them in ways you never, ever, would have thought. And they start looking for new approaches, methods, access to internals and new ways to control the behaviour. If they are paying you, it’s usually a good idea in some instances to give them what they need. But you have to do this in a controlled fashion. If you break products that customers are using to make money, then there will be hell to pay. So it’s important you control the lifecycle of the API and the experience of everybody. You need to be able to say we are making changes, and we’re going to change the version, and this is what that means.\n
  7. Previous characteristics apply to programming APIs, but web APIs have some extra fun things associated with them because they have the network in there, and everybody knows how that makes life difficult. Don’t try to do many fine-grained calls; make sure a typical interaction with the API doesn’t take many calls; but be bandwidth sensitive as well as latency savvy; use paging, with ranges, or iterator style URLs. \n
  8. This is the thing that will annoy people the most - if your API goes away totally. It may degrade, get slower, but shouldn’t go away. A lot of the resilience here is ops-based, so you need the right kind of scaling, but that doesn’t absolve you from doing some programming work! That’s the theory. \n
  9. I did a little sample application, which I’d like to keep developing, as there is some interesting stuff from the point of view of scaling and using mongo that I’d like to get into at some point.\n
  10. From the design perspective - it’s focussed - only does three things!\n
  11. Ok to hit this with a hammer, not to be subtle and encode a version number in the URL. We can enforce compatibility rules in the code itself. A little later we can see how something like Rack can help us with this even more so, but we should keep checks in the code. Compatibility statement is something you have in the docs for your developers. But you know how that works already.\n
  12. I admit I’m taking a few shortcuts here! Mongo is going to do the scaling for us :) We’re going to write some defensive code. One call per 5 minutes is probably plenty for me to find out what’s going on in terms of the handset location. I left out auth to just take off one layer of stuff - it should be in later versions of the example application.\n
  13. Very small API - fine-grained is ok here. We should use queues to ensure that the synchronous HTTP returns as quickly as possible to the client. This needs an experiment - I’m playing it by ear here - mongo is meant to be fast, so maybe putting in something like a delayed_job may actually mean more overhead. This is a kind of design decision where you need to get some figures and some costs. Now lets look at some of the technologies I’ve put together for this sample app.\n
  14. Sinatra is my go-to guy for small web application and web apis. Zero hassle and easy to work with and rackness gives it loads of middlewares I can use to modify the request path.\n
  15. This gives you a stack/interceptor model to run what’s called middlewares before it gets to your Sinatra application. You can also use it to start up and mount multiple applications living off the same root URL, but in different branches - I’ve added a separate tracking application which is meant to show the data gathered, which we’ll see later.\n
  16. Mongo! Why did I choose it for this - high performance, horizontal scaling, non-relational, and these are all things I wanted to look at (but not so much in this talk!) It might also save my ass on the resilience and responsiveness I was talking about earlier!\n
  17. There’s a good Ruby driver for Mongo from 10gen, but MongoMapper gives me an ORM, which is nice and lives on top of that driver. It’s a little ActiveRecord-like, with models, associations etc. At this point, it’s probably time to say a little about MongoDB.\n
  18. There are a few companies using it! Lots of data. You can get all of this information from http://ww.mongodb.com/ and there are a number of really good experience blog entries and articles that are linked. Worth a read.\n
  19. Well, what’s a document anyway? The main choice you need to make with Mongo is whether or not you want something to be an embedded document or a DBRef to a document on in another collection. \n
  20. Embedded documents instead of joins - the efficiency being that when you pull the document, you get all the embedded ones with it and you don’t need to go back to perform a JOIN.\n
  21. Horizontal scale and performance are the main goal of Mongo - the way to get this was to come back to some of the features and assumptions of the RDBMS and remove them: transactions, JOINs. Take these out, or soften the requirement, and the goals are more easily achieved.\n\nReplica sets involve a master and one or more slaves - you write to the master and this is pushed out to the slaves. It’s an eventual consistency model, so if you write, then immediately read from the slave, you will see stale data. If this works for you, then cool. This will scale reads.Sharding is about partitioning your collections over many replica sets. Multiple masters then means that you can scale your writes. Sharding just can be turned on at no downtime. But I haven’t tried this yet - the next talk maybe!\n\nmap/reduce is an approach for processing huge datasets on certain kinds of distributable problems using a large number of computers\nMap: The master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes.The worker node processes that smaller problem, and passes the answer back to its master node. Reduce: The master node then takes the answers to all the sub-problems and combines them in some way to get the output — the answer to the problem it was originally trying to solve.\n\n
  22. Any mention of Mongo or any NoSQL database has to mention the CAP Theorem. This is all distributed system academic stuff, but important.\n\nLots of links here - this was a conjecture by Brewer in 2000 that in a distributed system, you can have C, A, or P, but not all three. This was proved to be true in a paper in 2002 - check the links below. These features are all subtly linked and interdependent. \n\n\nExamples - BigTable is CA, Dynamo is AP\n\n\nhttp://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf\nhttp://www.julianbrowne.com/article/viewer/brewers-cap-theorem\nhttp://highscalability.com/amazon-architecture\nhttp://mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf\nhttp://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-theorem/fulltext\nhttp://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1\nhttp://blog.dhananjaynene.com/2009/10/nosql-a-fluid-architecture-in-transition/\nhttp://devblog.streamy.com/tag/partition-tolerance/\n
  23. Here’s where MongoDB sits in terms of read consistency wrt Dynamo/SimpleDB.\n
  24. \n
  25. 1, 2, 3) Sinatra API \n\n4) Application is started by Rack::Builder\n
  26. 1) This the regex that will match the root of the URL path_info for a versioned call\n2) The compatibility statement is implemented by this helper\n3) This filter occurs before every API call and checks the version expected by the incoming request is version compatible with the server’s own\n
  27. 1) This is a Mongo document\n2) Declare the keys in the document, their type and say they are mandatory\n3) This is an association - the Handset document should connect to many Location documents\n4) This is an Mongo Embedded Document - it lives inside another document, not in its own collection\n5) The :time key is protected from mass assignment\n
  28. 1) Making a new connection to the database and setting the database name -- this will be very different when you are using a hosted Mongo, like the MongoHQ that’s used by Heroku. Check out the app code on GitHub for details.\n2) Telling Mongo to make sure that the handsets collection (which is modeled by Handset) should be indexed on the :code key\n\nDriver too: http://api.mongodb.org/ruby/current/file.TUTORIAL.html\nMongoMapper: http://mongomapper.com/documentation/\n
  29. 1) Starting the Mongo shell client and using the appropriate database\n2) Querying for all the handsets\n3) One of the handsets has an embedded document Location\n
  30. 1) Standard MongoMapper ‘where’ query\n2) Creating a Handset and setting the :status and :code keys\n3) Dynamic finder, ActiveRecord stylee\n4) Deleting a document in the handsets collection\n
  31. 1) Making a new Location model instance, but not saving it to databas\n2) Defense Against the Dark Arts: checking for mandatory JSON payload keys\n3) Defense Against the Dark Arts: checking for optional JSON payload keys\n4) Adding a Location to an array of them in the Handset model\n5) Saving the Handset model will write the Location array as embedded documents\n
  32. Unfortunately can’t mix up those capped collections with location information here - it wouldn’t make sense to have the locations into a separate collection - there would be one for each handset and we’re limited on the number of collections on Mongo.\n\nIssues with document size - a single doc can be something like 16MB, including all of the\nembedded documents. Mongo is good for storing LOTS of documents, not HUGE documents.\nHence the dumb hack in the code.\n
  33. 1) Only in production, use Throttler middleware, and program for a 300 second (5 min) interval\n2) Extend the Rack Throttle interval throttler\n3) Just work the choke on URLs that have ‘plink’ at the end - we don’t want to throttle everything!\n\nThrottlees get a 403 if they try to get another plink in within a 5 minute limit.\n
  34. EASY!\n
  35. NOT EASY!\n
  36. 1) Grab all the handsets from the database\n2) Send /track tree off to the Track application - guess how this can help with versioning :)\n
  37. \n
  38. This is my takeaways from this experiment with mongoDB\n
  39. Improvements that could be made to the example application (hint hint).\n
Advertisement