Building a high throughput rest api with scala


Published on

Slides from my talk the Scala DC Meetup on Jan 15th 2014.

  @binkabir Yes, but at that time was not part of the typesafe stack, and we also needed to build a web admin console, so play framework worked out to be a better choice.
    Now that spray is going to be merged in to the typesafe umbrella , we'll surely revisit it.
  great slides, did you ever looked at before?
  1. 1. Building a high throughput REST API with Scala + Play + Akka Bhaskar V. Karambelkar Scala DC-MD-NOVA meetup Jan-15-2014
  2. 2. Status quo • APIs used to be built with various protocols such as JDBC (Stored Procs), JMS, SOAP/HTTP, XML-RPC, file transfer. • Issues –      No uniformity Not firewall friendly Programming language dependency (JMS) Not easy to test / document. Not easy to scale, load-balance, fail-over. Scala DC-MD-NOVA meetup Jan-15-2014
  3. 3. Why Scala + Play + Akka • Needed an API that could successfully tackle the 4 Vs of Big Data viz. Volume, Velocity, Variety, Veracity. • Needed the API to be horizontally as well as vertically scalable. • Needed an “event driven” architecture/ programming model. • Needed easy “HA”, “fail-over”, “concurrency”, “load balancing” constructs. Scala DC-MD-NOVA meetup Jan-15-2014
  4. 4. Stack • • • • • Scala 2.10.3, Play 2.2.1, Akka 2.2.3. Eclipse + ScalaIDE (4.0.0 M1) Mongo DB as a Config Data Store + Queue metrics-scala library for metrics. Webjars library to manage javascript/css dependencies. • sbt for building, jenkins for CI. Scala DC-MD-NOVA meetup Jan-15-2014
  5. 5. 1.0 Architecture Scala DC-MD-NOVA meetup Jan-15-2014
  6. 6. Architecture Cont. • Apache Reverse Proxy ( HA, Load Balancing, fail-over, TLS termination). • API farm gets JSON POSTs , parses JSON , normalized to Scala Objects, uploaded to Mongo DB acting as a Q. • Same API farm de-queues from Mongo, sends it to next hop in the pipeline. • A basic admin console written in AngularJS. • Eventual destination HDFS & Elasticsearch. Scala DC-MD-NOVA meetup Jan-15-2014
  7. 7. Performance in Production on first run • Slow JSON parsing, frequent OOMs, or even worse JVM hangs (kill -9). • No Transactions in MongoDB , so Data Loss in case of crash/hang. • Not scalable beyond a certain load. • CPUs pegged at 60 to 70% utilization, non-uniform core usage. • Heap usage high. • I/O bottlenecks. • Heavy en-queuing slowed down de-queuing, so queues fill up over time. Scala DC-MD-NOVA meetup Jan-15-2014
  8. 8. Architecture 2.0 Scala DC-MD-NOVA meetup Jan-15-2014
  9. 9. Architecture 2.0 Cont. • Dedicated Pipelines for clients. • Separate heavy traffic from light traffic. • Separate enqueue and de-queue in to dedicated API Server instances. • Compression all the way, even in Mongo. • Incremental JSON Parsing. • Avoid unnecessary JSON->Object->BSON>Object->Stream. • Changed logic so as to not lose data even in the event of an instance crash/hang. Scala DC-MD-NOVA meetup Jan-15-2014
  10. 10. Results • Platform Stable • CPU usage steady @ 30 to 40 %, with uniform distribution across cores. • Memory consumption under control, no more OOM / hanging. • Increased Throughput and scalability. • Very easy to increase scaling, create more data paths. Scala DC-MD-NOVA meetup Jan-15-2014
  11. 11. Buzzwords/Recommendations • Scala – Immutability every where, Use case classes / immutable collections. – Monadic Patterns everywhere ( Collections, Try, Option) . • Akka – – – – prefer ! (tell) Over ? (ask) Tune Dispatcher parameters, don’t rely on default dispatcher. Give Scheduler its own dispatcher. Routers with own dispatcher for load-balancing actors writing to destinations. – CircuitBreaker to prevent cascading failures. – Throttler Actor for throttling when required. Scala DC-MD-NOVA meetup Jan-15-2014
  12. 12. Buzzwords/Recommendations • Play – Prefer non-blocking/async calls whenever possible. – Use webjars for managing javascript/css dependency. – For huge JSONs use incremental JSON parser + Play’s Iteratee f/w. • JVM – Use Java 7. – Profile and tune GC and memory params. Scala DC-MD-NOVA meetup Jan-15-2014
  13. 13. Some Numbers • Current Load – 2.5 Billion events / day ( > 30 K/sec sustained). – 2 to 3 TB / day. – Expected to grow by 5x to 10x. • Current h/w count – 2 Data Paths with 4 enqueue and 4 de-queue API servers in each path. Scala DC-MD-NOVA meetup Jan-15-2014
  14. 14. Future … • Waiting for Typesafe platform to stabilize a bit (akka-io, spray, akka-cluster) • More reactive than current implementation (Play Futures, Iteratees) • Reactive Mongo (currently we use Casbah). • Evaluating Scala for use in the analytics pipeline (spark f/w, cascading). Scala DC-MD-NOVA meetup Jan-15-2014
  15. 15. Thank You ! • Questions ?, Comments ? Scala DC-MD-NOVA meetup Jan-15-2014