Building a high throughput rest api with scala


Published on

Slides from my talk the Scala DC Meetup on Jan 15th 2014.

Published in: Technology
  • @binkabir Yes, but at that time was not part of the typesafe stack, and we also needed to build a web admin console, so play framework worked out to be a better choice.
    Now that spray is going to be merged in to the typesafe umbrella , we'll surely revisit it.
    Are you sure you want to  Yes  No
    Your message goes here
  • great slides, did you ever looked at before?
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Building a high throughput rest api with scala

  1. 1. Building a high throughput REST API with Scala + Play + Akka Bhaskar V. Karambelkar Scala DC-MD-NOVA meetup Jan-15-2014
  2. 2. Status quo • APIs used to be built with various protocols such as JDBC (Stored Procs), JMS, SOAP/HTTP, XML-RPC, file transfer. • Issues –      No uniformity Not firewall friendly Programming language dependency (JMS) Not easy to test / document. Not easy to scale, load-balance, fail-over. Scala DC-MD-NOVA meetup Jan-15-2014
  3. 3. Why Scala + Play + Akka • Needed an API that could successfully tackle the 4 Vs of Big Data viz. Volume, Velocity, Variety, Veracity. • Needed the API to be horizontally as well as vertically scalable. • Needed an “event driven” architecture/ programming model. • Needed easy “HA”, “fail-over”, “concurrency”, “load balancing” constructs. Scala DC-MD-NOVA meetup Jan-15-2014
  4. 4. Stack • • • • • Scala 2.10.3, Play 2.2.1, Akka 2.2.3. Eclipse + ScalaIDE (4.0.0 M1) Mongo DB as a Config Data Store + Queue metrics-scala library for metrics. Webjars library to manage javascript/css dependencies. • sbt for building, jenkins for CI. Scala DC-MD-NOVA meetup Jan-15-2014
  5. 5. 1.0 Architecture Scala DC-MD-NOVA meetup Jan-15-2014
  6. 6. Architecture Cont. • Apache Reverse Proxy ( HA, Load Balancing, fail-over, TLS termination). • API farm gets JSON POSTs , parses JSON , normalized to Scala Objects, uploaded to Mongo DB acting as a Q. • Same API farm de-queues from Mongo, sends it to next hop in the pipeline. • A basic admin console written in AngularJS. • Eventual destination HDFS & Elasticsearch. Scala DC-MD-NOVA meetup Jan-15-2014
  7. 7. Performance in Production on first run • Slow JSON parsing, frequent OOMs, or even worse JVM hangs (kill -9). • No Transactions in MongoDB , so Data Loss in case of crash/hang. • Not scalable beyond a certain load. • CPUs pegged at 60 to 70% utilization, non-uniform core usage. • Heap usage high. • I/O bottlenecks. • Heavy en-queuing slowed down de-queuing, so queues fill up over time. Scala DC-MD-NOVA meetup Jan-15-2014
  8. 8. Architecture 2.0 Scala DC-MD-NOVA meetup Jan-15-2014
  9. 9. Architecture 2.0 Cont. • Dedicated Pipelines for clients. • Separate heavy traffic from light traffic. • Separate enqueue and de-queue in to dedicated API Server instances. • Compression all the way, even in Mongo. • Incremental JSON Parsing. • Avoid unnecessary JSON->Object->BSON>Object->Stream. • Changed logic so as to not lose data even in the event of an instance crash/hang. Scala DC-MD-NOVA meetup Jan-15-2014
  10. 10. Results • Platform Stable • CPU usage steady @ 30 to 40 %, with uniform distribution across cores. • Memory consumption under control, no more OOM / hanging. • Increased Throughput and scalability. • Very easy to increase scaling, create more data paths. Scala DC-MD-NOVA meetup Jan-15-2014
  11. 11. Buzzwords/Recommendations • Scala – Immutability every where, Use case classes / immutable collections. – Monadic Patterns everywhere ( Collections, Try, Option) . • Akka – – – – prefer ! (tell) Over ? (ask) Tune Dispatcher parameters, don’t rely on default dispatcher. Give Scheduler its own dispatcher. Routers with own dispatcher for load-balancing actors writing to destinations. – CircuitBreaker to prevent cascading failures. – Throttler Actor for throttling when required. Scala DC-MD-NOVA meetup Jan-15-2014
  12. 12. Buzzwords/Recommendations • Play – Prefer non-blocking/async calls whenever possible. – Use webjars for managing javascript/css dependency. – For huge JSONs use incremental JSON parser + Play’s Iteratee f/w. • JVM – Use Java 7. – Profile and tune GC and memory params. Scala DC-MD-NOVA meetup Jan-15-2014
  13. 13. Some Numbers • Current Load – 2.5 Billion events / day ( > 30 K/sec sustained). – 2 to 3 TB / day. – Expected to grow by 5x to 10x. • Current h/w count – 2 Data Paths with 4 enqueue and 4 de-queue API servers in each path. Scala DC-MD-NOVA meetup Jan-15-2014
  14. 14. Future … • Waiting for Typesafe platform to stabilize a bit (akka-io, spray, akka-cluster) • More reactive than current implementation (Play Futures, Iteratees) • Reactive Mongo (currently we use Casbah). • Evaluating Scala for use in the analytics pipeline (spark f/w, cascading). Scala DC-MD-NOVA meetup Jan-15-2014
  15. 15. Thank You ! • Questions ?, Comments ? Scala DC-MD-NOVA meetup Jan-15-2014