Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lambda architecture: from zero to One

870 views

Published on

Story of architecture evolution of one project from zero to Lambda Architecture. Also includes information on how we scaled cluster as soon as architecture is set up.

Contains nice performance charts after every architecture change.

Published in: Engineering
  • Be the first to comment

Lambda architecture: from zero to One

  1. 1. Lambda Architecture: from zero to One Serhiy Masyutin
  2. 2. Me • Staff Engineer @ Lohika • Passionate Developer • Father • Mountain Biker
  3. 3. Agenda • Project Overview • Architecture Evolution • What is Lambda Architecture? • Cluster Evolution • What We Achieved?
  4. 4. Project Overview
  5. 5. Project Goals • Portfolio-driven R&D project • Focus on Technology • Focus on Knowledge • Focus on a new remote Team
  6. 6. Service designed to offload highly concurrent scenario of live voting
  7. 7. Service designed to offload highly concurrent scenario of live voting • User puts a vote • User requests results on campaign • Manager requests reports on campaigns • Admin controls the system
  8. 8. Architecture Goals • SaaS Solution • High Throughput • Scalability • Low Latency
  9. 9. Essential Data Model • campaign { startDate, endDate } • vote { user, campaign, timestamp }
  10. 10. Architecture Evolution
  11. 11. Votes Start Simple Reports
  12. 12. Start Simple Java 8 Spring Boot 1.2.5 MariaDB 5.5 Angularjs 1.4
  13. 13. Benchmark it! • Simple throughout scenario: user.vote() user.request(results) • Stop tests when error rate raises above 5% • Benchmark tool runs locally, targeting could server
  14. 14. Gatling • An open-source load testing framework based on Scala, Akka and Netty • High performance • Out-of-box HTTP support • Ready-to-present HTML reports • Scenario recorder and developer-friendly DSL http://gatling.io
  15. 15. Gatling scenario(“Throughout simulation").repeat(repeatCount) { feed(voteFeeder()) .exec(http("Vote") .post(voteLink) .headers(sentHeaders).header("Authorization", token) .body(StringBody("${vote}")) .check(status.is(200)).asJSON) .exec(http("Report") .get(reportByOptionLink+"/${votingSchemaId}") .headers(sentHeaders).header("Authorization", token) .check(status.is(200)).asJSON) }
  16. 16. Gatling
  17. 17. Benchmark! 100 325 550 775 1000 2000 4000 6000 8000 10000 12000 Requestspersecond Number of concurrent users Initial Initial no-joins
  18. 18. Kafka • Publisher-subscriber • Distributed by design • Scalable • Fast • Durable http://kafka.apache.org
  19. 19. Incoming Queue Votes Reports
  20. 20. Benchmark! 100 325 550 775 1000 2000 4000 6000 8000 10000 12000 Requestspersecond Number of concurrent users Initial Initial no-joins Incoming Queue
  21. 21. Redis • In-memory data structure store (set, map, etc) • Easy leader board implementation • HyperLogLog is its native data structure http://redis.io
  22. 22. In-memory Storage Votes Reports
  23. 23. Benchmark! 100 325 550 775 1000 2000 4000 6000 8000 10000 12000 Requestspersecond Number of concurrent users Initial Initial no-joins Incoming Queue In-memory Storage
  24. 24. • A fast and general engine for large-scale data processing http://spark.apache.org
  25. 25. Scalable Processing Votes Reports
  26. 26. TODO: Benchmarks • Processing latency • Latency vs Data Volume
  27. 27. TODO: Scalable Storage Reports Votes
  28. 28. Architecture Goals Met • High Throughput • Scalable Storage • Scalable Processing • Extensible Processing • Low Latency Reads & Updates
  29. 29. Lambda Architecture
  30. 30. A Single Picture http://lambda-architecture.net/img/la-overview_small.png
  31. 31. A Single Picture QUERY = f_query(batch_view, realtime_view) batch_view = f_batch(all_data) realtime_view = f_speed(new_data, realtime_view)
  32. 32. Batch Layer • Immutable append-only data store • Batch computations produce batch views
  33. 33. Serving Layer • Random reads/queries on batch views • Batch updates from batch layer • No need for random writes
  34. 34. Batch + Serving Layer • Robustness and fault tolerance • Scalability • Generalization • Extensibility • Minimal maintenance • Debuggability
  35. 35. Speed Layer • Low latency reads and updates • Incremental computation (different from batch one) • Scalability • Fault tolerance • Minimal amount of stored data
  36. 36. Goals • Robustness and fault tolerance • Scalability • Generalization • Extensibility • Minimal maintenance • Debuggability • Low latency reads and updates
  37. 37. Lambda Architectrue http://lambda-architecture.net/img/la-overview_small.png
  38. 38. Cluster Evolution
  39. 39. Start Simple single box
  40. 40. Optimization: Tomcat Connector • Start with a single machine • Number of threads matter, benchmark it • Fine-tuning can be OS specific
  41. 41. Benchmark! 100 325 550 775 1000 2000 4000 6000 8000 10000 12000 Requestspersecond Number of concurrent users Initial Initial no-joins Incoming Queue In-memory Storage ???
  42. 42. Haproxy • The Reliable, High Performance TCP/HTTP Load Balancer • A single-process program http://haproxy.org
  43. 43. A cluster of 10 servers
  44. 44. Optimization: Load Balancing 0 0.25 0.5 0.75 1 1.25 0 10000 20000 30000 40000 dev 1 2 3 6 Gainbyaddinganotherserver Requestspersecond Number of servers requests per second scaling factor
  45. 45. When to Stop? CPU % Memory GB haproxy 95 2.5 tomcat 397 6 kafka 1 1.3 redis 55 3.5
  46. 46. What We Achieved?
  47. 47. Experience • Lambda Architecture: we have One • Cluster Scaling & Optimization • Excellent team
  48. 48. Technology Java 8 Spring Boot 1.2.5 Spring Data 1.2.5 Tomcat 8 MariaDB 5.5 Haproxy 1.5.14 Kafka 0.8 Redis 2.8 Spark 1.4 HDFS 2.6 Gatling 2.2 Angularjs 1.4
  49. 49. Things That Matter • Small steps make huge difference • Choose right metrics • Benchmark • Optimize!
  50. 50. Q/A
  51. 51. Thank You!

×