Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency

  1. Brought to you by Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency Jeffery Utter Staff Developer at theScore
  2. Jeffery Utter Staff Developer, theScore ■ Built half of a distributed database ■ P99.*s matter to 100% of users ■ Wrote my first line of Java at age 30-something ■ I lead a double-life as a double-bassist
  3. Table of Contents ■ What is Aggregator, Leaf, Tailor: ■ Goals and Constraints ■ Why not < insert your favorite (distributed) database > ■ Datadex ■ Performance Tips ● Java ● RocksDB ■ Overview ● Architecture ● Future ■ Conclusion
  4. Aggregator, Leaf, Tailer
  5. Aggregator, Leaf, Tailer ■ Term started getting use in 2019 ■ Largely promoted by Rockset ● Rockset Concepts, Design & Architecture [1] ● Aggregator Leaf Tailer: An Alternative to Lambda Architecture for Real-Time Analytics [2] ● Rockset’s Aggregator-Leaf-Tailer Architecture for SQL on semi structured data [3] ■ Prior Art ● Facebook: Science and the Social Graph (2008) [4] ● Serving Facebook Multifeed: Efficiency, performance gains through redesign (2015) [5] ● FollowFeed: LinkedIn's Feed Made Faster and Smarter (2016) [6]
  6. ■ Aggregator — Low latency aggregation of data stored in one or more Leaf ■ Leaf — All data stored and indexed in one or more leaf ■ Tailer — Pulls new data from various sources and inserts it into the leaves Aggregator, Leaf, Tailer
  7. Goals & Constraints
  8. Goals & Constraints ■ Low latency ■ Low operational complexity ● Ease of maintenance ● Ease of deployment (in all geolocations) ■ Developer ergonomics ■ Scalability
  9. Why Not …?
  10. Traditional RDBMS (Postgres) ■ Duplication of effort to populate databases ■ Operational overhead - database setup, maintenance, scaling
  11. ■ Implicit shared “schema” ■ Good scalability via hosted offerings ■ Operational overhead for on-prem Cloud NoSQL (MongoDB)
  12. Kafka-native (Rockset/kSQL) ■ Not an exact match with our querying needs (kSQL has no secondary indexes) ■ Both seem geared towards analytic workflows ■ Operational overhead for on-prem
  13. Datadex
  14. ■ Aggregator — Low latency aggregation of data stored in one or more Leaf ■ Leaf — All data stored and indexed in one or more leaf ■ Tailer — Pulls new data from various sources and inserts it into the leaves Aggregator, Leaf, Tailer
  15. Aggregator, Leaf, Tailer ■ Aggregator — gRPC Java ■ Leaf — RocksDB ■ Tailer — Kafka
  16. ■ zGC ■ Careful memory allocations ■ GRPC Streaming ■ Double-edged sword ■ “Bypasses” service mesh Low Latency
  17. ■ Single codebase ■ Single deployable unit (for now) ■ Instances managed by Kubernetes Operator ■ Deploy / Release / Upgrade cycle similar to other backend applications Low Operational Complexity
  18. ■ Simple configuration through CRD ■ Elixir Client library ■ Simple query “language” ■ “Watch” feature to stream updates to downstream services Developer Ergonomics
  19. Scalability ■ Fast scale-out through snapshot/backup/restore mechanism ■ Future improvements to independently scale Aggregator/Leaf/Tailer
  20. Performance Tips
  21. Java
  22. Minimize Allocations Ops/sec Error Before 3,053.990 ± 742.316 After 3,964.574 ± 240.020 ~ 30% Increase Throughput ■ Re-use buffers for key serialization/deserialization ■ Re-use buffer for reading values - up to a certain size (fastGet)
  23. RocksDB
  24. Ruthlessly Narrow Search Range After Before
  25. Resources 1. Rockset Concepts, Design & Architecture: https://rockset.com/Rockset_Concepts_Design_Architecture.pdf 2. Aggregator Leaf Tailer: An Alternative to Lambda Architecture for Real-Time Analytics: https://rockset.com/blog/aggregator-leaf-tailer-an-architecture-for-live-analytics-on-event-streams/ 3. Rockset’s Aggregator-Leaf-Tailer Architecture for SQL on semi structured data: http://www.hpts.ws/papers/2019/RocksetHPTS19.pdf 4. Facebook: Science and the Social Graph https://www.infoq.com/presentations/Facebook-Software-Stack/ (about 53 minutes in) 5. Serving Facebook Multifeed: Efficiency, performance gains through redesign: https://engineering.fb.com/2015/03/10/production-engineering/serving-facebook-multifeed-efficiency-performance-gains- through-redesign/ 6. FollowFeed: LinkedIn's Feed Made Faster and Smarter: https://engineering.linkedin.com/blog/2016/03/followfeed--linkedin-s-feed-made-faster-and-smarter Realtime Indexing for Fast Queries on Massive Semi-Structured Data: https://www.p99conf.io/session/realtime-indexing-for-fast-queries-on-massive-semi-structured-data/
  26. Brought to you by Jeffery Utter jeff@jeffutter.com @jeffutter
Advertisement