Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rebooting design in RavenDB

31,154 views

Published on

This talk will explore how we REBOOTED our Project Design. After a decade of production usage, the RavenDB team addressed a lot of ongoing concerns & changed some of RavenDB's core architecture.
We'll investigate the driving forces behind it, the reasoning process & look at how it all turned out.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Rebooting design in RavenDB

  1. 1. REBOOTING DESIGN Oren Eini oren@ravendb.net
  2. 2. JOEL SPOLSKY “…the single worst strategic mistake that any software company can make: They decided to rewrite the code from scratch.”
  3. 3. RAVENDB 1.0 • Design & Development started around 2009 • First production deployment – 2010 • Esent for storage • Lucene for indexing • MVC .NET architecture • Focused on time to market
  4. 4. RAVENDB 2.0 • Released 2013 • Silverlight UI • New storage engine for testing only - Munin • Lots of tiny feature, no architectural changes • Focus on features • Perf
  5. 5. RAVENDB 3.0 • Released 2014 • HTML5 Studio • Introduced own storage engine for production use – Voron • Re-architect to work on OWIN • I/O prefetching • Own thread pool • Focus on distributed work • Perf
  6. 6. JAN 2015 • Two sprints dedicated to performance tuning • Get a profiler, whole team effort • Started to pay attention to assembly output • Major performance gains
  7. 7. SUPPORT • We are a database… • Complex deployments • Harsh environment • 24/7 • High availability • Lots of data, lots of requests • High sensitivity to hiccups
  8. 8. Q3 2015 – WHAT CAN WE DO? • Backward compatibility • Deployed to (literally in the literal sense) millions of nodes world wide at various customer sites • Existing technical debt • Can’t run on Linux • Running on tech stack that is not fully owned by us • Existing complexity
  9. 9. REWRITE! (& BACKWARD COMPACT!) September 2015 X 10 Perf Simpler to support Cross Platform
  10. 10. EXPECTED OUTCOME?
  11. 11. EFFECT? • We have a decade+ of knowing how our software is used • Hindsight • Where does it hurts? • UX study • Deep dive with customers • Going over support incident reports
  12. 12. ARCHITECTURAL DECISIONS • Multiple OS • OWN the stack • Build for performance • Build of operations • Support as a key consideration • Key scenarios impact whole system design
  13. 13. WE ARE A JSON DOCUMENT DB… • JSON Parsing • Parsing means: • Text parsing • Allocating managed memory • Reading data from disk
  14. 14. GOING ALL THE WAY DOWN… • JSON documents are stored using blittable format. • No parsing / manipulation required to process • .NET representation: • byte* ptr, int len • Storage engine: Voron • Zero copy • Memory mapped • Can search a give value and give result: • byte* ptr, int len
  15. 15. CONCEPTUALLY
  16. 16. RESULT? • Zero copy throughout the process • No parsing costs • Therefor, reduced CPU • Therefor, no need to cache in memory • Therefor, reduced managed memory • Memory mapped • Therefor, OS already keep in memory • Avoid duplicate data caching • Reduce memory consumption • Lean on OS for eviction
  17. 17. MEMORY MANAGEMENT • Need to own that, important for perf! • Reduce managed allocations • Use unmanaged memory and take advantage of internal knowledge • Simple solutions can work well, in well defined context • Arena Allocator • Context
  18. 18. DESIGN FOR THE DEBUGGER • Support burden optimization • Single threaded execution for long running tasks • Named threads • Build data structure for analysis in dumps • Singular architecture focus
  19. 19. REPORTING PROBLEMS • Constant monitoring • Act or alert accordingly
  20. 20. REMOVING PROBLEMS • Some things are known to be issues. • Eliminate completely if possible. • Authentication sample: • Windows Authentication • Who’s the admin? • X509 Client Certificates • openssl s_client -connect • Fallacies of distributed computing
  21. 21. WHY NOT REWRITE IT IN RUST? • Rust • Go • C++ • C • Erlang • Elixir • Big bet on .NET Core • Team familiar with .NET • Even with added complications of design, tooling and language support are very good • Cross platform now . • Core values aligned with CoreCLR in terms of perf / support. On the table…
  22. 22. HOW WE ACTUALLY DID IT? Indexes Queries Indexing Query Optimizer Storage Cluster •Replication •ETL Documents •Operations •Load Client API C# •Go Java •Ruby Python •Go
  23. 23. PARALLEL VERSION DEVELOPMENT • 3.5 released while working on 4.0 • Identify bottlenecks in the design • As infrastructure completes, parallelize • Slowly transition team members to the new release • Prioritizing demo-abiblity of the system • 30% of the team dedicated to the UI
  24. 24. WHERE ARE WE NOW? • Initially budgeted for 1 year development • Team size: 25 • Started Sep 2015 • OMG, we have so many features! • Mid 2016 changed schedule to June 2017 • Released 3.5 while working on 4.0 • Ramp up time internally • Supporting older versions simultaneously • Actual release – Feb 2018 • Some features cut • Actual completion of all planned features • Aug 2018 – (3 years!) • With a lot extras, of course
  25. 25. IMPACT ON SUPPORT? • Support call time reduced • Typical 3.x tier 2 support call: 1 week • Typical 4.x tier 2 support call: 2 hours
  26. 26. PERFORMANCE • 100,000+ writes / second • 1,000,000 reads / second • < 1,000$ machine • On the wild • X 20 • X 52
  27. 27. PLATFORMS • Windows • Linux • Arm (Raspberry PI) • Mac OS X • Production deployed to ARM in industrial settings
  28. 28. END RESULT? • However: • 14 months overdue • Deadline extended twice • Much larger in scope than expected • Even when took this to account • Features in 3.0 are only in the 4.1 release, expected next month • 20 months over expected time • Mostly minor, though • Full support by whole team and company essential
  29. 29. QUESTIONS?

×