Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote


Published on

Presented at Lucene/Solr Revolution 2014

Published in: Software
  • Be the first to comment

Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote

  1. 1. Search Architecture at Evernote Not Your Typical Big Data Problem CHRISTIAN KOHLSCHÜTTER Sr. Search Researcher Augmented Intelligence @ Evernote
  2. 2. We are the workspace.
  3. 3. Write Collect Find Present
  4. 4. Find
  5. 5. Collect
  6. 6. Serving 100+ Million Users Worldwide • 559 Shards (200k users per shard), Linux/Tomcat/MySQL • 3.2 PB WebDAV-based Storage • 224 TB SSD capacity for System, MySQL and Lucene • 3.1 Billion Notes stored, 3.8 Bn Notes ever created • 115 Million Notes created or edited last week • 26 Million API calls to Context last week • 1 Lucene index per user
  7. 7. Evernote’s Three Laws of Data Protection • Your Data is Yours • Your Data is Protected • Your Data is Portable We are not a “big data” company and do not try to make money from your content.
  8. 8. Technical Debt • I/O over Lucene 2.9 indexes became a bottleneck • Code was woven into our “NoteStore” platform • Index changes had to be backwards-compatible • Complex re-indexing would require taking down a shard • Needed to rethink the entire architecture, but keep public API • Make search faster vs. Make us move faster
  9. 9. From Lucene 2.9 to 4.x and beyond • Large refactoring of search code • Lucene no longer is a direct dependency in “NoteStore” • Design-by-Contract • Can now run multiple Lucene versions concurrently in one VM • … and one specific version / schema per user • Migrated all users to Lucene 4.5, avg. downtime/user < 1 min
  10. 10. Separate the What from the How
  11. 11. Separation of Concerns UserIndex Manager UserIndex Factory NoteStore UserIndex Benchmarking UserIndex Lucene29 UserIndexImpl Lucene4 UserIndexImpl API Implementation Caching UserIndex ...
  12. 12. Hide Lucene behind ClassLoaders • One Maven artifact per major Lucene version, build profiles for code-reuse between minor updates • Code is packaged with dependencies into one common fat-jar with prefixes for each implementation: - lucene29/org/apache/lucene/... lucene29/com/evernote/search/lucene2/… - lucene43/org/apache/lucene/... lucene43/com/evernote/search/lucene4/… - lucene45/org/apache/lucene/… lucene45/com/evernote/search/lucene4/… • ResourcePrefixClassLoader called from outside code strips prefix, uses fat-jar as the only dependency
  13. 13. New Index Structure • Each user’s index now comes with a properties file that describes its internal structures, such as index type and version. Can handle different behavior in code. • Changes to the index schema? Just increase the index version and handle the rest in code • Automatically trigger re-indexing if necessary
  14. 14. Index Auto-Migration • Target Default Index Implementation centrally set by DevOps • Triggered upon UserIndex access • UserIndex facade determines whether re-index is necessary • “Cruise Control” automates off-peak access # Threads
  15. 15. Phase 1: Migration to Lucene 4 • Changes in Disk I/O (CPU correlates) overall: -81% searchRelatedNotes: -87% keyword-based search: -96% Saves TBs of I/O
  16. 16. Phase 2: Add Compression • User Indexes sizes and access patterns are skewed • Optimize large accounts • Directory-level compression • Compress segment files, invisible to the IndexReader • Only when re-indexing / every 3 months • In-memory Caching
  17. 17. LuceneTransform • by Mitja Lenič • We ported it to Lucene 4.5 (now available upstream for 4.9) • Improved LRU caching, added LZ4/Snappy compression • We will contribute our changes soon
  18. 18. OverlayDirectory on disk: _23.cfe c$_23.cfs segments.gen segments_2 visible to IndexReader: _23.cfe _23.cfs segments.gen segments_2
  19. 19. Results • Compressed the largest 5% of all indexes using LZ4 • 1.9 TB index space saved • 100 MB LRU Cache hit rate: 79% on avg (67% — 93%) • Saved 0.5 PB disk reads/week • Cache is so good, may use better/slower compression algorithm, may apply to more users Saves PBs of I/O
  20. 20. Bugs, Bugs, Bugs :-) • We’ve been warned “VInt bug” “background merge hit exception” JVM segfaults ! • and then this happened, too SPI / ContextClassLoaders … LUCENE-4713 Deadlocks / over-optimistic locking Unclosed resources / Too many open file handles => HousekeepingDirectory Issues with FieldCache singleton => LUCENE-831, LUCENE-2133, … … • UserIndex tracks “broken” state; allows self-healing (rebuild)
  21. 21. Conclusion • Design-By-Contract, Separation of Concerns • Per-user Search Implementation / Multiple Lucene versions • Migrated 60M users, without noticeable downtime • Migration allowed index changes, saves TBs of disk I/O • Block-level Index Compression, saves PBs of disk I/O • This is just the beginning.
  22. 22. Thank you
  23. 23. We’re hiring