0
Sensei     Volodymyr Zhabiuk
Agenda1.  History and motivation2.  High level architecture3.  Data guarantees4.  Features detailed overview5.  Quick demo
What is Sensei—  search engine and database—  Built on top of Lucene—  Full text search, relevance, faceting—  Distrib...
History•    Technology stack for LinkedIn.coms search,     analytics and homepage•    Open sourced in 2009, first 1.0.0 re...
Why yet another Lucene basedsearch engine?
Why yet another Lucene basedsearch engine?               •  Indexing elevates query latency               •  Hard to distr...
Why yet another Lucene basedsearch engine?               •  Indexing elevates query latency               •  Hard to distr...
Why yet another Lucene basedsearch engine?                    •  Indexing elevates query latency                    •  Har...
Motivation•    Indexing/Query isolation•    Structured vs. unstructured data (e.g. fulltext search     support)•    Facete...
Motivation•    Indexing/Query isolation•    Structured vs. unstructured data (e.g. fulltext search     support)•    Facete...
Sensei’s features•    Fast updates•    Rich query language - BQL•    Fulltext and faceted search•    Distributed and elast...
What Sensei doesn’t do—  Transactions and OLTP—  Dynamic shard rebalancing—  Multi tenancy and table joins—  Dynamic s...
Volume—  5-100 mln documents per node—  ~300K updates per minute—  Query latency < 100 ms
Deployments—  Search engine for SeaS—  Backend for USCP– 400 nodes—  >6 deployments in the team $—  Other companies(2 ...
Sensei’s technologies        Sensei                    Lucene
Sensei’s technologies        Sensei                    Zoie                    Lucene
Sensei’s technologies        Sensei                    Bobo                    Zoie                    Lucene
Sensei’s technologies            Sensei                         Bobo               NorbertZookeeper                       ...
VocabularyNode   Shard/Partition   Replica
VocabularyNode   Shard/Partition   Replica
High level architecture
Data injection            Sensei node                     Event w/ version              Gateway                           ...
Data guarantees•    Availability - replications•    Eventually consistent across replications•    Write durability - data ...
Configuration—  schema.xml   —  Indexed fields,   —  forward index customization—  sensei.properties   —  ports, plug...
Features
Lucene realtime extension            Disk Index
Realtime updates•    Updates are seen right away < 1s upon inserting•    Handles deletes and updates•    Indexing latency ...
Hourglass(Time Series)
Offline indexing and archive•    Efficient M/R indexing generation on Hadoop over     ETLd data•    Bootstrap from HDFS
Query Engine - Bobo•    Query planning/optimization•    Access to both inverted and forward data structures•    High perfo...
Bobo(cont.)       Custom            Custom            Custom   (forward) index   (forward) index   (forward) index        ...
Sensei API - BQL SELECT color, category, year, makemodel FROM cars WHERE NOT MATCH(color, category) AGAINST("*van") GROUP ...
Dynamic relevance SELECT * FROM cars WHERE price > 2000.00 USING RELEVANCE MODEL my_model (favoriteColor:"black", favorite...
Partial updates—  Storing data outside of Lucene—  High update rate—  Perfect for counters
Sensei in memory M/R         Node1Broker         Node2
Sensei in memory M/R                 map(IntArray docs, FieldAccessor, FacetCountAccessor)         Node1Broker         Nod...
Sensei in memory M/R                 map(IntArray docs, FieldAccessor, FacetCountAccessor)         Node1Broker         Nod...
Sensei in memory M/R                       List<MapResult> combine(List<MapResult>)         Node1Broker         Node2     ...
Sensei in memory M/R                       List<MapResult> combine(List<MapResult>)                                       ...
Sensei in memory M/R                             JSONObject reduce(List<MapResult>)                                      N...
Sensei in memory M/R—  select distinctCount(memberId), sum(clickCount)  where geo = ‘US/CA/SF’ group by seniority, age
Roadmap•    Just finished     o    Sensei aggregation functions     o    Map/Reduce analytics engine•    Plan     o    Gos...
Sensei tweets demo
Questions?—  SeaS Homepage: http://go/seas—  Questions: ask_seas@—  Sensei homepage: senseidb.com—  Sensei Google grou...
Upcoming SlideShare
Loading in...5
×

SenseiDB

1,255

Published on

The techtalk @LinkedIN

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,255
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
27
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "SenseiDB"

  1. 1. Sensei Volodymyr Zhabiuk
  2. 2. Agenda1.  History and motivation2.  High level architecture3.  Data guarantees4.  Features detailed overview5.  Quick demo
  3. 3. What is Sensei—  search engine and database—  Built on top of Lucene—  Full text search, relevance, faceting—  Distributed, horizontally scalable
  4. 4. History•  Technology stack for LinkedIn.coms search, analytics and homepage•  Open sourced in 2009, first 1.0.0 release February 2012•  https://github.com/linkedin/sensei•  http://senseidb.com—  sensei-search Google group—  Used by Xiaomi, several other OS deployments
  5. 5. Why yet another Lucene basedsearch engine?
  6. 6. Why yet another Lucene basedsearch engine? •  Indexing elevates query latency •  Hard to distribute
  7. 7. Why yet another Lucene basedsearch engine? •  Indexing elevates query latency •  Hard to distribute •  Large memory overhead •  Comparatively slow
  8. 8. Why yet another Lucene basedsearch engine? •  Indexing elevates query latency •  Hard to distribute •  Large memory overhead •  Comparatively slow SenseiDB •  Designed for LinkedIn search use cases and the Homepage
  9. 9. Motivation•  Indexing/Query isolation•  Structured vs. unstructured data (e.g. fulltext search support)•  Faceted search
  10. 10. Motivation•  Indexing/Query isolation•  Structured vs. unstructured data (e.g. fulltext search support)•  Faceted search•  Business intelligence
  11. 11. Sensei’s features•  Fast updates•  Rich query language - BQL•  Fulltext and faceted search•  Distributed and elastic•  Indexing and search customization•  In memory M/R
  12. 12. What Sensei doesn’t do—  Transactions and OLTP—  Dynamic shard rebalancing—  Multi tenancy and table joins—  Dynamic schema
  13. 13. Volume—  5-100 mln documents per node—  ~300K updates per minute—  Query latency < 100 ms
  14. 14. Deployments—  Search engine for SeaS—  Backend for USCP– 400 nodes—  >6 deployments in the team $—  Other companies(2 deployments at Xiaomi)
  15. 15. Sensei’s technologies Sensei Lucene
  16. 16. Sensei’s technologies Sensei Zoie Lucene
  17. 17. Sensei’s technologies Sensei Bobo Zoie Lucene
  18. 18. Sensei’s technologies Sensei Bobo NorbertZookeeper Zoie Lucene
  19. 19. VocabularyNode Shard/Partition Replica
  20. 20. VocabularyNode Shard/Partition Replica
  21. 21. High level architecture
  22. 22. Data injection Sensei node Event w/ version Gateway Get events with version bigger than the existing JDBC Databus RabbitMQ Kafka
  23. 23. Data guarantees•  Availability - replications•  Eventually consistent across replications•  Write durability - data stream•  Write consistency - data stream
  24. 24. Configuration—  schema.xml —  Indexed fields, —  forward index customization—  sensei.properties —  ports, plugins, zookeeper urls, etc
  25. 25. Features
  26. 26. Lucene realtime extension Disk Index
  27. 27. Realtime updates•  Updates are seen right away < 1s upon inserting•  Handles deletes and updates•  Indexing latency stable as index size grows•  Incremental and balanced segment merges
  28. 28. Hourglass(Time Series)
  29. 29. Offline indexing and archive•  Efficient M/R indexing generation on Hadoop over ETLd data•  Bootstrap from HDFS
  30. 30. Query Engine - Bobo•  Query planning/optimization•  Access to both inverted and forward data structures•  High performance faceting•  Dynamic sorting•  Dynamic relevance support•  Map/Reduce analytics engine
  31. 31. Bobo(cont.) Custom Custom Custom (forward) index (forward) index (forward) index Result Lucene segment Lucene segment Lucene segment
  32. 32. Sensei API - BQL SELECT color, category, year, makemodel FROM cars WHERE NOT MATCH(color, category) AGAINST("*van") GROUP BY category TOP 1 LIMIT 1000
  33. 33. Dynamic relevance SELECT * FROM cars WHERE price > 2000.00 USING RELEVANCE MODEL my_model (favoriteColor:"black", favoriteTag:"cool") DEFINED AS (String favoriteColor, String favoriteTag) BEGIN float boost = 1.0; if (tags.contains(favoriteTag)) boost += 0.5; if (color.equals(my_color)) boost += 1.2; return _INNER_SCORE * boost; END
  34. 34. Partial updates—  Storing data outside of Lucene—  High update rate—  Perfect for counters
  35. 35. Sensei in memory M/R Node1Broker Node2
  36. 36. Sensei in memory M/R map(IntArray docs, FieldAccessor, FacetCountAccessor) Node1Broker Node2 Lucene segments
  37. 37. Sensei in memory M/R map(IntArray docs, FieldAccessor, FacetCountAccessor) Node1Broker Node2 Lucene segments
  38. 38. Sensei in memory M/R List<MapResult> combine(List<MapResult>) Node1Broker Node2 Lucene segments
  39. 39. Sensei in memory M/R List<MapResult> combine(List<MapResult>) Node1 Node1Broker Node2 Node1 Lucene segments
  40. 40. Sensei in memory M/R JSONObject reduce(List<MapResult>) Node1 Node1Broker Broker Node2 Node1 Lucene segments
  41. 41. Sensei in memory M/R—  select distinctCount(memberId), sum(clickCount) where geo = ‘US/CA/SF’ group by seniority, age
  42. 42. Roadmap•  Just finished o  Sensei aggregation functions o  Map/Reduce analytics engine•  Plan o  Goshawk – for business inteligence (WVMP v2, LI Impressions) o  Zoie Redesign to support fixed length in memory segments
  43. 43. Sensei tweets demo
  44. 44. Questions?—  SeaS Homepage: http://go/seas—  Questions: ask_seas@—  Sensei homepage: senseidb.com—  Sensei Google group: sensei-search
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×