• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
SenseiDB
 

SenseiDB

on

  • 1,271 views

The techtalk @LinkedIN

The techtalk @LinkedIN

Statistics

Views

Total Views
1,271
Views on SlideShare
816
Embed Views
455

Actions

Likes
2
Downloads
19
Comments
0

3 Embeds 455

http://cms.nkia.net 302
http://dev.nkia.net 147
http://221.141.145.136 6

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    SenseiDB SenseiDB Presentation Transcript

    • Sensei Volodymyr Zhabiuk
    • Agenda1.  History and motivation2.  High level architecture3.  Data guarantees4.  Features detailed overview5.  Quick demo
    • What is Sensei—  search engine and database—  Built on top of Lucene—  Full text search, relevance, faceting—  Distributed, horizontally scalable
    • History•  Technology stack for LinkedIn.coms search, analytics and homepage•  Open sourced in 2009, first 1.0.0 release February 2012•  https://github.com/linkedin/sensei•  http://senseidb.com—  sensei-search Google group—  Used by Xiaomi, several other OS deployments
    • Why yet another Lucene basedsearch engine?
    • Why yet another Lucene basedsearch engine? •  Indexing elevates query latency •  Hard to distribute
    • Why yet another Lucene basedsearch engine? •  Indexing elevates query latency •  Hard to distribute •  Large memory overhead •  Comparatively slow
    • Why yet another Lucene basedsearch engine? •  Indexing elevates query latency •  Hard to distribute •  Large memory overhead •  Comparatively slow SenseiDB •  Designed for LinkedIn search use cases and the Homepage
    • Motivation•  Indexing/Query isolation•  Structured vs. unstructured data (e.g. fulltext search support)•  Faceted search
    • Motivation•  Indexing/Query isolation•  Structured vs. unstructured data (e.g. fulltext search support)•  Faceted search•  Business intelligence
    • Sensei’s features•  Fast updates•  Rich query language - BQL•  Fulltext and faceted search•  Distributed and elastic•  Indexing and search customization•  In memory M/R
    • What Sensei doesn’t do—  Transactions and OLTP—  Dynamic shard rebalancing—  Multi tenancy and table joins—  Dynamic schema
    • Volume—  5-100 mln documents per node—  ~300K updates per minute—  Query latency < 100 ms
    • Deployments—  Search engine for SeaS—  Backend for USCP– 400 nodes—  >6 deployments in the team $—  Other companies(2 deployments at Xiaomi)
    • Sensei’s technologies Sensei Lucene
    • Sensei’s technologies Sensei Zoie Lucene
    • Sensei’s technologies Sensei Bobo Zoie Lucene
    • Sensei’s technologies Sensei Bobo NorbertZookeeper Zoie Lucene
    • VocabularyNode Shard/Partition Replica
    • VocabularyNode Shard/Partition Replica
    • High level architecture
    • Data injection Sensei node Event w/ version Gateway Get events with version bigger than the existing JDBC Databus RabbitMQ Kafka
    • Data guarantees•  Availability - replications•  Eventually consistent across replications•  Write durability - data stream•  Write consistency - data stream
    • Configuration—  schema.xml —  Indexed fields, —  forward index customization—  sensei.properties —  ports, plugins, zookeeper urls, etc
    • Features
    • Lucene realtime extension Disk Index
    • Realtime updates•  Updates are seen right away < 1s upon inserting•  Handles deletes and updates•  Indexing latency stable as index size grows•  Incremental and balanced segment merges
    • Hourglass(Time Series)
    • Offline indexing and archive•  Efficient M/R indexing generation on Hadoop over ETLd data•  Bootstrap from HDFS
    • Query Engine - Bobo•  Query planning/optimization•  Access to both inverted and forward data structures•  High performance faceting•  Dynamic sorting•  Dynamic relevance support•  Map/Reduce analytics engine
    • Bobo(cont.) Custom Custom Custom (forward) index (forward) index (forward) index Result Lucene segment Lucene segment Lucene segment
    • Sensei API - BQL SELECT color, category, year, makemodel FROM cars WHERE NOT MATCH(color, category) AGAINST("*van") GROUP BY category TOP 1 LIMIT 1000
    • Dynamic relevance SELECT * FROM cars WHERE price > 2000.00 USING RELEVANCE MODEL my_model (favoriteColor:"black", favoriteTag:"cool") DEFINED AS (String favoriteColor, String favoriteTag) BEGIN float boost = 1.0; if (tags.contains(favoriteTag)) boost += 0.5; if (color.equals(my_color)) boost += 1.2; return _INNER_SCORE * boost; END
    • Partial updates—  Storing data outside of Lucene—  High update rate—  Perfect for counters
    • Sensei in memory M/R Node1Broker Node2
    • Sensei in memory M/R map(IntArray docs, FieldAccessor, FacetCountAccessor) Node1Broker Node2 Lucene segments
    • Sensei in memory M/R map(IntArray docs, FieldAccessor, FacetCountAccessor) Node1Broker Node2 Lucene segments
    • Sensei in memory M/R List<MapResult> combine(List<MapResult>) Node1Broker Node2 Lucene segments
    • Sensei in memory M/R List<MapResult> combine(List<MapResult>) Node1 Node1Broker Node2 Node1 Lucene segments
    • Sensei in memory M/R JSONObject reduce(List<MapResult>) Node1 Node1Broker Broker Node2 Node1 Lucene segments
    • Sensei in memory M/R—  select distinctCount(memberId), sum(clickCount) where geo = ‘US/CA/SF’ group by seniority, age
    • Roadmap•  Just finished o  Sensei aggregation functions o  Map/Reduce analytics engine•  Plan o  Goshawk – for business inteligence (WVMP v2, LI Impressions) o  Zoie Redesign to support fixed length in memory segments
    • Sensei tweets demo
    • Questions?—  SeaS Homepage: http://go/seas—  Questions: ask_seas@—  Sensei homepage: senseidb.com—  Sensei Google group: sensei-search