Your SlideShare is downloading. ×
0
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HPTS talk on micro sharding with Katta

134

Published on

A talk given by Ted Dunning in the HPTS workshop in Asilomar in 2009. It describes the ideas behind micro-sharding and outlines how Katta can manage micro-shards. …

A talk given by Ted Dunning in the HPTS workshop in Asilomar in 2009. It describes the ideas behind micro-sharding and outlines how Katta can manage micro-shards.

Some builds and spacing are off because this was exported as power point from Keynote.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
134
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Randomized Clustering
  • 2. Text Retrieval Task • Text viewed as a sequences of terms in fields • Document and position for each term are indexed • Query is a sequence of terms (typically many more than user actually types)
  • 3. Text Retrieval • Scores computed by merging occurrences of terms in query • Only top scoring documents are kept • Deletion and document edits done by adding new documents and keeping deletion list • (this is all standard ... Lucene is the best known example)
  • 4. Traditional Scaling shardshard 11 shardshard 22 shardshard 33 shardshard 44 shardshard 55 shardshard 11 shardshard 22 shardshard 33 shardshard 44 shardshard 55 shardshard 11 shardshard 22 shardshard 33 shardshard 44 shardshard 55 Replication Sharding ?1..n/4 n/4+1...n/2 n/2+1...3n/4 3n/4+1...n
  • 5. Consistent Hashing 0 1 01 0 1
  • 6. Problems • Presumes objects can be moved individually • Has very high insertion/deletion rate • Has disordered access patterns • Often exhibits content/placement correlations
  • 7. Micro Sharding Retrieval Indexer #1Retrieval Indexer #1 Retrieval Indexer #2Retrieval Indexer #2 Retrieval Indexer #Retrieval Indexer #nn Content Indexer #1Content Indexer #1 Content Indexer #2Content Indexer #2 Content Indexer #Content Indexer #mm for (t in types)for (t in types) yield [key:(t, h(key)%shardCnt),yield [key:(t, h(key)%shardCnt), value:doc]value:doc] n,m >> number of search nodes map reduce hdfs
  • 8. Search Architecture Retrieval Engine #1Retrieval Engine #1 Retrieval Engine #2Retrieval Engine #2 Retrieval Engine #Retrieval Engine #nn Content Engine #1Content Engine #1 Content Engine #Content Engine #mm federatorfederatorfederatorfederator presentationpresentation layerlayer
  • 9. Control Architecture Retrieval Engine #2Retrieval Engine #2federatorfederator zookeeperzookeeper indexerindexer kattakatta mastermaster HDFS
  • 10. Scenario: Node Start ● Node starts, tells ZK it exists and has no shards ● Master notified by ZK, looks at shard placement ● Imbalance exists so Master assigns shards to new node ● Node notified by ZK, downloads shard, tells ZK ● Master notified by ZK, looks at shard placement, unassigns shard somewhere
  • 11. Scenario: Node Crash ● ZK detects node connection loss and session expiration ● Master is notified by ZK that node ephemeral file has vanished, looks at shard placements ● If under-replication exists, Master assigns shards to other nodes ● Nodes are notified by ZK, download shards, tell ZK ● Master is notified by ZK, no action needed
  • 12. Summary of Master ● Master is notified of node set or shard set change ● Master examines current state of cluster ● If shards are under-replicated, add assignments ● If shards are over-replicated, delete assignments ● If cluster is imbalanced, add assignments ● Rinse, repeat
  • 13. Quick Results • No deletion/insertion in indexes at runtime • Reloading micro-shards allows large sequential transfers • Multiple shards allows very simple threading of search • Random placement guided by balancing policy gives near optimal motion • Node addition and failure are simple, reliable • Random sharding also near optimal local = global statistics, 2x query time improvement load balancing uniform management
  • 14. • EC2 - elastic compute • Zookeeper - reliable coordination • Katta - shard and query management • Hadoop - map-reduce, RPC for Katta • Lucene - candidate set retrieval, index file storage • Deepdyve search algorithms - segment scoring Building Blocks
  • 15. • EC2 - elastic compute • Zookeeper - reliable coordination • Katta - shard and query management • Hadoop - map-reduce, RPC for Katta • Lucene - candidate set retrieval, index file storage • Deepdyve search algorithms - segment scoring Building Blocks
  • 16. Zookeeper • Replicated key-value in-memory store • Minimal semantics create, read, replace specified version sequential and ephemeral files notifications • Very strict correctness guarantees strict ordering quorum writes no blocking operations no race conditions • High speed 50,000 updates per second, 200,000 reads per second
  • 17. • EC2 - elastic compute • Zookeeper - reliable coordination • Katta - shard and query management • Hadoop - map-reduce, RPC for Katta • Lucene - candidate set retrieval, index file storage • Deepdyve search algorithms - segment scoring Building Blocks
  • 18. Katta Interface • Simple Interface Client - horizontal broadcast for query, vertical broadcast for update InodeManaged - add/removeShard • Pluggable Application Interface • Pluggable Return Policy Given current return state return < 0 => done return 0 => return result, allow updates return n => wait at most n milliseconds • Comprehensive Results Results, exceptions, arrival times
  • 19. Horizontal/Vertical Broadcast shardshard 11 shardshard 22 shardshard 33 shardshard 44 shardshard 11 shardshard 22 shardshard 33 shardshard 44 shardshard 11 shardshard 22 shardshard 33 shardshard 44 Replication 1..n/4 n/4+1...n/2 n/2+1...3n/4 3n/4+1...n
  • 20. Operations Retrieval Engine #2Retrieval Engine #2federatorfederator zookeeperzookeeper indexerindexer kattakatta mastermaster HDFS
  • 21. Impact of Cloud Approach • Scale-free programming • Deployed in EC2 (test) or in private farm (production) • No single point of failure • Real-time scale up/down • Extensible to real-time index updates
  • 22. Lessons • Random is good no correlations in document to shard assignments ⇒ strong bounds on node variations in search time ⇒ local statistics are as good as global statistics no structure in shard to node assignments ⇒ node failure is not correlated to documents ⇒ load balancing and rebalancing is trivial ⇒ threaded search is trivial
  • 23. More Lessons • Randomized clustering requires good coordination Zookeeper makes that easy • Good coordination means not having to say you’re sorry Masters coordinate but don’t participate
  • 24. Resources ● My blog – http://tdunning.blogspot.com/ ● The web-site – www.deepdyve.com ● Source code – Katta (sourceforge) – Hadoop (Apache) – Lucene (Apache)

×