HPTS talk on micro-sharding with Katta

1,119 views

Published on

This talk is one that I gave to the HPTS workshop in Asilomar in 2009. It describes the ideas behind micro-sharding and outlines how Katta can manage micro-shards.

Some builds and spacing are off because this was exported as power point from Keynote.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,119
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

HPTS talk on micro-sharding with Katta

  1. 1. Randomized Clustering
  2. 2. Text Retrieval Task• Text viewed as a sequences of terms in fields• Document and position for each term are indexed• Query is a sequence of terms (typically many more than user actually types)
  3. 3. Text Retrieval• Scores computed by merging occurrences of terms in query• Only top scoring documents are kept• Deletion and document edits done by adding new documents and keeping deletion list• (this is all standard ... Lucene is the best known example)
  4. 4. Traditional Scaling Sharding 1..n/4 n/4+1...n/2 n/2+1...3n/4 3n/4+1...n ? shard shard shard shard shard 1 2 3 4 5Replication shard shard shard shard shard 1 2 3 4 5 shard shard shard shard shard 1 2 3 4 5
  5. 5. Consistent Hashing 1 0 0 1 0 1
  6. 6. Problems• Presumes objects can be moved individually• Has very high insertion/deletion rate• Has disordered access patterns• Often exhibits content/placement correlations
  7. 7. Micro Sharding map reduce hdfs Retrieval Indexer #1for (t in types) Retrieval Indexer #2 yield [key:(t, h(key)%shardCnt), Retrieval Indexer #n value:doc] Content Indexer #1 Content Indexer #2 Content Indexer #m n,m >> number of search nodes
  8. 8. Search Architecture Retrieval Engine #1presentation federator Retrieval Engine #2 layer federator Retrieval Engine #n Content Engine #1 Content Engine #m
  9. 9. Control Architecture federator Retrieval Engine #2 zookeeper katta HDFS master indexer
  10. 10. Scenario: Node Start● Node starts, tells ZK it exists and has no shards● Master notified by ZK, looks at shard placement● Imbalance exists so Master assigns shards to new node● Node notified by ZK, downloads shard, tells ZK● Master notified by ZK, looks at shard placement, unassigns shard somewhere
  11. 11. Scenario: Node Crash● ZK detects node connection loss and session expiration● Master is notified by ZK that node ephemeral file has vanished, looks at shard placements● If under-replication exists, Master assigns shards to other nodes● Nodes are notified by ZK, download shards, tell ZK● Master is notified by ZK, no action needed
  12. 12. Summary of Master● Master is notified of node set or shard set change● Master examines current state of cluster● If shards are under-replicated, add assignments● If shards are over-replicated, delete assignments● If cluster is imbalanced, add assignments● Rinse, repeat
  13. 13. Quick Results • No deletion/insertion in indexes at runtime • Reloading micro-shards allows large sequential transfers • Multiple shards allows very simple threading of search • Random placement guided by balancing policy gives near optimal motion • Node addition and failure are simple, reliable • = Random sharding also near optimallocal global statistics, 2x query time improvementload balancinguniform management
  14. 14. Building Blocks• EC2 - elastic compute• Zookeeper - reliable coordination• Katta - shard and query management• Hadoop - map-reduce, RPC for Katta• Lucene - candidate set retrieval, index file storage• Deepdyve search algorithms - segment scoring
  15. 15. Building Blocks• EC2 - elastic compute• Zookeeper - reliable coordination• Katta - shard and query management• Hadoop - map-reduce, RPC for Katta• Lucene - candidate set retrieval, index file storage• Deepdyve search algorithms - segment scoring
  16. 16. Zookeeper • Replicated key-value in-memory store • Minimal semanticsversioncreate, read, replace specifiedsequential and ephemeral filesnotifications • orderingstrict correctness guaranteesstrict Veryquorum writesno blocking operationsno race conditions • High speed 200,000 reads per second50,000 updates per second,
  17. 17. Building Blocks• EC2 - elastic compute• Zookeeper - reliable coordination• Katta - shard and query management• Hadoop - map-reduce, RPC for Katta• Lucene - candidate set retrieval, index file storage• Deepdyve search algorithms - segment scoring
  18. 18. Katta Interface • -Simple Interface for query, vertical broadcast for updateClient horizontal broadcastInodeManaged - add/removeShard • Pluggable Application Interface • current returnReturn PolicyGiven Pluggable statereturn < 0 => donereturn 0 => return result, allow updatesreturn n => wait at most n milliseconds • ComprehensivetimesResults, exceptions, arrival Results
  19. 19. Horizontal/Vertical Broadcast 1..n/4 n/4+1...n/2 n/2+1...3n/4 3n/4+1...n shard shard shard shard 1 2 3 4Replication shard shard shard shard 1 2 3 4 shard shard shard shard 1 2 3 4
  20. 20. Operationsfederator Retrieval Engine #2 zookeeper katta HDFS master indexer
  21. 21. Impact of Cloud Approach• Scale-free programming• Deployed in EC2 (test) or in private farm (production)• No single point of failure• Real-time scale up/down• Extensible to real-time index updates
  22. 22. Lessons • Random document to shard assignmentsno correlations in is good⇒ strong bounds on node variations in search time⇒ local statistics are as good as global statisticsno structure in shard to node assignments⇒ node failure is not correlated to documents⇒ load balancing and rebalancing is trivial⇒ threaded search is trivial
  23. 23. More Lessons • Randomized clustering requires good coordinationZookeeper makes that easy • Good coordination means not having to say you’re sorryMasters coordinate but don’t participate
  24. 24. Resources● My blog – http://tdunning.blogspot.com/● The web-site – www.deepdyve.com● Source code – Katta (sourceforge) – Hadoop (Apache) – Lucene (Apache)

×