Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Tom White_ Consistent Hashing
1. 1/13/2017 Tom White: Consistent Hashing
http://www.tomewhite.com/2007/11/consistenthashing.html 1/9
Tom White Bio
Speaking
Writing
About
Problems worthy of attack prove their worth by hitting back. —Piet Hein
T u e s d a y , 2 7 N o v e m b e r 2 0 0 7
Consistent Hashing
I've bumped into consistent hashing a couple of times lately. The paper that
introduced the idea (Consistent Hashing and Random Trees: Distributed
Caching Protocols for Relieving Hot Spots on the World Wide Web by David
Karger et al) appeared ten years ago, although recently it seems the idea has
quietly been finding its way into more and more services, from Amazon's
Dynamo to memcached (courtesy of Last.fm). So what is consistent hashing and
why should you care?
The need for consistent hashing arose from limitations experienced while
running collections of caching machines web caches, for example. If you have a
collection of n cache machines then a common way of load balancing across
them is to put object o in cache machine number hash(o) mod n. This works well
until you add or remove cache machines (for whatever reason), for then n
changes and every object is hashed to a new location. This can be catastrophic
since the originating content servers are swamped with requests from the cache
machines. It's as if the cache suddenly disappeared. Which it has, in a sense.
(This is why you should care consistent hashing is needed to avoid swamping
your servers!)
It would be nice if, when a cache machine was added, it took its fair share of
objects from all the other cache machines. Equally, when a cache machine was
removed, it would be nice if its objects were shared between the remaining
machines. This is exactly what consistent hashing does consistently maps
objects to the same cache machine, as far as is possible, at least.
The basic idea behind the consistent hashing algorithm is to hash both objects
and caches using the same hash function. The reason to do this is to map the
cache to an interval, which will contain a number of object hashes. If the cache is
removed then its interval is taken over by a cache with an adjacent interval. All
the other caches remain unchanged.
Let's look at this in more detail. The hash function actually maps objects and
caches to a number range. This should be familiar to every Java programmer
the hashCode method on Object returns an int, which lies in the range 231 to
2311. Imagine mapping this range into a circle so the values wrap around. Here's
a picture of the circle with a number of objects (1, 2, 3, 4) and caches (A, B, C)
marked at the points that they hash to (based on a diagram from Web Caching
with Consistent Hashing by David Karger et al):
Demonstration
► 2016 (1)
► 2015 (5)
► 2013 (3)
► 2012 (5)
► 2011 (3)
► 2010 (1)
► 2009 (2)
► 2008 (24)
▼ 2007 (19)
▼ November (3)
Eels in Many Worlds
Consistent Hashing
Back on the net
► October (4)
► September (5)
► August (2)
► July (5)
Blog Archive
5. 1/13/2017 Tom White: Consistent Hashing
http://www.tomewhite.com/2007/11/consistenthashing.html 5/9
14 comments:
morrita said...
good article!
i've made Japanese translation for your article. which is available at
http://www.hyuki.com/yukiwiki/wiki.cgi?ConsistentHashing .
if you have any trouble, please let me know.
thank you for your work.
1 December 2007 at 06:06
Tom White said...
morrita Glad you enjoyed the post and thanks for the translation! Tom
2 December 2007 at 22:26
Marcus said...
Cool! I'm as we speak creating a distributed caching and searching system
which uses JGroups for membership. The biggest problem I faced was this
exact thing. What to do on the memberjoined/leaved events and for the
system to be able to know at all times to which node to send what command :)
The caching system is strictly following the Map (and SortedMap) interface
and a bunch of implementations have been implemented. LFU, LRU, MRU,
Diskbased B+Tree (jdbm), ehcache wrapper, memcached java client wrapper,
hibernate support...
I like the Map interface since it is quite clean..
The impl I'm working on now is a cache/persister which uses HDFS as
persistance layer. See how that turns out. The line between a cache and a
persistence engine is fine.
And of course all caches must be searchable = My own indexer/searcher +
Lucene free text index/search, ohh and all must be able to work in a
distributed environment.. fuck it is a big task.
25 December 2007 at 09:37
marcusherou said...
Hi. Do you have any clue of how to create an algorithm which tracks the
history of joins/leves of members and delivers the same node for the same key
if it previously looked it up. Perhaps I'm explaining this in bad terms but
something like a (in memory or persistent) database in cojunction with a
consistent hash.
perhaps:
public Address getAddress(key)
{
if(lookedUpMap.containsKey(key))
{
return (Address)lookedUpMap.get(key)
}
else
{
Address a = get(key);
lookedUpMap.put(key, a);
return a;
9. 1/13/2017 Tom White: Consistent Hashing
http://www.tomewhite.com/2007/11/consistenthashing.html 9/9
Newer Post Older PostHome
Subscribe to: Post Comments (Atom)
Post a Comment
Create a Link
V. But server Y does not know about the actual value of V, right?
Does the concept of consistent hashing only delegate responsibility to servers
that share a common database?
Or does server X somehow transfer all its values to server Y while crashing?
Thanks!
Björn
11 June 2013 at 12:36
Anuj Tripathi said...
@Bjorn: From what I understand consistent hashing is required to do load
balancing across servers, 'sharing resource'. So to answer your question, I
guess it works only for shared resources. Even if there is an independent
monitor involved to cause a failover to the new server, I don't think the
transition will be seamless, defeating the whole purpose of high availability.
22 September 2013 at 22:58
Links to this post
Apache Hadoop and Hadoop are trademarks of the Apache Software Foundation.
Simple template. Powered by Blogger.