SlideShare a Scribd company logo
1 of 34
原理与实现
提要
 什么是Hash
 数据分片与路由
 经典一致性Hash算法(Karger et al.)
 Jump Consistent Hash
提要
 什么是Hash
 数据分片与路由
 经典一致性Hash算法(Karger et al.)
 Jump Consistent Hash
什么是Hash
 Wikipedia: A hash function is any function that can be
used to map data of arbitrary size to data of fixed size.
The values returned by a hash function are called hash
values, hash codes, digests, or simply hashes.
 哈希函数就是能将任意长度的数据映射为固定长度的
数据的函数。哈希函数返回的值被叫做哈希值、哈希
码、散列,或者直接叫做哈希。
什么是Hash:应用
 Hash Table
 数据校验
 数字签名
 数据加密
 数据分片
提要
 什么是Hash
 数据分片与路由
 经典一致性Hash算法(Karger et al.)
 Jump Consistent Hash
数据分片与路由
 数据分片(Sharding) & 数据复制(Replication)
数据分片与路由的抽象模型
数据分片的两种方式
 Hashed Sharding vs Ranged Sharding
 哈希分片:数据分布更均匀,减少热点
 范围分片:key相近的数据存放在一起;方便范围查询
 哈希分片
 哈希取模
 虚拟桶
 一致性哈希
哈希取模
 假设有K台机器,那么有
H(key) = hash(key) mod K
 简单方便,但缺乏灵活性:
 一台服务器宕机了
 新增了一台服务器
虚拟桶分片 (virtual bucket)
 固定节点数量,避免取模法的不灵活性。
 需要先进行数据迁移,再修改虚拟桶到服务器的映射
提要
 什么是Hash
 数据分片与路由
 经典一致性Hash算法(Karger et al.)
 Jump Consistent Hash
一致性哈希(Consistent Hashing)
 Karger 等人1997年提出
 Consistent Hashing and Random Trees: Distributed
Caching Protocols for Relieving Hot Spots on the World
Wide Web
 Web caching with consistent hashing
 几个关键概念
Balance(均衡性)
Monotonicity(单调性)
Spread(分散性)
Load(负载)
经典一致性哈希算法 (Karger et al.)
 一致性哈希将整个哈希值空间组织成一个虚拟的圆环。
如假设某哈希函数H的值空间为0 - 232-1(即哈希值是
一个32位无符号整形),整个哈希空间环如下
经典一致性哈希算法 (Karger et al.)
 将各个服务器使用H进行一个哈希,这样每台机器就
能确定其在哈希环上的位置
经典一致性哈希算法 (Karger et al.)
 将数据key使用相同的函数H计算出哈希值h,根据h确
定此数据在环上的位置,从此位置沿环顺时针“行
走”,第一台遇到的服务器就是其应该定位到的服务
器。
经典一致性哈希算法 (Karger et al.)
 有一台服务器不可用,则受影响的数据仅仅是此服务
器到其环空间中前一台服务器之间数据,其它不会受
到影响
经典一致性哈希算法 (Karger et al.)
 增加一台服务器,则受影响的数据仅仅是新服务器到
其环空间中前一台服务器之间数据,其它不会受到影
响
经典一致性哈希算法 (Karger et al.)
 一致性哈希算法在服务节点太少时,容易因为节点分
部不均匀而造成数据倾斜问题。
经典一致性哈希算法 (Karger et al.)
 为了解决平衡性的问题,引入了虚拟节点机制:对每
一个服务节点计算多个哈希,称为虚拟节点。数据定
位算法不变,只是多了一步虚拟节点到实际节点的映
射
算法实现(Java)
 使用TreeMap表示环形的hash空间
 使用ceilingKey 方法得到hash值对应的下一个虚拟节
点
提要
 什么是Hash
 数据分片与路由
 经典一致性Hash算法(Karger et al.)
 Jump Consistent Hash
Jump Consistent Hash
 来自于2014年Google的一篇论文
 A Fast, Minimal Memory, Consistent Hash Algorithm
 非常简单,可以用5行代码表示:
int32_t JumpConsistentHash(uint64_t key, int32_t num_buckets) {
int64_t b = -1, j = 0;
while (j < num_buckets) {
b = j;
key = key * 2862933555777941757ULL + 1;
j = (b + 1) * (double(1LL << 31) / double((key >> 33) + 1));
}
return b;
}
Jump Consistent Hash
 设计目标
 平衡性
 对象均匀分布到每个桶中
 单调性
 当增加桶的数量时,只需要把一些数据从旧桶移到新桶
Jump Consistent Hash
 记 ch(key, num_buckets) 作为有num_buckets 个桶时
的hash函数。
 当 num_buckets=1 时,对任意的 k,有ch(k,1)==0。
Jump Consistent Hash
 当 num_buckets=2 时,为了使hash的结果保持均匀,
ch(k,2) 的结果应该有一半为0,另一半变为1。
Jump Consistent Hash
 由此,一般规律是:num_buckets 从 n 变化到 n+1 后,
ch(k,n+1) 的结果与 ch(k,n) 相比,应该有占比 n/(n+1)
的结果保持不变,而有 1/(n+1) 的概率跳变为 n+1。
Jump Consistent Hash
 上个算法的时间复杂度为O(n),如何优化?
 随着 j 的增大,这一步在大部分情况不会执行。能否
根据一个随机数直接得出下一个跳变的 j ?
Jump Consistent Hash
1. 记上一次跳变结果是 b,假设下一次跳变结果是 j
2. 从 b 到 j 中间若干次都没有跳变。对于任意的 i ,不跳变
的概率为 i/(i+1) ,所以从 b+1 到 j-1,连续不跳变的概率
为 (b+1)/(b+2) * (b+2)/(b+3) * … * (j-1)/j = (b+1)/j
3. 所以对于任意的 i ,有 j>=i 的概率为 P(j>=i) = (b+1)/i
4. 可以取一个在[0,1]区间均匀分布的随机数r,规定当
r<(b+1)/i 时, j>=i 。所以有 i<(b+1)/r 。
5. 既然 i 的上界为 (b+1)/r ,因为 j>=i ,所以 j=floor((b+1)/r)
6. 由此我们根据一个随机数 r ,得到了下一个跳变的结果 j 。
Jump Consistent Hash
 最后我们可以得到如下算法:
Jump Consistent Hash
 与Karger方法的平衡性对比:
Jump Consistent Hash
 算法执行效率对比
Jump Consistent Hash
 Jump Consistent Hash只适用于节点增加的情况。如
果节点减少怎么办呢?
Thanks

More Related Content

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

一致性Hash算法