SlideShare a Scribd company logo
1 of 23
Pegasus在小米数据接入上的实践
肖发腾
肖发腾 2021-09-25
大家认为什么样的产品是一款好产品?
肖发腾
Pegasus Or Hbase
#2
小米数据接入
遇到的挑战
#1
Pegasus在小米
数据接入上的实践
#3
#1 小米数据接入遇到的挑战
全球销量第二的手机,产生的海量数据怎么高效、合规存储到数仓?
小米数据接入遇到的挑战
小米数据接入遇到的挑战
ODS层
Hive
消息队列
(Talos)
维表
实时查询
离线查询
数千亿
数十亿
数千亿
DW层
Hive
Doris
实时查询
离线写入
实时写入
#2 Pegasus Or Hbase
我的Pegasus初感受
Pegasus Or Hbase
大家第一次使用Hbase时遇到了哪些问题?
Pegasus Or Hbase
Pegasus Or Hbase
分布式数据库分区设计方法
1. 基于关键字区间分区
2. 基于关键字Hash值分区
3. 组合键:第一部分用于hash分区,第二部分可用于区间查询
组合键,使Pegasus使用起来很方便
Pegasus Or Hbase
Pegasus的分区设计举例
Pegasus Or Hbase
宕机恢复
Hbase:HBase要求每个Region在同一时刻只能由一个RegionServer服务。当某个
RegionServer宕机,必须选一个新的RegionServer来服务该Region。
Pegasus:高可用
高可用,使Pegasus维护成本很低
Pegasus Or Hbase
性能
1. C++
2. 单机存储引擎:RocksDB
3. Backup request
高性能,使Pegasus能真正解决业务问题
#3 Pegasus在小米数据接入上的实践
Pegasus在小米数据接入上的实践
ODS层
Hive
消息队列
(Talos)
维表
实时查询
离线查询
数千亿
数十亿
数千亿
DW层
Hive
Doris
实时查询
离线写入
实时写入
查询重点要解决的问题是什么?
Pegasus在小米数据接入上的实践
设备信
息表
维表
实时查询
离线查询
数亿
数十亿
映射后
设备信
息表
离线写入
原始日
志表
映射后
日志表
离线
JOIN
数亿
数千亿
数千亿
离线查询优化
Pegasus在小米数据接入上的实践
离线查询优化
client = PegasusClientFactory.createClient("pegasus集群信息配置文件")
client.batchGet("table_name", requestList, resultList)
batchGet函数,并发地向server发送异步请求,并等待结果
requestList中是一些设备id,哪个能查到oneid都可以。
如果有任意一个请求失败,就提前终止并抛出异常。
如果抛出了异常,则values中的结果是null
Pegasus在小米数据接入上的实践
实时查询优化:缓存
OneID实时查询
日查询上千亿,业务数百个,
单业务最高数百亿,
映射关系表记录数十亿,
存储 数百G,QPS 数百万
缓存:日活设备 数亿,存储 数十G
缓存:每个节点(executor) 缓存百万
条,数百M
第1层
第2层
第3层
Pegasus QPS:数十万
Pegasus在小米数据接入上的实践
实时查询优化:异步
Pegasus在小米数据接入上的实践
实时查询优化:异步
// 初始化Table
pegasusTable =
if(Nullable.isNull(pegasusTable))
pegasusClient.openAsyncTable("table_name")/
else
pegasusTable
// 去Pegasus中查询数据
val queryFuture: Future[PegasusResult] = pegasusTable.get[String, String](pegasusKey._1,
pegasusKey._2, Duration(1000, "millisecond"))
Pegasus在小米数据接入上的实践
性能表现
团队介绍
• leader:钟云
• 组员:徐威、李坤燚、彭程、肖发腾
诚邀大数据工程师加入我们的团队,欢迎发简历到
zhongyun@xiaomi.com
QA

More Related Content

Similar to Apache Pegasus's Practice in Data Access Business of Xiaomi

Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)涛 吴
 
2020 gops-旷视城市大脑私有云平台实践-刘天伟
2020 gops-旷视城市大脑私有云平台实践-刘天伟2020 gops-旷视城市大脑私有云平台实践-刘天伟
2020 gops-旷视城市大脑私有云平台实践-刘天伟Tianwei Liu
 
IDF2013大会分享——《使用新浪移动云开发全平台应用》
IDF2013大会分享——《使用新浪移动云开发全平台应用》IDF2013大会分享——《使用新浪移动云开发全平台应用》
IDF2013大会分享——《使用新浪移动云开发全平台应用》easychen
 
課程1 1:雲端運算初探
課程1 1:雲端運算初探課程1 1:雲端運算初探
課程1 1:雲端運算初探vaemon
 
云制造
云制造云制造
云制造leejd
 
Cloud Computing for Bioinformatics
Cloud Computing for BioinformaticsCloud Computing for Bioinformatics
Cloud Computing for BioinformaticsJazz Yao-Tsung Wang
 
Ibm dnt-dcos-v9-3
Ibm dnt-dcos-v9-3Ibm dnt-dcos-v9-3
Ibm dnt-dcos-v9-3Guangya Liu
 
迎接云计算大时代 - EasyStack 联合创始人兼CTO 刘国辉
迎接云计算大时代 - EasyStack 联合创始人兼CTO 刘国辉迎接云计算大时代 - EasyStack 联合创始人兼CTO 刘国辉
迎接云计算大时代 - EasyStack 联合创始人兼CTO 刘国辉Hardway Hou
 
How does the Apache Pegasus used in Advertising Data Stream in SensorsData
How does the Apache Pegasus used in Advertising Data Stream in SensorsDataHow does the Apache Pegasus used in Advertising Data Stream in SensorsData
How does the Apache Pegasus used in Advertising Data Stream in SensorsDataacelyc1112009
 
前端样式开发演变之路
前端样式开发演变之路前端样式开发演变之路
前端样式开发演变之路Zhao Lei
 
淘宝Java中间件之路 it168
淘宝Java中间件之路 it168淘宝Java中间件之路 it168
淘宝Java中间件之路 it168vanadies10
 
海通证券金融云思考与实践(数据技术嘉年华2017)
海通证券金融云思考与实践(数据技术嘉年华2017)海通证券金融云思考与实践(数据技术嘉年华2017)
海通证券金融云思考与实践(数据技术嘉年华2017)Zhaoyang Wang
 
雲端運算期中分組報告
雲端運算期中分組報告雲端運算期中分組報告
雲端運算期中分組報告Yosheng Zhang
 
2021 五月 Veeam 多雲解決方案 (完整版本)
2021 五月 Veeam 多雲解決方案 (完整版本)2021 五月 Veeam 多雲解決方案 (完整版本)
2021 五月 Veeam 多雲解決方案 (完整版本)Wales Chen
 

Similar to Apache Pegasus's Practice in Data Access Business of Xiaomi (20)

Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)Pegasus KV Storage, Let the Users focus on their work (2018/07)
Pegasus KV Storage, Let the Users focus on their work (2018/07)
 
2020 gops-旷视城市大脑私有云平台实践-刘天伟
2020 gops-旷视城市大脑私有云平台实践-刘天伟2020 gops-旷视城市大脑私有云平台实践-刘天伟
2020 gops-旷视城市大脑私有云平台实践-刘天伟
 
Hadoop 介紹 20141024
Hadoop 介紹 20141024Hadoop 介紹 20141024
Hadoop 介紹 20141024
 
IDF2013大会分享——《使用新浪移动云开发全平台应用》
IDF2013大会分享——《使用新浪移动云开发全平台应用》IDF2013大会分享——《使用新浪移动云开发全平台应用》
IDF2013大会分享——《使用新浪移动云开发全平台应用》
 
課程1 1:雲端運算初探
課程1 1:雲端運算初探課程1 1:雲端運算初探
課程1 1:雲端運算初探
 
Dell
DellDell
Dell
 
云制造
云制造云制造
云制造
 
Cloud Computing for Bioinformatics
Cloud Computing for BioinformaticsCloud Computing for Bioinformatics
Cloud Computing for Bioinformatics
 
Ibm dnt-dcos-v9-3
Ibm dnt-dcos-v9-3Ibm dnt-dcos-v9-3
Ibm dnt-dcos-v9-3
 
迎接云计算大时代 - EasyStack 联合创始人兼CTO 刘国辉
迎接云计算大时代 - EasyStack 联合创始人兼CTO 刘国辉迎接云计算大时代 - EasyStack 联合创始人兼CTO 刘国辉
迎接云计算大时代 - EasyStack 联合创始人兼CTO 刘国辉
 
About grow up
About grow upAbout grow up
About grow up
 
How does the Apache Pegasus used in Advertising Data Stream in SensorsData
How does the Apache Pegasus used in Advertising Data Stream in SensorsDataHow does the Apache Pegasus used in Advertising Data Stream in SensorsData
How does the Apache Pegasus used in Advertising Data Stream in SensorsData
 
Ria lqj
Ria lqjRia lqj
Ria lqj
 
雲端技術的新趨勢
雲端技術的新趨勢雲端技術的新趨勢
雲端技術的新趨勢
 
前端样式开发演变之路
前端样式开发演变之路前端样式开发演变之路
前端样式开发演变之路
 
淘宝Java中间件之路 it168
淘宝Java中间件之路 it168淘宝Java中间件之路 it168
淘宝Java中间件之路 it168
 
海通证券金融云思考与实践(数据技术嘉年华2017)
海通证券金融云思考与实践(数据技术嘉年华2017)海通证券金融云思考与实践(数据技术嘉年华2017)
海通证券金融云思考与实践(数据技术嘉年华2017)
 
雲端運算期中分組報告
雲端運算期中分組報告雲端運算期中分組報告
雲端運算期中分組報告
 
2021 五月 Veeam 多雲解決方案 (完整版本)
2021 五月 Veeam 多雲解決方案 (完整版本)2021 五月 Veeam 多雲解決方案 (完整版本)
2021 五月 Veeam 多雲解決方案 (完整版本)
 
App house
App houseApp house
App house
 

More from acelyc1112009

Apache Pegasus (incubating): A distributed key-value storage system
Apache Pegasus (incubating): A distributed key-value storage systemApache Pegasus (incubating): A distributed key-value storage system
Apache Pegasus (incubating): A distributed key-value storage systemacelyc1112009
 
How does Apache Pegasus used in SensorsData
How does Apache Pegasusused in SensorsDataHow does Apache Pegasusused in SensorsData
How does Apache Pegasus used in SensorsDataacelyc1112009
 
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...acelyc1112009
 
The Introduction of Apache Pegasus 2.4.0
The Introduction of Apache Pegasus 2.4.0The Introduction of Apache Pegasus 2.4.0
The Introduction of Apache Pegasus 2.4.0acelyc1112009
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
The Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache PegasusThe Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache Pegasusacelyc1112009
 
The Advertising Algorithm Architecture in Xiaomi and How does Pegasus Practic...
The Advertising Algorithm Architecture in Xiaomi and How does Pegasus Practic...The Advertising Algorithm Architecture in Xiaomi and How does Pegasus Practic...
The Advertising Algorithm Architecture in Xiaomi and How does Pegasus Practic...acelyc1112009
 
How do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine partHow do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine partacelyc1112009
 
How do we manage more than one thousand of Pegasus clusters - backend part
How do we manage more than one thousand of Pegasus clusters - backend partHow do we manage more than one thousand of Pegasus clusters - backend part
How do we manage more than one thousand of Pegasus clusters - backend partacelyc1112009
 

More from acelyc1112009 (9)

Apache Pegasus (incubating): A distributed key-value storage system
Apache Pegasus (incubating): A distributed key-value storage systemApache Pegasus (incubating): A distributed key-value storage system
Apache Pegasus (incubating): A distributed key-value storage system
 
How does Apache Pegasus used in SensorsData
How does Apache Pegasusused in SensorsDataHow does Apache Pegasusused in SensorsData
How does Apache Pegasus used in SensorsData
 
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
 
The Introduction of Apache Pegasus 2.4.0
The Introduction of Apache Pegasus 2.4.0The Introduction of Apache Pegasus 2.4.0
The Introduction of Apache Pegasus 2.4.0
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
The Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache PegasusThe Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache Pegasus
 
The Advertising Algorithm Architecture in Xiaomi and How does Pegasus Practic...
The Advertising Algorithm Architecture in Xiaomi and How does Pegasus Practic...The Advertising Algorithm Architecture in Xiaomi and How does Pegasus Practic...
The Advertising Algorithm Architecture in Xiaomi and How does Pegasus Practic...
 
How do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine partHow do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine part
 
How do we manage more than one thousand of Pegasus clusters - backend part
How do we manage more than one thousand of Pegasus clusters - backend partHow do we manage more than one thousand of Pegasus clusters - backend part
How do we manage more than one thousand of Pegasus clusters - backend part
 

Apache Pegasus's Practice in Data Access Business of Xiaomi