SlideShare a Scribd company logo
Pegasus Meetup@2022
小米通用推荐算法架构及
Pegasus在用户画像的应用
小米-互联网业务部-梁伟
Pegasus Meetup@2022
目录CONTENTS
架构实践
02
业务背景
01
核心技术组件
03
Pegasus应用
04
Pegasus Meetup@2022
业务背景
1
Pegasus Meetup@2022
应用安装卸载
MIUI行为
应用类别
应用时长
……
性别、年龄
基本信息
地域
家庭
……
应用内
行为
应用商店内行为
视频观看行为
商品购买行为
……
硬件种类
硬件使用时长
硬件数量
硬件行为
……
背景一: 小米用户数据非常丰富
Pegasus Meetup@2022
背景二: 小米搜索推荐业务种类繁多
应用商店
应用
游戏中心
小米视频(长短)
视频
小米电视(长短)
Zili短视频
游戏中心社区
桌面内容中心
浏览器信息流
米家社区
信息流 商品
小米商城
小米有品
零售通
金融
天星数科
Pegasus Meetup@2022
愿景
构建适合小米业务的统一搜索推荐系统
Pegasus Meetup@2022
架构实践
2
Pegasus Meetup@2022
统一内容池
检索Master
队
列
召
回
向
量
召
回
模
型
召
回
粗排打分
统一用户画像
SparseZoo
模型
训练
模型
评估
模型
蒸馏
机
器
学
习
平
台
Tensorflow Cloudml
Pegasus Meetup@2022
核心模块
SparseZoo
统一模型训练
召回模型:
DSSM
DeepFM
YoutubeDNN
Mind
……
排序模型:
MMOE
DNN
DIN
Wide&Deep
……
统一召回方案
sea-recall
分布式
经典队列召回
向量召回
模型召回
统一排序方案
ctr-score
分布式&通用化
高性能特征抽取
高性能在线推理
统一用户画像
全域信息
实时行为反馈
统一在线服务
统一内容池
多类型内容
新闻
长短视频
应用
电商
Pegasus Meetup@2022
核心技术组件
3
Pegasus Meetup@2022
核心组件:
自研搜广推场景下的统一机器学习训练平台(SparseZoo)
SparseZoo
CTR CVR 时长
跳出
率
完播
率
……
高维稀疏
Tensorflow
CloudML
Pegasus Meetup@2022
核心组件:
自研搜广推场景下的统一机器学习训练平台(SparseZoo)
SparseZoo
目前SparseZoo 已支持20+业界常用模型; 8+业务线
Pegasus Meetup@2022
Pegasus Meetup@2022
高维稀疏Tensorflow
增量训练
基于已有模型训练
免one-hot编码
无需对特征先进行one-hot编码
再降维
简洁的API
无需使用feature_column族函数
特征过期清理
淘汰不常出现的特征
TrainableHashTable
对Tensorflow进行深度改造,使
其HashTable结构支持反向传播
直接embedding
无需遍历整个样本进行特征统计
Pegasus Meetup@2022
DSL
配置化
基于编译原理的词法语法解析
通用化
支持thrift、json、pb等格式的数据
算子丰富
目前162+个
字符串相关,列表相关,数据计算,时
间相关等
高性能
字符串拼接 à hash combine
Dag 图缓存用到的特征
核心组件:自研高性能特征抽取工具(DSL)
Pegasus Meetup@2022
核心组件:自研高性能特征抽取工具(DSL)
算子化配置
Pegasus Meetup@2022
高性能 大规模
核心组件:自研高性能在线推理库(Prediction-lib)
原生tensorflow 5倍
拆分Embedding lookup 操作
离线训练:低并发,大batch
VS
在线推理:高并发,小batch
模型图分裂,embedding
lookup 操作独立
支持TB级模型在线分布式推理
外部存储embedding 参数 + 本
地缓存
* prediction-lib Launch Review记录: https://wiki.n.miui.com/pages/viewpage.action?pageId=255724333
Pegasus Meetup@2022
Pegasus Meetup@2022
Pegasus应用
4
Pegasus Meetup@2022
容量10T+
离线
用户画像
QPS:13k
P99 value size:50k
P99 Latency: 20ms
容量4.5T
实时
用户画像
QPS:100k
P99 value size:20k
P99 Latency: 5ms
容量890G
用户历史
QPS:23k
P99 value size:50B
P99 Latency: 5ms
Pegasus Meetup@2022
画像使用
• 分业务建表
• 离线画像每天bulkload
• 多业务写入时需要最后做compaction
• 实时画像flink数据流写入
• HashKey: uid+bizType+attribute
Pegasus Meetup@2022
业务(99分位) 优化前耗时(ms) 优化后耗时(ms) 差异/下降比例
电视长视频 35 20 15ms/42%
电视短视频 25 10 15ms/60%
Hbase 切换 Pegasus效果
Pegasus Meetup@2022
谢谢!

More Related Content

Similar to How does Apache Pegasus used in Xiaomi's Universal Recommendation Algorithm Framework

2010中国云计算调查报告
2010中国云计算调查报告2010中国云计算调查报告
2010中国云计算调查报告ITband
 
Picoway Company Profile 1.5
Picoway Company Profile 1.5Picoway Company Profile 1.5
Picoway Company Profile 1.5
picoway
 
Picoway Company Profile V1.5
Picoway Company Profile V1.5Picoway Company Profile V1.5
Picoway Company Profile V1.5
picoway
 
Apache Pegasus's Practice in Data Access Business of Xiaomi
Apache Pegasus's Practice in Data Access Business of XiaomiApache Pegasus's Practice in Data Access Business of Xiaomi
Apache Pegasus's Practice in Data Access Business of Xiaomi
acelyc1112009
 
IxDC 中国交互设计体验日-B4b_陈俊标、杨光_平台移植体验设计
IxDC 中国交互设计体验日-B4b_陈俊标、杨光_平台移植体验设计IxDC 中国交互设计体验日-B4b_陈俊标、杨光_平台移植体验设计
IxDC 中国交互设计体验日-B4b_陈俊标、杨光_平台移植体验设计
IxDC
 
Microsoft Generative AI and Medical case studies.
Microsoft Generative AI and Medical case studies.Microsoft Generative AI and Medical case studies.
Microsoft Generative AI and Medical case studies.
Meng-Ru (Raymond) Tsai
 
移動社交應用趨勢 2013.11.05
移動社交應用趨勢 2013.11.05移動社交應用趨勢 2013.11.05
移動社交應用趨勢 2013.11.05
August Lin
 
期末報告
期末報告期末報告
期末報告
凱勛 張
 
为什么你需要了解应用云
为什么你需要了解应用云为什么你需要了解应用云
为什么你需要了解应用云easychen
 
App狂潮來襲!你做好準備了嗎?
App狂潮來襲!你做好準備了嗎?App狂潮來襲!你做好準備了嗎?
App狂潮來襲!你做好準備了嗎?Ryan Chung
 
智慧應用與物聯網發展趨勢 (A Development Trend of Smart Applications and IoT)
智慧應用與物聯網發展趨勢 (A Development Trend of Smart Applications and IoT)智慧應用與物聯網發展趨勢 (A Development Trend of Smart Applications and IoT)
智慧應用與物聯網發展趨勢 (A Development Trend of Smart Applications and IoT)
William Liang
 
《淘宝客户端 for Android》项目实战
《淘宝客户端 for Android》项目实战《淘宝客户端 for Android》项目实战
《淘宝客户端 for Android》项目实战
完颜 小卓
 
Xiaomi Mi-2(小米手机2) - Simple Chinese Only
Xiaomi Mi-2(小米手机2) - Simple Chinese OnlyXiaomi Mi-2(小米手机2) - Simple Chinese Only
Xiaomi Mi-2(小米手机2) - Simple Chinese Only
JJ Wu
 
The development trends of smart applications and open source system software ...
The development trends of smart applications and open source system software ...The development trends of smart applications and open source system software ...
The development trends of smart applications and open source system software ...
William Liang
 
矽智財產業報告.pdf
矽智財產業報告.pdf矽智財產業報告.pdf
矽智財產業報告.pdf
Collaborator
 
矽智財產業報告.pdf
矽智財產業報告.pdf矽智財產業報告.pdf
矽智財產業報告.pdf
Collaborator
 
How to build your own robot with ibm bluemix&watson
How to build your own robot with ibm bluemix&watsonHow to build your own robot with ibm bluemix&watson
How to build your own robot with ibm bluemix&watson
湯米吳 Tommy Wu
 
物联网操作系统漫谈-GIAC大会.pdf
物联网操作系统漫谈-GIAC大会.pdf物联网操作系统漫谈-GIAC大会.pdf
物联网操作系统漫谈-GIAC大会.pdf
OpenCity Community
 

Similar to How does Apache Pegasus used in Xiaomi's Universal Recommendation Algorithm Framework (20)

2010中国云计算调查报告
2010中国云计算调查报告2010中国云计算调查报告
2010中国云计算调查报告
 
Picoway Company Profile 1.5
Picoway Company Profile 1.5Picoway Company Profile 1.5
Picoway Company Profile 1.5
 
Picoway Company Profile V1.5
Picoway Company Profile V1.5Picoway Company Profile V1.5
Picoway Company Profile V1.5
 
Apache Pegasus's Practice in Data Access Business of Xiaomi
Apache Pegasus's Practice in Data Access Business of XiaomiApache Pegasus's Practice in Data Access Business of Xiaomi
Apache Pegasus's Practice in Data Access Business of Xiaomi
 
IxDC 中国交互设计体验日-B4b_陈俊标、杨光_平台移植体验设计
IxDC 中国交互设计体验日-B4b_陈俊标、杨光_平台移植体验设计IxDC 中国交互设计体验日-B4b_陈俊标、杨光_平台移植体验设计
IxDC 中国交互设计体验日-B4b_陈俊标、杨光_平台移植体验设计
 
Microsoft Generative AI and Medical case studies.
Microsoft Generative AI and Medical case studies.Microsoft Generative AI and Medical case studies.
Microsoft Generative AI and Medical case studies.
 
移動社交應用趨勢 2013.11.05
移動社交應用趨勢 2013.11.05移動社交應用趨勢 2013.11.05
移動社交應用趨勢 2013.11.05
 
期末報告
期末報告期末報告
期末報告
 
为什么你需要了解应用云
为什么你需要了解应用云为什么你需要了解应用云
为什么你需要了解应用云
 
App狂潮來襲!你做好準備了嗎?
App狂潮來襲!你做好準備了嗎?App狂潮來襲!你做好準備了嗎?
App狂潮來襲!你做好準備了嗎?
 
智慧應用與物聯網發展趨勢 (A Development Trend of Smart Applications and IoT)
智慧應用與物聯網發展趨勢 (A Development Trend of Smart Applications and IoT)智慧應用與物聯網發展趨勢 (A Development Trend of Smart Applications and IoT)
智慧應用與物聯網發展趨勢 (A Development Trend of Smart Applications and IoT)
 
《淘宝客户端 for Android》项目实战
《淘宝客户端 for Android》项目实战《淘宝客户端 for Android》项目实战
《淘宝客户端 for Android》项目实战
 
Xiaomi Mi-2(小米手机2) - Simple Chinese Only
Xiaomi Mi-2(小米手机2) - Simple Chinese OnlyXiaomi Mi-2(小米手机2) - Simple Chinese Only
Xiaomi Mi-2(小米手机2) - Simple Chinese Only
 
Emc keynote 1130 1200
Emc keynote 1130 1200Emc keynote 1130 1200
Emc keynote 1130 1200
 
Chengdu Embedded Stagy
Chengdu Embedded StagyChengdu Embedded Stagy
Chengdu Embedded Stagy
 
The development trends of smart applications and open source system software ...
The development trends of smart applications and open source system software ...The development trends of smart applications and open source system software ...
The development trends of smart applications and open source system software ...
 
矽智財產業報告.pdf
矽智財產業報告.pdf矽智財產業報告.pdf
矽智財產業報告.pdf
 
矽智財產業報告.pdf
矽智財產業報告.pdf矽智財產業報告.pdf
矽智財產業報告.pdf
 
How to build your own robot with ibm bluemix&watson
How to build your own robot with ibm bluemix&watsonHow to build your own robot with ibm bluemix&watson
How to build your own robot with ibm bluemix&watson
 
物联网操作系统漫谈-GIAC大会.pdf
物联网操作系统漫谈-GIAC大会.pdf物联网操作系统漫谈-GIAC大会.pdf
物联网操作系统漫谈-GIAC大会.pdf
 

More from acelyc1112009

Apache Pegasus (incubating): A distributed key-value storage system
Apache Pegasus (incubating): A distributed key-value storage systemApache Pegasus (incubating): A distributed key-value storage system
Apache Pegasus (incubating): A distributed key-value storage system
acelyc1112009
 
How does Apache Pegasus used in SensorsData
How does Apache Pegasusused in SensorsDataHow does Apache Pegasusused in SensorsData
How does Apache Pegasus used in SensorsData
acelyc1112009
 
How does the Apache Pegasus used in Advertising Data Stream in SensorsData
How does the Apache Pegasus used in Advertising Data Stream in SensorsDataHow does the Apache Pegasus used in Advertising Data Stream in SensorsData
How does the Apache Pegasus used in Advertising Data Stream in SensorsData
acelyc1112009
 
How to continuously improve Apache Pegasus in complex toB scenarios
How to continuously improve Apache Pegasus in complex toB scenariosHow to continuously improve Apache Pegasus in complex toB scenarios
How to continuously improve Apache Pegasus in complex toB scenarios
acelyc1112009
 
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
acelyc1112009
 
The Introduction of Apache Pegasus 2.4.0
The Introduction of Apache Pegasus 2.4.0The Introduction of Apache Pegasus 2.4.0
The Introduction of Apache Pegasus 2.4.0
acelyc1112009
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
The Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache PegasusThe Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache Pegasus
acelyc1112009
 
How do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine partHow do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine part
acelyc1112009
 
How do we manage more than one thousand of Pegasus clusters - backend part
How do we manage more than one thousand of Pegasus clusters - backend partHow do we manage more than one thousand of Pegasus clusters - backend part
How do we manage more than one thousand of Pegasus clusters - backend part
acelyc1112009
 

More from acelyc1112009 (10)

Apache Pegasus (incubating): A distributed key-value storage system
Apache Pegasus (incubating): A distributed key-value storage systemApache Pegasus (incubating): A distributed key-value storage system
Apache Pegasus (incubating): A distributed key-value storage system
 
How does Apache Pegasus used in SensorsData
How does Apache Pegasusused in SensorsDataHow does Apache Pegasusused in SensorsData
How does Apache Pegasus used in SensorsData
 
How does the Apache Pegasus used in Advertising Data Stream in SensorsData
How does the Apache Pegasus used in Advertising Data Stream in SensorsDataHow does the Apache Pegasus used in Advertising Data Stream in SensorsData
How does the Apache Pegasus used in Advertising Data Stream in SensorsData
 
How to continuously improve Apache Pegasus in complex toB scenarios
How to continuously improve Apache Pegasus in complex toB scenariosHow to continuously improve Apache Pegasus in complex toB scenarios
How to continuously improve Apache Pegasus in complex toB scenarios
 
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...
 
The Introduction of Apache Pegasus 2.4.0
The Introduction of Apache Pegasus 2.4.0The Introduction of Apache Pegasus 2.4.0
The Introduction of Apache Pegasus 2.4.0
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
The Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache PegasusThe Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache Pegasus
 
How do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine partHow do we manage more than one thousand of Pegasus clusters - engine part
How do we manage more than one thousand of Pegasus clusters - engine part
 
How do we manage more than one thousand of Pegasus clusters - backend part
How do we manage more than one thousand of Pegasus clusters - backend partHow do we manage more than one thousand of Pegasus clusters - backend part
How do we manage more than one thousand of Pegasus clusters - backend part
 

How does Apache Pegasus used in Xiaomi's Universal Recommendation Algorithm Framework