SlideShare a Scribd company logo
1 of 33
Large-Scale Distributed Bayesian Matrix
Factorization using Stochastic Gradient MCMC
孙佩源
2016年1月6日
accepted in KDD ’15
currently a postdoctoral
fellow working with
Yoshua Bengio
UCIrvine
Sungjin Ahn
Google
Anoop Korattikara
University of Amsterdam
Max Welling
2008
Bayesian PMF using MCMC
2011
SGLD
2015
Distributed BPMF using SGLD
outline
• Introduction
• Bayesian PMF
• SGLD
• Distributed BPMF using SGLD
• Reference
Introduction
推荐系统
音乐推荐 商品推荐
Introduction
Netflix电影评分预测竞赛
Bayesian PMF
• PMF ( Probabilistic Matrix Factorization )
considered as a generative process
• Pick user u latent factor:
• Pick movie v latent factor:
• For each (user, movie) pair observed:
pick rating as:
1 2
{ , , , }ku u u uL L L L L
1 2
{ , , , }kv v v vR R R R L
*u vL R noise
Bayesian PMF
• PMF graphical model
Bayesian PMF
• PMF learning
MAP:
equivalent to:
Bayesian PMF
• PMF learning
• solution can be found by gradient descent in U and V
• need manual control on the parameters to avoid overfitting
Bayesian PMF
• Bayesian PMF Model
• introduce priors for the parameters
• allow model complexity to be controlled automatically
Bayesian PMF
• Bayesian PMF prediction
• predictive distribution:
• in contrast with MAP estimation in PMF:
*
( |{ , } ( ))ij MAPp R U V R
Integrate over uncertainty
in model parameters
Bayesian PMF
• Bayesian PMF evaluation
• predictive distribution:
• MCMC method: generated by a Markov Chain
whose stationary distribution is the posterior
distribution over the model parameters
0
0
( , , , | , )
( , | , , ) ( , | )
U V
U V U V
p U V R
p U V R p
  
     
Bayesian PMF
• Bayesian PMF inference algorithm
• Gibbs sampling algorithm
=
=
0
0
( | , , , )
( | ) ( | , , , )
i U
i U ij i j V
p U R V
p U p R U V
 
   
0( | , , , )i Vp V R U  
L
矩阵求逆复杂度为 3
( )O D
drawback of MCMC
• each iteration of MCMC requires computations over the
whole dataset
• each round of sampling requires expensive computation
Stochastic Gradient Langevin Dynamics
• stochastic optimization on posterior distribution
to find the MAP parameters operate as follows:
general idea is: 使用数据子集计算梯度近似代替全局梯度
prior distribution likehood 数据子集step size
Stochastic Gradient Langevin Dynamics
• MCMC using Langevin Dynamics
Langevin Diffusion:
的稳态分布为
代入后验分布并使用Euler-Maruyama离散化方法可得:
1
log ( )
2
t t td dt dw      ( ) 
降低step size可以显著降低离散误差
Stochastic Gradient Langevin Dynamics
• the two algorithms looks very similar
• combining these two ideas
Stochastic Gradient Langevin Dynamics
• based on rigorous proof by [Qi He, Jack Xin 2012]:
when , the sequence generated will converge to the true
posterior distribution.
t  
Stochastic Gradient Langevin Dynamics
• the algorithm in SG or LD depends on:
whether the SG noise or LD noise dominates the stochasticity
• when is large: SG noise dominates
• when is small: LD noise dominates
t
t
Distributed BPMF using SGLD
• we will only focus on the sampling from:
• Remember from previous slides, we have:
( , | , )p U V R 
Distributed BPMF using SGLD
• suppose the rating matrix represented as:
• computing the gradient of the log-posterior w.r.t. U is:
Distributed BPMF using SGLD
• we were to update only the parameter of users who have ratings
in the mini-batch data
• we find an unbiased estimate of this gradient which need only
update users in the mini-batch:
这样就可以在使用分块数据更新参数时只计算本块内的用户参数
Distributed BPMF using SGLD
• intuitive explanation to the approach
movies
users
1 3 5
2 1 3
2 2 1
2 4 2
3 2 2
3 3 4
4 1 2
4 4 1
5 1 4
5 3 5
6 2 1
6 4 3
Distributed BPMF using SGLD
• run two chains in parallel with parameters:
• assume the latent features with dimension 2:
1 1
1 1 1{ , }U V 
2 2
2 1 1{ , }U V 
61111
1
6212
, ,
T
UU
U
UU
    
    
     
L
11 141
1
21 24
, ,
V U
V
V U
     
     
     
L
Distributed BPMF using SGLD
• divide the rating matrix into 4 blocks
the gray blocks form the 1st group
the white blocks form the 2nd group
• start 4 workers corresponding to 4 blocks respectively
worker1
worker2
worker3
worker4 work1 and work2 share 1
work3 and work4 share 2
Distributed BPMF using SGLD
• the rating data is partitioned into 4 blocks
2 1 3
2 2 1
3 2 2
1 3 5
2 4 2
3 3 4
4 1 2
5 1 4
6 2 1
4 4 1
5 3 5
6 4 3
worker1
worker3
worker4
worker2
Distributed BPMF using SGLD
• each worker (e.g. worker1) works as:
1. sample mini-batch from work1’s rating data (assume size is 2):
M=
2. for each user i and j in M updates in parallel using following rules:
2 1 3
2 2 1
3 2 2
worker1
2 1 3
2 2 1
Distributed BPMF using SGLD
• each worker (e.g. worker1) works as:
1. worker1 updates U2, U3, V1, V2 using mini-batch data M
2. similarity, worker2 updates U4, U5, U6, V3, V4 using its mini-batch data
2 1 3
2 2 1
3 2 2
worker1
Distributed BPMF using SGLD
• experiments compared five different algorithms:
• dataset:
Distributed BPMF using SGLD
• results:
Netflix dataset Yahoo music dataset
Reference
• Bayesian Proabilistci Matrix Factorization using Markov Chain Monte Carlo. Ruslan Salakhutdinov,
Andriy Mnih. University of Toronto. ICML 2008.
• Probabilistic Matrix Factorization. Emily Fox. University of Washington. Machine Learning for Big Data,
2014.
• Bayesian Learning via Stochastic Gradient Langevin Dynamics. Max Welling. UCIrvine, ICML 2011.
• Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC. sunjin Ahn etc.
KDD 2015.
• Bayesian Posterior Inference in Big Data Arena. Max Welling. ICML 2014 tutorial.
• Hybrid Deterministic-stochastic gradient langevin dynamics for Bayesian learning. Qi He, Jack Xin.
Communications in Information and Systems 2012.
• 预测
• 热点事件

More Related Content

What's hot

GKEL_IGARSS_2011.ppt
GKEL_IGARSS_2011.pptGKEL_IGARSS_2011.ppt
GKEL_IGARSS_2011.pptgrssieee
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
 
Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님taeseon ryu
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Fast Iterative Graph Computation with Block Updates
Fast Iterative Graph Computation with Block UpdatesFast Iterative Graph Computation with Block Updates
Fast Iterative Graph Computation with Block UpdatesWenlei Xie
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationTaeoh Kim
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Iterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraIterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraWenlei Xie
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnnTaeoh Kim
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human labelKai-Wen Zhao
 
Control System toolbox in Matlab
Control System toolbox in MatlabControl System toolbox in Matlab
Control System toolbox in MatlabAbdul Sami
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)Shunta Saito
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
 
Transformation of Random variables & noise concepts
Transformation of Random variables & noise concepts Transformation of Random variables & noise concepts
Transformation of Random variables & noise concepts Darshan Bhatt
 

What's hot (20)

GKEL_IGARSS_2011.ppt
GKEL_IGARSS_2011.pptGKEL_IGARSS_2011.ppt
GKEL_IGARSS_2011.ppt
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님Dear - 딥러닝 논문읽기 모임 김창연님
Dear - 딥러닝 논문읽기 모임 김창연님
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Fast Iterative Graph Computation with Block Updates
Fast Iterative Graph Computation with Block UpdatesFast Iterative Graph Computation with Block Updates
Fast Iterative Graph Computation with Block Updates
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
 
Iterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraIterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data Era
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 
Control System toolbox in Matlab
Control System toolbox in MatlabControl System toolbox in Matlab
Control System toolbox in Matlab
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
ECE 565 FInal Project
ECE 565 FInal ProjectECE 565 FInal Project
ECE 565 FInal Project
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)
 
Transformation of Random variables & noise concepts
Transformation of Random variables & noise concepts Transformation of Random variables & noise concepts
Transformation of Random variables & noise concepts
 

Similar to Dsgld

Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Lu Jiang
 
Multimedia basic video compression techniques
Multimedia basic video compression techniquesMultimedia basic video compression techniques
Multimedia basic video compression techniquesMazin Alwaaly
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementSean Moran
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringAllen Wu
 
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxAlyaaMachi
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...ActiveEon
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
 
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Rakebul Hasan
 
Docker, Monitoring and SLURM Specific Visualisations
Docker, Monitoring and SLURM Specific VisualisationsDocker, Monitoring and SLURM Specific Visualisations
Docker, Monitoring and SLURM Specific Visualisationsalherca1
 
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHMAN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHMIJCSEA Journal
 
Report on Action Recognition using Graph cut
Report on Action Recognition using Graph cut Report on Action Recognition using Graph cut
Report on Action Recognition using Graph cut DCU
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementSean Moran
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...Jihun Yun
 

Similar to Dsgld (20)

OBDPC 2022
OBDPC 2022OBDPC 2022
OBDPC 2022
 
Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...Leveraging high level and low-level features for multimedia event detection.2...
Leveraging high level and low-level features for multimedia event detection.2...
 
Multimedia basic video compression techniques
Multimedia basic video compression techniquesMultimedia basic video compression techniques
Multimedia basic video compression techniques
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image Enhancement
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
 
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
 
Mmclass5b
Mmclass5bMmclass5b
Mmclass5b
 
Moving object detection on FPGA
Moving object detection on FPGAMoving object detection on FPGA
Moving object detection on FPGA
 
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
 
Docker, Monitoring and SLURM Specific Visualisations
Docker, Monitoring and SLURM Specific VisualisationsDocker, Monitoring and SLURM Specific Visualisations
Docker, Monitoring and SLURM Specific Visualisations
 
Defense_20140625
Defense_20140625Defense_20140625
Defense_20140625
 
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHMAN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
 
Report on Action Recognition using Graph cut
Report on Action Recognition using Graph cut Report on Action Recognition using Graph cut
Report on Action Recognition using Graph cut
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image Enhancement
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
PMF BPMF and BPTF
PMF BPMF and BPTFPMF BPMF and BPTF
PMF BPMF and BPTF
 
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
ProxGen: Adaptive Proximal Gradient Methods for Structured Neural Networks (N...
 

More from sun peiyuan

network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learningsun peiyuan
 
基于Gpu的高性能计算
基于Gpu的高性能计算基于Gpu的高性能计算
基于Gpu的高性能计算sun peiyuan
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelsun peiyuan
 
A geometric interpretation for growing networks
A geometric interpretation for growing networksA geometric interpretation for growing networks
A geometric interpretation for growing networkssun peiyuan
 
Variational inference
Variational inferenceVariational inference
Variational inferencesun peiyuan
 

More from sun peiyuan (8)

network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 
基于Gpu的高性能计算
基于Gpu的高性能计算基于Gpu的高性能计算
基于Gpu的高性能计算
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.model
 
A geometric interpretation for growing networks
A geometric interpretation for growing networksA geometric interpretation for growing networks
A geometric interpretation for growing networks
 
Variational inference
Variational inferenceVariational inference
Variational inference
 
Lda
LdaLda
Lda
 
Manifold
ManifoldManifold
Manifold
 
HMC
HMCHMC
HMC
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Dsgld

  • 1. Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC 孙佩源 2016年1月6日
  • 2. accepted in KDD ’15 currently a postdoctoral fellow working with Yoshua Bengio UCIrvine Sungjin Ahn Google Anoop Korattikara University of Amsterdam Max Welling
  • 3. 2008 Bayesian PMF using MCMC 2011 SGLD 2015 Distributed BPMF using SGLD
  • 4. outline • Introduction • Bayesian PMF • SGLD • Distributed BPMF using SGLD • Reference
  • 7. Bayesian PMF • PMF ( Probabilistic Matrix Factorization ) considered as a generative process • Pick user u latent factor: • Pick movie v latent factor: • For each (user, movie) pair observed: pick rating as: 1 2 { , , , }ku u u uL L L L L 1 2 { , , , }kv v v vR R R R L *u vL R noise
  • 8. Bayesian PMF • PMF graphical model
  • 9. Bayesian PMF • PMF learning MAP: equivalent to:
  • 10. Bayesian PMF • PMF learning • solution can be found by gradient descent in U and V • need manual control on the parameters to avoid overfitting
  • 11. Bayesian PMF • Bayesian PMF Model • introduce priors for the parameters • allow model complexity to be controlled automatically
  • 12. Bayesian PMF • Bayesian PMF prediction • predictive distribution: • in contrast with MAP estimation in PMF: * ( |{ , } ( ))ij MAPp R U V R Integrate over uncertainty in model parameters
  • 13. Bayesian PMF • Bayesian PMF evaluation • predictive distribution: • MCMC method: generated by a Markov Chain whose stationary distribution is the posterior distribution over the model parameters 0 0 ( , , , | , ) ( , | , , ) ( , | ) U V U V U V p U V R p U V R p         
  • 14. Bayesian PMF • Bayesian PMF inference algorithm • Gibbs sampling algorithm = = 0 0 ( | , , , ) ( | ) ( | , , , ) i U i U ij i j V p U R V p U p R U V       0( | , , , )i Vp V R U   L 矩阵求逆复杂度为 3 ( )O D
  • 15. drawback of MCMC • each iteration of MCMC requires computations over the whole dataset • each round of sampling requires expensive computation
  • 16. Stochastic Gradient Langevin Dynamics • stochastic optimization on posterior distribution to find the MAP parameters operate as follows: general idea is: 使用数据子集计算梯度近似代替全局梯度 prior distribution likehood 数据子集step size
  • 17. Stochastic Gradient Langevin Dynamics • MCMC using Langevin Dynamics Langevin Diffusion: 的稳态分布为 代入后验分布并使用Euler-Maruyama离散化方法可得: 1 log ( ) 2 t t td dt dw      ( )  降低step size可以显著降低离散误差
  • 18. Stochastic Gradient Langevin Dynamics • the two algorithms looks very similar • combining these two ideas
  • 19. Stochastic Gradient Langevin Dynamics • based on rigorous proof by [Qi He, Jack Xin 2012]: when , the sequence generated will converge to the true posterior distribution. t  
  • 20. Stochastic Gradient Langevin Dynamics • the algorithm in SG or LD depends on: whether the SG noise or LD noise dominates the stochasticity • when is large: SG noise dominates • when is small: LD noise dominates t t
  • 21. Distributed BPMF using SGLD • we will only focus on the sampling from: • Remember from previous slides, we have: ( , | , )p U V R 
  • 22. Distributed BPMF using SGLD • suppose the rating matrix represented as: • computing the gradient of the log-posterior w.r.t. U is:
  • 23. Distributed BPMF using SGLD • we were to update only the parameter of users who have ratings in the mini-batch data • we find an unbiased estimate of this gradient which need only update users in the mini-batch: 这样就可以在使用分块数据更新参数时只计算本块内的用户参数
  • 24. Distributed BPMF using SGLD • intuitive explanation to the approach movies users 1 3 5 2 1 3 2 2 1 2 4 2 3 2 2 3 3 4 4 1 2 4 4 1 5 1 4 5 3 5 6 2 1 6 4 3
  • 25. Distributed BPMF using SGLD • run two chains in parallel with parameters: • assume the latent features with dimension 2: 1 1 1 1 1{ , }U V  2 2 2 1 1{ , }U V  61111 1 6212 , , T UU U UU                 L 11 141 1 21 24 , , V U V V U                   L
  • 26. Distributed BPMF using SGLD • divide the rating matrix into 4 blocks the gray blocks form the 1st group the white blocks form the 2nd group • start 4 workers corresponding to 4 blocks respectively worker1 worker2 worker3 worker4 work1 and work2 share 1 work3 and work4 share 2
  • 27. Distributed BPMF using SGLD • the rating data is partitioned into 4 blocks 2 1 3 2 2 1 3 2 2 1 3 5 2 4 2 3 3 4 4 1 2 5 1 4 6 2 1 4 4 1 5 3 5 6 4 3 worker1 worker3 worker4 worker2
  • 28. Distributed BPMF using SGLD • each worker (e.g. worker1) works as: 1. sample mini-batch from work1’s rating data (assume size is 2): M= 2. for each user i and j in M updates in parallel using following rules: 2 1 3 2 2 1 3 2 2 worker1 2 1 3 2 2 1
  • 29. Distributed BPMF using SGLD • each worker (e.g. worker1) works as: 1. worker1 updates U2, U3, V1, V2 using mini-batch data M 2. similarity, worker2 updates U4, U5, U6, V3, V4 using its mini-batch data 2 1 3 2 2 1 3 2 2 worker1
  • 30. Distributed BPMF using SGLD • experiments compared five different algorithms: • dataset:
  • 31. Distributed BPMF using SGLD • results: Netflix dataset Yahoo music dataset
  • 32. Reference • Bayesian Proabilistci Matrix Factorization using Markov Chain Monte Carlo. Ruslan Salakhutdinov, Andriy Mnih. University of Toronto. ICML 2008. • Probabilistic Matrix Factorization. Emily Fox. University of Washington. Machine Learning for Big Data, 2014. • Bayesian Learning via Stochastic Gradient Langevin Dynamics. Max Welling. UCIrvine, ICML 2011. • Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC. sunjin Ahn etc. KDD 2015. • Bayesian Posterior Inference in Big Data Arena. Max Welling. ICML 2014 tutorial. • Hybrid Deterministic-stochastic gradient langevin dynamics for Bayesian learning. Qi He, Jack Xin. Communications in Information and Systems 2012.