SlideShare a Scribd company logo
1 of 55
LEARNING THE STRUCTURE
OF DEEP SPARSE GRAPHICAL
         MODELS
   Ryan P. Adams, Hanna M. Vallach and Zoubin Ghahramani



               Presented by Justinas Mišeikis
             Supervisor: Alexander Vezhnevets




                             1
DEEP BELIEF NETWORKS
             • Deep belief networks
               consist of multiple layers
             • Consists of visible and
               hidden nodes
             • Visible nodes are only on
               the outside layer and
               represent output
             • Nodes are linked using
               directional edges
             • Graphical model




         2
DEEP BELIEF NETWORKS



             Hidden layers




             Visible layer

         3
DEEP BELIEF NETWORKS
             Properties:

             • # of layers
             • # of nodes in the layer
             • Network connectivity. We
               allow connections to be
               established only in
               consecutive layers
             • Node types. Binary or
               continuous?




         4
DEEP BELIEF NETWORKS
             Properties:

             • # of layers
             • # of nodes in the layer
             • Network connectivity. We
               allow connections to be
               established only in
               consecutive layers
             • Node types. Binary or
               continuous?




         5
DEEP BELIEF NETWORKS
             Properties:

             • # of layers
             • # of nodes in the layer
             • Network connectivity. We
               allow connections to be
               established only in
               consecutive layers
             • Node types. Binary or
               continuous?




         6
DEEP BELIEF NETWORKS
             Properties:

             • # of layers
             • # of nodes in the layer
             • Network connectivity. We
               allow connections to be
               established only in
               consecutive layers
             • Node types. Binary or
               continuous?




         7
DEEP BELIEF NETWORKS
             Properties:

             • # of layers
             • # of nodes in the layer
             • Network connectivity. We
               allow connections to be
               established only in
               consecutive layers
             • Node types. Binary or
               continuous?




         8
THE PROBLEM - DBN STRUCTURE
• What is the best structure of DBN?
   - Number of hidden units in each layer
   - Number of hidden layers
   - Types of unit behaviour
   - Connectivity
• This article presents a non-parametric Bayesian approach for
  learning the structure of a layered DBN




                                9
FINITE SINGLE LAYER NETWORK
Network connectivity is represented using binary matrices.
• Columns and rows represent nodes
• Zero (non-filled) - no connectivity
• One (filled) - a connection


                                        Hidden
        Hidden
                              Visible




        Visible

                                 10
FINITE SINGLE LAYER NETWORK
Network connectivity is represented using binary matrices.
• Columns and rows represent nodes
• Zero (non-filled) - no connectivity
• One (filled) - a connection


                                        Hidden
        Hidden
                              Visible




        Visible

                                 11
FINITE SINGLE LAYER NETWORK
Network connectivity is represented using binary matrices.
• Columns and rows represent nodes
• Zero (non-filled) - no connectivity
• One (filled) - a connection


                                        Hidden
        Hidden
                              Visible




        Visible

                                 12
FINITE SINGLE LAYER NETWORK
Network connectivity is represented using binary matrices.
• Columns and rows represent nodes
• Zero (non-filled) - no connectivity
• One (filled) - a connection


                                        Hidden
        Hidden
                              Visible




        Visible

                                 13
FINITE SINGLE LAYER NETWORK
Network connectivity is represented using binary matrices.
• Columns and rows represent nodes
• Zero (non-filled) - no connectivity
• One (filled) - a connection


                                        Hidden
        Hidden
                              Visible




        Visible

                                 14
FINITE SINGLE LAYER NETWORK
• Network dimensions for a prior have to be defined in advance
• How many hidden units there should be?
   - Not sure
• Can we have an infinite amount of hidden units?
• Solution: the Indian Buffet Process




                                15
THE INDIAN BUFFET PROCESS
The Indian buffet process (IBP) is a stochastic process defining a
probability distribution over equivalence classes of sparse binary
matrices with a finite number of rows and an unbounded number
of columns. *
• Rows - customers (visible layer), finite number of units
• Columns - dishes (hidden layer), unbounded number of
  countable units
• The IBP creates sparse matrices with a posterior of finite
  number of non-zero columns. However during the learning
  process, matrix growth column-wise is unlimited.
* Thomas L. Griffiths, Zoubin Ghahramani. The Indian Buffet Process: An Introduction and Review. 2011
http://jmlr.csail.mit.edu/papers/volume12/griffiths11a/griffiths11a.pdf

                                                    16
THE INDIAN BUFFET PROCESS
                      Dishes
                                              1st Customer tries 2 new dishes
Customers




                ...

                                        Parameters: α and β
                                        ηk - number of previous customers
                                        that have tried the dish
jth customer tries:
• Previously tasted dish k with probability ηk / (j + β - 1)
• Poisson distribution with param αβ / (j + β - 1) of new dishes
                                  17
THE INDIAN BUFFET PROCESS
                      Dishes
                                              1st Customer tries 2 new dishes
Customers




                                              2nd Customer tries 1 old dish + 2 new




                           ...

                                        Parameters: α and β
                                        ηk - number of previous customers
                                        that have tried the dish
jth customer tries:
• Previously tasted dish k with probability ηk / (j + β - 1)
• Poisson distribution with param αβ / (j + β - 1) of new dishes
                                  18
THE INDIAN BUFFET PROCESS
                      Dishes
                                              1st Customer tries 2 new dishes
Customers




                                              2nd Customer tries 1 old dish + 2 new
                                              3rd Customer tries 2 old dishes + 1 new




                                ...

                                        Parameters: α and β
                                        ηk - number of previous customers
                                        that have tried the dish
jth customer tries:
• Previously tasted dish k with probability ηk / (j + β - 1)
• Poisson distribution with param αβ / (j + β - 1) of new dishes
                                  19
THE INDIAN BUFFET PROCESS
                      Dishes
                                              1st Customer tries 2 new dishes
Customers




                                              2nd Customer tries 1 old dish + 2 new
                                              3rd Customer tries 2 old dishes + 1 new
                                              4th Customer tries 2 old dishes + 2 new




                                            ...

                                        Parameters: α and β
                                        ηk - number of previous customers
                                        that have tried the dish
jth customer tries:
• Previously tasted dish k with probability ηk / (j + β - 1)
• Poisson distribution with param αβ / (j + β - 1) of new dishes
                                  20
THE INDIAN BUFFET PROCESS
                      Dishes
                                              1st Customer tries 2 new dishes
Customers




                                              2nd Customer tries 1 old dish + 2 new
                                              3rd Customer tries 2 old dishes + 1 new
                                              4th Customer tries 2 old dishes + 2 new
                                              5th Customer tries 4 old dishes + 2 new




                                                               ...

                                        Parameters: α and β
                                        ηk - number of previous customers
                                        that have tried the dish
jth customer tries:
• Previously tasted dish k with probability ηk / (j + β - 1)
• Poisson distribution with param αβ / (j + β - 1) of new dishes
                                  21
THE INDIAN BUFFET PROCESS
                   Dishes
                                         1st Customer tries 2 new dishes
Customers




                                         2nd Customer tries 1 old dish + 2 new
                                         3rd Customer tries 2 old dishes + 1 new
                                         4th Customer tries 2 old dishes + 2 new
                                         5th Customer tries 4 old dishes + 2 new




                                                          ...


If no more customers come in, marked binary matrix would define
the structure of the deep belief network.


                              22
MULTI LAYER NETWORK
• Single-layer: hidden units are independent
• Multi-layer: hidden units can be dependent
• Solution: extend the IBP to have unlimited number of layers
  -> deep belief network with unbounded width and depth



While a belief network with an infinitely-wide hidden layer can represent any probability
distribution arbitrarily closely, it is not necessarily a useful prior on such distributions.
Without intra-layer connections, the the hidden units are independent a priori. This
“shallowness” is a strong assumption that weakens the model in practice and the
explosion of recent literature on deep belief networks speaks to the empirical success of
belief networks with more hidden structure.



                                             23
CASCADING IBP
• Cascading Indian Buffet Process builds a prior on belief
  networks that are unbounded in both width and depth
• Prior has the following properties
   - Each of the “dishes” in the restaurant of the layer m are
     also “customers” in the restaurant of the layer m+1
   - Columns in layer m binary matrix correspond to the rows in
     the layer m+1 binary matrix
• The matrices in the CIBP are constructed in a sequence starting
  with m = 0, the visible layer
• Number of non-zero columns in the matrix m+1 is determined
  entirely by active non-zero columns in the previous matrix m

                                 24
CASCADING IBP
• Layer 1 has 5 customers who tasted 5 dishes in total




    Layer 1

                               25
CASCADING IBP
• Layer 1 has 5 customers who tasted 5 dishes in total
• Layer 2 ‘inherits’ 5 customers <- 5 dishes in the previous layer




    Layer 1             Layer 2

                                  26
CASCADING IBP
• Layer 1 has 5 customers who tasted 5 dishes in total
• Layer 2 ‘inherits’ 5 customers <- 5 dishes in the previous layer
• These 5 customers in layer 2 taste 7 dishes in total




    Layer 1             Layer 2

                                  27
CASCADING IBP
• Layer 1 has 5 customers who tasted 5 dishes in total
• Layer 2 ‘inherits’ 5 customers <- 5 dishes in the previous layer
• These 5 customers in layer 2 taste 7 dishes in total
• Layer 3 ‘inherits’ 7 customers <- 7 dishes in the previous layer




    Layer 1             Layer 2                Layer 3

                                  28
CASCADING IBP
• Layer 1 has 5 customers who tasted 5 dishes in total
• Layer 2 ‘inherits’ 5 customers <- 5 dishes in the previous layer
• These 5 customers in layer 2 taste 7 dishes in total
• Layer 3 ‘inherits’ 7 customers <- 7 dishes in the previous layer
• Continues until in one layer customers taste zero dishes




                                                             ...

    Layer 1             Layer 2                Layer 3

                                  29
CIBP PARAMETERS
• Two main parameters: α and β
• α - defines the expected in-degree of each unit, or number of
  parents
• β - controls the expected out-degree, or number of children, by
  the following equation:
• K(m) is number of columns in the layer m
• α and β are layer specific, they are not constant in the whole
  network. They can be written as α(m) and β(m)




                                30
CIBP CONVERGENCE
• Does CIBP eventually converge to create finite depth DBN?
   - Yes!
• How?
   - Applying this transition distribution to the Markov chain:



• It is simply a Poisson distribution with mean λ(K(m); α, β)
• The absorbing state, where no ‘dishes’ are tasted, will always
  be reached
• Full mathematical proof of the convergence is given in the
  appendix of the paper

                                 31
CIBP CONVERGENCE




      α = 3, β = 1
           32
CIBP BASED PRIOR SAMPLES




           33
NODE TYPES
• Nonlinear Gaussian belief network (NLGBN) framework is used.
  Distribution u = Gaussian noise precision ν + activation sum y
• Then the noisy sum is transformed with sigmoid function σ(∙)
• Black line shows the zero mean distribution
• Blue line shows pre-sigmoid mean of -1
• Red line shows pre-sigmoid mean of +1




       Binary                Gaussian             Deterministic
                                34
INFERENCE: JOINT DISTRIBUTION
                precision of
                                     input data
               Gaussian noise
                                                       activations
bias weights




                                in-layer units                     NLGBP
layer number   weights matrix                                   distribution
                                                  number of
                                                 observations
                                35
MARKOV CHAIN MONTE CARLO




* Christophe Andreu, Nando de Freitas, Arnaud Doucet, Michael I. Jordan. An Introduction to MCMC for
Machine Learning. 2003.
http://www.cs.princeton.edu/courses/archive/spr06/cos598C/papers/AndrieuFreitasDoucetJordan2003.pdf

                                                 36
INFERENCE
• Task: find the posterior distribution over the structure and the
  parameters of the network
• Conditioning is used in order to update the model part-by-part
  rather than modifying the whole model at each time instance
• Process is split into four parts
   - Edges: Sample posterior distribution over its weight
   - Activations: sample from the posterior distributions over
     the Gaussian noise precision
   - Structure: sample ancestors of the visible units
   - Parameters: closely tied with hyper-parameters



                                     37
INFERENCE
• Task: find the posterior distribution over the structure and the
  parameters of the network
• Conditioning is used in order to update the model part-by-part
  rather than modifying the whole model at each time instance
• Process is split into four parts
   - Edges: Sample posterior distribution over its weight
   - Activations: sample from the posterior distributions over
     the Gaussian noise precision
   - Structure: sample ancestors of the visible units
   - Parameters: closely tied with hyper-parameters



                                     38
SAMPLING FROM THE STRUCTURE
          Layer 2

                         First phase:
                         • For each layer
Layer 1




                    39
SAMPLING FROM THE STRUCTURE
          Layer 2

                         First phase:
                         • For each layer
Layer 1




                         • For each unit in the layer




                    40
SAMPLING FROM THE STRUCTURE
          Layer 2

                         First phase:
                         • For each layer
Layer 1




                         • For each unit in the layer
                         • Check each connected unit
                            in the layer m+1 indexed
                            by k’




                    41
SAMPLING FROM THE STRUCTURE
              Layer 2

                             First phase:
                             • For each layer
Layer 1




                             • For each unit in the layer
                             • Check each connected unit
                                in the layer m+1 indexed
                                by k’
          1                  • Calculate non-zero entries
                                in the k’th column of binary
                                matrix excluding entry in
                                kth row




                        42
SAMPLING FROM THE STRUCTURE
              Layer 2

                             First phase:
                             • For each layer
Layer 1




                             • For each unit in the layer
                             • Check each connected unit
                                in the layer m+1 indexed
                                by k’
          1                  • Calculate non-zero entries
                                in the k’th column of binary
                                matrix excluding entry in
                                kth row




                        43
SAMPLING FROM THE STRUCTURE
              Layer 2

                             First phase:
                             • For each layer
Layer 1




                             • For each unit in the layer
                             • Check each connected unit
                                in the layer m+1 indexed
                                by k’
          1   2              • Calculate non-zero entries
                                in the k’th column of binary
                                matrix excluding entry in
                                kth row




                        44
SAMPLING FROM THE STRUCTURE
              Layer 2

                             First phase:
                             • For each layer
Layer 1




                             • For each unit in the layer
                             • Check each connected unit
                                in the layer m+1 indexed
                                by k’
          1   2              • Calculate non-zero entries
                                in the k’th column of binary
                                matrix excluding entry in
                                kth row




                        45
SAMPLING FROM THE STRUCTURE
              Layer 2

                                     First phase:
                                     • For each layer
Layer 1




                                     • For each unit in the layer
                                     • Check each connected unit
                                        in the layer m+1 indexed
                                        by k’
          1   2    1    1   0        • Calculate non-zero entries
                                        in the k’th column of binary
                                        matrix excluding entry in
                                        kth row
                                     • If the sum is zero, the unit
                                        k’ is a singleton parent

                                46
SAMPLING FROM THE STRUCTURE
              Layer 2

                                     Second phase:
                                     • Considers only singletons
Layer 1




                                     • Option a: add new parent
                                     • Option b: delete
                                       connection to child k
                                     • Decisions are made by the
          1   2    1    1   0          Metropolis-Hastings
                                       operator using birth/death
                                       process
                                     • In the end: units that are
                                       not ancestors of the visible
                                       units are discarded

                                47
EXPERIMENTS
• Three datasets of images were used for experiments
   - Olivetti faces
   - MNIST Digit data
   - Frey faces
• Performance test - image reconstruction
• Bottom halves of images were removed and the model had to
  reconstruct the missing data by ‘seeing’ only top half
• Top-bottom approach was chosen instead of left-right because
  both faces and numbers have left-right symmetry making it
  easier


                               48
OLIVETTI FACES
350 + 50 images of 40 distinct subjects, 64x64
~3 hidden layers: around 70 units in each layer




                      49
OLIVETTI FACES
Raw predictive fantasies from the model




                  50
MNIST DIGIT DATA
        50 + 10 images of 10 digits, 28x28
~3 hidden layers: 120, 100, 70 units in hidden layers




                         51
FREY FACES
1865 + 100 images of a single face, different expressions, 20x28
      ~3 hidden layers: 260, 120, 35 units in hidden layers




                               52
DISCUSSION
• Addresses the issues with deep belief networks
• Unites two areas of research: nonparametric Bayesian methods
  and deep belief networks
• Introduced the cascading Indian buffet process to have
  unbounded number of layers
• CIBP always converges
• Result: algorithm learns the effective model complexity




                                53
DISCUSSION
• Very processor intensive algorithm
  - finding reconstructions took ‘few hours of CPU time’
• Much better than fixed dimensionality DPNs?




                               54
THANK YOU!



    55

More Related Content

More from Justas Miseikis

Robot Localisation and 3D Position Estimation Using a Free-Moving Camera and ...
Robot Localisation and 3D Position Estimation Using a Free-Moving Camera and ...Robot Localisation and 3D Position Estimation Using a Free-Moving Camera and ...
Robot Localisation and 3D Position Estimation Using a Free-Moving Camera and ...Justas Miseikis
 
Multi-Objective Convolutional Neural Networks for Robot Localisation and 3D P...
Multi-Objective Convolutional Neural Networks for Robot Localisation and 3D P...Multi-Objective Convolutional Neural Networks for Robot Localisation and 3D P...
Multi-Objective Convolutional Neural Networks for Robot Localisation and 3D P...Justas Miseikis
 
Automatic Calibration of a Robot Manipulator and Multi 3D Camera System
Automatic Calibration of a Robot Manipulator and Multi 3D Camera SystemAutomatic Calibration of a Robot Manipulator and Multi 3D Camera System
Automatic Calibration of a Robot Manipulator and Multi 3D Camera SystemJustas Miseikis
 
3D Vision Guided Robotic Charging Station for Electric and Plug-In Hybrid Veh...
3D Vision Guided Robotic Charging Station for Electric and Plug-In Hybrid Veh...3D Vision Guided Robotic Charging Station for Electric and Plug-In Hybrid Veh...
3D Vision Guided Robotic Charging Station for Electric and Plug-In Hybrid Veh...Justas Miseikis
 
Mažasis Universitetas, IT karjera: Robotika
Mažasis Universitetas, IT karjera: RobotikaMažasis Universitetas, IT karjera: Robotika
Mažasis Universitetas, IT karjera: RobotikaJustas Miseikis
 
CUDA based Iris Detection based on Hough Transform
CUDA based Iris Detection based on Hough TransformCUDA based Iris Detection based on Hough Transform
CUDA based Iris Detection based on Hough TransformJustas Miseikis
 
Joint Human Detection from On-Board and Off-Board Cameras
Joint Human Detection from On-Board and Off-Board CamerasJoint Human Detection from On-Board and Off-Board Cameras
Joint Human Detection from On-Board and Off-Board CamerasJustas Miseikis
 
TESP 2012 Drums Haptic Interface
TESP 2012 Drums Haptic InterfaceTESP 2012 Drums Haptic Interface
TESP 2012 Drums Haptic InterfaceJustas Miseikis
 
Finger Rehabilitation Robot - Justinas Miseikis
Finger Rehabilitation Robot - Justinas MiseikisFinger Rehabilitation Robot - Justinas Miseikis
Finger Rehabilitation Robot - Justinas MiseikisJustas Miseikis
 

More from Justas Miseikis (9)

Robot Localisation and 3D Position Estimation Using a Free-Moving Camera and ...
Robot Localisation and 3D Position Estimation Using a Free-Moving Camera and ...Robot Localisation and 3D Position Estimation Using a Free-Moving Camera and ...
Robot Localisation and 3D Position Estimation Using a Free-Moving Camera and ...
 
Multi-Objective Convolutional Neural Networks for Robot Localisation and 3D P...
Multi-Objective Convolutional Neural Networks for Robot Localisation and 3D P...Multi-Objective Convolutional Neural Networks for Robot Localisation and 3D P...
Multi-Objective Convolutional Neural Networks for Robot Localisation and 3D P...
 
Automatic Calibration of a Robot Manipulator and Multi 3D Camera System
Automatic Calibration of a Robot Manipulator and Multi 3D Camera SystemAutomatic Calibration of a Robot Manipulator and Multi 3D Camera System
Automatic Calibration of a Robot Manipulator and Multi 3D Camera System
 
3D Vision Guided Robotic Charging Station for Electric and Plug-In Hybrid Veh...
3D Vision Guided Robotic Charging Station for Electric and Plug-In Hybrid Veh...3D Vision Guided Robotic Charging Station for Electric and Plug-In Hybrid Veh...
3D Vision Guided Robotic Charging Station for Electric and Plug-In Hybrid Veh...
 
Mažasis Universitetas, IT karjera: Robotika
Mažasis Universitetas, IT karjera: RobotikaMažasis Universitetas, IT karjera: Robotika
Mažasis Universitetas, IT karjera: Robotika
 
CUDA based Iris Detection based on Hough Transform
CUDA based Iris Detection based on Hough TransformCUDA based Iris Detection based on Hough Transform
CUDA based Iris Detection based on Hough Transform
 
Joint Human Detection from On-Board and Off-Board Cameras
Joint Human Detection from On-Board and Off-Board CamerasJoint Human Detection from On-Board and Off-Board Cameras
Joint Human Detection from On-Board and Off-Board Cameras
 
TESP 2012 Drums Haptic Interface
TESP 2012 Drums Haptic InterfaceTESP 2012 Drums Haptic Interface
TESP 2012 Drums Haptic Interface
 
Finger Rehabilitation Robot - Justinas Miseikis
Finger Rehabilitation Robot - Justinas MiseikisFinger Rehabilitation Robot - Justinas Miseikis
Finger Rehabilitation Robot - Justinas Miseikis
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Learning the Structure of Deep Sparse Graphical Models - Paper presentation

  • 1. LEARNING THE STRUCTURE OF DEEP SPARSE GRAPHICAL MODELS Ryan P. Adams, Hanna M. Vallach and Zoubin Ghahramani Presented by Justinas Mišeikis Supervisor: Alexander Vezhnevets 1
  • 2. DEEP BELIEF NETWORKS • Deep belief networks consist of multiple layers • Consists of visible and hidden nodes • Visible nodes are only on the outside layer and represent output • Nodes are linked using directional edges • Graphical model 2
  • 3. DEEP BELIEF NETWORKS Hidden layers Visible layer 3
  • 4. DEEP BELIEF NETWORKS Properties: • # of layers • # of nodes in the layer • Network connectivity. We allow connections to be established only in consecutive layers • Node types. Binary or continuous? 4
  • 5. DEEP BELIEF NETWORKS Properties: • # of layers • # of nodes in the layer • Network connectivity. We allow connections to be established only in consecutive layers • Node types. Binary or continuous? 5
  • 6. DEEP BELIEF NETWORKS Properties: • # of layers • # of nodes in the layer • Network connectivity. We allow connections to be established only in consecutive layers • Node types. Binary or continuous? 6
  • 7. DEEP BELIEF NETWORKS Properties: • # of layers • # of nodes in the layer • Network connectivity. We allow connections to be established only in consecutive layers • Node types. Binary or continuous? 7
  • 8. DEEP BELIEF NETWORKS Properties: • # of layers • # of nodes in the layer • Network connectivity. We allow connections to be established only in consecutive layers • Node types. Binary or continuous? 8
  • 9. THE PROBLEM - DBN STRUCTURE • What is the best structure of DBN? - Number of hidden units in each layer - Number of hidden layers - Types of unit behaviour - Connectivity • This article presents a non-parametric Bayesian approach for learning the structure of a layered DBN 9
  • 10. FINITE SINGLE LAYER NETWORK Network connectivity is represented using binary matrices. • Columns and rows represent nodes • Zero (non-filled) - no connectivity • One (filled) - a connection Hidden Hidden Visible Visible 10
  • 11. FINITE SINGLE LAYER NETWORK Network connectivity is represented using binary matrices. • Columns and rows represent nodes • Zero (non-filled) - no connectivity • One (filled) - a connection Hidden Hidden Visible Visible 11
  • 12. FINITE SINGLE LAYER NETWORK Network connectivity is represented using binary matrices. • Columns and rows represent nodes • Zero (non-filled) - no connectivity • One (filled) - a connection Hidden Hidden Visible Visible 12
  • 13. FINITE SINGLE LAYER NETWORK Network connectivity is represented using binary matrices. • Columns and rows represent nodes • Zero (non-filled) - no connectivity • One (filled) - a connection Hidden Hidden Visible Visible 13
  • 14. FINITE SINGLE LAYER NETWORK Network connectivity is represented using binary matrices. • Columns and rows represent nodes • Zero (non-filled) - no connectivity • One (filled) - a connection Hidden Hidden Visible Visible 14
  • 15. FINITE SINGLE LAYER NETWORK • Network dimensions for a prior have to be defined in advance • How many hidden units there should be? - Not sure • Can we have an infinite amount of hidden units? • Solution: the Indian Buffet Process 15
  • 16. THE INDIAN BUFFET PROCESS The Indian buffet process (IBP) is a stochastic process defining a probability distribution over equivalence classes of sparse binary matrices with a finite number of rows and an unbounded number of columns. * • Rows - customers (visible layer), finite number of units • Columns - dishes (hidden layer), unbounded number of countable units • The IBP creates sparse matrices with a posterior of finite number of non-zero columns. However during the learning process, matrix growth column-wise is unlimited. * Thomas L. Griffiths, Zoubin Ghahramani. The Indian Buffet Process: An Introduction and Review. 2011 http://jmlr.csail.mit.edu/papers/volume12/griffiths11a/griffiths11a.pdf 16
  • 17. THE INDIAN BUFFET PROCESS Dishes 1st Customer tries 2 new dishes Customers ... Parameters: α and β ηk - number of previous customers that have tried the dish jth customer tries: • Previously tasted dish k with probability ηk / (j + β - 1) • Poisson distribution with param αβ / (j + β - 1) of new dishes 17
  • 18. THE INDIAN BUFFET PROCESS Dishes 1st Customer tries 2 new dishes Customers 2nd Customer tries 1 old dish + 2 new ... Parameters: α and β ηk - number of previous customers that have tried the dish jth customer tries: • Previously tasted dish k with probability ηk / (j + β - 1) • Poisson distribution with param αβ / (j + β - 1) of new dishes 18
  • 19. THE INDIAN BUFFET PROCESS Dishes 1st Customer tries 2 new dishes Customers 2nd Customer tries 1 old dish + 2 new 3rd Customer tries 2 old dishes + 1 new ... Parameters: α and β ηk - number of previous customers that have tried the dish jth customer tries: • Previously tasted dish k with probability ηk / (j + β - 1) • Poisson distribution with param αβ / (j + β - 1) of new dishes 19
  • 20. THE INDIAN BUFFET PROCESS Dishes 1st Customer tries 2 new dishes Customers 2nd Customer tries 1 old dish + 2 new 3rd Customer tries 2 old dishes + 1 new 4th Customer tries 2 old dishes + 2 new ... Parameters: α and β ηk - number of previous customers that have tried the dish jth customer tries: • Previously tasted dish k with probability ηk / (j + β - 1) • Poisson distribution with param αβ / (j + β - 1) of new dishes 20
  • 21. THE INDIAN BUFFET PROCESS Dishes 1st Customer tries 2 new dishes Customers 2nd Customer tries 1 old dish + 2 new 3rd Customer tries 2 old dishes + 1 new 4th Customer tries 2 old dishes + 2 new 5th Customer tries 4 old dishes + 2 new ... Parameters: α and β ηk - number of previous customers that have tried the dish jth customer tries: • Previously tasted dish k with probability ηk / (j + β - 1) • Poisson distribution with param αβ / (j + β - 1) of new dishes 21
  • 22. THE INDIAN BUFFET PROCESS Dishes 1st Customer tries 2 new dishes Customers 2nd Customer tries 1 old dish + 2 new 3rd Customer tries 2 old dishes + 1 new 4th Customer tries 2 old dishes + 2 new 5th Customer tries 4 old dishes + 2 new ... If no more customers come in, marked binary matrix would define the structure of the deep belief network. 22
  • 23. MULTI LAYER NETWORK • Single-layer: hidden units are independent • Multi-layer: hidden units can be dependent • Solution: extend the IBP to have unlimited number of layers -> deep belief network with unbounded width and depth While a belief network with an infinitely-wide hidden layer can represent any probability distribution arbitrarily closely, it is not necessarily a useful prior on such distributions. Without intra-layer connections, the the hidden units are independent a priori. This “shallowness” is a strong assumption that weakens the model in practice and the explosion of recent literature on deep belief networks speaks to the empirical success of belief networks with more hidden structure. 23
  • 24. CASCADING IBP • Cascading Indian Buffet Process builds a prior on belief networks that are unbounded in both width and depth • Prior has the following properties - Each of the “dishes” in the restaurant of the layer m are also “customers” in the restaurant of the layer m+1 - Columns in layer m binary matrix correspond to the rows in the layer m+1 binary matrix • The matrices in the CIBP are constructed in a sequence starting with m = 0, the visible layer • Number of non-zero columns in the matrix m+1 is determined entirely by active non-zero columns in the previous matrix m 24
  • 25. CASCADING IBP • Layer 1 has 5 customers who tasted 5 dishes in total Layer 1 25
  • 26. CASCADING IBP • Layer 1 has 5 customers who tasted 5 dishes in total • Layer 2 ‘inherits’ 5 customers <- 5 dishes in the previous layer Layer 1 Layer 2 26
  • 27. CASCADING IBP • Layer 1 has 5 customers who tasted 5 dishes in total • Layer 2 ‘inherits’ 5 customers <- 5 dishes in the previous layer • These 5 customers in layer 2 taste 7 dishes in total Layer 1 Layer 2 27
  • 28. CASCADING IBP • Layer 1 has 5 customers who tasted 5 dishes in total • Layer 2 ‘inherits’ 5 customers <- 5 dishes in the previous layer • These 5 customers in layer 2 taste 7 dishes in total • Layer 3 ‘inherits’ 7 customers <- 7 dishes in the previous layer Layer 1 Layer 2 Layer 3 28
  • 29. CASCADING IBP • Layer 1 has 5 customers who tasted 5 dishes in total • Layer 2 ‘inherits’ 5 customers <- 5 dishes in the previous layer • These 5 customers in layer 2 taste 7 dishes in total • Layer 3 ‘inherits’ 7 customers <- 7 dishes in the previous layer • Continues until in one layer customers taste zero dishes ... Layer 1 Layer 2 Layer 3 29
  • 30. CIBP PARAMETERS • Two main parameters: α and β • α - defines the expected in-degree of each unit, or number of parents • β - controls the expected out-degree, or number of children, by the following equation: • K(m) is number of columns in the layer m • α and β are layer specific, they are not constant in the whole network. They can be written as α(m) and β(m) 30
  • 31. CIBP CONVERGENCE • Does CIBP eventually converge to create finite depth DBN? - Yes! • How? - Applying this transition distribution to the Markov chain: • It is simply a Poisson distribution with mean λ(K(m); α, β) • The absorbing state, where no ‘dishes’ are tasted, will always be reached • Full mathematical proof of the convergence is given in the appendix of the paper 31
  • 32. CIBP CONVERGENCE α = 3, β = 1 32
  • 33. CIBP BASED PRIOR SAMPLES 33
  • 34. NODE TYPES • Nonlinear Gaussian belief network (NLGBN) framework is used. Distribution u = Gaussian noise precision ν + activation sum y • Then the noisy sum is transformed with sigmoid function σ(∙) • Black line shows the zero mean distribution • Blue line shows pre-sigmoid mean of -1 • Red line shows pre-sigmoid mean of +1 Binary Gaussian Deterministic 34
  • 35. INFERENCE: JOINT DISTRIBUTION precision of input data Gaussian noise activations bias weights in-layer units NLGBP layer number weights matrix distribution number of observations 35
  • 36. MARKOV CHAIN MONTE CARLO * Christophe Andreu, Nando de Freitas, Arnaud Doucet, Michael I. Jordan. An Introduction to MCMC for Machine Learning. 2003. http://www.cs.princeton.edu/courses/archive/spr06/cos598C/papers/AndrieuFreitasDoucetJordan2003.pdf 36
  • 37. INFERENCE • Task: find the posterior distribution over the structure and the parameters of the network • Conditioning is used in order to update the model part-by-part rather than modifying the whole model at each time instance • Process is split into four parts - Edges: Sample posterior distribution over its weight - Activations: sample from the posterior distributions over the Gaussian noise precision - Structure: sample ancestors of the visible units - Parameters: closely tied with hyper-parameters 37
  • 38. INFERENCE • Task: find the posterior distribution over the structure and the parameters of the network • Conditioning is used in order to update the model part-by-part rather than modifying the whole model at each time instance • Process is split into four parts - Edges: Sample posterior distribution over its weight - Activations: sample from the posterior distributions over the Gaussian noise precision - Structure: sample ancestors of the visible units - Parameters: closely tied with hyper-parameters 38
  • 39. SAMPLING FROM THE STRUCTURE Layer 2 First phase: • For each layer Layer 1 39
  • 40. SAMPLING FROM THE STRUCTURE Layer 2 First phase: • For each layer Layer 1 • For each unit in the layer 40
  • 41. SAMPLING FROM THE STRUCTURE Layer 2 First phase: • For each layer Layer 1 • For each unit in the layer • Check each connected unit in the layer m+1 indexed by k’ 41
  • 42. SAMPLING FROM THE STRUCTURE Layer 2 First phase: • For each layer Layer 1 • For each unit in the layer • Check each connected unit in the layer m+1 indexed by k’ 1 • Calculate non-zero entries in the k’th column of binary matrix excluding entry in kth row 42
  • 43. SAMPLING FROM THE STRUCTURE Layer 2 First phase: • For each layer Layer 1 • For each unit in the layer • Check each connected unit in the layer m+1 indexed by k’ 1 • Calculate non-zero entries in the k’th column of binary matrix excluding entry in kth row 43
  • 44. SAMPLING FROM THE STRUCTURE Layer 2 First phase: • For each layer Layer 1 • For each unit in the layer • Check each connected unit in the layer m+1 indexed by k’ 1 2 • Calculate non-zero entries in the k’th column of binary matrix excluding entry in kth row 44
  • 45. SAMPLING FROM THE STRUCTURE Layer 2 First phase: • For each layer Layer 1 • For each unit in the layer • Check each connected unit in the layer m+1 indexed by k’ 1 2 • Calculate non-zero entries in the k’th column of binary matrix excluding entry in kth row 45
  • 46. SAMPLING FROM THE STRUCTURE Layer 2 First phase: • For each layer Layer 1 • For each unit in the layer • Check each connected unit in the layer m+1 indexed by k’ 1 2 1 1 0 • Calculate non-zero entries in the k’th column of binary matrix excluding entry in kth row • If the sum is zero, the unit k’ is a singleton parent 46
  • 47. SAMPLING FROM THE STRUCTURE Layer 2 Second phase: • Considers only singletons Layer 1 • Option a: add new parent • Option b: delete connection to child k • Decisions are made by the 1 2 1 1 0 Metropolis-Hastings operator using birth/death process • In the end: units that are not ancestors of the visible units are discarded 47
  • 48. EXPERIMENTS • Three datasets of images were used for experiments - Olivetti faces - MNIST Digit data - Frey faces • Performance test - image reconstruction • Bottom halves of images were removed and the model had to reconstruct the missing data by ‘seeing’ only top half • Top-bottom approach was chosen instead of left-right because both faces and numbers have left-right symmetry making it easier 48
  • 49. OLIVETTI FACES 350 + 50 images of 40 distinct subjects, 64x64 ~3 hidden layers: around 70 units in each layer 49
  • 50. OLIVETTI FACES Raw predictive fantasies from the model 50
  • 51. MNIST DIGIT DATA 50 + 10 images of 10 digits, 28x28 ~3 hidden layers: 120, 100, 70 units in hidden layers 51
  • 52. FREY FACES 1865 + 100 images of a single face, different expressions, 20x28 ~3 hidden layers: 260, 120, 35 units in hidden layers 52
  • 53. DISCUSSION • Addresses the issues with deep belief networks • Unites two areas of research: nonparametric Bayesian methods and deep belief networks • Introduced the cascading Indian buffet process to have unbounded number of layers • CIBP always converges • Result: algorithm learns the effective model complexity 53
  • 54. DISCUSSION • Very processor intensive algorithm - finding reconstructions took ‘few hours of CPU time’ • Much better than fixed dimensionality DPNs? 54

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n