SlideShare a Scribd company logo
Forest Learning from Data
Joe Suzuki
July 17, 2017
Road Map
PART-II: July 24, 2017 (based on PART-I)
1. Estimating Mutual Information (15 mins)
2. Learning Forests from Data (25 mins)
3. Learning Bayesian Networks from Data (5 mins)
4. Exercise (45 mins)
PART-I: July 17, 2017
A Bayesian Approach to Data Compression
Entropy
Mutual Information (MI)
Correlation may not detect independence!
ML Estimator of MI
Bayesian Testing of Independence
Bayesian Estimation of MI
From Stirling’s formula
For large n
Experiments 500 trials for binary seq. of length n=200
BNSL: a CRAN package (J. Suzuki and J. Kawahara, 2017)
Bayesian Network Learning Structure
https://cran.r-project.org/web/packages/BNSL/index.html
collects research results by Joe Suzuki.
install(“BNSL”)
library(BNSL)
n=200; p=0.5; x=rbinom(n,1,p); y=rbinom(n,1,p) # seqs are generated
mi(x,y, proc=9) # I_n
mi(x,y) # J_n
Tree Approximation
Factorization w.r.t. A Tree
Find E s.t. D(P||P’) is minimized
Kruskal’s Algorithm
Chow-Liu Algorithm
Experiments using Asia data set
• library(BNSL)
• mm=mi_matrix(asia, proc=9) # I_n is used
• edge.list=kruskal(mm)
• g=graph_from_edgelist(edge.list, directed=FALSE)
• plot(g)
• mm=mi_matrix(asia) # J_n is used
• edge.list=kruskal(mm)
• g=graph_from_edgelist(edge.list, directed=FALSE)
• plot(g)
Asia (8 variables)
S. Lauritzen, D. Spiegelhalter. Local
Computation with Probabilities on
Graphical Structures and their
Application to Expert Systems (with
discussion). Journal of the Royal
Statistical Society: Series B
(Statistical Methodology), 50(2):157-
224, 1988
Asia Data Set
I. A. Beinlich, H. J. Suermondt, R. M.
Chavez, and G. F. Cooper. The ALARM
Monitoring System: A Case Study
with Two Probabilistic Inference
Techniques for Belief Networks. In
Proceedings of the 2nd European
Conference on Artificial Intelligence
in Medicine, pages 247-256.
Springer-Verlag, 1989.
Alarm (37 varibles)
Alarm Data Set
Learning Bayesian Networks from Data
The # of candidate structures with p nodes is more than exponential with p
25 DAGs exist for p=3 but only 11 BNs are considered
7 local scores and 11 global scores
• Estimating Mutual Information
• Learning Forests from Data
• Learning Bayesian Networks from Data
Summary
Problem Set #2

More Related Content

More from Joe Suzuki

RとPythonを比較する
RとPythonを比較するRとPythonを比較する
RとPythonを比較する
Joe Suzuki
 
R集会@統数研
R集会@統数研R集会@統数研
R集会@統数研
Joe Suzuki
 
E-learning Development of Statistics and in Duex: Practical Approaches and Th...
E-learning Development of Statistics and in Duex: Practical Approaches and Th...E-learning Development of Statistics and in Duex: Practical Approaches and Th...
E-learning Development of Statistics and in Duex: Practical Approaches and Th...
Joe Suzuki
 
分枝限定法でモデル選択の計算量を低減する
分枝限定法でモデル選択の計算量を低減する分枝限定法でモデル選択の計算量を低減する
分枝限定法でモデル選択の計算量を低減する
Joe Suzuki
 
連続変量を含む条件付相互情報量の推定
連続変量を含む条件付相互情報量の推定連続変量を含む条件付相互情報量の推定
連続変量を含む条件付相互情報量の推定
Joe Suzuki
 
E-learning Design and Development for Data Science in Osaka University
E-learning Design and Development for Data Science in Osaka UniversityE-learning Design and Development for Data Science in Osaka University
E-learning Design and Development for Data Science in Osaka University
Joe Suzuki
 
UAI 2017
UAI 2017UAI 2017
UAI 2017
Joe Suzuki
 
AMBN2017 サテライトワークショップ
AMBN2017 サテライトワークショップAMBN2017 サテライトワークショップ
AMBN2017 サテライトワークショップ
Joe Suzuki
 
CRAN Rパッケージ BNSLの概要
CRAN Rパッケージ BNSLの概要CRAN Rパッケージ BNSLの概要
CRAN Rパッケージ BNSLの概要
Joe Suzuki
 
A Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent LearningA Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent Learning
Joe Suzuki
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
Joe Suzuki
 
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
Joe Suzuki
 
研究紹介(学生向け)
研究紹介(学生向け)研究紹介(学生向け)
研究紹介(学生向け)
Joe Suzuki
 
Bayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal MeasuresBayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal Measures
Joe Suzuki
 
MDL/Bayesian Criteria based on Universal Coding/Measure
MDL/Bayesian Criteria based on Universal Coding/MeasureMDL/Bayesian Criteria based on Universal Coding/Measure
MDL/Bayesian Criteria based on Universal Coding/Measure
Joe Suzuki
 
The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...
Joe Suzuki
 
Universal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousUniversal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or Continuous
Joe Suzuki
 
Bayesian network structure estimation based on the Bayesian/MDL criteria when...
Bayesian network structure estimation based on the Bayesian/MDL criteria when...Bayesian network structure estimation based on the Bayesian/MDL criteria when...
Bayesian network structure estimation based on the Bayesian/MDL criteria when...
Joe Suzuki
 
The Universal Bayesian Chow-Liu Algorithm
The Universal Bayesian Chow-Liu AlgorithmThe Universal Bayesian Chow-Liu Algorithm
The Universal Bayesian Chow-Liu Algorithm
Joe Suzuki
 
Bayes Independence Test
Bayes Independence TestBayes Independence Test
Bayes Independence Test
Joe Suzuki
 

More from Joe Suzuki (20)

RとPythonを比較する
RとPythonを比較するRとPythonを比較する
RとPythonを比較する
 
R集会@統数研
R集会@統数研R集会@統数研
R集会@統数研
 
E-learning Development of Statistics and in Duex: Practical Approaches and Th...
E-learning Development of Statistics and in Duex: Practical Approaches and Th...E-learning Development of Statistics and in Duex: Practical Approaches and Th...
E-learning Development of Statistics and in Duex: Practical Approaches and Th...
 
分枝限定法でモデル選択の計算量を低減する
分枝限定法でモデル選択の計算量を低減する分枝限定法でモデル選択の計算量を低減する
分枝限定法でモデル選択の計算量を低減する
 
連続変量を含む条件付相互情報量の推定
連続変量を含む条件付相互情報量の推定連続変量を含む条件付相互情報量の推定
連続変量を含む条件付相互情報量の推定
 
E-learning Design and Development for Data Science in Osaka University
E-learning Design and Development for Data Science in Osaka UniversityE-learning Design and Development for Data Science in Osaka University
E-learning Design and Development for Data Science in Osaka University
 
UAI 2017
UAI 2017UAI 2017
UAI 2017
 
AMBN2017 サテライトワークショップ
AMBN2017 サテライトワークショップAMBN2017 サテライトワークショップ
AMBN2017 サテライトワークショップ
 
CRAN Rパッケージ BNSLの概要
CRAN Rパッケージ BNSLの概要CRAN Rパッケージ BNSLの概要
CRAN Rパッケージ BNSLの概要
 
A Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent LearningA Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent Learning
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
 
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
 
研究紹介(学生向け)
研究紹介(学生向け)研究紹介(学生向け)
研究紹介(学生向け)
 
Bayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal MeasuresBayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal Measures
 
MDL/Bayesian Criteria based on Universal Coding/Measure
MDL/Bayesian Criteria based on Universal Coding/MeasureMDL/Bayesian Criteria based on Universal Coding/Measure
MDL/Bayesian Criteria based on Universal Coding/Measure
 
The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...
 
Universal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousUniversal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or Continuous
 
Bayesian network structure estimation based on the Bayesian/MDL criteria when...
Bayesian network structure estimation based on the Bayesian/MDL criteria when...Bayesian network structure estimation based on the Bayesian/MDL criteria when...
Bayesian network structure estimation based on the Bayesian/MDL criteria when...
 
The Universal Bayesian Chow-Liu Algorithm
The Universal Bayesian Chow-Liu AlgorithmThe Universal Bayesian Chow-Liu Algorithm
The Universal Bayesian Chow-Liu Algorithm
 
Bayes Independence Test
Bayes Independence TestBayes Independence Test
Bayes Independence Test
 

Recently uploaded

GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 

Forest Learning from Data

  • 1. Forest Learning from Data Joe Suzuki July 17, 2017
  • 2. Road Map PART-II: July 24, 2017 (based on PART-I) 1. Estimating Mutual Information (15 mins) 2. Learning Forests from Data (25 mins) 3. Learning Bayesian Networks from Data (5 mins) 4. Exercise (45 mins) PART-I: July 17, 2017 A Bayesian Approach to Data Compression
  • 5. Correlation may not detect independence!
  • 7.
  • 8. Bayesian Testing of Independence
  • 9.
  • 10. Bayesian Estimation of MI From Stirling’s formula For large n
  • 11. Experiments 500 trials for binary seq. of length n=200
  • 12. BNSL: a CRAN package (J. Suzuki and J. Kawahara, 2017) Bayesian Network Learning Structure https://cran.r-project.org/web/packages/BNSL/index.html collects research results by Joe Suzuki. install(“BNSL”) library(BNSL) n=200; p=0.5; x=rbinom(n,1,p); y=rbinom(n,1,p) # seqs are generated mi(x,y, proc=9) # I_n mi(x,y) # J_n
  • 15. Find E s.t. D(P||P’) is minimized
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Experiments using Asia data set • library(BNSL) • mm=mi_matrix(asia, proc=9) # I_n is used • edge.list=kruskal(mm) • g=graph_from_edgelist(edge.list, directed=FALSE) • plot(g) • mm=mi_matrix(asia) # J_n is used • edge.list=kruskal(mm) • g=graph_from_edgelist(edge.list, directed=FALSE) • plot(g)
  • 23. Asia (8 variables) S. Lauritzen, D. Spiegelhalter. Local Computation with Probabilities on Graphical Structures and their Application to Expert Systems (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 50(2):157- 224, 1988
  • 25. I. A. Beinlich, H. J. Suermondt, R. M. Chavez, and G. F. Cooper. The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks. In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine, pages 247-256. Springer-Verlag, 1989. Alarm (37 varibles)
  • 27. Learning Bayesian Networks from Data The # of candidate structures with p nodes is more than exponential with p
  • 28. 25 DAGs exist for p=3 but only 11 BNs are considered
  • 29.
  • 30. 7 local scores and 11 global scores
  • 31. • Estimating Mutual Information • Learning Forests from Data • Learning Bayesian Networks from Data Summary