The document discusses using a dependent Dirichlet process (DDP) to model ecological data measuring microbe abundance across different pollution sites. It first introduces the biology question and data, which contains measurements of various microbes found at sites with different pollution levels. It then summarizes the Dirichlet process and introduces the DDP as a way to model dependence between sites. The DDP defines a process on the beta distribution parameters that determine the weights in the Dirichlet process mixture, allowing weights to vary based on pollution level.
This document summarizes a discussion on the differences between assessing the "causes of effects" versus the "effects of causes". It outlines that the two questions, while related, require different statistical analyses and frameworks. The causes of effects question aims to determine what caused an observed outcome, while effects of causes looks at the impact of a treatment or exposure. Examples from legal cases, epidemiology, and discrimination studies are provided to illustrate how the perspective taken influences statistical analyses and interpretations.
This document provides a list of 35 references related to Bayesian statistics. It includes journal articles published between 1963 and 2013 in publications like The Annals of Probability, Journal of the American Statistical Association, and Bayesian Analysis. These references cover topics such as Markov chain Monte Carlo sampling methods, Bayesian model specification, consistency of Bayes estimates, and nonparametric Bayesian inference.
The Metropolis Hastings algorithm is an MCMC method for obtaining a sequence of samples from a probability distribution when direct sampling is difficult. It constructs a Markov chain that has the desired target distribution as its stationary distribution. At each step, a candidate sample is generated and either accepted, replacing the current state, or rejected, keeping the current state. The acceptance ratio is determined by the ratio of probabilities of the candidate and current states. The algorithm is a generalization of the Metropolis algorithm that allows for non-symmetric proposal distributions. When the chain satisfies ergodicity conditions, the sample distribution will converge to the target distribution as the number of samples increases.
The document discusses several key ideas in statistics and modeling:
1. Fisher and Neyman had different views on model specification - Fisher saw it as practical while Neyman emphasized theoretical building blocks.
2. Statistics can contribute a "reservoir of models", model selection techniques, and classification of theoretical vs empirical models.
3. Theoretical models aim to explain underlying mechanisms while empirical models guide actions based on forecasts.
4. Examples like Mendel's inheritance models, Pearson distributions, and Galileo's trial illustrate the development and application of statistical modeling.
This document provides a list of 33 papers related to Bayesian statistics for students to choose from for a presentation. It includes brief descriptions of several theoretical and general audience journals. The papers cover a range of topics in Bayesian statistics published between 1763 and 2013. Students will be evaluated on their understanding and presentation of the chosen paper.
Novel image fusion techniques using global and local kekre wavelet transformsIAEME Publication
This document presents novel image fusion techniques using Kekre wavelet transforms. It describes Kekre transform and generation of local and global Kekre wavelet transforms. A proposed image fusion method is presented that applies local or global Kekre transforms to source images, then fuses coefficients using minimum, maximum or average. Experimentation on six image sets showed local Kekre wavelet transform outperformed other techniques, and averaging fusion was better than minimum or maximum. Local Kekre wavelet transform with averaging produced the best fused images with lowest mean square error compared to source images.
This document summarizes a discussion on the differences between assessing the "causes of effects" versus the "effects of causes". It outlines that the two questions, while related, require different statistical analyses and frameworks. The causes of effects question aims to determine what caused an observed outcome, while effects of causes looks at the impact of a treatment or exposure. Examples from legal cases, epidemiology, and discrimination studies are provided to illustrate how the perspective taken influences statistical analyses and interpretations.
This document provides a list of 35 references related to Bayesian statistics. It includes journal articles published between 1963 and 2013 in publications like The Annals of Probability, Journal of the American Statistical Association, and Bayesian Analysis. These references cover topics such as Markov chain Monte Carlo sampling methods, Bayesian model specification, consistency of Bayes estimates, and nonparametric Bayesian inference.
The Metropolis Hastings algorithm is an MCMC method for obtaining a sequence of samples from a probability distribution when direct sampling is difficult. It constructs a Markov chain that has the desired target distribution as its stationary distribution. At each step, a candidate sample is generated and either accepted, replacing the current state, or rejected, keeping the current state. The acceptance ratio is determined by the ratio of probabilities of the candidate and current states. The algorithm is a generalization of the Metropolis algorithm that allows for non-symmetric proposal distributions. When the chain satisfies ergodicity conditions, the sample distribution will converge to the target distribution as the number of samples increases.
The document discusses several key ideas in statistics and modeling:
1. Fisher and Neyman had different views on model specification - Fisher saw it as practical while Neyman emphasized theoretical building blocks.
2. Statistics can contribute a "reservoir of models", model selection techniques, and classification of theoretical vs empirical models.
3. Theoretical models aim to explain underlying mechanisms while empirical models guide actions based on forecasts.
4. Examples like Mendel's inheritance models, Pearson distributions, and Galileo's trial illustrate the development and application of statistical modeling.
This document provides a list of 33 papers related to Bayesian statistics for students to choose from for a presentation. It includes brief descriptions of several theoretical and general audience journals. The papers cover a range of topics in Bayesian statistics published between 1763 and 2013. Students will be evaluated on their understanding and presentation of the chosen paper.
Novel image fusion techniques using global and local kekre wavelet transformsIAEME Publication
This document presents novel image fusion techniques using Kekre wavelet transforms. It describes Kekre transform and generation of local and global Kekre wavelet transforms. A proposed image fusion method is presented that applies local or global Kekre transforms to source images, then fuses coefficients using minimum, maximum or average. Experimentation on six image sets showed local Kekre wavelet transform outperformed other techniques, and averaging fusion was better than minimum or maximum. Local Kekre wavelet transform with averaging produced the best fused images with lowest mean square error compared to source images.
Novel image fusion techniques using global and local kekre wavelet transformsIAEME Publication
This document discusses novel image fusion techniques using Kekre wavelet transforms. It proposes using both local and global Kekre wavelet transforms for image fusion. The key steps are:
1) Apply the local or global Kekre wavelet transform to each input image separately, generating transformed images.
2) Fuse the transformed images using either the average, minimum or maximum of the coefficient values at each point.
3) Apply the inverse transform to the fused coefficients to obtain the output fused image.
Experiments on six sets of images are used to compare the performance of local vs global Kekre wavelet transforms, and averaging, minimum and maximum fusion methods. The results show that
The document discusses using OpenCL to accelerate genomic analysis through parallelization. It introduces OpenCL and provides examples of using it to parallelize algorithms for copy number inference in tumors, computing relatedness between individuals, and performing variable selection in regression. Key applications discussed include hidden Markov models for copy number inference, principal component analysis on relatedness matrices, and coordinate descent algorithms for lasso regression. Performance gains of up to 155x are reported for the parallel implementations compared to serial code.
The document describes an image processing methodology to detect the nematode C. elegans in microscope images. It aims to automate the identification of individual worms, which is currently done manually but is too labor-intensive. The methodology segments worms from the background, detects endpoints, generates shape descriptors, and performs profile-driven shape fitting to identify worms. It was implemented as a plug-in for the open-source image analysis software Endrov and aims to improve upon previous automated methods by achieving a higher matching accuracy.
This document summarizes novel statistical methods for genetic association studies, including those that account for population structure. It describes methods for detecting gene-gene interactions and inferring copy number variations. For interactions, it proposes using graphics processing units to efficiently search large model spaces. For copy number analysis, it presents a hidden Markov model approach to deconvolve tumor profiles from normal cell contamination. Speedups of over 100x were achieved by parallelizing the model training on a GPU.
Elizabeth Iorns - How Science Exchange promotes Open ScienceScience Exchange
Science Exchange is an online marketplace that connects scientists seeking specialized research services with providers that can perform those services. This allows scientists to outsource experiments and analyses to expert facilities around the world, improving transparency of pricing, access to expertise, efficiency of research, and reproducibility of results. By distributing work among multiple specialized providers, Science Exchange aims to enhance the overall quality and reproducibility of academic research.
This document discusses self-organizing neural networks, including Kohonen networks and Adaptive Resonance Theory (ART). Kohonen networks use competitive learning to form topological mappings between input and output layers. Neighboring units respond to similar inputs, and learning updates weights of both the winning unit and its neighbors. ART networks learn stable recognition codes in response to input sequences and address the stability-plasticity dilemma by resetting matches that fail a vigilance test.
Terminological cluster trees for Disjointness Axiom DiscoveryGiuseppe Rizzo
The document describes a framework for discovering disjointness axioms from semantic web knowledge bases using terminological cluster trees (TCT). It induces TCTs from knowledge bases to cluster individuals, derives concept descriptions for clusters, and proposes disjointness axioms between non-overlapping concept descriptions. An evaluation on several ontologies shows it can rediscover many existing disjointness axioms and propose new plausible ones, with limited inconsistencies introduced.
This document discusses next generation DNA sequencing technologies. It begins by describing some of the limitations of traditional Sanger sequencing, such as read lengths of 500-1000 bases and throughput of 57,000 bases per run. It then introduces some key next generation sequencing technologies, such as 454 sequencing which uses emulsion PCR and pyrosequencing to achieve read lengths of 20-100 bases but higher throughput of 20-100 Mb per run. Illumina/Solexa sequencing is also discussed, which uses sequencing by synthesis with reversible terminators and laser-based detection. Finally, third generation sequencing technologies are mentioned, such as Pacific Biosciences' single molecule real time sequencing and nanopore sequencing. In summary, the document provides a high-level
PhD describes methods for segmentation of cells in phase contrast microscopy. The PhD was realized partly at Max Planck Institute of Cell Biology and Dresden in Buchholz Lab.
Flow cytometry analyzes cells by detecting fluorescent markers on individual cells. Nikolas Pontikos' work automatically analyzes flow cytometry data to identify cell phenotypes, such as naive CD25+ cells, and evaluates associations between cell phenotypes and genetic/clinical factors. His method follows from manually gated data and defines thresholds to automatically gate on markers like CD25. This allows evaluating repeatability of cell phenotype identification over time in large sample sets.
Classification of squamous cell cervical cytologykarthigailakshmi
This document presents a thesis submitted for the degree of Magister en Ingeniería Biomédica. The thesis aims to classify squamous cervical cells using color and texture descriptors defined in the MPEG-7 standard. The author first characterizes the transformation zone of cervical smear images using MPEG-7 descriptors like color layout, scalable color, and edge histogram. These descriptors are then used as inputs to binary classifiers to obtain a precision of 90% and sensitivity of 83% for cell classification. Unlike traditional approaches requiring cell segmentation, the proposed method is independent of cell shape. The thesis finds this strategy applicable for pre-screening cervical smear images in conditions with random noise factors that could mislead segmentation.
The document provides an overview of artificial neural networks and bio-inspired algorithms. It discusses various neural network concepts like the perceptron algorithm, backpropagation, genetic algorithms, particle swarm optimization, autoencoders, and deep neural networks. It includes descriptions of key concepts, mathematical equations, examples to illustrate how different algorithms work, and comparisons between algorithms. The document serves as an introduction to neural networks and bio-inspired optimization techniques.
This thesis examines using a data-driven sample generator model to augment real data for improved classification of schizophrenia patients and healthy controls from structural magnetic resonance images (SMRIs). A three-way ANOVA analysis of SMRIs found significant differences between groups based on diagnosis, age, and gender. Various machine learning classifiers were tested on raw data, reduced data, and augmented data. A neural network trained exclusively on synthetic data produced the best classification results, demonstrating that the generator model can produce realistic training data that improves generalization.
Faster, More Effective Flowgraph-based Malware ClassificationSilvio Cesare
Silvio Cesare is a PhD candidate at Deakin University researching malware detection and automated vulnerability discovery. His current work extends his Masters research on fast automated unpacking and classification of malware. He presented this work last year at Ruxcon 2010. His system uses control flow graphs and q-grams of decompiled code as "birthmarks" to detect unknown malware samples that are suspiciously similar to known malware, reducing the need for signatures. He evaluated the system on 10,000 malware samples with only 10 false positives. The system provides improved effectiveness and efficiency over his previous work in 2010.
MMseqs (Many-against-Many sequence searching) is a novel software suite for very fast protein sequence searches and clustering of huge protein sequence data sets, such as sets of predicted protein sequences or 6-frame-translated open reading frames (ORFs) from large metagenomics experiments. MMseqs is around 1000 times faster than protein BLAST and sensitive enough to capture similarities down to less than 30% sequence identity.
At the core of MMseqs are two modules for the comparison of two sequence sets with each other. The first, prefiltering module computes the similarities between all sequences in one set with all sequences in the other based on a very fast and sensitive alignment-free metric, the sum of scores of similar 7-mers. The second module implements an AVX2-accelerated Smith-Waterman-alignment of all sequences that pass a cut-off for the score in the first module. Due to its unparalleled combination of speed and sensitivity, searches of all predicted ORFs in large metagenomics data sets through the entire UniProt or NCBI-NR databases will be feasible. This could allow to assign to functional clusters and taxonomic clades many reads that are too diverged to be mappable by current software.
MMseqs' third module can also cluster sequence sets efficiently, based on the similarity graph obtained from the comparison of the sequence set with itself in modules 1 and 2. MMseqs further supports an updating mode in which sequences can be added to an existing clustering with stable cluster identifiers and without the need to recluster the entire sequence set. MMseqs will therefore be used to offer high-quality clustered versions of the UniProt database down to 30% sequence similarity threshold.
- The document discusses Bayesian deep learning, including introducing Bayesian approaches, modeling uncertainty, and challenges such as scaling algorithms and building interpretable priors.
- It describes early work showing infinite width neural networks behave as Gaussian processes.
- For wide Bayesian neural networks with certain properties, the marginal prior distribution of units converges to a Gaussian process in the wide limit. This "wide regime" property extends to deep networks.
Bayesian neural networks increasingly sparsify their units with depthJulyan Arbel
This document analyzes deep Bayesian neural networks with Gaussian priors on weights and ReLU-like activations. It proves that the marginal prior distributions of hidden units become heavier-tailed (sub-Weibull) with increasing layer depth, with an optimal tail parameter of layer depth divided by 2. This indicates that units in deeper layers will be more sparsely represented under maximum a posteriori estimation, explaining the natural shrinkage properties of these networks.
Species sampling models in Bayesian NonparametricsJulyan Arbel
This document discusses species sampling models and discovery probabilities. It introduces the problem of estimating the probability of observing a new species given a sample. Good and Turing proposed an estimator for this during World War II. Bayesian nonparametric models provide an alternative approach by placing a prior on unknown species proportions. The document outlines BNP estimators for discovery probabilities and how credible intervals can be derived. It applies these methods to genomic datasets of expressed sequence tags to estimate discovery probabilities for observing new genes.
Dependent processes in Bayesian NonparametricsJulyan Arbel
This document summarizes dependent processes in Bayesian nonparametrics. It motivates the need for dependent random probability measures to accommodate temporal dependence structures beyond the exchangeability assumption. It describes modeling collections of random probability measures indexed by time as either discrete-time or continuous-time processes. The diffusive Dirichlet process is introduced as a dependent Dirichlet process with Dirichlet marginal distributions at each time point and continuous sample paths. Simulation and estimation methods are discussed for this model.
Asymptotics for discrete random measuresJulyan Arbel
This document provides an introduction to asymptotics for discrete random measures, specifically the Dirichlet process and two-parameter Poisson-Dirichlet process. It discusses several key aspects in 3 sentences or less:
1) It outlines the stick-breaking construction of the two-parameter Poisson-Dirichlet process and defines related notation. 2) It introduces the truncation error Rn and discusses how its asymptotic behavior differs between the Dirichlet and two-parameter Poisson-Dirichlet cases. 3) It briefly describes some applications of these processes in mixture modeling and summarizes different sampling approaches like blocked Gibbs and slice sampling that rely on truncation of the infinite-dimensional distributions.
Bayesian Nonparametrics, Applications to biology, ecology, and marketingJulyan Arbel
This document discusses applications of Bayesian nonparametric methods to various domains including toxicology, ecology, marketing, human fertility, and more. It provides examples of using rounded Gaussian mixtures and Dirichlet process mixtures to model count data from developmental toxicity studies and animal abundance data. Applications to modeling multivariate mobile phone usage data and basal body temperature curves are also described. The document emphasizes that Bayesian nonparametric approaches allow inclusion of prior information and flexible modeling of complex data structures.
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
The document provides an introduction to Bayesian nonparametrics and the Dirichlet process. It explains that Bayesian nonparametrics aims to fit models that can adapt their complexity based on the data, without strictly imposing a fixed structure. The Dirichlet process is described as a prior distribution on the space of all probability distributions, allowing the model to utilize an infinite number of parameters. Nonparametric mixture models using the Dirichlet process provide a flexible approach to density estimation and clustering.
Novel image fusion techniques using global and local kekre wavelet transformsIAEME Publication
This document discusses novel image fusion techniques using Kekre wavelet transforms. It proposes using both local and global Kekre wavelet transforms for image fusion. The key steps are:
1) Apply the local or global Kekre wavelet transform to each input image separately, generating transformed images.
2) Fuse the transformed images using either the average, minimum or maximum of the coefficient values at each point.
3) Apply the inverse transform to the fused coefficients to obtain the output fused image.
Experiments on six sets of images are used to compare the performance of local vs global Kekre wavelet transforms, and averaging, minimum and maximum fusion methods. The results show that
The document discusses using OpenCL to accelerate genomic analysis through parallelization. It introduces OpenCL and provides examples of using it to parallelize algorithms for copy number inference in tumors, computing relatedness between individuals, and performing variable selection in regression. Key applications discussed include hidden Markov models for copy number inference, principal component analysis on relatedness matrices, and coordinate descent algorithms for lasso regression. Performance gains of up to 155x are reported for the parallel implementations compared to serial code.
The document describes an image processing methodology to detect the nematode C. elegans in microscope images. It aims to automate the identification of individual worms, which is currently done manually but is too labor-intensive. The methodology segments worms from the background, detects endpoints, generates shape descriptors, and performs profile-driven shape fitting to identify worms. It was implemented as a plug-in for the open-source image analysis software Endrov and aims to improve upon previous automated methods by achieving a higher matching accuracy.
This document summarizes novel statistical methods for genetic association studies, including those that account for population structure. It describes methods for detecting gene-gene interactions and inferring copy number variations. For interactions, it proposes using graphics processing units to efficiently search large model spaces. For copy number analysis, it presents a hidden Markov model approach to deconvolve tumor profiles from normal cell contamination. Speedups of over 100x were achieved by parallelizing the model training on a GPU.
Elizabeth Iorns - How Science Exchange promotes Open ScienceScience Exchange
Science Exchange is an online marketplace that connects scientists seeking specialized research services with providers that can perform those services. This allows scientists to outsource experiments and analyses to expert facilities around the world, improving transparency of pricing, access to expertise, efficiency of research, and reproducibility of results. By distributing work among multiple specialized providers, Science Exchange aims to enhance the overall quality and reproducibility of academic research.
This document discusses self-organizing neural networks, including Kohonen networks and Adaptive Resonance Theory (ART). Kohonen networks use competitive learning to form topological mappings between input and output layers. Neighboring units respond to similar inputs, and learning updates weights of both the winning unit and its neighbors. ART networks learn stable recognition codes in response to input sequences and address the stability-plasticity dilemma by resetting matches that fail a vigilance test.
Terminological cluster trees for Disjointness Axiom DiscoveryGiuseppe Rizzo
The document describes a framework for discovering disjointness axioms from semantic web knowledge bases using terminological cluster trees (TCT). It induces TCTs from knowledge bases to cluster individuals, derives concept descriptions for clusters, and proposes disjointness axioms between non-overlapping concept descriptions. An evaluation on several ontologies shows it can rediscover many existing disjointness axioms and propose new plausible ones, with limited inconsistencies introduced.
This document discusses next generation DNA sequencing technologies. It begins by describing some of the limitations of traditional Sanger sequencing, such as read lengths of 500-1000 bases and throughput of 57,000 bases per run. It then introduces some key next generation sequencing technologies, such as 454 sequencing which uses emulsion PCR and pyrosequencing to achieve read lengths of 20-100 bases but higher throughput of 20-100 Mb per run. Illumina/Solexa sequencing is also discussed, which uses sequencing by synthesis with reversible terminators and laser-based detection. Finally, third generation sequencing technologies are mentioned, such as Pacific Biosciences' single molecule real time sequencing and nanopore sequencing. In summary, the document provides a high-level
PhD describes methods for segmentation of cells in phase contrast microscopy. The PhD was realized partly at Max Planck Institute of Cell Biology and Dresden in Buchholz Lab.
Flow cytometry analyzes cells by detecting fluorescent markers on individual cells. Nikolas Pontikos' work automatically analyzes flow cytometry data to identify cell phenotypes, such as naive CD25+ cells, and evaluates associations between cell phenotypes and genetic/clinical factors. His method follows from manually gated data and defines thresholds to automatically gate on markers like CD25. This allows evaluating repeatability of cell phenotype identification over time in large sample sets.
Classification of squamous cell cervical cytologykarthigailakshmi
This document presents a thesis submitted for the degree of Magister en Ingeniería Biomédica. The thesis aims to classify squamous cervical cells using color and texture descriptors defined in the MPEG-7 standard. The author first characterizes the transformation zone of cervical smear images using MPEG-7 descriptors like color layout, scalable color, and edge histogram. These descriptors are then used as inputs to binary classifiers to obtain a precision of 90% and sensitivity of 83% for cell classification. Unlike traditional approaches requiring cell segmentation, the proposed method is independent of cell shape. The thesis finds this strategy applicable for pre-screening cervical smear images in conditions with random noise factors that could mislead segmentation.
The document provides an overview of artificial neural networks and bio-inspired algorithms. It discusses various neural network concepts like the perceptron algorithm, backpropagation, genetic algorithms, particle swarm optimization, autoencoders, and deep neural networks. It includes descriptions of key concepts, mathematical equations, examples to illustrate how different algorithms work, and comparisons between algorithms. The document serves as an introduction to neural networks and bio-inspired optimization techniques.
This thesis examines using a data-driven sample generator model to augment real data for improved classification of schizophrenia patients and healthy controls from structural magnetic resonance images (SMRIs). A three-way ANOVA analysis of SMRIs found significant differences between groups based on diagnosis, age, and gender. Various machine learning classifiers were tested on raw data, reduced data, and augmented data. A neural network trained exclusively on synthetic data produced the best classification results, demonstrating that the generator model can produce realistic training data that improves generalization.
Faster, More Effective Flowgraph-based Malware ClassificationSilvio Cesare
Silvio Cesare is a PhD candidate at Deakin University researching malware detection and automated vulnerability discovery. His current work extends his Masters research on fast automated unpacking and classification of malware. He presented this work last year at Ruxcon 2010. His system uses control flow graphs and q-grams of decompiled code as "birthmarks" to detect unknown malware samples that are suspiciously similar to known malware, reducing the need for signatures. He evaluated the system on 10,000 malware samples with only 10 false positives. The system provides improved effectiveness and efficiency over his previous work in 2010.
MMseqs (Many-against-Many sequence searching) is a novel software suite for very fast protein sequence searches and clustering of huge protein sequence data sets, such as sets of predicted protein sequences or 6-frame-translated open reading frames (ORFs) from large metagenomics experiments. MMseqs is around 1000 times faster than protein BLAST and sensitive enough to capture similarities down to less than 30% sequence identity.
At the core of MMseqs are two modules for the comparison of two sequence sets with each other. The first, prefiltering module computes the similarities between all sequences in one set with all sequences in the other based on a very fast and sensitive alignment-free metric, the sum of scores of similar 7-mers. The second module implements an AVX2-accelerated Smith-Waterman-alignment of all sequences that pass a cut-off for the score in the first module. Due to its unparalleled combination of speed and sensitivity, searches of all predicted ORFs in large metagenomics data sets through the entire UniProt or NCBI-NR databases will be feasible. This could allow to assign to functional clusters and taxonomic clades many reads that are too diverged to be mappable by current software.
MMseqs' third module can also cluster sequence sets efficiently, based on the similarity graph obtained from the comparison of the sequence set with itself in modules 1 and 2. MMseqs further supports an updating mode in which sequences can be added to an existing clustering with stable cluster identifiers and without the need to recluster the entire sequence set. MMseqs will therefore be used to offer high-quality clustered versions of the UniProt database down to 30% sequence similarity threshold.
- The document discusses Bayesian deep learning, including introducing Bayesian approaches, modeling uncertainty, and challenges such as scaling algorithms and building interpretable priors.
- It describes early work showing infinite width neural networks behave as Gaussian processes.
- For wide Bayesian neural networks with certain properties, the marginal prior distribution of units converges to a Gaussian process in the wide limit. This "wide regime" property extends to deep networks.
Bayesian neural networks increasingly sparsify their units with depthJulyan Arbel
This document analyzes deep Bayesian neural networks with Gaussian priors on weights and ReLU-like activations. It proves that the marginal prior distributions of hidden units become heavier-tailed (sub-Weibull) with increasing layer depth, with an optimal tail parameter of layer depth divided by 2. This indicates that units in deeper layers will be more sparsely represented under maximum a posteriori estimation, explaining the natural shrinkage properties of these networks.
Species sampling models in Bayesian NonparametricsJulyan Arbel
This document discusses species sampling models and discovery probabilities. It introduces the problem of estimating the probability of observing a new species given a sample. Good and Turing proposed an estimator for this during World War II. Bayesian nonparametric models provide an alternative approach by placing a prior on unknown species proportions. The document outlines BNP estimators for discovery probabilities and how credible intervals can be derived. It applies these methods to genomic datasets of expressed sequence tags to estimate discovery probabilities for observing new genes.
Dependent processes in Bayesian NonparametricsJulyan Arbel
This document summarizes dependent processes in Bayesian nonparametrics. It motivates the need for dependent random probability measures to accommodate temporal dependence structures beyond the exchangeability assumption. It describes modeling collections of random probability measures indexed by time as either discrete-time or continuous-time processes. The diffusive Dirichlet process is introduced as a dependent Dirichlet process with Dirichlet marginal distributions at each time point and continuous sample paths. Simulation and estimation methods are discussed for this model.
Asymptotics for discrete random measuresJulyan Arbel
This document provides an introduction to asymptotics for discrete random measures, specifically the Dirichlet process and two-parameter Poisson-Dirichlet process. It discusses several key aspects in 3 sentences or less:
1) It outlines the stick-breaking construction of the two-parameter Poisson-Dirichlet process and defines related notation. 2) It introduces the truncation error Rn and discusses how its asymptotic behavior differs between the Dirichlet and two-parameter Poisson-Dirichlet cases. 3) It briefly describes some applications of these processes in mixture modeling and summarizes different sampling approaches like blocked Gibbs and slice sampling that rely on truncation of the infinite-dimensional distributions.
Bayesian Nonparametrics, Applications to biology, ecology, and marketingJulyan Arbel
This document discusses applications of Bayesian nonparametric methods to various domains including toxicology, ecology, marketing, human fertility, and more. It provides examples of using rounded Gaussian mixtures and Dirichlet process mixtures to model count data from developmental toxicity studies and animal abundance data. Applications to modeling multivariate mobile phone usage data and basal body temperature curves are also described. The document emphasizes that Bayesian nonparametric approaches allow inclusion of prior information and flexible modeling of complex data structures.
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
The document provides an introduction to Bayesian nonparametrics and the Dirichlet process. It explains that Bayesian nonparametrics aims to fit models that can adapt their complexity based on the data, without strictly imposing a fixed structure. The Dirichlet process is described as a prior distribution on the space of all probability distributions, allowing the model to utilize an infinite number of parameters. Nonparametric mixture models using the Dirichlet process provide a flexible approach to density estimation and clustering.
The document outlines a paper on Bayesian linear models. It introduces a simple example of a linear model with exchangeable priors. It then presents the general Bayesian linear model and theorems for the posterior distribution given multiple stages of priors. It applies this to an experimental design setting, deriving Bayes estimates that shrink treatment and block effects towards zero based on their variances.
This document discusses different approaches to Bayesian analysis including objective, subjective, robust, frequentist, and quasi Bayesian analysis. It provides examples and discusses the advantages and disadvantages of each approach. Objective Bayesian analysis uses objective prior distributions designed to be minimally informative, while subjective Bayesian analysis aims to fully specify subjective priors but has challenges in practice. Robust Bayesian analysis considers classes of models and priors to provide interval estimates. Frequentist Bayesian analysis combines Bayesian and frequentist ideas, and quasi Bayesian analysis uses ad hoc priors. Computational techniques for Bayesian analysis include calculating integrals and posterior modes using Laplace approximation, Monte Carlo sampling, and MCMC methods.
Lewis Carroll wrote "Pillow Problems", a collection of 72 logic and probability puzzles, while lying in bed at night. Many had clever but flawed solutions due to Carroll's limited understanding of modern probability concepts. For example, in one problem about breaking rods, Carroll incorrectly assumed the probability of breaking at the middle was nonzero. Overall, "Pillow Problems" reflects the nascent state of English probability in Carroll's time and his personal difficulties with more rigorous concepts like continuous probabilities.
This document discusses different approaches to specifying prior distributions in Bayesian statistics. It begins by introducing the binomial model for coin tossing and how priors and posteriors are calculated. It then describes three categories of Bayesian priors: classical Bayesians use a flat prior, modern parametric Bayesians use a Beta distribution prior, and subjective Bayesians quantify existing knowledge about a process. The document shows that different priors lead to different posteriors. It further explains that any prior density can be approximated by mixtures of Beta densities, and extends this concept to the exponential family. The exponential family conjugate prior is also discussed. Finally, connections are made between the exponential family, Beta density priors, and a generalization about conditional expected posteriors.
This document discusses the connection between Ockham's Razor and Bayesian analysis. It explains that Ockham's Razor favors the simplest hypothesis consistent with the data, and Bayesian analysis can help determine how much a simpler model should be preferred. It provides Galileo's problem of developing the law of falling bodies as an example. Jeffrey and Wrinch suggested using prior probabilities to represent simplicity, with the hypothesis having fewer parameters being assigned a higher prior probability. However, defining simplicity based solely on prior probabilities is problematic. Alternatively, a simpler hypothesis that makes precise predictions should be given greater credence if those predictions are confirmed. The key idea linking Bayesian analysis and Ockham's Razor is how simplicity in a hypothesis is represented and
This document discusses mixing R source code and documentation in LaTeX documents using knitr. It recommends using knitr in RStudio to embed R code chunks and output (like graphs and tables) in LaTeX documents. Code chunks can include any R code to evaluate, show, or hide. Graphs and tables from R code chunks will be included in the LaTeX output.
This document introduces a dependent Dirichlet process (DDP) model that allows the cluster weights and locations to vary based on a covariate x. It defines a measure of dependence between data points based on x, and derives a Polya urn-style predictive rule. It then presents a novel DDP construction based on simulating gamma random variables, which allows for easy posterior computation. This model generalizes previous dependent DP work and can handle multidimensional covariates.
Bayesian adaptive optimal estimation using a sieve priorJulyan Arbel
This document presents results on Bayesian optimal adaptive estimation using a sieve prior. It derives posterior concentration rates and risk convergence rates for models that accommodate a sieve prior. For the Gaussian white noise model, it shows the rates are adaptive optimal under global loss but a lower bound on the rate is obtained under pointwise loss, indicating the sieve prior is not optimal. Further work on posterior concentration rates under pointwise loss is suggested.
The document describes Positron Emission Tomography (PET), a molecular imaging technique. PET involves injecting a radioactively tagged molecule into a patient and using detectors to record coincident photon pairs from positron annihilation. This data is used to reconstruct the 3D or 4D activity distribution in the patient. Traditional PET reconstruction uses voxel-based discretization and parametric models, which have limitations. As an alternative, the document proposes a nonparametric Bayesian model that places a Dirichlet process prior directly on a random probability measure, avoiding discretization and allowing inference on the continuous activity distribution.
MRS PUNE 2024 - WINNER AMRUTHAA UTTAM JAGDHANEDK PAGEANT
Amruthaa Uttam Jagdhane, a stunning woman from Pune, has won the esteemed title of Mrs. India 2024, which is given out by the Dk Exhibition. Her journey to this prestigious accomplishment is a confirmation of her faithful assurance, extraordinary gifts, and profound commitment to enabling women.
Amid the constant barrage of distractions and dwindling motivation, self-discipline emerges as the unwavering beacon that guides individuals toward triumph. This vital quality serves as the key to unlocking one’s true potential, whether the aspiration is to attain personal goals, ascend the career ladder, or refine everyday habits.
Understanding Self-Discipline
The Fascinating World of Bats: Unveiling the Secrets of the Nightthomasard1122
The Fascinating World of Bats: Unveiling the Secrets of the Night
Bats, the mysterious creatures of the night, have long been a source of fascination and fear for humans. With their eerie squeaks and fluttering wings, they have captured our imagination and sparked our curiosity. Yet, beyond the myths and legends, bats are fascinating creatures that play a vital role in our ecosystem.
There are over 1,300 species of bats, ranging from the tiny Kitti's hog-nosed bat to the majestic flying foxes. These winged mammals are found in almost every corner of the globe, from the scorching deserts to the lush rainforests. Their diversity is a testament to their adaptability and resilience.
Bats are insectivores, feeding on a vast array of insects, from mosquitoes to beetles. A single bat can consume up to 1,200 insects in an hour, making them a crucial part of our pest control system. By preying on insects that damage crops, bats save the agricultural industry billions of dollars each year.
But bats are not just useful; they are also fascinating creatures. Their ability to fly in complete darkness, using echolocation to navigate and hunt, is a remarkable feat of evolution. They are also social animals, living in colonies and communicating with each other through a complex system of calls and body language.
Despite their importance, bats face numerous threats, from habitat destruction to climate change. Many species are endangered, and conservation efforts are necessary to protect these magnificent creatures.
In conclusion, bats are more than just creatures of the night; they are a vital part of our ecosystem, playing a crucial role in maintaining the balance of nature. By learning more about these fascinating animals, we can appreciate their importance and work to protect them for generations to come. So, let us embrace the beauty and mystery of bats, and celebrate their unique place in our world.
Insanony: Watch Instagram Stories Secretly - A Complete GuideTrending Blogers
Welcome to the world of social media, where Instagram reigns supreme! Today, we're going to explore a fascinating tool called Insanony that lets you watch Instagram Stories secretly. If you've ever wanted to view someone's story without them knowing, this blog is for you. We'll delve into everything you need to know about Insanony with Trending Blogers!
Care Instructions for Activewear & Swim Suits.pdfsundazesurf80
SunDaze Surf offers top swimwear tips: choose high-quality, UV-protective fabrics to shield your skin. Opt for secure fits that withstand waves and active movement. Bright colors enhance visibility, while adjustable straps ensure comfort. Prioritize styles with good support, like racerbacks or underwire tops, for active beach days. Always rinse swimwear after use to maintain fabric integrity.
Biography and career history of Bruno AmezcuaBruno Amezcua
Bruno Amezcua's entry into the film and visual arts world seemed predestined. His grandfather, a distinguished film editor from the 1950s through the 1970s, profoundly influenced him. This familial mentorship early on exposed him to the nuances of film production and a broad array of fine arts, igniting a lifelong passion for narrative creation. Over 15 years, Bruno has engaged in diverse projects showcasing his dedication to the arts.
At Affordable Garage Door Repair, we specialize in both residential and commercial garage door services, ensuring your property is secure and your doors are running smoothly.
Types of Garage Doors Explained: Energy Efficiency, Style, and More
Arbel oviedo
1. Dependent Dirichlet processes
and application to ecological data
Julyan Arbel
Joint work with Kerrie Mengersen & Judith Rousseau
´
CREST-INSEE, Universite Paris-Dauphine
2 December 2012
ERCIM 2012
5th International Conference on
Computing & Statistics
2. Biology question
Nonparametric model
Outline
1 Biology question
Introduction
Data
2 Nonparametric model
Dirichlet process
Dependent Dirichlet process
Julyan Arbel DDP and ecological data
3. Biology question Introduction
Nonparametric model Data
Outline
1 Biology question
Introduction
Data
2 Nonparametric model
Dirichlet process
Dependent Dirichlet process
Julyan Arbel DDP and ecological data
4. Biology question Introduction
Nonparametric model Data
Biology introduction
Series of measurements at
different places around
Casey Station, permanent
base in Antarctica
At each site: pollution
level, and abundance of
microbes called OTUs.
Assess the impact of a
pollutant on the soil
composition / biodiversity
Julyan Arbel DDP and ecological data
5. Biology question Introduction
Nonparametric model Data
Data
Data consist of measurements of microbes abundance:
Julyan Arbel DDP and ecological data
6. Biology question Introduction
Nonparametric model Data
Data
Data consist of measurements of microbes abundance:
Site TPH 06251 00576 00429 06360 08793 06259 05164 00772
Sample of abundance of 8 microbes (columns) at 6 sites
(rows)
Main covariate is a pollution level called TPH, denoted x
7. Biology question Introduction
Nonparametric model Data
Data
Data consist of measurements of microbes abundance:
Site TPH 06251 00576 00429 06360 08793 06259 05164 00772
1 80 3 724 88 1 0 0 0 467
2 80 9 2364 252 0 0 2 0 616
3 80 12 443 1655 11 0 0 0 168
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
. . . . . . . . . .
Sample of abundance of 8 microbes (columns) at 6 sites
(rows)
Main covariate is a pollution level called TPH, denoted x
8. Biology question Introduction
Nonparametric model Data
Data
Data consist of measurements of microbes abundance:
Site TPH 06251 00576 00429 06360 08793 06259 05164 00772
1 80 3 724 88 1 0 0 0 467
2 80 9 2364 252 0 0 2 0 616
3 80 12 443 1655 11 0 0 0 168
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
. . . . . . . . . .
13 2600 2262 339 229 1100 537 352 0 0
20 10000 1883 23 18 879 224 325 9 1
24 22000 1446 2 27 920 1808 1456 0 0
Sample of abundance of 8 microbes (columns) at 6 sites
(rows)
Main covariate is a pollution level called TPH, denoted x
Julyan Arbel DDP and ecological data
9. Biology question Introduction
Nonparametric model Data
Notations
Microbe species are denoted by j = 1, . . . by decreasing
total abundance
Julyan Arbel DDP and ecological data
10. Biology question Introduction
Nonparametric model Data
Notations
Microbe species are denoted by j = 1, . . . by decreasing
total abundance
At each site x, there are N(x) microbes, denoted Yi (x),
i = 1, . . . , N(x).
Julyan Arbel DDP and ecological data
11. Biology question Introduction
Nonparametric model Data
Notations
Microbe species are denoted by j = 1, . . . by decreasing
total abundance
At each site x, there are N(x) microbes, denoted Yi (x),
i = 1, . . . , N(x).
Data are a frequency matrix:
Site TPH 06251 00576 ...
j =1 j ...
1 x = 80 #(Yn (x = 80) = 1) = 3 ... ...
.
. .
. .
. .
. .
.
. . . . .
k x ... #(Yn (x) = j) ...
Julyan Arbel DDP and ecological data
12. Biology question Introduction
Nonparametric model Data
Notations
A standard example of diversity is Shannon diversity, taken as
the exponential of Shannon entropy
#(Yn (x)=j)
D(x) = exp j −pj (x) log pj (x) with pj (x) = N(x)
Julyan Arbel DDP and ecological data
13. Biology question Introduction
Nonparametric model Data
Notations
A standard example of diversity is Shannon diversity, taken as
the exponential of Shannon entropy
#(Yn (x)=j)
D(x) = exp j −pj (x) log pj (x) with pj (x) = N(x)
40
3.5
Shannon diversity
Shannon entropy
30
3.0
20
2.5
0 5000 10000 20000 10 0 5000 10000 20000
tph tph
Figure: Left: Shannon entropy in row data. Right: Shannon diversity
in row data.
Julyan Arbel DDP and ecological data
14. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Outline
1 Biology question
Introduction
Data
2 Nonparametric model
Dirichlet process
Dependent Dirichlet process
Julyan Arbel DDP and ecological data
15. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
First model
Pavlovian conditioning associated with the word species leads
to the Dirichlet process and/or related processes.
Julyan Arbel DDP and ecological data
16. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
First model
Pavlovian conditioning associated with the word species leads
to the Dirichlet process and/or related processes.
Yi (x) | G ∼ G,
First, we run an ∞
independent model at G(·) = pj δj (·),
each site with TPH x j=1
(pj )j ∼ GEM(M).
Julyan Arbel DDP and ecological data
17. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
First model
Pavlovian conditioning associated with the word species leads
to the Dirichlet process and/or related processes.
Yi (x) | G ∼ G,
First, we run an ∞
independent model at G(·) = pj δj (·),
each site with TPH x j=1
(pj )j ∼ GEM(M).
The GEM(M) distribution is defined in [Pitman, 2002] (GEM
stands for Griffiths, Engen and McCloskey) and represents the
distribution of the weights in a Dirichlet process:
pj = Vj (1 − Vl ), Vj ∼ Beta(1, M).
l<j
Julyan Arbel DDP and ecological data
18. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Posterior sampling
We use a blocked Gibbs sampler (truncated version of the
infinite sum)
Julyan Arbel DDP and ecological data
19. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Posterior sampling
We use a blocked Gibbs sampler (truncated version of the
infinite sum)
The prior on p is induced by the Beta prior on V ,
π⊥ (Vj ) = Be(1, M).
Julyan Arbel DDP and ecological data
20. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Posterior sampling
We use a blocked Gibbs sampler (truncated version of the
infinite sum)
The prior on p is induced by the Beta prior on V ,
π⊥ (Vj ) = Be(1, M).
This is conjugated, with a Beta posterior:
π(Vj |Y ) = Be(Vj |1 + #(Yn = j), M + #(Yn > j)).
Julyan Arbel DDP and ecological data
21. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
But we want to run a single model across TPH x ; it means a
predictor-dependent model
Julyan Arbel DDP and ecological data
22. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
But we want to run a single model across TPH x ; it means a
predictor-dependent model
Early references to predictor-dependent DP models include
Cifarelli and Regazzini [1978] and Muliere and Petrone
[1993]
Julyan Arbel DDP and ecological data
23. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
But we want to run a single model across TPH x ; it means a
predictor-dependent model
Early references to predictor-dependent DP models include
Cifarelli and Regazzini [1978] and Muliere and Petrone
[1993]
Increasing interest since MacEachern [1999,2000,2001]
Julyan Arbel DDP and ecological data
24. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
But we want to run a single model across TPH x ; it means a
predictor-dependent model
Early references to predictor-dependent DP models include
Cifarelli and Regazzini [1978] and Muliere and Petrone
[1993]
Increasing interest since MacEachern [1999,2000,2001]
Extensions with varying weights include, among others,
order-based DDP [Griffin and Steel, 2006], local DP [Chung
and Dunson, 2009], weighted mixtures of DP [Dunson and
Park, 2008], and kernel stick-breaking processes [Dunson
et al., 2007].
Julyan Arbel DDP and ecological data
25. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
Only interested in a dependence in the weights. We worked out
a dependent process prior with a simple structure of
dependence on the weights.
Julyan Arbel DDP and ecological data
26. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
Only interested in a dependence in the weights. We worked out
a dependent process prior with a simple structure of
dependence on the weights.
Yi (x) | G(x) ∼ G(x),
∞
G(x)(·) = pj (x)δj (·),
j=1
(pj (x))j ∼ DGEM(M),
Julyan Arbel DDP and ecological data
27. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
Only interested in a dependence in the weights. We worked out
a dependent process prior with a simple structure of
dependence on the weights.
Yi (x) | G(x) ∼ G(x),
∞
G(x)(·) = pj (x)δj (·), pj (x) = Vj (x) (1 − Vl (x)),
j=1 l<j
(pj (x))j ∼ DGEM(M), Vj (x) ∼ Beta(1, M).
where DGEM(M) stands for Dependent GEM distribution.
Julyan Arbel DDP and ecological data
28. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Second model
Only interested in a dependence in the weights. We worked out
a dependent process prior with a simple structure of
dependence on the weights.
Yi (x) | G(x) ∼ G(x),
∞
G(x)(·) = pj (x)δj (·), pj (x) = Vj (x) (1 − Vl (x)),
j=1 l<j
(pj (x))j ∼ DGEM(M), Vj (x) ∼ Beta(1, M).
where DGEM(M) stands for Dependent GEM distribution.
Want a process for each j, (Vj (x))x , which is marginally
Beta(1, M).
Julyan Arbel DDP and ecological data
29. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
Julyan Arbel DDP and ecological data
30. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
Γ(x1 )
V (x1 ) = Γ(x1 )+ΓM (x1 )
Julyan Arbel DDP and ecological data
31. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3
Julyan Arbel DDP and ecological data
32. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3
Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
ΓM (x1 ) = ΓM + ΓM + ΓM .
1 12 123
Julyan Arbel DDP and ecological data
33. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3
Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
Γ1 ∼ Ga(α1 ), . . . , Γ123 ∼ Ga(α123 ),
ΓM (x1 ) = ΓM + ΓM + ΓM .
1 12 123
ΓM
1
∼ Ga(α1 M), . . . , ΓM ∼ Ga(α123 M).
123
Julyan Arbel DDP and ecological data
34. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Process on the beta breaks,Vj (x)
Construction from Trippa, Muller and Johnson [2011].
¨
α2
Γ(x1 ) α1 α3
V (x1 ) = Γ(x1 )+ΓM (x1 ) α12
α23
α123
x1 x2 x3
Γ(x1 ) = Γ1 + Γ12 + Γ123 ,
Γ1 ∼ Ga(α1 ), . . . , Γ123 ∼ Ga(α123 ),
ΓM (x1 ) = ΓM + ΓM + ΓM .
1 12 123
ΓM
1
∼ Ga(α1 M), . . . , ΓM ∼ Ga(α123 M).
123
In the end:
pj (x) = Vj (x) l<j (1 − Vl (x)) ∼ DGEM(M).
Julyan Arbel DDP and ecological data
35. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Interesting features
This idea can be extended to large dimensional covariate
spaces:
α3
x3.
α123
α1 x1. x2.
α23
α12
α2
Easy to simulate in: only needs to simulate Gamma
random variables
Julyan Arbel DDP and ecological data
36. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Posterior sampling
There is independence across j, so it suffices to be able to
simulate in each posterior:
π(Vj | Y ) ∝ π(V j )L(Y | V j ),
∝ π(V j ) Vj (x)#(Yn (x)=j) (1 − Vj (x))#(Yn (x)>j) .
x
Quite uncommon situation: we can sample in the prior
π(V j ), but we cannot evaluate it. Reverse situation to
Approximate Bayesian computation (ABC), where the
likelihood is intractable, but can be sampled.
Julyan Arbel DDP and ecological data
37. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
A first solution is to use a Metropolis-Hastings algorithm:
Metropolis-Hastings Algorithm
1 Given a current value V j , sample a new one V ∗
j
independently in the prior π(V j ).
2 Acceptance probability is
L(Y |V ∗ )
j
ρ = min
L(Y |V ) .
1,
j
Julyan Arbel DDP and ecological data
38. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
A first solution is to use a Metropolis-Hastings algorithm:
Metropolis-Hastings Algorithm
1 Given a current value V j , sample a new one V ∗
j
independently in the prior π(V j ).
2 Acceptance probability is
L(Y |V ∗ )
j
ρ = min
L(Y |V ) .
1,
j
But it is not a good idea to propose in the prior.
Acceptance rate is low (around 1%).
Julyan Arbel DDP and ecological data
39. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
A better solution is to use Importance Sampling:
Importance Sampling
1 Sample iid values V j in the prior π(V j ).
2 Use a weighted sample by the importance weights defined
by the likelihood w(V j ) = L(Y |V j ).
Julyan Arbel DDP and ecological data
40. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
A better solution is to use Importance Sampling:
Importance Sampling
1 Sample iid values V j in the prior π(V j ).
2 Use a weighted sample by the importance weights defined
by the likelihood w(V j ) = L(Y |V j ).
iid sample instead of a Markov chain
better precision by a Rao-Blackwellisation argument
(weights instead of accept-reject)
Julyan Arbel DDP and ecological data
41. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Results
40
40
Posterior diversity
Diversity in data
30
30
20
20
10
10
0 5000 10000 20000 0 5000 10000 20000
tph tph
Figure: Left: dependent DP prior: posterior mean of the Shannon
diversity by TPH; 95% centred credible intervals. Right: Shannon
diversity in row data.
Julyan Arbel DDP and ecological data
42. Biology question Dirichlet process
Nonparametric model Dependent Dirichlet process
Conclusion
Such a model allows to give probabilistic answers to
questions about diversity as we get a posterior sample.
The use of Gaussian processes transformed to Beta
processes by the inverse CDF might fastened the posterior
computations.
Extension to handle other covariates.
Julyan Arbel DDP and ecological data