This is an introduction of Topic Modeling, including tf-idf, LSA, pLSA, LDA, EM, and some other related materials. I know there are definitely some mistakes, and you can correct them with your wisdom. Thank you~
Выступление Сергея Кольцова (НИУ ВШЭ) на International Conference on Big Data and its Applications (ICBDA).
ICBDA — конференция для предпринимателей и разработчиков о том, как эффективно решать бизнес-задачи с помощью анализа больших данных.
http://icbda2015.org/
Paper presentation for the final course Advanced Concept in Machine Learning.
The paper is @Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data"
http://jmlr.org/proceedings/papers/v32/chenf14.pdf
Deep neural methods have recently demonstrated significant performance improvements in several IR tasks. In this lecture, we will present a brief overview of deep models for ranking and retrieval.
This is a follow-up lecture to "Neural Learning to Rank" (https://www.slideshare.net/BhaskarMitra3/neural-learning-to-rank-231759858)
Neural Models for Information RetrievalBhaskar Mitra
In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such as language modelling and machine translation. This suggests that neural models will also yield significant performance improvements on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using semantic rather than lexical matching. IR tasks, however, are fundamentally different from NLP tasks leading to new challenges and opportunities for existing neural representation learning approaches for text.
We begin this talk with a discussion on text embedding spaces for modelling different types of relationships between items which makes them suitable for different IR tasks. Next, we present how topic-specific representations can be more effective than learning global embeddings. Finally, we conclude with an emphasis on dealing with rare terms and concepts for IR, and how embedding based approaches can be augmented with neural models for lexical matching for better retrieval performance. While our discussions are grounded in IR tasks, the findings and the insights covered during this talk should be generally applicable to other NLP and machine learning tasks.
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. In this lecture, we will cover some of the fundamentals of neural representation learning for text retrieval. We will also discuss some of the recent advances in the applications of deep neural architectures to retrieval tasks.
(These slides were presented at a lecture as part of the Information Retrieval and Data Mining course taught at UCL.)
In this natural language understanding (NLU) project, we implemented and compared various approaches for predicting the topics of paragraph-length texts. This paper explains our methodology and results for the following approaches: Naive Bayes, One-vs-Rest Support Vector Machine (OvR SVM) with GloVe vectors, Latent Dirichlet Allocation (LDA) with OvR SVM, Convolutional Neural Networks (CNN), and Long Short Term Memory networks (LSTM).
This report discusses three submissions based on the Duet architecture to the Deep Learning track at TREC 2019. For the document retrieval task, we adapt the Duet model to ingest a "multiple field" view of documents—we refer to the new architecture as Duet with Multiple Fields (DuetMF). A second submission combines the DuetMF model with other neural and traditional relevance estimators in a learning-to-rank framework and achieves improved performance over the DuetMF baseline. For the passage retrieval task, we submit a single run based on an ensemble of eight Duet models.
Neural Models for Information RetrievalBhaskar Mitra
In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such as language modelling and machine translation. This suggests that neural models may also yield significant performance improvements on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using semantic rather than lexical matching. IR tasks, however, are fundamentally different from NLP tasks leading to new challenges and opportunities for existing neural representation learning approaches for text.
In this talk, I will present my recent work on neural IR models. We begin with a discussion on learning good representations of text for retrieval. I will present visual intuitions about how different embeddings spaces capture different relationships between items, and their usefulness to different types of IR tasks. The second part of this talk is focused on the applications of deep neural architectures to the document ranking task.
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
Experimental work done regarding the use of Topic Modeling for the implementation and the improvement of some common tasks of Information Retrieval and Word Sense Disambiguation.
First of all it describes the scenario, the pre-processing pipeline realized and the framework used. After we we face a discussion related to the investigation of some different hyperparameters configurations for the LDA algorithm.
This work continues dealing with the retrieval of relevant documents mainly through two different approaches: inferring the topics distribution of the held out document (or query) and comparing it to retrieve similar collection’s documents or through an approach driven by probabilistic querying. The last part of this work is devoted to the investigation of the word sense disambiguation task.
Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text. We hypothesize that matching with distributed representations complements matching with traditional local representations, and that a combination of the two is favourable. We propose a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that matches the query and the document using learned distributed representations. The two networks are jointly trained as part of a single neural network. We show that this combination or ‘duet’ performs significantly better than either neural network individually on a Web page ranking task, and significantly outperforms traditional baselines and other recently proposed models based on neural networks.
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
Slides from my keynote talk at the Recherche d'Information SEmantique (RISE) workshop at CORIA-TALN 2018 conference in Rennes, France.
(Abstract)
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. Unlike classical IR models, these machine learning (ML) based approaches are data-hungry, requiring large scale training data before they can be deployed. Traditional learning to rank models employ supervised ML techniques—including neural networks—over hand-crafted IR features. By contrast, more recently proposed neural models learn representations of language from raw text that can bridge the gap between the query and the document vocabulary.
Neural IR is an emerging field and research publications in the area has been increasing in recent years. While the community explores new architectures and training regimes, a new set of challenges, opportunities, and design principles are emerging in the context of these new IR models. In this talk, I will share five lessons learned from my personal research in the area of neural IR. I will present a framework for discussing different unsupervised approaches to learning latent representations of text. I will cover several challenges to learning effective text representations for IR and discuss how latent space models should be combined with observed feature spaces for better retrieval performance. Finally, I will conclude with a few case studies that demonstrates the application of neural approaches to IR that go beyond text matching.
Latent Semantic Analysis (LSA) is a mathematical technique for computationally modeling the meaning of words and larger units of texts. LSA works by applying a mathematical technique called Singular Value Decomposition (SVD) to a term*document matrix containing frequency counts for all words found in the corpus in all of the documents or passages in the corpus. After this SVD application, the meaning of a word is represented as a vector in a multidimensional semantic space, which makes it possible to compare word meanings, for instance by computing the cosine between two word vectors.
LSA has been successfully used in a large variety of language related applications from automatic grading of student essays to predicting click trails in website navigation. In Coh-Metrix (Graesser et al. 2004), a computational tool that produces indices of the linguistic and discourse representations of a text, LSA was used as a measure of text cohesion by assuming that cohesion increases as a function of higher cosine scores between adjacent sentences.
Besides being interesting as a technique for building programs that need to deal with semantics, LSA is also interesting as a model of human cognition. LSA can match human performance on word association tasks and vocabulary test. In this talk, Fridolin will focus on LSA as a tool in modeling language acquisition. After framing the area of the talk with sketching the key concepts learning, information, and competence acquisition, and after outlining presuppositions, an introduction into meaningful interaction analysis (MIA) is given. MIA is a means to inspect learning with the support of language analysis that is geometrical in nature. MIA is a fusion of latent semantic analysis (LSA) combined with network analysis (NA/SNA). LSA, NA/SNA, and MIA are illustrated by several examples.
Выступление Сергея Кольцова (НИУ ВШЭ) на International Conference on Big Data and its Applications (ICBDA).
ICBDA — конференция для предпринимателей и разработчиков о том, как эффективно решать бизнес-задачи с помощью анализа больших данных.
http://icbda2015.org/
Paper presentation for the final course Advanced Concept in Machine Learning.
The paper is @Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data"
http://jmlr.org/proceedings/papers/v32/chenf14.pdf
Deep neural methods have recently demonstrated significant performance improvements in several IR tasks. In this lecture, we will present a brief overview of deep models for ranking and retrieval.
This is a follow-up lecture to "Neural Learning to Rank" (https://www.slideshare.net/BhaskarMitra3/neural-learning-to-rank-231759858)
Neural Models for Information RetrievalBhaskar Mitra
In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such as language modelling and machine translation. This suggests that neural models will also yield significant performance improvements on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using semantic rather than lexical matching. IR tasks, however, are fundamentally different from NLP tasks leading to new challenges and opportunities for existing neural representation learning approaches for text.
We begin this talk with a discussion on text embedding spaces for modelling different types of relationships between items which makes them suitable for different IR tasks. Next, we present how topic-specific representations can be more effective than learning global embeddings. Finally, we conclude with an emphasis on dealing with rare terms and concepts for IR, and how embedding based approaches can be augmented with neural models for lexical matching for better retrieval performance. While our discussions are grounded in IR tasks, the findings and the insights covered during this talk should be generally applicable to other NLP and machine learning tasks.
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. In this lecture, we will cover some of the fundamentals of neural representation learning for text retrieval. We will also discuss some of the recent advances in the applications of deep neural architectures to retrieval tasks.
(These slides were presented at a lecture as part of the Information Retrieval and Data Mining course taught at UCL.)
In this natural language understanding (NLU) project, we implemented and compared various approaches for predicting the topics of paragraph-length texts. This paper explains our methodology and results for the following approaches: Naive Bayes, One-vs-Rest Support Vector Machine (OvR SVM) with GloVe vectors, Latent Dirichlet Allocation (LDA) with OvR SVM, Convolutional Neural Networks (CNN), and Long Short Term Memory networks (LSTM).
This report discusses three submissions based on the Duet architecture to the Deep Learning track at TREC 2019. For the document retrieval task, we adapt the Duet model to ingest a "multiple field" view of documents—we refer to the new architecture as Duet with Multiple Fields (DuetMF). A second submission combines the DuetMF model with other neural and traditional relevance estimators in a learning-to-rank framework and achieves improved performance over the DuetMF baseline. For the passage retrieval task, we submit a single run based on an ensemble of eight Duet models.
Neural Models for Information RetrievalBhaskar Mitra
In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such as language modelling and machine translation. This suggests that neural models may also yield significant performance improvements on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using semantic rather than lexical matching. IR tasks, however, are fundamentally different from NLP tasks leading to new challenges and opportunities for existing neural representation learning approaches for text.
In this talk, I will present my recent work on neural IR models. We begin with a discussion on learning good representations of text for retrieval. I will present visual intuitions about how different embeddings spaces capture different relationships between items, and their usefulness to different types of IR tasks. The second part of this talk is focused on the applications of deep neural architectures to the document ranking task.
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
Experimental work done regarding the use of Topic Modeling for the implementation and the improvement of some common tasks of Information Retrieval and Word Sense Disambiguation.
First of all it describes the scenario, the pre-processing pipeline realized and the framework used. After we we face a discussion related to the investigation of some different hyperparameters configurations for the LDA algorithm.
This work continues dealing with the retrieval of relevant documents mainly through two different approaches: inferring the topics distribution of the held out document (or query) and comparing it to retrieve similar collection’s documents or through an approach driven by probabilistic querying. The last part of this work is devoted to the investigation of the word sense disambiguation task.
Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on the other hand, terms have discrete or local representations, and the relevance of a document is determined by the exact matches of query terms in the body text. We hypothesize that matching with distributed representations complements matching with traditional local representations, and that a combination of the two is favourable. We propose a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that matches the query and the document using learned distributed representations. The two networks are jointly trained as part of a single neural network. We show that this combination or ‘duet’ performs significantly better than either neural network individually on a Web page ranking task, and significantly outperforms traditional baselines and other recently proposed models based on neural networks.
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
Slides from my keynote talk at the Recherche d'Information SEmantique (RISE) workshop at CORIA-TALN 2018 conference in Rennes, France.
(Abstract)
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. Unlike classical IR models, these machine learning (ML) based approaches are data-hungry, requiring large scale training data before they can be deployed. Traditional learning to rank models employ supervised ML techniques—including neural networks—over hand-crafted IR features. By contrast, more recently proposed neural models learn representations of language from raw text that can bridge the gap between the query and the document vocabulary.
Neural IR is an emerging field and research publications in the area has been increasing in recent years. While the community explores new architectures and training regimes, a new set of challenges, opportunities, and design principles are emerging in the context of these new IR models. In this talk, I will share five lessons learned from my personal research in the area of neural IR. I will present a framework for discussing different unsupervised approaches to learning latent representations of text. I will cover several challenges to learning effective text representations for IR and discuss how latent space models should be combined with observed feature spaces for better retrieval performance. Finally, I will conclude with a few case studies that demonstrates the application of neural approaches to IR that go beyond text matching.
Latent Semantic Analysis (LSA) is a mathematical technique for computationally modeling the meaning of words and larger units of texts. LSA works by applying a mathematical technique called Singular Value Decomposition (SVD) to a term*document matrix containing frequency counts for all words found in the corpus in all of the documents or passages in the corpus. After this SVD application, the meaning of a word is represented as a vector in a multidimensional semantic space, which makes it possible to compare word meanings, for instance by computing the cosine between two word vectors.
LSA has been successfully used in a large variety of language related applications from automatic grading of student essays to predicting click trails in website navigation. In Coh-Metrix (Graesser et al. 2004), a computational tool that produces indices of the linguistic and discourse representations of a text, LSA was used as a measure of text cohesion by assuming that cohesion increases as a function of higher cosine scores between adjacent sentences.
Besides being interesting as a technique for building programs that need to deal with semantics, LSA is also interesting as a model of human cognition. LSA can match human performance on word association tasks and vocabulary test. In this talk, Fridolin will focus on LSA as a tool in modeling language acquisition. After framing the area of the talk with sketching the key concepts learning, information, and competence acquisition, and after outlining presuppositions, an introduction into meaningful interaction analysis (MIA) is given. MIA is a means to inspect learning with the support of language analysis that is geometrical in nature. MIA is a fusion of latent semantic analysis (LSA) combined with network analysis (NA/SNA). LSA, NA/SNA, and MIA are illustrated by several examples.
Presentation about Tree-LSTMs networks described in "Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks" by Kai Sheng Tai, Richard Socher, Christopher D. Manning
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
Detecting paraphrases using recursive autoencodersFeynman Liang
Presentation on deep learning applied to natural language processing, presented at University of Cambridge Machine Learning Group's Research and Communication Club 2-11-2015 meeting.
Defining the generative probabilistic topic model for text summarization that aims at extracting a small subset of sentences from the corpus with respect to some given query.
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an
audio or video recording that contains unknown amounts of speech from unknown speakers and unknown
number of speakers. Diarization has numerous applications in speech recognition, speaker identification,
and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization
problems, but providing exhaustive labeling for the training dataset can become costly in supervised
learning, while accuracy can be compromised when using unsupervised approaches. This paper presents a
novel approach to speaker diarization, which defines loosely labeled data and employs x-vector embedding
and a formalized approach for threshold searching with a given abstract similarity metric to cluster
temporal segments into unique user segments. The proposed algorithm uses concepts of graph theory,
matrix algebra, and genetic algorithm to formulate and solve the optimization problem. Additionally, the
algorithm is applied to English, Spanish, and Chinese audios, and the performance is evaluated using wellknown similarity metrics. The results demonstrate that the robustness of the proposed approach. The
findings of this research have significant implications for speech processing, speaker identification
including those with tonal differences. The proposed method offers a practical and efficient solution for
speaker diarization in real-world scenarios where there are labeling time and cost constraints.
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an
audio or video recording that contains unknown amounts of speech from unknown speakers and unknown
number of speakers. Diarization has numerous applications in speech recognition, speaker identification,
and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization
problems, but providing exhaustive labeling for the training dataset can become costly in supervised
learning, while accuracy can be compromised when using unsupervised approaches. This paper presents a
novel approach to speaker diarization, which defines loosely labeled data and employs x-vector embedding
and a formalized approach for threshold searching with a given abstract similarity metric to cluster
temporal segments into unique user segments. The proposed algorithm uses concepts of graph theory,
matrix algebra, and genetic algorithm to formulate and solve the optimization problem. Additionally, the
algorithm is applied to English, Spanish, and Chinese audios, and the performance is evaluated using wellknown similarity metrics. The results demonstrate that the robustness of the proposed approach. The
findings of this research have significant implications for speech processing, speaker identification
including those with tonal differences. The proposed method offers a practical and efficient solution for
speaker diarization in real-world scenarios where there are labeling time and cost constraints.
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an
audio or video recording that contains unknown amounts of speech from unknown speakers and unknown
number of speakers. Diarization has numerous applications in speech recognition, speaker identification,
and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization
problems, but providing exhaustive labeling for the training dataset can become costly in supervised
learning, while accuracy can be compromised when using unsupervised approaches. This paper presents a
novel approach to speaker diarization, which defines loosely labeled data and employs x-vector embedding
and a formalized approach for threshold searching with a given abstract similarity metric to cluster
temporal segments into unique user segments. The proposed algorithm uses concepts of graph theory,
matrix algebra, and genetic algorithm to formulate and solve the optimization problem. Additionally, the
algorithm is applied to English, Spanish, and Chinese audios, and the performance is evaluated using wellknown similarity metrics. The results demonstrate that the robustness of the proposed approach. The
findings of this research have significant implications for speech processing, speaker identification
including those with tonal differences. The proposed method offers a practical and efficient solution for
speaker diarization in real-world scenarios where there are labeling time and cost constraints
An Approach to Automated Learning of Conceptual Graphs from TextFulvio Rotella
Many document collections are private and accessible only by selected people. Especially in business realities, such collections need to be managed, and the use of an external taxonomic or ontological resource would be very useful. Unfortunately, very often domain-specific resources are not available, and the development of techniques that do not rely on external resources becomes essential.
Automated learning of conceptual graphs from restricted collections needs to be robust with respect to missing or partial knowledge, that does not allow to extract a full conceptual graph and only provides sparse fragments thereof. This work proposes a way to deal with these problems applying relational clustering and generalization methods. While clustering collects similar concepts, generalization provides additional nodes that can bridge separate pieces of the graph while expressing it at a higher level of abstraction. In this process, considering relational information allows a broader perspective in the similarity assessment for clustering, and ensures more flexible and understandable descriptions of the generalized concepts. The final conceptual graph can be used for better analyzing and understanding the collection, and for performing some kind of reasoning on it.
Recommender system slides for undergraduateYueshen Xu
Slides for undergraduate in IR class. Presented in Chinese
Mainly focus on the background, application, real case, idea, basic method of recommender systems
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
2. Outline
Basic Concepts
Application and Background
Famous Researchers
Language Model
Vector Space Model (VSM)
Term Frequency-Inverse Document Frequency (TF-IDF)
Latent Semantic Indexing (LSA)
Probabilistic Latent Semantic Indexing (pLSA)
Expectation-Maximization Algorithm (EM) & Maximum-
Likelihood Estimation (MLE)
6/11/2014 2 Middleware, CCNT, ZJU, Yueshen Xu
3. Outline
Latent Dirichlet Allocation (LDA)
Conjugate Prior
Possion Distribution
Variational Distribution and Variational Inference (VD
&VI)
Markov Chain Monte Carlo (MCMC)
Metropolis-Hastings Sampling (MH)
Gibbs Sampling and GS for LDA
Bayesian Theory v.s. Probability Theory
6/11/2014 3 Middleware, CCNT, ZJU, Yueshen Xu
4. Concepts
Latent Semantic Analysis
Topic Model
Text Mining
Natural Language Processing
Computational Linguistics
Information Retrieval
Dimension Reduction
Expectation-Maximization(EM)
6/11/2014 Middleware, CCNT, ZJU
Information Retrieval
Computational Linguistics
Natural Language Processing
LSA/Topic Model
Text Mining
LSA/Topic Model
Data Mining
Reduction
Dimension
Machine
Learning
EM
4
Machine
Translation
Aim:find the topic that a word or a document belongs to
Latent Factor Model
, Yueshen Xu
5. Application
LFM has been a fundamental technique in modern
search engine, recommender system, tag extraction,
blog clustering, twitter topic mining, news (text)
summarization, etc.
Search Engine
PageRank How important….this web page?
LFM How relevance….this web page?
LFM How relevance…the user’s query
vs. one document?
Recommender System
Opinion Extraction
Spam Detection
Tag Extraction
6/11/2014 5 Middleware, CCNT, ZJU
Text Summarization
Abstract Generation
Twitter Topic Mining
Text: Steven Jobs had left us for about two years…..the apple’s price will fall
down….
, Yueshen Xu
6. Famous Researcher
6/11/2014 6 Middleware, CCNT, ZJU
David Blei,
Princeton,
LDA
Chengxiang Zhai,
UIUC, Presidential
Early Career Award
W. Bruce Croft, UMA
Language Model
Bing Liu, UIC
Opinion Mining
John D. Lafferty,
CMU, CRF&IBM
Thomas Hofmann
Brown, pLSA
Andrew McCallum,
UMA, CRF&IBM
Susan Dumais,
Microsoft, LSI
, Yueshen Xu
7. Language Model
Unigram Language Model == Zero-order Markov Chain
Bigram Language Model == First-order Markov Chain
N-gram Language Model == (N-1)-order Markov Chain
Mixture-unigram Language Model
6/11/2014 Middleware, CCNT, ZJU
sw
i
i
MwpMwp )|()|(
Bag of Words(BoW)
No order, no grammar, only multiplicity
sw
ii
i
MwwpMwp )|()|( ,1
8
w
N
M
w
N
M
z
𝑝 𝒘 =
𝑧
𝑝(𝑧)
𝑛=1
𝑁
𝑝(𝑤 𝑛|𝑧)
, Yueshen Xu
8. 9
Vector Space Model
A document is represented as a vector of identifier
Identifier
Boolean: 0, 1
Term Count: How many times…
Term Frequency: How frequent…in this document
TF-IDF: How important…in the corpus most used
Relevance Ranking
First used in SMART(Gerard Salton, Cornell)
6/11/2014 Middleware, CCNT, ZJU
),,,(
),,,(
21
21
tqqq
tjjjj
wwwq
wwwd
Gerard Salton
Award(SIGIR)
qd
qd
j
j
cos
, Yueshen Xu
9. TF-IDF
Mixture language model
Linear combination of a certain distribution(Gaussian)
Better Performance
TF: Term Frequency
IDF: Inversed Document Frequency
TF-IDF
6/11/2014 Middleware, CCNT, ZJU
k kj
ij
ij
n
n
tf Term i, document j, count of i in j
)
|}:{|1
log(
dtDd
N
idf
i
i
N documents in the corpus
iijjij idftfDdtidftf ),,(
How important …in this document
How important …in this corpus
10, Yueshen Xu
10. Latent Semantic Indexing
Challenge
Compare document in the same concept space
Compare documents across languages
Synonymy, ex: buy - purchase, user - consumer
Polysemy, ex; book - book, draw - draw
Key Idea
Dimensionality reduction of word-document co-occurrence matrix
Construction of latent semantic space
6/11/2014 Middleware, CCNT, ZJU
Defects of VSM
Word Document
Word DocumentConcept
VSM
LSI
11, Yueshen Xu
Aspect
Topic
Latent
Factor
11. Singular Value Decomposition
LSI ~= SVD
U, V: orthogonal matrices
∑ :the diagonal matrix with the singular values of N
6/11/2014 Middleware, CCNT, ZJU12
T
VUN
U
t * m
Document
Terms
t * d
m* m m* d
N ∑U V
k < m || k <<mCount, Frequency, TF-IDF
t * m
Document
Terms
t * k
k* k m* d
U V N
word: Exchangeability
k < m || k <<m
k
, Yueshen Xu
12. Singular Value Decomposition
The K-largest singular values
Distinguish the variance between words and documents to a
greatest extent
Discarding the lowest dimensions
Reduce noise
Fill the matrix
Predict & Lower computational complexity
Enlarge the distinctiveness
Decomposition
Concept, semantic, topic (aspect)
6/11/2014 13 Middleware, CCNT, ZJU
(Probabilistic) Matrix Factorization/
Factorization Model: Analytic
solution of SVD
Unsupervised
Learning
, Yueshen Xu
13. Probabilistic Latent Semantic Indexing
pLSI Model
6/11/2014 14 Middleware, CCNT, ZJU
w1
w2
wN
z1
zK
z2
d1
d2
dM
…..
…..
…..
)(dp)|( dzp)|( zwp
Assumption
Pairs(d,w) are assumed to be
generated independently
Conditioned on z, w is generated
independently of d
Words in a document are
exchangeable
Documents are exchangeable
Latent topics z are independent
Generative Process/Model
ZzZz
zwpdzpdpdzwpdpdpdwpwdp )|()|()()|,()()()|(),(
Multinomial Distribution
Multinomial Distribution
One layer of ‘Deep
Neutral Network’
Global
Local
, Yueshen Xu
14. Probabilistic Latent Semantic Indexing
6/11/2014 15 Middleware, CCNT, ZJU
d z w
N
M
Zz
zwpdzpdwp )|()|()|(
Zz
ZzZz
zpzdpzwp
zdpzdwpzwdpdwp
)()|()|(
),(),|(),,(),(
d
z w
N
M
These are two ways to
formulate pLSA, which are
equivalent but lead to two
different inference processes
Equivalent in Bayes Rule
Probabilistic
Graph Model
d:Exchangeability
Directed Acyclic
Graph (DAG)
, Yueshen Xu
15. Expectation-Maximization
EM is a general algorithm for maximum-likelihood estimation
(MLE) where the data are ‘incomplete’ or contains latent
variables: pLSA, GMM, HMM…---Cross Domain
Deduction Process
θ:parameter to be estimated; θ0: initialize randomly; θn: the current
value; θn+1: the next value
6/11/2014 16 Middleware, CCNT, ZJU
)()(max1 nn
LL
),|(log)( XpL )|,(log)( HXpLc
Latent Variable
),|(log)(),|(log)|(log)|,(log)( XHpLXHpXpHXpLc
),|(
),|(
log)()()()(
XHp
XHp
LLLL
n
n
cc
n
, Yueshen Xu
Objective:
16. Expectation-Maximization
6/11/2014 17 Middleware, CCNT, ZJU
),|(
),|(
log),|(
),|()(),|()()()(
XHp
XHp
XHp
XHpLXHpLLL
n
H
n
H
nn
c
H
n
c
n
K-L divergence: non-negative
Kullback-Leibler Divergence, or Relative Entropy
H
nn
c
H
nn
c XHpLLXHpLL ),|()()(),|()()(
Lower Bound
H
n
ccXHp
n
XHpLLEQ n ),|()()]([);( ),|(
Q-function
E-step (expectation): Compute Q;
M-step(maximization): Re-estimate θ by maximizing Q
Convergence
How is EM used in pLSA?
, Yueshen Xu
17. EM in pLSA
6/11/2014 18 Middleware, CCNT, ZJU
K
k
ikkjijk
N
i
M
j
ji
K
k
ikkj
N
i
M
j
jiijk
H
n
ccXHp
n
dzpzwpdwzpwdn
dzpzwpwdndwzp
XHpLLEQ n
11 1
1 1 1
),|(
))|()|(log(),|(),(
))|()|(log(),(),|(
),|()()]([);(
Posterior Random value in initialization
Likelyhood function
Constraints:
1.
2.
1)|(
1
M
j
kj
zwp
1)|(
1
K
k
jk dzp
Lagrange
Multiplier
M
i
K
k
iki
K
k
M
j
kjkc dzpzwpLEH
1 11 1
))|(1())|(1(][
Partial derivative=0
independent
variable
independent
variable
M
m
N
i
imkim
N
i
ijkij
kj
dwzpdwn
dwzpdwn
zwp
1 1
1
),|(),(
),|(),(
)|(
)(
),|(),(
)|(
1
i
M
j
ijkij
ik
dn
dwzpdwn
dzp
M-Step
E-Step
K
l
illj
ikkj
K
l
illji
iikkj
ijk
dzpzwp
dzpzwp
dzpzwpdp
dpdzpzwp
dwzp
1
1
)|()|(
)|()|(
)|()|()(
)()|()|(
),|(
Associative
Law &
Distributive
Law
, Yueshen Xu
𝑙𝑜𝑔 𝑝(𝑤|𝑑) 𝑛(𝑑,𝑤)
18. Bayesian Theory v.s.
Probability Theory
Bayesian Theory v.s. Probability Theory
Estimate 𝜃 through posterior v.s. Estimate 𝜃 through the
maximization of likelihood
Bayesian theory prior v.s. Probability theory statistic
When the number of samples → ∞, Bayesian theory == Probability
theory
Parameter Estimation
𝑝 𝜃 𝐷 ∝ 𝑝 𝐷 𝜃 𝑝 𝜃 𝑝 𝜃 ? Conjugate Prior likelihood is
helpful, but its function is limited Otherwise?
6/11/2014 19 Middleware, CCNT, ZJU
Non-parametric Bayesian Methods (Complicated)
Kernel methods: I just know a little...
VSM CF MF pLSA LDA Non-parametric Bayesian
Deep Learning
, Yueshen Xu
19. Latent Dirichlet Allocation
Latent Dirichlet Allocation (LDA)
David M. Blei, Andrew Y. Ng, Michael I. Jordan
Journal of Machine Learning Research,2003, cited > 3000
Hierarchical Bayesian model; Bayesian pLSI
6/11/2014 20 Middleware, CCNT, ZJU
θ z w
N
M
α
β
Iterative times
Generative Process of a document d in a
corpus according to LDA
Choose N ~ Poisson(𝜉); Why?
For each document d={𝑤1, 𝑤2 … 𝑤 𝑛}
Choose 𝜃 ~𝐷𝑖𝑟(𝛼); Why?
For each of the N words 𝑤 𝑛 in d:
a) Choose a topic 𝑧 𝑛~𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑛𝑎𝑙 𝜃
Why?
b) Choose a word 𝑤 𝑛 from 𝑝 𝑤 𝑛 𝑧 𝑛, 𝛽 ,
a multinomial probability conditioned on 𝑧 𝑛
Why
ACM-Infosys
Awards
, Yueshen Xu
20. Latent Dirichlet Allocation
LDA(Cont.)
6/11/2014 21 Middleware, CCNT, ZJU
θ z w
N
Mα
𝜑
β
K
β
Generative Process of a document d in LDA
Choose N ~ Poisson(𝜉); Not important
For each document d={𝑤1, 𝑤2 … 𝑤 𝑛}
Choose 𝜃 ~𝐷𝑖𝑟(𝛼);𝜃 = 𝜃1, 𝜃2 … 𝜃 𝐾 , 𝜃 = 𝐾 ,
K is fixed, 1
𝐾
𝜃 = 1, 𝐷𝑖𝑟~𝑀𝑢𝑙𝑡𝑖 →𝐶𝑜𝑛𝑗𝑢𝑔𝑎𝑡𝑒
𝑃𝑟𝑖𝑜𝑟
For each of the N words 𝑤 𝑛 in d:
a) Choose a topic 𝑧 𝑛~𝑀𝑢𝑙𝑡𝑖𝑛𝑜𝑚𝑖𝑛𝑎𝑙 𝜃
b) Choose a word 𝑤 𝑛 from 𝑝 𝑤 𝑛 𝑧 𝑛, 𝛽 ,
a multinomial probability conditioned on
𝑧 𝑛 one word one topic
one document multi-topics
𝜃 = 𝜃1, 𝜃2 … 𝜃 𝐾
z= 𝑧1, 𝑧2 … 𝑧 𝐾
For each word 𝑤 𝑛there is a 𝑧 𝑛
pLSA: the number of p(z|d) is linear
to the number of documents
overfitting
Regularization
M+K Dirichlet-Multinomial
, Yueshen Xu
22. Conjugate Prior &
Distributions
Conjugate Prior:
If the posterior p(θ|x) are in the same family as the p(θ), the prior
and posterior are called conjugate distributions, and the prior is
called a conjugate prior of the likelihood p(x|θ) : p(θ|x) ∝ p(x|θ)p(θ)
Distributions
Binomial Distribution ←→ Beta Distribution
Multinomial Distribution ←→ Dirichlet Distribution
Binomial & Beta Distribution
Binomial Bin(m|N,θ)=C(m,N)θm(1-θ)N-m :likelihood
C(m,N)=N!/(N-m)!m!
Beta(θ|a,b)
6/11/2014 23 Middleware, CCNT, ZJU
11-
)1(
)()(
)(
ba
ba
ba
0
1
)( dteta ta
Why do prior and
posterior need to be
conjugate distributions?
, Yueshen Xu
23. Conjugate Prior &
Distributions
6/11/2014 24 Middleware, CCNT, ZJU
11-
)1(
)()(
)(
)1(),(),,,|(
ba
lm
ba
ba
lmmCbalmp
11-
)1(
)()(
)(
),,,|(
blam
blam
blam
balmp
Beta Distribution!
Parameter Estimation
Multinomial & Dirichlet Distribution
x/ 𝑥 is a multivariate, ex, 𝑥 = (0,0,1,0,0,0): event of 𝑥3 happens
The probabilistic distribution of 𝑥 in only one event : 𝑝 𝑥 𝜃
= 𝑘=1
𝐾
𝜃 𝑘
𝑥 𝑘
, 𝜃 = (𝜃1, 𝜃2 … , 𝜃 𝑘)
, Yueshen Xu
24. Conjugate Prior &
Distributions
Multinomial & Dirichlet Distribution (Cont.)
Mult(𝑚1, 𝑚2, … , 𝑚 𝐾|𝜽, 𝑁)=
𝑁!
𝑚1!𝑚2!…𝑚 𝐾!
𝐶 𝑁
𝑚1
𝐶 𝑁−𝑚1
𝑚2
𝐶 𝑁−𝑚1−𝑚2
𝑚3
…
𝐶 𝑁− 𝑘=1
𝐾−1
𝑚 𝑘
𝑚 𝐾
𝑘=1
𝐾
𝜃 𝑘
𝑥 𝑘
: the likelihood function of 𝜃
6/11/2014 25 Middleware, CCNT, ZJU
Mult: The exact probabilistic distribution of 𝑝 𝑧 𝑘 𝑑𝑗 and 𝑝 𝑤𝑗 𝑧 𝑘
In Bayesian theory, we need to find a conjugate prior of 𝜃 for
Mult, where 0 < 𝜃 < 1, 𝑘=1
𝐾
𝜃 𝑘 = 1
Dirichlet Distribution
𝐷𝑖𝑟 𝜃 𝜶 =
Γ(𝛼0)
Γ 𝛼1 … Γ 𝛼 𝐾
𝑘=1
𝐾
𝜃 𝑘
𝛼 𝑘−1
a vector
Hyper-parameter: parameter in
probabilistic distribution function (pdf)
, Yueshen Xu
26. Poisson Distribution
Why Poisson distribution?
The number of births per hour during a given day; the number of
particles emitted by a radioactive source in a given time; the number
of cases of a disease in different towns
For Bin(n,p), when n is large, and p is small p(X=k)≈
𝜉 𝑘 𝑒−𝜉
𝑘!
, 𝜉 ≈ 𝑛𝑝
𝐺𝑎𝑚𝑚𝑎 𝑥 𝛼 =
𝑥 𝛼−1 𝑒−𝑥
Γ(𝛼)
𝐺𝑎𝑚𝑚𝑎 𝑥 𝛼 = 𝑘 + 1 =
𝑥 𝑘 𝑒−𝑥
𝑘!
(Γ 𝑘 + 1 = 𝑘!)
(Poisson discrete; Gamma continuous)
6/11/2014 27 Middleware, CCNT, ZJU
Poisson Distribution
𝑝 𝑘|𝜉 =
𝜉 𝑘 𝑒−𝜉
𝑘!
Many experimental situations occur in which we observe the
counts of events within a set unit of time, area, volume, length .etc
, Yueshen Xu
28. Solution for LDA
6/11/2014 29 Middleware, CCNT, ZJU
The most significant generative model in Machine Learning Community in the
recent ten years
𝑝 𝒘 𝛼, 𝛽 =
Γ( 𝑖 𝛼𝑖)
𝑖 Γ(𝛼𝑖)
𝑖=1
𝑘
𝜃𝑖
𝛼 𝑖−1
𝑛=1
𝑁
𝑖=1
𝑘
𝑗=1
𝑉
(𝜃𝑖 𝛽𝑖𝑗) 𝑤 𝑛
𝑗
𝑑𝜃
p 𝒘 𝛼, 𝛽 = 𝑝(𝜃|𝛼)
𝑛=1
𝑁
𝑧 𝑛
𝑝 𝑧 𝑛 𝜃 𝑝(𝑤 𝑛|𝑧 𝑛, 𝛽) 𝑑𝜃
Rewrite in terms of
model parameters
𝛼 = 𝛼1, 𝛼2, … 𝛼 𝐾 ; 𝛽 ∈ 𝑅 𝐾×𝑉:What we need to solve out
Variational Inference Gibbs Sampling
Deterministic Inference Stochastic Inference
Why variational inference?Simplify the dependency structure
Why sampling? Approximate the
statistical properties of the population
with those of samples’
, Yueshen Xu
29. Variational Inference
Variational Inference (Inference through a variational
distribution), VI
VI aims to use an approximating distribution that has a simpler
dependency structure than that of the exact posterior distribution
6/11/2014 30 Middleware, CCNT, ZJU
𝑃(𝐻|𝐷) ≈ 𝑄(𝐻)
true posterior distribution
variational distribution
Dissimilarity between
P and Q?
Kullback-Leibler
Divergence
𝐾𝐿(𝑄| 𝑃 = 𝑄 𝐻 𝑙𝑜𝑔
𝑄 𝐻 𝑃 𝐷
𝑃 𝐻, 𝐷
𝑑𝐻
= 𝑄 𝐻 𝑙𝑜𝑔
𝑄 𝐻
𝑃 𝐻, 𝐷
𝑑𝐻 + 𝑙𝑜𝑔𝑃(𝐷)
𝐿
𝑑𝑒𝑓
𝑄 𝐻 𝑙𝑜𝑔𝑃 𝐻, 𝐷 𝑑𝐻 − 𝑄 𝐻 𝑙𝑜𝑔𝑄 𝐻 𝑑𝐻 =< 𝑙𝑜𝑔𝑃(𝐻, 𝐷) >Q(H) +ℍ 𝑄
Entropy of Q
, Yueshen Xu
33. Variational Inference
You can refer to more in the original paper.
Variational EM Algorithm
Aim: (𝛼
∗
, 𝛽
∗
)=arg max 𝑑=1
𝑀
𝑝 𝒘|𝛼, 𝛽
Initialize 𝛼, 𝛽
E-Step: compute 𝛼, 𝛽 through variational inference for likelihood
approximation
M-Step: Maximize the likelihood according to 𝛼, 𝛽
End until convergence
6/11/2014 34 Middleware, CCNT, ZJU, Yueshen Xu
36. Markov Chain Monte Carlo
MCMC Sampling
We should construct the relationship between 𝜋(𝑥) and MC
transition process Detailed Balance Condition
In a common MC, if for 𝝅 𝒙 , 𝑃 𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛 𝑚𝑎𝑡𝑟𝑖𝑥 , 𝜋 𝑖 𝑃𝑖𝑗 = 𝜋(j)
𝑃𝑗𝑖, 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖, 𝑗 𝜋(𝑥) is the stationary distribution of this MC
Prove: 𝑖=1
∞
𝜋 𝑖 𝑃𝑖𝑗 = 𝑖=1
∞
𝜋 𝑗 𝑃𝑗𝑖 = 𝜋 𝑗 −→ 𝜋𝑃 = 𝜋𝜋 is the
solution of the equation 𝜋𝑃 = 𝜋 Done
For a common MC(q(i,j), q(j|i), q(ij)), and for any probabilistic
distribution p(x) (the dimension of x is arbitrary) Transformation
6/11/2014 37 Middleware, CCNT, ZJU
𝑝 𝑖 𝑞 𝑖, 𝑗 𝛼 𝑖, 𝑗 = 𝑝 𝑗 𝑞(𝑗, 𝑖)𝛼(𝑗, 𝑖)
Q’(i,j) Q’(j,i)
𝛼 𝑖, 𝑗 = 𝑝 𝑗 𝑞(𝑗, 𝑖),𝛼 𝑗, 𝑖 = 𝑝 𝑖 𝑞(𝑗, 𝑖),
necessary condition
, Yueshen Xu
37. Markov Chain Monte Carlo
MCMC Sampling(cont.)
Step1: Initialize: 𝑋0 = 𝑥0
Step2: for t = 0, 1, 2, …
𝑋𝑡 = 𝑥𝑡, 𝑠𝑎𝑚𝑝𝑙𝑒 𝑦 𝑓𝑟𝑜𝑚 𝑞(𝑥|𝑥𝑡) (𝑦 ∈ 𝐷𝑜𝑚𝑎𝑖𝑛 𝑜𝑓 𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛)
sample u from Uniform[0,1]
If 𝑢 < 𝛼 𝑥𝑡, 𝑦 = 𝑝 𝑦 𝑞 𝑥𝑡 𝑦 ⇒ 𝑥𝑡 → 𝑦, Xt+1 = y
else Xt+1 = xt
6/11/2014 38 Middleware, CCNT, ZJU
Metropolis-Hastings Sampling
Step1: Initialize: 𝑋0 = 𝑥0
Step2: for t = 0, 1, 2, …n, n+1, n+2…
𝑋𝑡 = 𝑥𝑡, 𝑠𝑎𝑚𝑝𝑙𝑒 𝑦 𝑓𝑟𝑜𝑚 𝑞 𝑥 𝑥𝑡 𝑦 ∈ 𝐷𝑜𝑚𝑎𝑖𝑛 𝑜𝑓 𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖on
Burn-in Period
Convergence
, Yueshen Xu