SlideShare a Scribd company logo
1 of 3
Download to read offline
PhD’s Research Proposal
Title: Bayesian Inference for Big Data with Stochastic MCMC and Variational Bayesian
Author: Komlan ATITEY
Abstract—This PhD Research Proposal discusses the research project that the author will work
on for his PhD dissertation. We consider the problem of big data for multi-target tracking in the
presence of an unknown number of targets. This research provides a Bayesian framework for
making Big Data inferences, based on conceptualized transformation, sampling and censoring
processes applied to the Big Data measurements. Proper inference will require modeling of all
processes, which can be very complex, if at all possible. However, where certain sampling and
censoring ignorability conditions are fulfilled, inference can be made on the Big Data
measurements as if they are acquired from a random sample.
BACKGROUND
MULTITARGET tracking has a long history spanning over 50 years and it refers to the problem
of jointly estimating the number of targets and their states from sensor data. Today, multitarget
tracking has found applications in diverse disciplines, including, air traffic control, intelligence,
surveillance, and reconnaissance (ISR), space applications, oceanography, autonomous vehicles
and robotics, remote sensing, computer vision, and biomedical research. During the last decade,
advances in multitarget tracking techniques, along with sensing and computing technologies,
have opened up numerous research venues as well as application areas. As the statistical models
used to comprehend complex systems grow, the strategies used to fit these models must scale
accordingly. While progressed computational strategies are being created to fit these complex
models, their velocity and memory requirements regularly request colossal computational force
via large clusters. This approach of relying on big data and high dimensional systems are rapidly
getting to be unsustainable, especially for professionals for whom these assets are not accessible.
Accordingly, there is a substantial and developing requirement for statistically efficient strategies
which scale in terms of speed and memory while being straightforward to implement and
communicate.
PROBLEM STATEMENT
In the field of multiple target tracking with the advances of sensor technology, it is possible to
collect large amount of real time observation data from real systems during simulations.
Inaccurate simulation results are often inevitable due to imperfect model and inaccurate inputs.
Bayesian analysis is a standout amongst the best group of methods for analyzing information
(data), and one now widely adopted in the statistical sciences as well as in Artificial intelligence
(AI) technologies like machine learning. The Bayesian approach offers various alluring points of
interest over different techniques: adaptability in constructing complex models from simple parts;
completely coherent inferences from data; natural incorporation of prior knowledge; explicit
modeling assumptions; exact thinking of vulnerabilities over model request and parameters; and
assurance against overfitting.
On the other hand, there is a general perception that Bayesian approach can be too slow to be
practically useful on big data sets. This is because exact Bayesian computations are typically
intractable, so a range of more practical approximate algorithms are needed, including
Variational approximations, sequential Monte Carlo (SMC) and Markov Chain Monte Carlo
(MCMC). Unfortunately, MCMC methods do not scale well to big data sets, since they require
many iterations to reduce Monte Carlo noise, and each iteration already involves an expensive
sweep through the whole data set.
PREVIOUS RESEARCH
For such big data’s problems, Scott et al. [1] has argue that the communication between large
numbers of machines is expensive (regardless of the amount of data being communicated), so
there is a need for algorithms that perform distributed approximate Bayesian analyses with
minimal communication. The paper by Mihaylova et al. [2] presents the various aspects of the
problems of group and extended object tracking, underlying difficulties, and the key factors
facilitating their solution in the context of Bayesian estimation. They have presented methods for
small groups and for large groups including MCMC methods, the random matrices approach and
Random Finite Set Statistics methods. MCMC methods arguably form the most popular class of
Bayesian computational techniques, due to their flexibility, general applicability and asymptotic
exactness. The work by Korattikara et al. [3] enlightened MCMC methods and showed the need
to develop an approximation related to the Metropolis-Hastings algorithm for Bayesian posterior
sampling. Next, the paper by Gelman et al. [4] considered the expectation propagation (EP) as a
prototype for scale algorithms that partition big data sets into many parts and analyze each part in
parallel to perform inference of shared parameters. Furthermore, EP iteratively approximates the
moments of the titled distributions and incorporates those approximations into a global posterior
approximation.
APPROACH TO PROBLEM
Usually, taking more data into account and considering high dimensional systems improve a
model's performance. In this project we propose to develop the theoretical foundations for a new
class of MCMC inference strategies that can scale to billions of data items, in this way opening
the qualities of Bayesian methods for big data. The essential thought is to utilize a small subset
of the information (data) during each parameter update iteration of the algorithm, so that many
iterations can be performed easily.
Our proposal is to lay the mathematical foundations for understanding the theoretical
properties of such stochastic MCMC algorithms, and to build on these foundations to develop
more sophisticated algorithms. We aim to comprehend the conditions under which the algorithm
is ensured to converge, and the sort and speed of convergence. Using this understanding, we
intend to develop algorithmic extensions and generalizations with better convergence properties,
including preconditioning, Sequential Monte Carlo methods, Online Bayesian learning methods,
and approximate methods such as Variational Bayesian with large step sizes. These algorithms
will be empirically validated on real world problems, including large scale data analysis
problems for text processing and collaborative filtering.
RESEARCH PLAN
The plan for this research project is the following:
1) Review the extant literature on potential function methods and swarms. This will
include a review of the previous work done on big data problems.
2) Develop theoretical foundations for a new class of MCMC inference strategies that can
scale to big data items, in this way opening the qualities of Bayesian methods for big
data.
3) Make a review to understand the conditions under which the algorithm converge and
build on these foundations to develop more sophisticated algorithms.
4) The performance of our algorithms will be based on the use of Sequential Monte Carlo
methods, Online Bayesian learning methods, and approximate methods such as
Variational Bayesian.
5) If possible, the performance of our algorithms will be proved in real experiences to test
and verify each of the used methods and parameters.
6) Write up the results of the study in the form of a PhD dissertation.
7) Research papers will be written and published in peer-reviewed journals.
This research will be conducted during the first three semesters’ researches period.
AUTHOR’S PREVIOUS RESEARCH
In the previous relevant research, the author have studied and made experiments on the
Probability Hypothesis Density (PHD) filter to get its application’s ways in target tracking
process. Furthermore, the author implemented the closed form solution of PHD recursion: the
Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter. The research has led to
determine the different drawbacks of GM-PHD filter which are: it lost performance when the
number of targets grow and when the trajectories of targets become more closed. Related to these
drawbacks, the author has developed a novel prediction algorithm in GM-PHD filter called the
Gamma Gaussian Mixture Probability Hypothesis Density (GaGM-PHD) filter for the innovation
of GM-PHD filter. The comparisons between the implementations of the new algorithm and the
existing GM-PHD filter have shown the innovation realized.
The author’s algorithm was original, effective and impactful and the result was presented at
the 8th International Conference on Image and Graphics (ICIG 2015), organized by China
Society of Image and Graphics and Microsoft Research Asia (MSRA) hosted in Tianjin, China.
The author’s paper was published by Springer and indexed by Engineering village (EI) with
Accession number: 20154201380467.
REFERENCES
[1] Scott et al. “Bayes and big data: The consensus Monte Carlo algorithm,” in EFaB Bayes
250 Conf., vol. 16, 2013.
[2] Mihaylova et al., “Overview of Bayesian sequential Monte Carlo methods for group and
extended object tracking," Elsevier, Digital Signal Processing 25 (2014) pp 1-16
[3] Korattikara et al., “Austerity in MCMC land: Cutting the Metropolis-Hastings Budget," in
Proc. of the Int. Conf. on Machine Learning, 2014.
[4] Gelman et al., Cunningham, “Expectation propagation as a way of life”.
preprint, http://arxiv.org/abs/1412.4869, 2014.

More Related Content

What's hot

Inferential statictis ready go
Inferential statictis ready goInferential statictis ready go
Inferential statictis ready goMmedsc Hahm
 
Seminaar on meta analysis
Seminaar on meta analysisSeminaar on meta analysis
Seminaar on meta analysisPreeti Rai
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysisSetia Pramana
 
Test of hypothesis
Test of hypothesisTest of hypothesis
Test of hypothesisvikramlawand
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAttaullah Khan
 
R workshop xiv--Survival Analysis with R
R workshop xiv--Survival Analysis with RR workshop xiv--Survival Analysis with R
R workshop xiv--Survival Analysis with RVivian S. Zhang
 
Analytical Hierarchy Process (AHP)
Analytical Hierarchy Process (AHP)Analytical Hierarchy Process (AHP)
Analytical Hierarchy Process (AHP)Rajiv Kumar
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimationzihad164
 
The Method Of Maximum Likelihood
The Method Of Maximum LikelihoodThe Method Of Maximum Likelihood
The Method Of Maximum LikelihoodMax Chipulu
 
Meta analysis ppt
Meta analysis pptMeta analysis ppt
Meta analysis pptSKVA
 
Workshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelWorkshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelHiram Ting
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAiden Yeh
 
Impact factor of Journal as per Journal citation report, SNIP, SJR, IPP, Cite...
Impact factor of Journal as per Journal citation report, SNIP, SJR, IPP, Cite...Impact factor of Journal as per Journal citation report, SNIP, SJR, IPP, Cite...
Impact factor of Journal as per Journal citation report, SNIP, SJR, IPP, Cite...Omprakash saini saini
 
t Test- Thiyagu
t Test- Thiyagut Test- Thiyagu
t Test- ThiyaguThiyagu K
 

What's hot (20)

Analysis Of Medical Data
Analysis Of Medical DataAnalysis Of Medical Data
Analysis Of Medical Data
 
Inferential statictis ready go
Inferential statictis ready goInferential statictis ready go
Inferential statictis ready go
 
Seminaar on meta analysis
Seminaar on meta analysisSeminaar on meta analysis
Seminaar on meta analysis
 
Descriptive statistics ii
Descriptive statistics iiDescriptive statistics ii
Descriptive statistics ii
 
impact factor ,h index (1).pptx
impact factor ,h index (1).pptximpact factor ,h index (1).pptx
impact factor ,h index (1).pptx
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysis
 
Introduction to Research Ethics
Introduction to Research EthicsIntroduction to Research Ethics
Introduction to Research Ethics
 
Test of hypothesis
Test of hypothesisTest of hypothesis
Test of hypothesis
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
R workshop xiv--Survival Analysis with R
R workshop xiv--Survival Analysis with RR workshop xiv--Survival Analysis with R
R workshop xiv--Survival Analysis with R
 
Analytical Hierarchy Process (AHP)
Analytical Hierarchy Process (AHP)Analytical Hierarchy Process (AHP)
Analytical Hierarchy Process (AHP)
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimation
 
The Method Of Maximum Likelihood
The Method Of Maximum LikelihoodThe Method Of Maximum Likelihood
The Method Of Maximum Likelihood
 
Meta analysis ppt
Meta analysis pptMeta analysis ppt
Meta analysis ppt
 
Workshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelWorkshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate Level
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Meta analysis
Meta analysisMeta analysis
Meta analysis
 
Impact factor of Journal as per Journal citation report, SNIP, SJR, IPP, Cite...
Impact factor of Journal as per Journal citation report, SNIP, SJR, IPP, Cite...Impact factor of Journal as per Journal citation report, SNIP, SJR, IPP, Cite...
Impact factor of Journal as per Journal citation report, SNIP, SJR, IPP, Cite...
 
t Test- Thiyagu
t Test- Thiyagut Test- Thiyagu
t Test- Thiyagu
 
Tests of significance z & t test
Tests of significance z & t testTests of significance z & t test
Tests of significance z & t test
 

Similar to Research Proposal

factorization methods
factorization methodsfactorization methods
factorization methodsShaina Raza
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstracttsysglobalsolutions
 
Classifier Model using Artificial Neural Network
Classifier Model using Artificial Neural NetworkClassifier Model using Artificial Neural Network
Classifier Model using Artificial Neural NetworkAI Publications
 
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...cscpconf
 
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...csandit
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Mumbai Academisc
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...IJDKP
 
JAVA 2013 IEEE DATAMINING PROJECT Crowdsourcing predictors of behavioral outc...
JAVA 2013 IEEE DATAMINING PROJECT Crowdsourcing predictors of behavioral outc...JAVA 2013 IEEE DATAMINING PROJECT Crowdsourcing predictors of behavioral outc...
JAVA 2013 IEEE DATAMINING PROJECT Crowdsourcing predictors of behavioral outc...IEEEGLOBALSOFTTECHNOLOGIES
 
Crowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomesCrowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomesIEEEFINALYEARPROJECTS
 
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...Multi Label Spatial Semi Supervised Classification using Spatial Associative ...
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...cscpconf
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsIJERA Editor
 
Crowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomesCrowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomesJPINFOTECH JAYAPRAKASH
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
 
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...IJCSIS Research Publications
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RIOSR Journals
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streamsirjes
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
 
Nature-inspired methods for the Semantic Web
Nature-inspired methods for the Semantic WebNature-inspired methods for the Semantic Web
Nature-inspired methods for the Semantic WebClaudiu Mihăilă
 

Similar to Research Proposal (20)

factorization methods
factorization methodsfactorization methods
factorization methods
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
Classifier Model using Artificial Neural Network
Classifier Model using Artificial Neural NetworkClassifier Model using Artificial Neural Network
Classifier Model using Artificial Neural Network
 
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...
 
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
 
algorithms
algorithmsalgorithms
algorithms
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...
 
JAVA 2013 IEEE DATAMINING PROJECT Crowdsourcing predictors of behavioral outc...
JAVA 2013 IEEE DATAMINING PROJECT Crowdsourcing predictors of behavioral outc...JAVA 2013 IEEE DATAMINING PROJECT Crowdsourcing predictors of behavioral outc...
JAVA 2013 IEEE DATAMINING PROJECT Crowdsourcing predictors of behavioral outc...
 
Crowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomesCrowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomes
 
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...Multi Label Spatial Semi Supervised Classification using Spatial Associative ...
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
 
ACCESS.2020.3015966.pdf
ACCESS.2020.3015966.pdfACCESS.2020.3015966.pdf
ACCESS.2020.3015966.pdf
 
Crowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomesCrowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomes
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
 
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Da...
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
Nature-inspired methods for the Semantic Web
Nature-inspired methods for the Semantic WebNature-inspired methods for the Semantic Web
Nature-inspired methods for the Semantic Web
 

Research Proposal

  • 1. PhD’s Research Proposal Title: Bayesian Inference for Big Data with Stochastic MCMC and Variational Bayesian Author: Komlan ATITEY Abstract—This PhD Research Proposal discusses the research project that the author will work on for his PhD dissertation. We consider the problem of big data for multi-target tracking in the presence of an unknown number of targets. This research provides a Bayesian framework for making Big Data inferences, based on conceptualized transformation, sampling and censoring processes applied to the Big Data measurements. Proper inference will require modeling of all processes, which can be very complex, if at all possible. However, where certain sampling and censoring ignorability conditions are fulfilled, inference can be made on the Big Data measurements as if they are acquired from a random sample. BACKGROUND MULTITARGET tracking has a long history spanning over 50 years and it refers to the problem of jointly estimating the number of targets and their states from sensor data. Today, multitarget tracking has found applications in diverse disciplines, including, air traffic control, intelligence, surveillance, and reconnaissance (ISR), space applications, oceanography, autonomous vehicles and robotics, remote sensing, computer vision, and biomedical research. During the last decade, advances in multitarget tracking techniques, along with sensing and computing technologies, have opened up numerous research venues as well as application areas. As the statistical models used to comprehend complex systems grow, the strategies used to fit these models must scale accordingly. While progressed computational strategies are being created to fit these complex models, their velocity and memory requirements regularly request colossal computational force via large clusters. This approach of relying on big data and high dimensional systems are rapidly getting to be unsustainable, especially for professionals for whom these assets are not accessible. Accordingly, there is a substantial and developing requirement for statistically efficient strategies which scale in terms of speed and memory while being straightforward to implement and communicate. PROBLEM STATEMENT In the field of multiple target tracking with the advances of sensor technology, it is possible to collect large amount of real time observation data from real systems during simulations. Inaccurate simulation results are often inevitable due to imperfect model and inaccurate inputs. Bayesian analysis is a standout amongst the best group of methods for analyzing information (data), and one now widely adopted in the statistical sciences as well as in Artificial intelligence (AI) technologies like machine learning. The Bayesian approach offers various alluring points of interest over different techniques: adaptability in constructing complex models from simple parts; completely coherent inferences from data; natural incorporation of prior knowledge; explicit modeling assumptions; exact thinking of vulnerabilities over model request and parameters; and assurance against overfitting.
  • 2. On the other hand, there is a general perception that Bayesian approach can be too slow to be practically useful on big data sets. This is because exact Bayesian computations are typically intractable, so a range of more practical approximate algorithms are needed, including Variational approximations, sequential Monte Carlo (SMC) and Markov Chain Monte Carlo (MCMC). Unfortunately, MCMC methods do not scale well to big data sets, since they require many iterations to reduce Monte Carlo noise, and each iteration already involves an expensive sweep through the whole data set. PREVIOUS RESEARCH For such big data’s problems, Scott et al. [1] has argue that the communication between large numbers of machines is expensive (regardless of the amount of data being communicated), so there is a need for algorithms that perform distributed approximate Bayesian analyses with minimal communication. The paper by Mihaylova et al. [2] presents the various aspects of the problems of group and extended object tracking, underlying difficulties, and the key factors facilitating their solution in the context of Bayesian estimation. They have presented methods for small groups and for large groups including MCMC methods, the random matrices approach and Random Finite Set Statistics methods. MCMC methods arguably form the most popular class of Bayesian computational techniques, due to their flexibility, general applicability and asymptotic exactness. The work by Korattikara et al. [3] enlightened MCMC methods and showed the need to develop an approximation related to the Metropolis-Hastings algorithm for Bayesian posterior sampling. Next, the paper by Gelman et al. [4] considered the expectation propagation (EP) as a prototype for scale algorithms that partition big data sets into many parts and analyze each part in parallel to perform inference of shared parameters. Furthermore, EP iteratively approximates the moments of the titled distributions and incorporates those approximations into a global posterior approximation. APPROACH TO PROBLEM Usually, taking more data into account and considering high dimensional systems improve a model's performance. In this project we propose to develop the theoretical foundations for a new class of MCMC inference strategies that can scale to billions of data items, in this way opening the qualities of Bayesian methods for big data. The essential thought is to utilize a small subset of the information (data) during each parameter update iteration of the algorithm, so that many iterations can be performed easily. Our proposal is to lay the mathematical foundations for understanding the theoretical properties of such stochastic MCMC algorithms, and to build on these foundations to develop more sophisticated algorithms. We aim to comprehend the conditions under which the algorithm is ensured to converge, and the sort and speed of convergence. Using this understanding, we intend to develop algorithmic extensions and generalizations with better convergence properties, including preconditioning, Sequential Monte Carlo methods, Online Bayesian learning methods, and approximate methods such as Variational Bayesian with large step sizes. These algorithms will be empirically validated on real world problems, including large scale data analysis problems for text processing and collaborative filtering.
  • 3. RESEARCH PLAN The plan for this research project is the following: 1) Review the extant literature on potential function methods and swarms. This will include a review of the previous work done on big data problems. 2) Develop theoretical foundations for a new class of MCMC inference strategies that can scale to big data items, in this way opening the qualities of Bayesian methods for big data. 3) Make a review to understand the conditions under which the algorithm converge and build on these foundations to develop more sophisticated algorithms. 4) The performance of our algorithms will be based on the use of Sequential Monte Carlo methods, Online Bayesian learning methods, and approximate methods such as Variational Bayesian. 5) If possible, the performance of our algorithms will be proved in real experiences to test and verify each of the used methods and parameters. 6) Write up the results of the study in the form of a PhD dissertation. 7) Research papers will be written and published in peer-reviewed journals. This research will be conducted during the first three semesters’ researches period. AUTHOR’S PREVIOUS RESEARCH In the previous relevant research, the author have studied and made experiments on the Probability Hypothesis Density (PHD) filter to get its application’s ways in target tracking process. Furthermore, the author implemented the closed form solution of PHD recursion: the Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter. The research has led to determine the different drawbacks of GM-PHD filter which are: it lost performance when the number of targets grow and when the trajectories of targets become more closed. Related to these drawbacks, the author has developed a novel prediction algorithm in GM-PHD filter called the Gamma Gaussian Mixture Probability Hypothesis Density (GaGM-PHD) filter for the innovation of GM-PHD filter. The comparisons between the implementations of the new algorithm and the existing GM-PHD filter have shown the innovation realized. The author’s algorithm was original, effective and impactful and the result was presented at the 8th International Conference on Image and Graphics (ICIG 2015), organized by China Society of Image and Graphics and Microsoft Research Asia (MSRA) hosted in Tianjin, China. The author’s paper was published by Springer and indexed by Engineering village (EI) with Accession number: 20154201380467. REFERENCES [1] Scott et al. “Bayes and big data: The consensus Monte Carlo algorithm,” in EFaB Bayes 250 Conf., vol. 16, 2013. [2] Mihaylova et al., “Overview of Bayesian sequential Monte Carlo methods for group and extended object tracking," Elsevier, Digital Signal Processing 25 (2014) pp 1-16 [3] Korattikara et al., “Austerity in MCMC land: Cutting the Metropolis-Hastings Budget," in Proc. of the Int. Conf. on Machine Learning, 2014. [4] Gelman et al., Cunningham, “Expectation propagation as a way of life”. preprint, http://arxiv.org/abs/1412.4869, 2014.