This 10 hours class is intended to give students the basis to empirically solve statistical problems. Talk 1 serves as an introduction to the statistical software R, and presents how to calculate basic measures such as mean, variance, correlation and gini index. Talk 2 shows how the central limit theorem and the law of the large numbers work empirically. Talk 3 presents the point estimate, the confidence interval and the hypothesis test for the most important parameters. Talk 4 introduces to the linear regression model and Talk 5 to the bootstrap world. Talk 5 also presents an easy example of a markov chains.
All the talks are supported by script codes, in R language.
Those are the slides for my Master course on Monte Carlo Statistical Methods given in conjunction with the Monte Carlo Statistical Methods book with George Casella.
Those are the slides for my Master course on Monte Carlo Statistical Methods given in conjunction with the Monte Carlo Statistical Methods book with George Casella.
I am Stacy W. I am a Statistical Physics Assignment Expert at statisticsassignmenthelp.com. I hold a Masters in Statistics from, University of McGill, Canada
I have been helping students with their homework for the past 7years. I solve assignments related to Statistical.
Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Statistical Physics Assignments.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Jere Koskela's slides
Quantum Algorithms and Lower Bounds in Continuous TimeDavid Yonge-Mallo
A poster presented at the Quantum Computing & Quantum Algorithms Program Review, in Buckhead, Atlanta, Georgia, 2008.
Abstract: "Many models of quantum computation, such as the Turing machine model or the circuit model, treat time as a discrete quantity and describe algorithms as discrete sequences of steps. However, this is not the only way to view quantum computational processes, as algorithms based on such ideas as continuous-time quantum walks show. By studying the properties of quantum computation in a continuous-time framework, we hope to discover new algorithms, develop better intuitions into existing algorithms, and gain further insights into the power and limitations of quantum computation."
I am Stacy W. I am a Statistical Physics Assignment Expert at statisticsassignmenthelp.com. I hold a Masters in Statistics from, University of McGill, Canada
I have been helping students with their homework for the past 7years. I solve assignments related to Statistical.
Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Statistical Physics Assignments.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Jere Koskela's slides
Quantum Algorithms and Lower Bounds in Continuous TimeDavid Yonge-Mallo
A poster presented at the Quantum Computing & Quantum Algorithms Program Review, in Buckhead, Atlanta, Georgia, 2008.
Abstract: "Many models of quantum computation, such as the Turing machine model or the circuit model, treat time as a discrete quantity and describe algorithms as discrete sequences of steps. However, this is not the only way to view quantum computational processes, as algorithms based on such ideas as continuous-time quantum walks show. By studying the properties of quantum computation in a continuous-time framework, we hope to discover new algorithms, develop better intuitions into existing algorithms, and gain further insights into the power and limitations of quantum computation."
A 3hrs intro lecture to Approximate Bayesian Computation (ABC), given as part of a PhD course at Lund University, February 2016. For sample codes see http://www.maths.lu.se/kurshemsida/phd-course-fms020f-nams002-statistical-inference-for-partially-observed-stochastic-processes/
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
An important, and well studied, class of stochastic models is given by stochastic differential equations (SDEs). In this talk, we consider Bayesian inference based on measurements from several individuals, to provide inference at the "population level" using mixed-effects modelling. We consider the case where dynamics are expressed via SDEs or other stochastic (Markovian) models. Stochastic differential equation mixed-effects models (SDEMEMs) are flexible hierarchical models that account for (i) the intrinsic random variability in the latent states dynamics, as well as (ii) the variability between individuals, and also (iii) account for measurement error. This flexibility gives rise to methodological and computational difficulties.
Fully Bayesian inference for nonlinear SDEMEMs is complicated by the typical intractability of the observed data likelihood which motivates the use of sampling-based approaches such as Markov chain Monte Carlo. A Gibbs sampler is proposed to target the marginal posterior of all parameters of interest. The algorithm is made computationally efficient through careful use of blocking strategies, particle filters (sequential Monte Carlo) and correlated pseudo-marginal approaches. The resulting methodology is is flexible, general and is able to deal with a large class of nonlinear SDEMEMs [1]. In a more recent work [2], we also explored ways to make inference even more scalable to an increasing number of individuals, while also dealing with state-space models driven by other stochastic dynamic models than SDEs, eg Markov jump processes and nonlinear solvers typically used in systems biology.
[1] S. Wiqvist, A. Golightly, AT McLean, U. Picchini (2020). Efficient inference for stochastic differential mixed-effects models using correlated particle pseudo-marginal algorithms, CSDA, https://doi.org/10.1016/j.csda.2020.107151
[2] S. Persson, N. Welkenhuysen, S. Shashkova, S. Wiqvist, P. Reith, G. W. Schmidt, U. Picchini, M. Cvijovic (2021). PEPSDI: Scalable and flexible inference framework for stochastic dynamic single-cell models, bioRxiv doi:10.1101/2021.07.01.450748.
I am George P. I am a Stochastic Processes Assignment Expert at statisticsassignmenthelp.com. I hold a Master's in Statistics, Malacca, Malaysia. I have been helping students with their homework for the past 8 years. I solve assignments related to Stochastic Processes.
Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Stochastic Processes Assignments.
I am Ben R. I am a Statistics Assignment Expert at statisticshomeworkhelper.com. I hold a Ph.D. in Statistics, from University of Denver, USA. I have been helping students with their homework for the past 5 years. I solve assignments related to Statistics.
Visit statisticshomeworkhelper.com or email info@statisticshomeworkhelper.com.
You can also call on +1 678 648 4277 for any assistance with Statistics Assignment.
Modelling traffic flows with gravity models and mobile phone large dataUniversity of Salerno
The analysis of origin-destination traffic flows is useful in many contexts of application
as urban planning and tourism economics, and have been commonly studied through the
Gravity Model, which in its simplest formulation states that flows are proportional to masses
of both origin and destination and inversely proportional to distance between them. Using data
from the flow of mobile phone signals among different areas recorded on hourly basis for several
months, in this study we use the Gravity Model to characterize the dynamic of such flows
over the time in the strongly urbanized and flood-prone area of the Mandolossa (western outskirts
of Brescia, northern Italy), with the final aim of predicting the traffic flow during flood
episodes. In order to better account for the dynamic of flows over time, we introduce in the
model a most accurate set of explanatory variables: (i) the density of mobile phone users by
area and time period and (ii) some appropriate temporal effects. Preliminary results show that
the joint use of these two novel sets of explanatory variables allow us to obtain a better linear
fitting of the Gravity Model and a better traffic flow prediction for the flood risk evaluation.
Balistrocchi, M., Metulini, R., Carpita, M., and Ranzi, R.: Dynamic maps of human exposure to floods based on mobile phone data, Nat. Hazards Earth Syst. Sci. Discuss., https://doi.org/10.5194/nhess-2020-201, in press, 2020
A strategy for the matching of mobile phone signals with census dataUniversity of Salerno
Administrative data allows us to count for the number of residents. The geo-localization of people by mobile phone, by quantifying the number of people at a given moment in time, enriches the amount of useful information for “smart”
(cities) evaluations. However, using Telecom Italia Mobile (TIM) data, we are able to characterize the spatio-temporal dynamic of the presences in the city of just TIM users. A strategy to estimate total presences is needed. In this paper we propose a
strategy to extrapolate the number of total people by using TIM data only. To do so, we apply a spatial record linkage of mobile phone data with administrative archives using the number of residents at the level of “sezione di censimento”.
Detecting and classifying moments in basketball matches using sensor tracked ...University of Salerno
Data analytics in sports is crucial to evaluate the performance of single players and the whole team. The literature proposes a number of tools for both offence and defence scenarios. Data coming from tracking location of players, in this respect, may be used to enrich the amount of useful information. In basketball,
however, actions are interleaved with inactive periods. This paper describes a methodological approach to automatically identify active periods during a game and to classify them as offensive or defensive. The method is based on the application
of thresholds to players kinematic parameters, whose values undergo a “tuning” strategy similar to Receiver Operating Characteristic curves, using a “ground truth” extracted from the video of the games
To assess the scoring probability of teams and players in different areas of a court map is an important topic in basketball analytics, in order to define both game strategies and training programmes.
In this contribution we propose a method based on regression trees, aimed to define a partition of the court in rectangles with maximally different scoring probabilities. Each analysed team/player has its/his own partition, so comparisons can be made among different teams/players.
In addition, shooting efficiency measures computed within the rectangles can be used to define spatial scoring performance indicators.
Human activity spatio-temporal indicators using mobile phone dataUniversity of Salerno
In the context of Smart Cities, monitoring the dynamic of the presence of people is a crucial aspect for the well-being of an urban area. We use mobile phone data as a proxy for the total number of people (Carpita & Simonetto 2014), with the specific aim of computing spatio-temporal region specific indicators. Telecom Italia Mobile (TIM), which is the largest operator in Italy, thanks to a research agreement with the Statistical Office of the Municipality of Brescia, provided to us about two years (April 2014 to June 2016) of High-Frequency Daily Mobile Phone Density Profiles (DMPDPs) in the form of a regular grid polygon each 15 minutes. Densities have to be rescaled in order to express the total amount of people rather than just TIM users. Separately
for selected regions in the province of Brescia, characterized by being either working or residential areas, we group similar DMPDPs and we characterize groups by their spatial and temporal components. In doing so, we propose a mixed-approach procedure.
In the context of Smart cities, local institutions face the increasing need for monitoring the
dynamic of the flow of people’s presences inside urban areas in order to plan the improvement
and the maintaining of the urban infrastructure. Rectangular grid polygons reporting the density
of people using mobile phone (Carpita, Simonetto, 2014) are source of very large data. Telecom
Italia Mobile (TIM), which is currently the largest operator in Italy in this sector, thanks to a
research agreement with the Statistical Office of the Municipality of Brescia, provided to us
about two years (April 2014 to June 2016, n ' 700) of Daily Mobile Phone Density Profiles
(DMPDPs) for the Province of Brescia in the form of a regular grid of 923 x 607 cells each 15
minutes.
In order to find regularities and detect anomalies in the flow of people’s presences, this
work aims to cluster similar DMPDPs, where each DMPDP is characterized by both the 2-D
spatial component (i.e. 923 x 607 dimensions, one for each cell of the grid) and by the temporal
component (i.e. each cell has repeated values in time, for a total of 96 daily dimensions per cell).
So, while each DMPDP counts for p ' 50 millions (923 x 607 x 96) of space-time dimensions,
time and economic constraints prevent us from having a longer time series of DMPDPs. In
this terms, to group DMPDPs configures as an High Dimensional Low Sample Size (HDLSS)
problem, since p n.
We propose a mixed-approach procedure that we apply to the city of Brescia. First, borrowing
the method of the Histogram of Oriented Gradients (HOG) from the Image Clustering
discipline (Tomasi, 2012), we perform a reduction of the DMPDPs dimensionality computing
their features extractions. In doing so, we perform some tuning on the HOG parameters in order
to reduce as much as possible the DMPDPs dimensionality while preserving as much as possible
the information contained in the extracted features. With this approach we preserve both the
spatial and the temporal components of the DMPDPs. Then, using the HOG features extractions,
we group DMPDPs by applying - and by testing the feasibility of - different clustering
approaches for large data
Because of the advent of GPS techniques, a wide range of scientific literature on Sport Science is nowadays devoted to the analysis of players’ movement in relation to team performance in the context of big data analytics. A specific research question regards whether certain patterns of space among players affect team performance, from both an offensive and a defensive perspective. Using a time series of basketball players’ coordinates, we focus on the dynamics of the surface area of the five players on the court with a two-fold purpose: (i) to give tools allowing a detailed description and analysis of a game with respect to surface areas dynamics and (ii) to investigate its influence on the points made by both the team and the opponent. We propose a three-step procedure integrating different statistical modelling approaches. Specifically, we first employ a Markov Switching Model (MSM) to detect structural changes in the surface area. Then, we perform descriptive analyses in order to highlight associations between regimes and relevant game variables. Finally, we assess the relation between the regime probabilities and the scored points by means of Vector Auto Regressive (VAR) models. We carry out the proposed procedure using real data and, in the analyzed case studies, we find that structural changes are strongly associated to offensive and defensive game phases and that there is some association between the surface area dynamics and the points scored by the team and the opponent.
In the domain of Sport Analytics, Global Positioning Systems devices are intensively used as they permit to retrieve players' movements. Team sports' managers and coaches are interested on the relation between players' patterns of movements and team performance, in order to better manage their team. In this paper we propose a Cluster Analysis and Multidimensional Scaling approach to find and describe separate patterns of players movements. Using real data of multiple professional basketball teams, we find, consistently over different case studies, that in the defensive clusters players are close one to another while the transition cluster are characterized by a large space among them. Moreover, we find the pattern of players' positioning that produce the best shooting performance.
In these slides I show how different kind of geo-referenced objects can be processed and manipulated in smart cities' studies.
I also presents a proposal for applying a gravity model to human mobility by replacing masses with phone cells.
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...University of Salerno
A new approach in team sports analysis consists in studying positioning and movements of players during the game in relation to team performance. State of the art tracking systems produce spatio-temporal traces of players that have facilitated a variety of research aimed to extract insights from trajectories. Several methods borrowed from machine learning, network and complex systems, geographic information system, computer vision and statistics have been proposed. After having reviewed the state of the art in those niches of literature aiming to extract useful information to analysts and experts in terms of relation between players' trajectories and team performance, this paper presents preliminary results from analysing trajectories data and sheds light on potential future research in this eld of study. In particular, using convex hulls, we find interesting regularities in players' movement patterns.
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...University of Salerno
Global Positioning Systems (GPS) are nowadays intensively used in Sport Science as they permit to capture the space-time trajectories of players, with the aim to infer useful information to coaches in addition to traditional statistics. In our application to basketball, we used Cluster Analysis in order to split the match in a number of separate time-periods, each identifying homogeneous spatial relations among players in the court. Results allowed us to identify differences in spacing among players, distinguish defensive or offensive actions, analyze transition probabilities from a certain group to another one.
Global Positioning Systems (GPS) are nowadays intensively used in Sport Science as they capture the
trajectories of players and /or the ball, sometimes together with play-by-play recording the time of match
events, with the aim of infer to supply coaches, experts and analysts with useful information in addition to
traditional statistics. To find any regularities and synchronizations in players‘ trajectories, and to study their
relationship with team's performance, however, is a complex task, because of the strong interdependencies
among players in the court and because of external factors that can influence players. To this aim, a variety
of methods has been proposed in Sport Science literature, which borrow from the disciplines of Machine
Learning, Network and Complex Systems, Geographical Information Systems, Computer Vision and Statistics.
In this seminar, with an application to basketball, I propose a methodological approach that can be
generalized to other team sports. I first demonstrate the usefulness of a visual tool approach in order to
extract preliminar insights from trajectories, then, I use data mining techniques such as Cluster Analysis and
Multidimensional Scaling to decompose the game into homogeneous phases in terms of spatial relations.
To conclude, I present specific research questions, such as: i) who is the most influencing player of the team?
ii) how much each player influences the others? iii) how much trajectories are determined by trajectories of
other players and by external factors? where the adoption of methods traditionally used in Spatial Statistics
and Spatial Econometrics could have a potential. In this regard, the seminar is also intended as a `platform”
to launch new research challenges and to search for collaboration
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...University of Salerno
Nonlinear estimation of the gravity model with Poisson/negative binomial methods has become popular to model international trade flows, as it permits a better accounting for large numbers of zero flows. Nevertheless, as trade flows are not independent of each other due to spatial autocorrelation, those methods lead to biased parameter estimates. To overcome this problem, eigenvector spatial filtering variants of the Poisson/Negative binomial specification has been proposed in the literature of gravity modelling of trade. This paper contributes to the literature in two ways. First, by employing a stepwise selection criterion for spatial filters which is based on robust (sandwich) p-values and does not require likelihood-based indicators. In this respect, we develop an ad hoc backward stepwise function in R. Second, using this function, we select a reduced set of spatial filters that properly accounts for importer-side and exporter-side specific spatial effects, both at the count and the logit process. Applying this estimation strategy to a cross-section of bilateral trade flows between a set of worldwide countries for the year 2000, we find that our specification outperforms the benchmark models, in terms of model fitting, both considering the AIC and in predicting zero (and small) flows.
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...University of Salerno
Disentangling the relations between human migrations and water resources is relevant for food security and trade policy in water-scarce countries. It is commonly believed that human migrations are beneficial to the water endowments of origin countries for reducing the pressure on local resources. We show here that such belief is over-simplistic. We reframe the problem by considering the international food trade and the corresponding virtual water fluxes, which quantify the water used for the production of traded agricultural commodities. By means of robust analytical tools, we show that migrants strengthen the commercial links between countries, triggering trade fluxes caused by food consumption habits persisting after migration. Thus migrants significantly increase the virtual water fluxes and the use of water in the countries of origin. The flux ascribable to each migrant, i.e. the “water suitcase”, is found to have increased from 321 m3/y in 1990 to 1367 m3/y in 2010. A comparison with the water footprint of individuals shows that where the water suitcase exceeds the water footprint of inhabitants, migrations turn out to be detrimental to the water endowments of origin countries, challenging the common perception that migrations tend to relieve the pressure on the local (water) resources of origin countries.
''The global virtual water network'' is a FIRB project funded by MIUR which aims at studying the main characteristics and implications of the virtual water flows associated to the international trade of food.
The project has the following main goals:
understanding the global dynamics of virtual water flows;
investigating the international water (and food) trade network;
evaluating impacts and feedbacks for food security;
assessing the vulnerability of the system to crises.
We aim at investigating the complex relationships between climatic, agronomic and socio-economic factors and how they shape the evolution of the worldwide trade of virtual water.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
A Survey of Techniques for Maximizing LLM Performance.pptx
Talk 5
1. Statistics Lab
Rodolfo Metulini
IMT Institute for Advanced Studies, Lucca, Italy
Lesson 5 - Introduction to Bootstrap (and hints on Markov
Chains) - 27.01.2015
2. Introduction
Let’s assume, for a moment, the Central Limit Theorem
(CLT):
If a random sample of n observations y1, y2, ..., yn is drawn from a
population of mean µ and sd σ2, for n enough large, the sample
distribution of the sample mean can be approximated by a normal
density with mean µ and variance σ2
n
Averages taken from any distribution will have a normal
distribution
The standard deviation decreases as the number of
observation increases
But .. nobody tells us exactly how big the sample has to be.
3. Why Bootstrap?
1. Sometimes we cannot take advantages of the CLT, because:
Nobody tells us exactly how big the sample has to be.
Empirically, in some cases the sample is really small.
So, we are not encouraged to conjecture any distribution
assumption. We just have the data and we let the raw data
speak.
The bootstrap method attempts to determine the probability
distribution from the data itself, without recourse to CLT.
2. To better estimate the variance of a parameter, and
consequently having more accurate confidence intervals and
hypothesis testing.
4. Basic Idea of Bootstrap
To use the original sample as the population, and to draw M
samples from the original sample (the bootstrap samples). To
Define the estimator using the bootstrap samples.
Figure: Real World versus Bootstrap World
5. Structure of Bootstrap
1. Originally, from a list of data (the sample), one computes a
statistic (an estimation).
2. Then, he/she can creates an artificial list of data (a new
sample), by randomly drawing elements from the list.
3. He/she computes a new statistic (estimation), from the new
sample.
4. He/she repeats, let’s say, M = 1000 times the point 2) and 3)
and he/she looks to the distribution of these 1000 statistics.
6. Type of resampling methods
1. The Monte Carlo algorithm: with replacement, the size of the
bootstrap sample must be equal to the size of the original data set
2. Jackknife algorithm: we simply re sample from the original sample
deleting one value at a time, the size is equal to n - 1.
7. Estimation of the sample mean
Suppose we extracted a sample x = (x1, x2, ..., xn) from the
population X. Let’s say the sample size is small: n = 10.
We can compute the sample mean ˆXn using the values of the
sample x. But, since n is small, the CLT does not hold, so that we
can say anything about the sample mean distribution.
APPROACH: We extract M samples (or sub-samples) of dimension
n from the sample x (with replacement, MC).
We can define the bootstrap sample means: ˆxi,b, ∀i = 1..., M. This
become the new sample with dimension M.
Bootstrap sample mean:
Mb(X) = M
i ˆxi,b/M
Bootstrap sample variance:
Vb(X) = M
i (ˆxi,b − Mb(X))2/M − 1 –(Chunk 1)
8. Bootstrap Confidence interval with variance
estimation
Let’s take a random sample of size n= 25 from a normal
distribution with mean 10 and standard deviation 3.
We can consider the sampling distribution of the sample mean.
From that, we estimate the intervals.
The bootstrap estimates standard error by re sampling the data in
our original sample.
Instead of repeatedly drawing samples of size n= 25 from the
population, we will repeatedly draw new samples of size n=25 from
our original sample, re sampling with replacement.
We can estimate the standard error of the sample mean using the
standard deviation of the bootstrapped sample means. –(Chunk
2)
10. Confidence interval with quantiles
Suppose we have a sample of data from an exponential distribution
with parameter λ:
f (x|λ) = λe−λx (remember: the estimation of λ is
ˆλ = 1/ˆxn).
An alternative solution to the use of bootstrap estimated standard
errors (since the estimation of the standard errors from an
exponential is not straightforward) is the use of bootstrap
quantiles.
We can obtain M bootstrap estimates ˆλb and define q∗(α) the α
quantile of the bootstrap distribution of the M λ estimates.
The new bootstrap confidence interval for λ will be:
[2 ∗ ˆλ − q∗(1 − α/2); 2 ∗ ˆλ − q∗(α/2)] –(Chunk 3)
11. Regression model coefficient estimate with Bootstrap
Now we will consider the situation where we have data on two variables.
This is the type of data that arises in linear regression models. It does
not make sense to bootstrap the two variables separately, so they remain
linked when bootstrapped.
If our original n=4 sample contains the observations (y1=1,x1=3),
(y2=2,x2=6), (y3=4,x3=3), and (y4=6,x4=2), we re-sample these
original couples in pairs.
Recall that the linear regression model is: yi = β1 + β2xi + i . We are
going to construct a bootstrap interval for the slope coefficient β2:
1. We draw M bootstrap bivariate samples.
2. We define the OLS ˆβ2 coefficient for each bootstrap sample.
3. We define the bootstrap quantiles, and we use the 0.025 (α/2) and
the 0.975 (1 − α/2) to define the confidence interval for ˆβ2.
–(Chunk 4)
12. Regression model coefficient estimate with Bootstrap
(alternative): sampling the residuals
An alternative solution for bootstrap estimating the regression
coefficient is a two stage methods in which:
1. You draw M samples. For each one you run a regression and
you define M bootstrap residual vectors (M vectors of
dimension n).
2. You add those residuals to each of the M dependent variable’s
vector.
3. You perform M new regression models using the new
dependent variables, to estimate M bootstrapped β2.
The method consists in using the (α/2) and the (1 - α/2)
quantiles of bootstrapped β2 to define the confidence interval.
–(Chunk 5)
13. References
Efron, B., Tibshirani, R. (1993). An introduction to the
bootstrap (Vol. 57). CRC press
Figure: Efron and Tbishirani foundational book
14. Routines in R
1. boot, by Brian Ripley.
Functions and datasets for bootstrapping from the book
Bootstrap Methods and Their Applications by A. C. Davison
and D. V. Hinkley (1997, CUP).
2. bootstrap, by Rob Tibshirani.
Software (bootstrap, cross-validation, jackknife) and data for
the book An Introduction to the Bootstrap by B. Efron and
R. Tibshirani, 1993, Chapman and Hall
15. Markov Chain
Markov Chain is an important method in probability and many
other area of research.
They are used to model the probability to belong to a certain state
in a certain period, given that the state in the past period is
known.
Example of weather: What is the markov probability for the state
tomorrow will be sunny, given that today is rainy?
The main properties of Markov Chain processes are:
Memory of the process (usually the memory is fixed to 1).
Stationarity of the distribution.
16. Chart 1
A picture of an easy example of markov chain with two possible
states and reported transition probabilities.
Figure: An example of 2 states markov chain
17. Notation
We define a stochastic process {Xt, t = 0, 1, 2, ...} that takes on a
finite or countable number of possible values.
Let the possible values be non negative integers (i.e.Xt ∈ Z+). If
Xt = i, then the process is said to be in state i at time t.
The Markov process (in discrete time) is defined as follows:
Pij = P[Xt+1 = j|Xt = i, Xt−1 = i, ..., X0 = i] = P[Xt+1 = j|Xt =
i], ∀i, j ∈ Z+
We call Pij a 1-step transition probability because we move from
time t to time t + 1.
It is a first order Markov Chain (memory = 1) because the
probability of being in state j at time (t + 1) only depends on the
state at time t.
18. Notation - 2
The t − step transition probability
Ptij = P[Xt+k = j|Xk = i], ∀t ≥ 0, i, j ≥ 0
The Champman Kolmogorov equations allow us to compute these
t − step transition probabilities. It states that:
Ptij = k PtikPmkj , ∀t, m ≥ 0, ∀i, j ≥ 0
N.B. Base probability properties:
1. Pij ≥ 0, ∀i, j ≥ 0
2. j≥0 Pij = 1, i = 0, 1, 2, ...
19. Example: conditional probability
Consider two states: 0 = rain and 1 = no rain.
Define two probabilities:
α = P00 = P[Xt+1 = 0|Xt = 0] the probability it will rain
tomorrow given it rains today
β = P01 = P[Xt+1 = 1|Xt = 0] the probability it will rain
tomorrow given it does not rain today. What is the probability it
will rain the day after tomorrow given it rains today, given α = 0.7
and β = 0.3?
The transition probability matrix will be:
P = [P00, P01, P10, P11], or
P = [α = 0.7, β = 0.3, 1 − α = 0.4, 1 − β = 0.6] –(Chunk 6)
20. Example: unconditional probababily
What is the unconditional probability it will rain the day after
tomorrow?
We need to define the unconditional or marginal distribution of the
state at time t:
P[Xt = j] = i P[Xt = j|X0 = 1]P[X0 = i] = i Ptij ∗ αi ,
where αi = P[X0 = i], ∀i ≥ 0
and P[Xt = j|X0 = 1] is the conditional probability just computed
before. –(Chunk 7)
21. Stationary distributions
A stationary distribution π is the probability distribution such that
when the Markov chain reaches the stationary distribution, then it
remains in that probability forever.
It means we are asking this question: What is the probability to be
in a particular state in the long-run?
Let’s define πj as the limiting probability that the process will be in
state j at time t, or
πj = limt→∞Pnij
Using Fubini’s theorem
(https://www.youtube.com/watch?v=6-sGhUeOOk8), we can
define the stationary distribution as:
πj = i Pij πi , or, better, with these approximations: π0 = β
α;
π1 = 1−α
α
22. Example: stationary distribution
Back to our example.
We can compute the 2 step, 3 step, ..., n- step transition
distributions, and give a look WHEN it reach the
convergence.
An alternative method to compute the stationary transition
distribution consists in using this easy formula:
π0 = β
α
π1 = 1−α
α
23. References
Ross, S. M. (2006). Introduction to probability models. Access
Online via Elsevier.
Figure: Cover of the 10th edition
24. Routines in R
markovchain, by Giorgio Alfredo Spedicato.
A package for easily handling discrete Markov chains.
MCMCpack, by Andrew D. Martin, Kevin M. Quinn, and
Jong Hee Park.
Perform Monte Carlo simulations based on Markov Chain
approach.