The document provides an overview of panel data methods and commands in Stata. It discusses:
1) Linear panel models including pooled OLS, fixed effects, random effects, and first differences estimators.
2) Panel instrumental variable estimators including Hausman-Taylor for models with time-invariant endogenous variables.
3) Dynamic panel data models and the Arellano-Bond estimator for models with individual fixed effects and a lagged dependent variable.
神戸大学政治学研究会2015年12月例会報告資料
一部の表が見にくい場合は
http://www.JaySong.net/wp-content/uploads/2015/12/Kobepolisci_Conjoint.pdf
をご参照下さい。
( PDF version is also available in http://www.jaysong.net )
神戸大学政治学研究会2015年12月例会報告資料
一部の表が見にくい場合は
http://www.JaySong.net/wp-content/uploads/2015/12/Kobepolisci_Conjoint.pdf
をご参照下さい。
( PDF version is also available in http://www.jaysong.net )
Objectives
To understand Weibull distribution
To be able to use Weibull plot for failure time analysis and
diagnosis
To be able to use software to do data analysis
Organization
Distribution model
Parameter estimation
Regression analysis
Objectives
To understand Weibull distribution
To be able to use Weibull plot for failure time analysis and
diagnosis
To be able to use software to do data analysis
Organization
Distribution model
Parameter estimation
Regression analysis
Finn Tarp: A High Five to the AfDB: WIDER commentsUNU-WIDER
Finn Tarp's comments at the seminar discussion 'Unlocking Africa's development potential - how to translate economic growth into development goals’ in Helsinki on 9 June 2016.
Dario Debowicz explains the methodology behind his analysis of the impact of Bolsa Familia transfers within different municipalities in Brazil. Read the full research at: http://bit.ly/1plN4yI
Professor Ed Amann outlines how Brazil has been able to bridge the urban - rural divide and what African countries might learn from its success. Read more at www.Brazil4Africa.org
Non-wage income is a big component of total income in America, yet is almost never analyzed in terms of inequality and discrimination. Here we use the Tobit method to determine the likelihood of a person earning Non-Wage income.
1) To understand the underlying structure of Time Series represented by sequence of observations by breaking it down to its components.
2) To fit a mathematical model and proceed to forecast the future.
Dual-time Modeling and Forecasting in Consumer Banking (2016)Aijun Zhang
Longitudinal and survival data are naturally observed with multiple origination dates. They form a dual-time data structure with horizontal axis representing the calendar time and the vertical axis representing the lifetime. In this talk we discuss how to model dual-time data based on a decomposition strategy and how to forecast over the time horizon. Various statistical techniques are used for treating fixed and random effects.
Among other fields, we share the potential applications in quantitative risk management, and demonstrate a large-scale credit risk analysis powered by big data computing.
Accelerated life testing plans are designed under multiple objective consideration, with the resulting Pareto optimal solutions classified and reduced using neural network and data envelopement analysis, respectively.
Hamming Distance and Data Compression of 1-D CAcsitconf
In this paper an application of von Neumann correction technique to the output string of some
chaotic rules of 1-D Cellular Automata that are unsuitable for cryptographic pseudo random
number generation due to their non uniform distribution of the binary elements is presented.
The one dimensional (1-D) Cellular Automata (CA) Rule space will be classified by the time run
of Hamming Distance (HD). This has the advantage of determining the rules that have short
cycle lengths and therefore deemed to be unsuitable for cryptographic pseudo random number
generation. The data collected from evolution of chaotic rules that have long cycles are
subjected to the original von Neumann density correction scheme as well as a new generalized
scheme presented in this paper and tested for statistical testing fitness using Diehard battery of
tests. Results show that significant improvement in the statistical tests are obtained when the
output of a balanced chaotic rule are mutually exclusive ORed with the output of unbalanced
chaotic rule that have undergone von Neumann density correction.
Hamming Distance and Data Compression of 1-D CAcscpconf
In this paper an application of von Neumann correction technique to the output string of some chaotic rules of 1-D Cellular Automata that are unsuitable for cryptographic pseudo random number generation due to their non uniform distribution of the binary elements is presented.The one dimensional (1-D) Cellular Automata (CA) Rule space will be classified by the time run of Hamming Distance (HD). This has the advantage of determining the rules that have short cycle lengths and therefore deemed to be unsuitable for cryptographic pseudo random number generation. The data collected from evolution of chaotic rules that have long cycles are subjected to the original von Neumann density correction scheme as well as a new generalized scheme presented in this paper and tested for statistical testing fitness using Diehard battery of tests. Results show that significant improvement in the statistical tests are obtained when the output of a balanced chaotic rule are mutually exclusive ORed with the output of unbalanced
chaotic rule that have undergone von Neumann density correction.
BPSO&1-NN algorithm-based variable selection for power system stability ident...IJAEMSJORNAL
Due to the very high nonlinearity of the power system, traditional analytical methods take a lot of time to solve, causing delay in decision-making. Therefore, quickly detecting power system instability helps the control system to make timely decisions become the key factor to ensure stable operation of the power system. Power system stability identification encounters large data set size problem. The need is to select representative variables as input variables for the identifier. This paper proposes to apply wrapper method to select variables. In which, Binary Particle Swarm Optimization (BPSO) algorithm combines with K-NN (K=1) identifier to search for good set of variables. It is named BPSO&1-NN. Test results on IEEE 39-bus diagram show that the proposed method achieves the goal of reducing variables with high accuracy.
Remote-sensing instruments have enabled the collection of big spatial data over large spatial domains such as entire continents or the globe. Basis-function representations are well suited to big spatial data, as they can enable fast computations for large datasets and they provide flexibility to deal with the complicated dependence structures often encountered over large domains. We propose two related multi-resolution approximations (MRAs) that use basis
functions at multiple resolutions to (approximately) represent any covariance structure. The first MRA results in a multi-resolution taper that can deal with large spatial datasets. The second MRA is based on a multi-resolution partitioning of the spatial domain and can deal with truly massive datasets, as it is highly scalable and amenable to parallel computations on distributed computing systems.
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automataijait
In this research work, we have put an emphasis on the cost effective design approach for high quality pseudo-random numbers using one dimensional Cellular Automata (CA) over Maximum Length CA. This work focuses on different complexities e.g., space complexity, time complexity, design complexity and searching complexity for the generation of pseudo-random numbers in CA. The optimization procedure for
these associated complexities is commonly referred as the cost effective generation approach for pseudorandom numbers. The mathematical approach for proposed methodology over the existing maximum length CA emphasizes on better flexibility to fault coverage. The randomness quality of the generated patterns for the proposed methodology has been verified using Diehard Tests which reflects that the randomness quality
achieved for proposed methodology is equal to the quality of randomness of the patterns generated by the maximum length cellular automata. The cost effectiveness results a cheap hardware implementation for the concerned pseudo-random pattern generator. Short version of this paper has been published in [1].
Similar to Panel data methods for microeconometrics using Stata! Short and good one :) (20)
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Panel data methods for microeconometrics using Stata! Short and good one :)
1. Panel data methods for microeconometrics using Stata
A. Colin Cameron
Univ. of California - Davis
Prepared for West Coast Stata Users’Group Meeting
Based on A. Colin Cameron and Pravin K. Trivedi,
Microeconometrics using Stata, Stata Press, forthcoming.
October 25, 2007
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 1 / 39
2. 1. Introduction
Panel data are repeated measures on individuals (i) over time (t).
Regress yit on xit for i = 1, ..., N and t = 1, ..., T.
Complications compared to cross-section data:
1 Inference: correct (in‡ate) standard errors.
This is because each additional year of data is not independent of
previous years.
2 Modelling: richer models and estimation methods are possible with
repeated measures.
Fixed e¤ects and dynamic models are examples.
3 Methodology: di¤erent areas of applied statistics may apply di¤erent
methods to the same panel data set.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 2 / 39
3. This talk: overview of panel data methods and xt commands for Stata
10 most commonly used by microeconometricians.
Three specializations to general panel methods:
1 Short panel: data on many individual units and few time periods.
Then data viewed as clustered on the individual unit.
Many panel methods also apply to clustered data such as
cross-section individual-level surveys clustered at the village level.
2 Causation from observational data: use repeated measures to
estimate key marginal e¤ects that are causative rather than mere
correlation.
Fixed e¤ects: assume time-invariant individual-speci…c e¤ects.
IV: use data from other periods as instruments.
3 Dynamic models: regressors include lagged dependent variables.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 3 / 39
4. Outline
1 Introduction
2 Linear models overview
3 Example: wages
4 Standard linear panel estimators
5 Linear panel IV estimators
6 Linear dynamic models
7 Long panels
8 Random coe¢ cient models
9 Clustered data
10 Nonlinear panel models overview
11 Nonlinear panel models estimators
12 Conclusions
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 4 / 39
5. 2.1 Some basic considerations
1 Regular time intervals assumed.
2 Unbalanced panel okay (xt commands handle unbalanced data).
[Should then rule out selection/attrition bias].
3 Short panel assumed, with T small and N ! ∞.
[Versus long panels, with T ! ∞ and N small or N ! ∞.]
4 Errors are correlated.
[For short panel: panel over t for given i, but not over i.]
5 Parameters may vary over individuals or time.
Intercept: Individual-speci…c e¤ects model (…xed or random e¤ects).
Slopes: Pooling and random coe¢ cients models.
6 Regressors: time-invariant, individual-invariant, or vary over both.
7 Prediction: ignored.
[Not always possible even if marginal e¤ects computed.]
8 Dynamic models: possible.
[Usually static models are estimated.]
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 5 / 39
6. 2.2 Basic linear panel models
Pooled model (or population-averaged)
yit = α + x0
it β + uit . (1)
Two-way e¤ects model allows intercept to vary over i and t
yit = αi + γt + x0
it β + εit . (2)
Individual-speci…c e¤ects model
yit = αi + x0
it β + εit , (3)
for short panels where time-e¤ects are included as dummies in xit .
Random coe¢ cients model allows slopes to vary over i
yit = αi + x0
it βi + εit . (4)
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 6 / 39
7. 2.2 Fixed e¤ects versus random e¤ects
Individual-speci…c e¤ects model: yit = x0
it β + (αi + εit ).
Fixed e¤ects (FE):
αi is possibly correlated with xit
regressor xit can be endogenous
(though only wrt a time-invariant component of the error)
can consistently estimate β for time-varying xit
(mean-di¤erencing or …rst-di¤erencing eliminates αi )
cannot consistently estimate αi if short panel
prediction is not possible
β = ∂E[yit jαi , xit ]/∂xit
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 7 / 39
8. Random e¤ects (RE) or population-averaged (PA)
αi is purely random (usually iid (0, σ2
α)).
regressor xit must be exogenous
corrects standard errors for equicorrelated clustered errors
prediction is possible
β = ∂E[yit jxit ]/∂xit
Fundamental divide
Microeconometricians: …xed e¤ects
Many others: random e¤ects.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 8 / 39
9. 2.3 Robust inference
Many methods assume εit and αi (if present) are iid.
Yields wrong standard errors if heteroskedasticity or if errors not
equicorrelated over time for a given individual.
For short panel can relax and use cluster-robust inference.
Allows heteroskedasticity and general correlation over time for given i.
Independence over i is still assumed.
Use option vce(cluster) if available (xtreg, xtgee).
This is not available for many xt commands.
then use option vce(boot) or vce(cluster)
but only if the estimator being used is still consistent.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 9 / 39
10. 2.4 Stata linear panel commands
Panel summary xtset; xtdescribe; xtsum; xtdata;
xtline; xttab; xttran
Pooled OLS regress
Feasible GLS xtgee, family(gaussian)
xtgls; xtpcse
Random e¤ects xtreg, re; xtregar, re
Fixed e¤ects xtreg, fe; xtregar, fe
Random slopes xtmixed; quadchk; xtrc
First di¤erences regress (with di¤erenced data)
Static IV xtivreg; xthtaylor
Dynamic IV xtabond; xtdpdsys; xtdpd
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 10 / 39
11. 3.1 Example: wages
PSID wage data 1976-82 on 595 individuals. Balanced.
Source: Baltagi and Khanti-Akom (1990).
[Corrected version of Cornwell and Rupert (1998).]
Goal: estimate causative e¤ect of education on wages.
Complication: education is time-invariant in these data.
Rules out …xed e¤ects.
Need to use IV methods (Hausman-Taylor).
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 11 / 39
12. 3.2 Reading in panel data
xt commands require data to be in long form.
Then each observation is an individual-time pair.
Original data are often in wide form.
Then an observation combines all time periods for an individual, or all
individuals for a time period.
Use reshape long to convert from wide to long.
xtset is used to de…ne i and t.
xtset id t is an example
allows use of panel commands and some time series operators.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 12 / 39
13. 3.3 Summarizing panel data
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 13 / 39
14. describe, summarize and tabulate confound cross-section and time
series variation.
Instead use specialized panel commands:
xtdescribe: extent to which panel is unbalanced
xtsum: separate within (over time) and between (over individuals)
variation
xttab: tabulations within and between for discrete data e.g. binary
xttrans: transition frequencies for discrete data
xtline: time series plot for each individual on one chart
xtdata: scatterplots for within and between variation.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 14 / 39
15. 4.1 Standard linear panel estimators
1 Pooled OLS: OLS of yit on xit .
2 Between estimator: OLS of ¯yi on xi .
3 Random e¤ects estimator: FGLS in RE model.
Equals OLS of (yit
bθi ¯yi ) on (xit
bθi xi );
θi = 1
p
σ2
ε /(Ti σ2
α + σ2
ε ).
4 Within estimator or FE estimator: OLS of (yit ¯yi ) on (xit xi ).
5 First di¤erence estimator: OLS of (yit yi,t 1) on (xit xi,t 1).
Implementation:
xtreg does 2-4 with options be, fe, re
xtgee does 3 (with option exchangeable)
regress does 1 and 5.
Only 4. and 5. give consistent estimates of β in FE model.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 15 / 39
16. 4.2 Example
Coe¢ cients vary considerably across OLS, FE and RE estimators.
Cluster-robust standard errors (su¢ x rob) larger even for FE and RE.
Coe¢ cient of ed not identi…ed for FE as time-invariant regressor.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 16 / 39
17. 4.3 Fixed e¤ects versus random e¤ects
Use Hausman test to discriminate between FE and RE.
If …xed e¤ects: FE consistent and RE inconsistent.
If not …xed e¤ects: FE consistent and RE consistent.
So see whether di¤erence between FE and RE is zero.
H = eβ1,RE
bβ1,FE
0 h
dCov[eβ1,RE
bβ1,FE]
i 1
eβ1,RE
bβ1,W ,
where β1 corresponds to time-varying regressors (or a subset of these).
Problem: hausman command assumes RE is fully e¢ cient.
But not the case here as robust se’s for RE di¤er from default se’s.
So hausman is incorrect.
Instead implement Hausman test using suest or panel bootstrap or
Wooldridge (2002) robust version of Hausman test.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 17 / 39
18. 5.1 Panel IV
Consider model with possibly transformed variables:
yit = α + x 0
it β + uit ,
where yit = yit or yit = ¯yi for BE or yit = (yit ¯yi ) for FE or
yit = (yit θi ¯yi ) for RE.
OLS is inconsistent if E[uit jxit ] = 0.
So do IV estimation with instruments zit satisfy E[uit jzit ] = 0.
Command xtivreg is used, with options be, re or fe.
This command does not have option for robust standard errors.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 18 / 39
19. 5.2 Hausman-Taylor IV estimator
Problem in the …xed e¤ects model
If an endogenous regressor is time-invariant
Then FE estimator cannot identify β (as time-invariant).
Solution:
Assume the endogenous regressor is correlated only with αi
(and not with εit )
Use exogenous time-varying regressors xit from other periods as
instruments
Command xthtaylor does this (and has option amacurdy).
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 19 / 39
20. 6.1 Linear dynamic panel models
Simple dynamic model regresses yit in polynomial in time.
e.g. Growth curve of child height or IQ as grow older
use previous models with xit polynomial in time or age.
Richer dynamic model regresses yit on lags of yit .
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 20 / 39
21. 6.2 Linear dynamic panel models with individual e¤ects
Leading example: AR(1) model with individual speci…c e¤ects
yit = γyi,t 1 + x0
it β + αi + εit .
Three reasons for yit being serially correlated over time:
True state dependence: via yi,t 1
Observed heterogeneity: via xit which may be serially correlated
Unobserved heterogeneity: via αi
Focus on case where αi is a …xed e¤ect
FE estimator is now inconsistent (if short panel)
Instead use Arellano-Bond estimator
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 21 / 39
22. 6.3 Arellano-Bond estimator
First-di¤erence to eliminate αi (rather than mean-di¤erence)
(yit yi,t 1) = γ(yi,t 1 yi,t 2) + (xit x0
i,t 1)β + (εit εi,t 1).
OLS inconsistent as (yi,t 1 yi,t 2) correlated with (εit εi,t 1)
(even under assumption εit is serially uncorrelated).
But yi,t 2 is not correlated with (εit εi,t 1),
so can use yi,t 2 as an instrument for (yi,t 1 yi,t 2).
Arellano-Bond is a variation that uses unbalanced set of instruments
with further lags as instruments.
For t = 3 can use yi1, for t = 4 can use yi1 and yi2, and so on.
Stata commands
xtabond for Arellano-Bond
xtdpdsys for Blundell-Bond (more e¢ cient than xtabond)
xtdpd for more complicated models than xtabond and xtdpdsys.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 22 / 39
23. 7.1 Long panels
For short panels asymptotics are T …xed and N ! ∞.
For long panels asymptotics are for T ! ∞
A dynamic model for the errors is speci…ed, such as AR(1) error
Errors may be correlated over individuals
Individual-speci…c e¤ects can be just individual dummies
Furthermore if N is small and T large can allow slopes to di¤er across
individuals and test for poolability.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 23 / 39
24. 7.2 Commands for long panels
Models with stationary errors:
xtgls allows several di¤erent models for the error
xtpcse is a variation of xtgls
xtregar does FE and RE with AR(1) error
Models with nonstationary errors (currently active area):
As yet no Stata commands
Add-on levinlin does Levin-Lin-Chu (2002) panel unit root test
Add-on ipshin does Im-Pesaran-Shin (1997) panel unit root test in
heterogeneous panels
Add-on xtpmg for does Pesaran-Smith and Pesaran-Shin-Smith
estimation for nonstationary heterogeneous panels with both N and T
large.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 24 / 39
25. 8.1 Random coe¢ cients model
Generalize random e¤ects model to random slopes.
Command xtrc estimates the random coe¢ cients model
yit = αi + x0
it βi + εit ,
where (αi , βi ) are iid with mean (α, β) and variance matrix Σ
and εit is iid.
No vce(robust) option but can use vce(boot) if short panel.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 25 / 39
26. 8.2 Mixed or multi-level or hierarchical model
Not used in microeconometrics but used in many other disciplines.
Stack all observations for individual i and specify
yi = Xi β + Zi ui + εi
where ui is iid (0, G) and Zi is called a design matrix.
Random e¤ects: Zi = e (a vector of ones) and ui = αi
Random coe¢ cients: Zi = Xi .
Other models including multi-level models are possible.
Command xtmixed estimates this model.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 26 / 39
27. 9.1 Clustered data
Consider data on individual i in village j with clustering on village.
A cluster-speci…c model (here village-speci…c) speci…es
yji = αi + x0
ji β + εji .
Here clustering is on village (not individual) and the repeated
measures are over individuals (not time).
Use xtset village id
Assuming equicorrelated errors can be more reasonable here than
with panel data (where correlation dampens over time).
So perhaps less need for vce(cluster) after xtreg
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 27 / 39
28. 9.2 Estimators for clustered data
If αi is random use:
regress with option vce(cluster village)
xtreg,re
xtgee with option exchangeable
xtmixed for richer models of error structure
If αi is …xed use:
xtreg,fe
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 28 / 39
29. 10.1 Nonlinear panel models overview
General approaches similar to linear case
Pooled estimation or population-averaged
Random e¤ects
Fixed e¤ects
Complications
Random e¤ects often not tractable so need numerical integration
Fixed e¤ects models in short panels are generally not estimable due to
the incidental parameters problem.
Here we consider short panels throughout.
Standard nonlinear models are:
Binary: logit and probit
Counts: Poisson and negative binomial
Truncated: Tobit
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 29 / 39
30. 10.2 Nonlinear panel models
A pooled or population-averaged model may be used.
This is same model as in cross-section case, with adjustment for
correlation over time for a given individual.
A fully parametric model may be speci…ed, with conditional density
f (yit jαi , xit ) = f (yit , αi + x0
it β, γ), t = 1, ..., Ti , i = 1, ...., N, (5)
where γ denotes additional model parameters such as variance
parameters and αi is an individual e¤ect.
A conditional mean model may be speci…ed, with additive e¤ects
E[yit jαi , xit ] = αi + g(x0
it β) (6)
or multiplicative e¤ects
E[yit jαi , xit ] = αi g(x0
it β). (7)
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 30 / 39
31. 10.3 Nonlinear panel commands
Counts Binary
Pooled poisson logit
negbin probit
GEE (PA) xtgee,family(poisson) xtgee,family(binomial) link(logit
xtgee,family(nbinomial) xtgee,family(poisson) link(probit
RE xtpoisson, re xtlogit, re
xtnegbin, fe xtprobit, re
Random slopes xtmepoisson xtmelogit
FE xtpoisson, fe xtlogit, fe
xtnegbin, fe
plus tobit and xttobit.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 31 / 39
32. 11.1 Pooled or Population-averaged estimation
Extend pooled OLS
Give the usual cross-section command for conditional mean models or
conditional density models but then get cluster-robust standard errors
Probit example:
probit y x, vce(cluster id)
or
xtgee y x, fam(binomial) link(probit) corr(ind)
vce(cluster id)
Extend pooled feasible GLS
Estimate with an assumed correlation structure over time
Equicorrelated probit example:
xtprobit y x, pa vce(boot)
or
xtgee y x, fam(binomial) link(probit) corr(exch)
vce(cluster id)
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 32 / 39
33. 11.2 Random e¤ects estimation
Assume individual-speci…c e¤ect αi has speci…ed distribution g(αi jη).
Then the unconditional density for the ith observation is
f (yit , ..., yiT jxi1, ..., xiT , β, γ, η)
=
Z h
∏
T
t=1
f (yit jxit , αi , β, γ)
i
g(αi jη)dαi . (8)
Analytical solution:
For Poisson with gamma random e¤ect
For negative binomial with gamma e¤ect
Use xtpoisson, re and xtnbreg, re
No analytical solution:
For other models.
Instead use numerical integration (only univariate integration is
required).
Assume normally distributed random e¤ects.
Use re option for xtlogit, xtprobit
Use normal option for xtpoisson and xtnegbin
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 33 / 39
34. 11.2 Random slopes estimation
Can extend to random slopes.
Nonlinear generalization of xtmixed
Then higher-dimensional numerical integral.
Use adaptive Gaussian quadrature
Stata commands are:
xtmelogit for binary data
xtmepoisson for counts
Stata add-on that is very rich:
gllamm (generalized linear and latent mixed models)
Developed by Sophia Rabe-Hesketh and Anders Skrondal.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 34 / 39
35. 11.3 Fixed e¤ects estimation
In general not possible in short panels.
Incidental parameters problem:
N …xed e¤ects αi plus K regressors means (N + K) parameters
But (N + K) ! ∞ as N ! ∞
Need to eliminate αi by some sort of di¤erencing
possible for Poisson, negative binomial and logit.
Stata commands
xtlogit, fe
xtpoisson, fe (better to use xtpqml as robust se’s)
xtnegbin, fe
Fixed e¤ects extended to dynamic models for logit and probit.
No Stata command.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 35 / 39
36. 12. Conclusion
Stata provides commands for panel models and estimators commonly
used in microeconometrics and biostatistics.
Stata also provides diagnostics and postestimation commands, not
presented here.
The emphasis is on short panels. Some commands provide
cluster-robust standard errors, some do not.
A big distinction is between …xed e¤ects models, emphasized by
microeconometricians, and random e¤ects and mixed models favored
by many others.
Extensions to nonlinear panel models exist, though FE models may
not be estimable with short panels.
This presentation draws on two chapters in Cameron and Trivedi,
Microeconometrics using Stata, forthcoming.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 36 / 39
37. Book Outline
For Cameron and Trivedi, Microeconometrics using Stata, forthcoming.
1. Stata basics
2. Data management and graphics
3. Linear regression basics
4. Simulation
5. GLS regression
6. Linear instrumental variable regression
7. Quantile regression
8. Linear panel models
9. Nonlinear regression methods
10. Nonlinear optimization methods
11. Testing methods
12. Bootstrap methods
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 37 / 39
38. Book Outline (continued)
13. Binary outcome models
14. Multinomial models
15. Tobit and selection models
16. Count models
17. Nonlinear panel models
18. Topics
A. Programming in Stata
B. Mata
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 38 / 39
39. Econometrics graduate-level panel data texts
Comprehensive panel texts
Baltagi, B.H. (1995, 2001, 200?), Econometric Analysis of Panel Data,
1st and 2nd editions, New York, John Wiley.
Hsiao, C. (1986, 2003), Analysis of Panel Data, 1st and 2nd editions,
Cambridge, UK, Cambridge University Press.
More selective advanced panel texts
Arellano, M. (2003), Panel Data Econometrics, Oxford, Oxford
University Press.
Lee, M.-J. (2002), Panel Data Econometrics: Methods-of-Moments
and Limited Dependent Variables, San Diego, Academic Press.
Texts with several chapters on panel
Cameron, A.C. and P.K. Trivedi (2005), Microeconometrics: Methods
and Applications, New York, Cambridge University Press.
Greene, W.H. (2003), Econometric Analysis, …fth edition, Upper
Saddle River, NJ, Prentice-Hall.
Wooldridge, J.M. (2002, 200?), Econometric Analysis of Cross Section
and Panel Data, Cambridge, MA, MIT Press.
A. Colin Cameron Univ. of California - Davis (Prepared for West Coast Stata Users’Group Meeting Based on A. Colin Cameron andPanel methods for Stata October 25, 2007 39 / 39