Stata statistics
Upcoming SlideShare
Loading in...5
×
 

Stata statistics

on

  • 2,201 views

Topics for the class include multiple regression, dummy variables, interaction effects, hypothesis tests, and model diagnostics. Prerequisites include a general familiarity with Stata, including ...

Topics for the class include multiple regression, dummy variables, interaction effects, hypothesis tests, and model diagnostics. Prerequisites include a general familiarity with Stata, including importing and managing datasets and data exploration, the linear regression model, and the ordinary least squares estimation.

Workshop materials including do files and example data sets are available from http://projects.iq.harvard.edu/rtc/event/regression-stata

Statistics

Views

Total Views
2,201
Views on SlideShare
1,806
Embed Views
395

Actions

Likes
2
Downloads
106
Comments
0

3 Embeds 395

http://projects.iq.harvard.edu 393
http://www.slashdocs.com 1
http://www.pdfok.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Stata statistics Stata statistics Presentation Transcript

  • Regression in Stata Ista Zahn Harvard MIT Data Center January 24 2013 The Institute for Quantitative Social Science at Harvard UniversityIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 1 / 35
  • Outline 1 Introduction 2 Univariate regression 3 Multiple regression 4 Interactions 5 Exporting and saving results 6 Wrap-upIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 2 / 35
  • Introduction Topic 1 Introduction 2 Univariate regression 3 Multiple regression 4 Interactions 5 Exporting and saving results 6 Wrap-upIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 3 / 35
  • Introduction Organization Please feel free to ask questions at any point if they are relevant to the current topic (or if you are lost!) There will be a Q&A after class for more specific, personalized questions Collaboration with your neighbors is encouraged If you are using a laptop, you will need to adjust paths accordingly Make comments in your Do-file rather than on hand-outs Save on flash drive or email to yourselfIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 4 / 35
  • Introduction Copy the workshop materials to your home directory Log in to an Athena workstation using your Athena user name and password Click on the “Ubuntu” button on the upper-left and type “term” as shown below Click on the “Terminal” icon as shown above In the terminal, type this line exactly as shown: cd; wget http://tinyurl.com/stata-stats-zip; unzip stata-stats-zip If you see “ERROR 404: Not Found”, then you mistyped the command – try again, making sure to type the command exactly as shownIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 5 / 35
  • Introduction Launch Stata on Athena To start Stata type these commands in the terminal: add stata xstata Open up today’s Stata script In Stata, go to Window => New do file => Open Locate and open the StatStatistics.do script in the StataStatistics folder in your home directory I encourage you to add your own notes to this file!Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 6 / 35
  • Introduction Today’s Dataset We have data on a variety of variables for all 50 states Population, density, energy use, voting tendencies, graduation rates, income, etc. We’re going to be predicting SAT scores Univariate Regression: SAT scores and Education Expenditures Does the amount of money spent on education affect the mean SAT score in a state? Dependent variable: csat Independent variable: expenseIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 7 / 35
  • Introduction Opening Files in Stata Look at bottom left hand corner of Stata screen – This is the directory Stata is currently reading from Files are located in the StataDatMan folder Start by changing directory and loading the data // change directory cd "C:/Users/dataclass/Desktop/StataStatistics" // use dir to see what is in the directory: dir // tell Stata to use the states data set use states.dtaIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 8 / 35
  • Introduction Steps for Running Regression 1 Examine descriptive statistics 2 Look at relationship graphically and test correlation(s) 3 Run and interpret regression 4 Test regression assumptionsIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 9 / 35
  • Univariate regression Topic 1 Introduction 2 Univariate regression 3 Multiple regression 4 Interactions 5 Exporting and saving results 6 Wrap-upIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 10 / 35
  • Univariate regression Univariate Regression: Preliminaries We want to predict csat scores from expense First, let’s look at some descriptives // generate summary statistics for csat and expense sum csat expense // look at codebok codebook csat expenseIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 11 / 35
  • Univariate regression Univariate Regression Look at scatterplots, compute correlation matrix, and regress SAT scores on expenditures // graph expense by csat twoway scatter expense csat // correlate csat and expense pwcorr csat expense, star(.05) // run the regression regress csat expenseIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 12 / 35
  • Univariate regression Linear Regression Assumptions Assumption 1: Normal Distribution The errors of regression equation are normally distributed Assumption 2: Homoscedasticity (The variance around the regression line is the same for all values of the predictor variable) Assumption 3: Errors are independent Assumption 4: Relationships are linearIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 13 / 35
  • Univariate regression HomoscedasticityIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 14 / 35
  • Univariate regression Testing Assumptions: Normality A simple histogram of the residuals can be informative // graph the residual values of csat predict resid, residual histogram resid, normalIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 15 / 35
  • Univariate regression Testing Assumptions: Homoscedasticity rvfplotIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 16 / 35
  • Multiple regression Topic 1 Introduction 2 Univariate regression 3 Multiple regression 4 Interactions 5 Exporting and saving results 6 Wrap-upIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 17 / 35
  • Multiple regression Multiple Regression Just keep adding predictors Let’s try adding some predictors to the model of SAT scores income :: % students taking SATs percent :: % adults with HS diploma (high)Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 18 / 35
  • Multiple regression Multiple Regression Preliminaries As before, start with descriptive statistics and correlations // descriptive statistics sum income percent high // generate correlation matrix pwcorr csat expense income percent high // regress csat on exense, income, percent, and high regress csat expense income percent highIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 19 / 35
  • Multiple regression Exercise 1: Multiple Regression Open the datafile, states.dta. 1 Select a few variables to use in a multiple regression of your own. Before running the regression, examine descriptive of the variables and generate a few scatterplots. 2 Run your regression 3 Examine the plausibility of the assumptions of normality and homogeneityIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 20 / 35
  • Interactions Topic 1 Introduction 2 Univariate regression 3 Multiple regression 4 Interactions 5 Exporting and saving results 6 Wrap-upIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 21 / 35
  • Interactions Interactions What if we wanted to test an interaction between percent & high? Option 1: generate product terms by hand // generate product of percent and high gen percenthigh = percent*high regress csat expense income percent high percenthighIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 22 / 35
  • Interactions Interactions What if we wanted to test an interaction between percent & high? Option 2: Let Stata do your dirty work // use the # sign to represent interactions regress csat percent high c.percent#c.high // same as . regress csat c.percent##highIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 23 / 35
  • Interactions Categorical Predictors For categorical variables, we first need to dummy code Use region as example Option 1: create dummy codes before fitting regression model // create region dummy codes using tab tab region, gen(region) // could also use gen / replace //regress csat on region regress csat region1 region2 region3Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 24 / 35
  • Interactions Categorical Predictors For categorical variables, we first need to dummy code Use region as example Option 2: Let Stata do it for you // regress csat on region using fvvarlist syntax // see help fvvarlist for details regress csat i.regionIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 25 / 35
  • Interactions Exercise 2: Regression, Categorical Predictors, & Interactions Open the datafile, states.dta. 1 Add on to the regression equation that you created in exercise 1 by generating an interaction term and testing the interaction. 2 Try adding a categorical variable to your regression (remember, it will need to be dummy coded). You could use region or high25, or generate a new categorical variable from one of the continuous variables in the dataset.Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 26 / 35
  • Exporting and saving results Topic 1 Introduction 2 Univariate regression 3 Multiple regression 4 Interactions 5 Exporting and saving results 6 Wrap-upIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 27 / 35
  • Exporting and saving results Saving and exporting regression tables Usually when we’re running regression, we’ll be testing multiple models at a time Can be difficult to compare results Stata offers several user-friendly options for storing and viewing regression output from multiple models First, download the necessary packages: * install outreg2 package findit outreg2Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 28 / 35
  • Exporting and saving results Saving and replaying You can store regression model results in Stata // fit two regression models and store the results regress csat expense income percent high estimates store Model1 regress csat expense income percent high i.region estimates store Model2Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 29 / 35
  • Exporting and saving results Saving and replaying Stored models can be recalled // Display Model1 estimates replay Model1Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 30 / 35
  • Exporting and saving results Saving and replaying Stored models can be compared // Compare Model1 and Model2 coefficients estimates table Model1 Model2Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 31 / 35
  • Exporting and saving results Exporting into Excel Avoid human error when transferring coefficients into tables Excel can be used to format publication-ready tables outreg2 [Model1 Model2] using csatprediction.xls, replaceIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 32 / 35
  • Wrap-up Topic 1 Introduction 2 Univariate regression 3 Multiple regression 4 Interactions 5 Exporting and saving results 6 Wrap-upIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 33 / 35
  • Wrap-up Help Us Make This Workshop Better Please take a moment to fill out a very short feedback form These workshops exist for you–tell us what you need! ttp://tinyurl.com/StataRegressionFeedbackIsta Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 34 / 35
  • Wrap-up Additional resources training and consulting IQSS workshops: http://projects.iq.harvard.edu/rtc/filter_by/workshops IQSS statistical consulting: http://rtc.iq.harvard.edu Stata resources UCLA website: http://www.ats.ucla.edu/stat/Stata/ Great for self-study Links to resources Stata website: http://www.stata.com/help.cgi?contents Email list: http://www.stata.com/statalist/Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 35 / 35