2. 2
2
Why Stata
• Pro
– Aimed at epidemiology
– Many methods, growing
– Graphics
– Structured, programmable
– Coming soon to a course near you
• Con
– Memory > file size
3. 3
3
This Course
Date Level Topic Teacher
21.1. Beginner Introduction to Stata JW
28.2. Beginner Graphics for data and results JW
7.3. Elementary Linear Regression HS
14.3. Elementary Logistic regression HS
21.3. --------------- No course ---------------
28.3. Advanced Survival analysis HS
4.4. Advanced Automating analysis: Loops, macros, … HS
11.4. Advanced
Programing: Simulating data,
bootstrapping, power calculations
HS
18.4. Advanced Individual fixed effects regression JW
For more details: https://tinyurl.com/53scv867
4. Stata introduction
• General use
– Interface and menu
– Do-files and syntax
– Data handling
• Analysis
– Descriptive
– Graphs
– Bivariate
4
5 Exercises
5. Exercises
• Course files: https://tinyurl.com/53scv867
– Birth 1 (Datafil) data for exercises 1-5
– 1 Stata introduction (syntax) solutions to exercises
5
13. Smart working
• Data (.dta)
– Master file, keep safe
– Working file for each project
• Syntax (.do)
– Work in progress file
– Manuscript file (Table 1…, Figure 1…, Supplement)
• Output (.smcl or .log)
– Save or discard
13
15. 15
15
Use and save data
• Open data
– use “C:CourseMyfile.dta”, clear
– Or two lines:
• cd “C:Course”
• use “Myfile.dta”, clear
• Describe
– describe describe all variables
– list sex age in 1/20 list obs nr 1 to 20
• Save data
– save “C:CourseMyfile_new.dta”, replace
– Or two lines:
• cd “C:Course”
• save “Myfile_new.dta”, replace
16. Exercise 1
• Download the birth1-datafile (to desktop/folder):
https://tinyurl.com/53scv867
• Start Stata
• Open a new syntax file (Ctlr-9)
– Write all commands in the syntax file
• Open the dataset (use)
• Describe all variables (describe)
• List the 10 first observations of id, weight, sex and
mother’s age (mage)
• Save the syntax file (to desktop/folder) for later use
16
19. 19
19
Generate, replace
• Index (young men)
– generate index=0
– replace index=1 if sex==1 & age<30
• Young/Old
– generate old=(age>50) if age<.
20. Recode
• Recode 1/2 into 0/1
– recode sex (1=0) (2=1), gen(sex0)
• Alternative
– generate sex0=sex-1
20
21. Labels
• Assign variable label to variable
– label variable girl ”Girl (ref. Boy)”
• Assign value label to variable values
– label define girllbl 0 ”boy” 1 ”girl”
– label value girl girllbl
21
22. 22
22
Dates
• From numeric to date (3 numeric variables into date variable)
ex: m=12, d=2, y=1987
generate birth=mdy(m,d,y)
format birth %td
• From string to date (1 string variable into date variable)
ex: bstr=“02.12.1987”
generate birth=date(bstr,”DMY”)
format birth %td
23. Exercise 2
• Summarize mother’s age
• Tabulate sex
• Recode sex into sex0 with categories 0, 1
• Generate new gestational age in weeks (the old is in
days)
– Summarize the new variable
– Label the new variable (not its values)
• Generate and format new variable birth in date format
based on the three variables day, month and year
– List day, month, year and birth to control the results
23
24. 24
24
Missing
• Obs!!!
– Represented as ”.” (.a, .b, …)
– Missing values are large numbers
– age>30 will include missing.
– age>30 if age<. will not.
• Change between values and missings
– replace educ = . if educ == 99
– mvdecode educ, mv(99 = . 9999 = .a)
26. Exercise 3
• Tabulate missing in weight, sex, and gestational age
(gest) with the misstable sum command. Interpret.
• Tabulate gest versus sex and show number of
missing
• Summarize mage if gest is greater than 260 days
– Will this include missing in gest? Prove!
– Summarize mage if gest is greater than 260 days, excluding
missing in gest
26
27. 27
27
Help
• General
– help command
– findit keyword search Stata+net
• Examples
– help table
– findit coefplot
• Web resources
– https://www.stata.com/links/resources-for-
learning-stata/
30. 30
30
twoway (scatter bw gest) (fpfitci bw gest) (lfit bw gest)
2000
3000
4000
5000
6000
gram
250 270 290 310
days
Weight by gestational age
scatter smooth with CI line fit
32. Exercise 4
• Make a density plot of birth weight (weight)
• Make a scatter plot of birth weight versus gestational age (gest)
– Replace the outlier in gestational age (gest) with missing
– Restrict the plot to gestational age greater than 250 days (hint if gest>250)
– Add a linear fit line to the scatter plot to see the trend
– Add a smoothing curve with confidence interval to the plot (fpfitci) to check
for a non-linear pattern. The order of plots matters!
– Add a title, ytitle and xtitle to the plot
32
34. 34
34
2 independent samples
2000 3000 4000 5000 6000
Birth weight
twoway ( kdensity weight if sex==1, lcolor(blue) ) ///
( kdensity weight if sex==2, lcolor(red) )
Equal means?
Equal variance?
Do boys and girls have the same mean birth weight?
37. Exercise 5
• The variable “magegr2” contains mother’s age in two groups. Do
tab magegr2 and tab magegr2, nolab to find the groups and the
coding. An alternative to find coding is to list all labels: label list
• Make a plot of the birth weight distribution for each of the two
groups of mother’s age.
• Do a ttest of weight by magegr2. Are the means different?
• Redo the ttest for weight>2000 to get more normal distributions.
– Are the means different?
– Are the p-values different?
• Generate an indicator for high birth weight (>4500).
• Make a table of high birth weight by gestgr2 with columns
percent and chi-square test. Is higher birthweight more likely
with higher gestational age?
37
38. Extra (if you have time)
• Do a help tabstat and look at the statistics options
• Do a tabstat of weight showing N min p25 p50 p75 max, by
magegr2
38
41. Copying output
• Copy graphs to Word or PowerPoint
– Save graphs in many formats, or
– Right-click on a graph to copy
• Copy output to Word or PowerPoint
– Mark output and right-click
– “Copy as picture”
• Copy tables to Excel
– Mark table, Ctrl-shift-C
41
42. Save output (Log results)
• Save a portion of the output as a .smcl file
capture log close
log using “results.smcl”, replace
…
log close
42
43. Keep plots during session
• Set “tabbed” graphics
• Give each plot a name
43
set autotabgraphs on, permanently
twoway …, name(“scatter”,replace)
44. Stata via kiosk
• Stata
– https://kiosk.uio.no
– Analyse Stata (single click, wait…)
– Vmvare horizon
44