SlideShare a Scribd company logo
Faculty of Economics and Political Science, Cairo University 
STATA9 
:Instructor 
Samaa H. Hosny 
Ph.D. Candidate 
Sunday-Wednesday 
10-14 May 2009
:Section 1 
Introduction and Overview
1. Stata interface: 
- windows, 
- icons vs. syntax, and 
- initial output 
2. Stata community and website: 
www.stata.com
Compare Stata with SPSS. 3 
 As per statistical capabilities, Stata can do a 
lot more than SPSS, i.e. more advanced. 
 SPSS is more inclined towards the business 
world. 
 Stata is more inclined towards the research 
community. It offers a helpful exchange of 
ideas and experience between its academic 
users.
Compare Stata with SPSS . 3 
((cont’d 
 Can receive updates as well as ado files, i.e. you 
don’t need to wait for a new version to run new 
commands. 
 Compare the two websites: www.spss.com and 
www.stata.com 
 Join the mailing list of updates: send a message to 
majordomo@hsphsun2.harvard.edu and write: 
 subscribe statalist email@address 
 OR for daily summary you can write: 
subscribe statalist-digest email@address
4. Introducing the three Stata 
editions: 
 Stata/SE: special edition 
 Stata/IC: Intercooled 
 Stata/small: small (educational) 
5. Two dimensions of data: cases and 
variables
Stata/ SE Stata/IC /Stata 
small 
Max. no. of 
variables 
32,766 2,047 99 
Max. no. of 
observations 
2,147,483,647 2,147,483,647 1,000 
Max. no. of 
characters for 
a string 
variable 
244 80 80 
Matrices x 1,000 1,000 x 800 800 x 40 40
6. Help: Built-in (offline) / internet (online) 
- From Stata: Help >> Search >>Search all 
- findit keyword 
- Very helpful links at: 
http://www.stata.com/links/resources1.html 
7. File Types: 
- Data file: filename.dta 
- Do file: filename.do 
- Ado file: filename.dta 
- Log file (only readable in Stata): filename.smcl 
- Log file (text file): filename.log
Main tasks. 8 
1. Accessing data 
2. Entering data in Stata 
3. Convert data (via StatTransfer) from formats such 
as text, Excel, SPSS, SAS, and other softwares 
4. Save Stata format as a .dta file 
5. File and data preparation 
6. Descriptive statistics 
7. Tabulating data: frequency tables and cross 
tabulations 
8. Graphics 
9. Data analysis 
10. Preparing a report using Stata output: (creating a 
Word document)
Good Practices. 9 
1. Documentation within a program (using the *) 
2. Intuitive variable names, labels, and file names 
3. Avoid destroying or over-writing original data 
4. Appropriate use of command abbreviation 
5. Keeping a record of work: commands (do) and 
outputs (log)
:Section 2 
Getting Started 
(Hands-On Applications(
Opening and closing Stata. 1 
Like any other software: 
Start button>>Programs>>Stata 
OR 
Double click on the software icon on the 
desktop
Memory issues: checking and . 2 
setting memory capacity 
 To check memory status (default is 1m = 1 
Megabytes) 
memory 
 To change the memory (needed for large 
data sets) 
set memory 750m 
set memory 750m,perm
Current directory and . 3 
changing it 
 The current directory is found in the status bar as a path 
below the 4 windows. To use and save files without the 
need to rewrite the whole required path every time we 
write a “use” or “save” command, we change the directory 
to the one we want to deal with directly as follows: 
cd D:StataData //since no spaces included in the name 
OR 
cd “D:StataData Files” //a space in the path name 
requires using quotes
4. Opening and closing files 
(data, log, and do) 
 Open file existing in any directory: 
use C:folderfilename.dta 
 Open file existing on the current directory: 
use filename 
 Open specific variables : 
use age region using C:folderfilename.dta 
 Open specific cases : 
use C:folderfilename.dta if male==1 //using a certain category 
use C:folderfilename.dta in 1/10 //using the first 10 cases only
Preparing a report using . 5 
Stata output: 
((creating a Word document 
 Steps: 
1. Open a log file to save all contents of the session (commands and 
outputs( using: 
log using filename.log //to have a text file not an .smcl file 
2. Carry on all the analysis required, then write: 
log close 
3. Open this file using any .txt reader and copy it to a Word file 
4. Format the Word file with font: “Courier New” of size 8 or 9 to have 
exactly the same shape of output as in the Stata output window
Viewing existing data in a . 6 
data file 
 To view all existing variable names, specification, labels, number of 
variables and observations we use: 
describe 
d, fullnames //to avoid abbreviations in names 
 To namely select some of them we use: 
describe region1 region2 
 To select all those starting with the same letters (e.g. reg( we use: 
d reg* // describe all variables starting with reg-d 
*tion // describe all variables ending with -tion
Viewing existing data in . 6 
)a data file (cont’d 
 To view all existing observations: 
list 
 To view some selected observations: 
list in 1/5 //1st 5 observations 
 To view some selected variables: 
l age gov sex in 1/10 //1st 10 observations in the three variables 
l age gov sex if male==1 //only males in the three variables 
li X* //all variables starting with X
Viewing existing data in . 6 
)a data file (cont’d 
Important Notes! 
 To avoid running very long outputs in general, for 
example all the observations (in case of very large 
datasets( we can use: the Break icon in the toolbar 
or from the Keyboard: Ctrl+C at any time to stop 
getting more output from the same command. 
 To permanently switch off the –more- option 
between pages of output we type: 
set more off
Entering and saving . 7 
data 
1. Manually through the keyboard: string variables 
should be specified as str before the varname (e.g. 
var3 is string of 9 places, it’s str9(: 
input var1 var2 str9 var3 
val11 val12 “val13” 
. 
. 
. 
valN1 valN2 valN3 
end
Entering and saving . 7 
data 
2. Manually through the data editor 
 Enter values in the table cell by cell (where the 
cursor (colored cell( is. 
 Double click on the varname and edit its name, 
label, and format.
Entering and saving . 7 
data 
3. Download or search for datasets by: 
 Typing in the command window: 
help datasets 
 searching www.stata.com for datasets 
 searching the internet
Entering and saving . 7 
data 
4. Using StatTransfer to transfer any 
spreadsheet into Stata format (The 
best way in order not to lose any 
data( as well as maintaining all 
variable labels and storage types (in 
case the file was in SPSS or any 
other statistical package saving 
information about the variables(
Entering and saving . 7 
data 
4. Save an Excel file with variable header (i.e. 
varnames in the first row(>>select all>> copy 
from Excel sheet >>highlight the upper left cell in 
Stata data editor>>paste 
6. Save an Excel file using tab delimited format 
(.txt( without variable headers (i.e. all columns 
are values( 
Then type in Stata command window: 
insheet using Book1.txt
Entering and saving data . 7 
)(Precautions 
 Take care of any data that might have been 
missed while transferring to Stata without 
StataTransfer 
 Make sure you label the variables and 
rename them in Stata after the insheet 
 Also check Stata infile command 
 Note that Stata10 reads directly from Excel 
by using the file icon in the Stata interface.
Labeling data, variables, . 8 
and values 
label data “This is Employment data” 
label variable employ "Employment Status” 
label define employed 0 “unemployed" 1 “employed” 
label values employ employed 
label define employed 2 "Other", add 
OR 
label define employed 0 "0: No" 1 "1: Yes" 2 "2: Other", 
modify
9. Describing and tabulating 
data 
1. An overview of data 
• The first step is to see the data (variables 
and observations) by the ‘list’ and 
‘describe’ commands. 
• See the labels of a variable in full name 
label list name 
NB! Here we type the name of the label list 
NOT the varname
9. Describing and tabulating 
data 
2. Summarizing data (descriptive statistics): 
 For quantitative data (numeric variables only) 
summarize 
 To show basic descriptives of var X: i.e. No. of obs., mean, 
st.dev., min, & max values 
sum X 
 To show detailed descriptives of var X: basic + percentiles, 
variance, and skewness 
sum X, detail
9. Describing and tabulating 
data 
3. Frequency tables 
tabulate X 
ta X, nolabel //shows codes NOT labels 
tab1 X1 X2 X3 X4 //for each one separately 
ta X, summarize(Y) //summarizes Y for each category of X
9. Describing and 
tabulating data 
4. Cross tabulations 
 Can take up to 2 variables: Y on rows, X on columns with totals: 
ta Y X 
ta X1 X2, row //displays row percentages (% for each category) 
ta X1 X2, row nofreq //displays row percentages without 
frequencies 
bysort X: tab Y, summarize(Z) missing 
//for each categ. of X (including the missing categ.), 
we tabulate Y and calculate basic descriptives of Z
9. Describing and 
tabulating data 
4. Crosstabs (cont’d) 
 Another command. More flexible in options esp. weights. 
Can take up to 3 variables, with Y as the rows and X2 as the columns 
for each category of X1. 
table Y X1 X2, row col //a new row and col. for totals 
table Y X, by(Z) //a separate table for Y on rows 
and X on columns for each category of Z
10. Data manipulation 
1. Creating case number (case id) 
generate id=_n 
2. Deleting existing variables/cases 
drop X //deletes variable X 
keep X Y Z //deletes all other variables 
drop if gov==1 //deletes all cases in this governorate 
drop if ~male //deletes females and missing values 
keep if age>=15 & age<=60 //deletes all other cases
10. Data manipulation 
3. Dealing with Variable Groups: 
 Grouping variables in a variable set 
global set1 “X1 X2 X3 X4” 
 When we use this variable set in any 
command, we call it by adding a $ before 
the name. For example: 
tab1 $set1
10. Data manipulation 
3. Dealing with Variable Groups: 
 The use of dash (-) 
for var X1-Y10: rename XX_2009 
//will be executed on variables X1 to Y10 
 The use of star (*): (previously discussed) 
describe X* des demo*99 
list *Y* list *W
10. Data manipulation 
4. Creating new variables (gen) 
generate y=1 
g z=1 if (x=5) 
gen samplesize=_N 
//column of a constant=total number of 
observations in the dataset (total sample size) 
bysort family: gen famsize=_N 
//column of constants=total number of observations in 
each family (add up to sample size)
10. Data manipulation 
 Creating new variables (gen) (cont’d) 
gen l_income=log(income) //natural log 
OR gen l_income=ln(income) //natural log 
gen loginc=log10(income) //base 10 log 
gen Y=sqrt(X) //get square-root of X 
gen Z=exp(Y) //get the exponential 
gen sqage=age^2 //get the square age 
gen XY=X*Y //interaction term 
gen lagYt = Yt[_n-1] //lagYt=Yt-1
10. Data manipulation 
Creating new variables (egen and its options) 
egen avage = mean(age) //mean age of sample (only 1 value) 
bysort hh: egen avg = mean(age) //mean age for every hh 
egen meddiff = median(var1-var2) // (exp, - means subtraction) 
median of the difference 
egen avginc = rowmean(W X Y Z) 
OR 
egen avginc = rowmean(W - Z) //(varlist, - means through) 
egen ttlsales = total(sales), by(region)
10. Data manipulation 
Dummy variable construction: 
 Manual (allowing missing values) 
gen female=0 if sex==2 
replace female=0 if sex==1 
 Automatic (NOT allowing missing values) 
gen married=(mrtst==2) //generate a dummy for married 
tab region, gen(region) 
//generate 6 dummy variables for the 6 regions 
 xi commands for categorical data 
xi: tab1 i.region 
//can be used with any command. Dummies not saved
11. Graphics 
 The commands that draw graphs are 
 command description 
 ------------------------------------------------------------------------------------------ 
 graph twoway scatterplots, line plots, etc. 
 graph matrix scatterplot matrices 
 graph bar bar charts 
 graph dot dot charts 
 graph box box-and-whisker plots 
 graph pie pie charts 
 other more commands to draw statistical graphs 
 ------------------------------------------------------------------------------------------
11. Graphics 
 The commands that save a previously drawn graph, 
redisplay previously saved graphs, and combine graphs are 
 command description 
 ------------------------------------------------------------------------- 
 graph save save graph to disk 
 graph use redisplay graph stored on disk 
 graph display redisplay graph stored in memory 
 graph combine combine graphs into one 
 -------------------------------------------------------------------------
11. Graphics 
1. Histograms 
histogram X // draws a histogram for variable X 
histogram X if male==1 in 1/1000 
//histogram for variable X for males only in the first 1000 cases 
histogram X, percentage normal 
// histogram for variable X along with the normal curve 
For more info on options: help histogram
11. Graphics 
2. Bar graphs 
graph bar X 
// draws a bar chart with vertical bars for variable X 
graph hbar Y 
// draws a bar chart with horizontal bars for variable Y 
For more info on options: help graph bar
11. Graphics 
3. Scatterplots 
graph twoway scatter Y X 
twoway scatter Y X 
scatter Y X 
The above three commands are equivalent.
11. Graphics 
graph twoway (scatter y1 x) (scatter y2 x) 
// draws a scatter plot for variable y1 against x and for y2 against x 
This is equivalent to typing 
OR twoway scatter y1 x || scatter y2 x 
graph twoway (scatter y x) (lfit y x) 
// draws a scatter plot for variable y against x and adds the linear prediction fit 
graph matrix X1 X2 X3 
// scatterplot matrices for the three variables together (two at a time) 
For more info on options: help scatter 
OR help graph_twoway
11. Graphics 
4. Line graphs 
graph twoway line Y X 
twoway line Y X 
line Y X 
The above three commands are equivalent. 
For more info on options: help line 
OR help graph_twoway
11. Graphics 
5. Labeling graph and graph axes 
scatter lexp region, title("Scatter plot") 
subtitle("Life expectancy at birth, US") 
yvarlabel("life expectancy") 
xvarlabel("Region") 
Note: The whole command should be written on 
one line.
11. Graphics 
6. Saving graphs 
This command is written directly after the graph that 
you wish to save: 
e.g. 
scatter yvar xvar 
graph save mygraph //save previous graph 
This will create the file mygraph.gph 
OR 
scatter yvar xvar, saving(mygraph) 
graph use mygraph //use saved graph
11. Graphics 
7. Combining graphs 
e.g. using lifeexp.dta: 
scatter lexp region, saving(figure1, replace) 
scatter gnppc region, saving(figure2, replace) 
graph combine figure1.gph figure2.gph, saving(byregion)
Further topics of interest. 12 
 It should be noted that data should be 
weighted to be representative of the 
population (help weight) 
 Stata can merge files (add variables 
from one file to another) and append 
files (add cases). 
(help merge) and (help append) 
 Numerous options are present with 
every command
Matrices. 13 
 To input matrix A: 
11 530 
550 32130 , we do the following: 
matrix input A=(11,530550,32130) 
mat list A // to show the matrix content 
mat define detA=det(A) // to get the determinant of A 
mat define invA=inv(A) // to get the inverse of A 
mat define transA=A’ // to get the transpose of A 
Mat D=A+B // to get the sum of A and B

More Related Content

What's hot

SPSS introduction Presentation
SPSS introduction Presentation SPSS introduction Presentation
SPSS introduction Presentation
befikra
 
INTRODUCTION TO STATA.pptx
INTRODUCTION TO STATA.pptxINTRODUCTION TO STATA.pptx
INTRODUCTION TO STATA.pptx
Dhananjaykumar464035
 
Data management through spss
Data management through spssData management through spss
Data management through spss
APPOLINAIRE BIZIMANA
 
Workshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelWorkshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate Level
Hiram Ting
 
SPSS
SPSSSPSS
STATA - Introduction
STATA - IntroductionSTATA - Introduction
STATA - Introductionstata_org_uk
 
SPSS an intro...
SPSS an intro...SPSS an intro...
SPSS an intro...Jithin Zcs
 
Introduction to STATA(2).pdf
Introduction to STATA(2).pdfIntroduction to STATA(2).pdf
Introduction to STATA(2).pdf
Yomif3
 
Stata statistics
Stata statisticsStata statistics
Stata statistics
izahn
 
Spss beginners
Spss beginnersSpss beginners
Spss beginners
Mbabazi Theos
 
Introduction to epi data
Introduction to epi dataIntroduction to epi data
Introduction to epi data
Zelalem Mehari Nigussie
 
Introduction To SPSS
Introduction To SPSSIntroduction To SPSS
Introduction To SPSSPhi Jack
 
SPSS How to use Spss software
SPSS How to use Spss softwareSPSS How to use Spss software
SPSS How to use Spss software
Debashis Baidya
 
Data analysis using spss
Data analysis using spssData analysis using spss
Data analysis using spss
Syed Faisal
 
introduction to spss
introduction to spssintroduction to spss
introduction to spssOmid Minooee
 
Spss
SpssSpss

What's hot (20)

SPSS introduction Presentation
SPSS introduction Presentation SPSS introduction Presentation
SPSS introduction Presentation
 
INTRODUCTION TO STATA.pptx
INTRODUCTION TO STATA.pptxINTRODUCTION TO STATA.pptx
INTRODUCTION TO STATA.pptx
 
(Manual spss)
(Manual spss)(Manual spss)
(Manual spss)
 
Data management through spss
Data management through spssData management through spss
Data management through spss
 
Workshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate LevelWorkshop on SPSS: Basic to Intermediate Level
Workshop on SPSS: Basic to Intermediate Level
 
SPSS
SPSSSPSS
SPSS
 
SPSS
SPSSSPSS
SPSS
 
STATA - Introduction
STATA - IntroductionSTATA - Introduction
STATA - Introduction
 
Spss tutorial 1
Spss tutorial 1Spss tutorial 1
Spss tutorial 1
 
SPSS an intro...
SPSS an intro...SPSS an intro...
SPSS an intro...
 
Introduction to STATA(2).pdf
Introduction to STATA(2).pdfIntroduction to STATA(2).pdf
Introduction to STATA(2).pdf
 
Stata statistics
Stata statisticsStata statistics
Stata statistics
 
Spss beginners
Spss beginnersSpss beginners
Spss beginners
 
Introduction to epi data
Introduction to epi dataIntroduction to epi data
Introduction to epi data
 
Introduction To SPSS
Introduction To SPSSIntroduction To SPSS
Introduction To SPSS
 
SPSS How to use Spss software
SPSS How to use Spss softwareSPSS How to use Spss software
SPSS How to use Spss software
 
Introduction to EpiData
Introduction to EpiDataIntroduction to EpiData
Introduction to EpiData
 
Data analysis using spss
Data analysis using spssData analysis using spss
Data analysis using spss
 
introduction to spss
introduction to spssintroduction to spss
introduction to spss
 
Spss
SpssSpss
Spss
 

Similar to Introduction to Stata

StataTutorial.pdf
StataTutorial.pdfStataTutorial.pdf
StataTutorial.pdf
GeorgeMgendi2
 
STATA_Training_for_data_science_juniors.pdf
STATA_Training_for_data_science_juniors.pdfSTATA_Training_for_data_science_juniors.pdf
STATA_Training_for_data_science_juniors.pdf
AronMozart1
 
SAS Commands
SAS CommandsSAS Commands
SAS Commands
Suvojyoti Chowdhury
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
santoshranjan77
 
Dressen-RSA-2019-preconference-data-workshop-copy.pptx
Dressen-RSA-2019-preconference-data-workshop-copy.pptxDressen-RSA-2019-preconference-data-workshop-copy.pptx
Dressen-RSA-2019-preconference-data-workshop-copy.pptx
AvneeshKumar164042
 
Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docx
fredharris32
 
Percentiles and Deciles
Percentiles and DecilesPercentiles and Deciles
Percentiles and Deciles
Mary Espinar
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Language
guest2160992
 
Day1, session ii&amp;iii- spss
Day1, session ii&amp;iii- spssDay1, session ii&amp;iii- spss
Day1, session ii&amp;iii- spss
abir hossain
 
Automating Test File Creation
Automating Test File CreationAutomating Test File Creation
Automating Test File Creation
bdebruin
 
spss presentation.pdf
spss presentation.pdfspss presentation.pdf
spss presentation.pdf
AbebeNega
 
Week 2 Project - STAT 3001Student Name Type your name here.docx
Week 2 Project - STAT 3001Student Name Type your name here.docxWeek 2 Project - STAT 3001Student Name Type your name here.docx
Week 2 Project - STAT 3001Student Name Type your name here.docx
cockekeshia
 
DATA HANDLING FOR SPSS
DATA HANDLING FOR SPSSDATA HANDLING FOR SPSS
DATA HANDLING FOR SPSS
IAEME Publication
 
Stata claass lecture
Stata claass lectureStata claass lecture
Stata claass lectureAkram Ali
 
Manual InnerSoft STATS
Manual InnerSoft STATSManual InnerSoft STATS
Manual InnerSoft STATS
InnerSoft
 
An introduction to STATA.pdf
An introduction to STATA.pdfAn introduction to STATA.pdf
An introduction to STATA.pdf
Md Nain
 
Unit 3
Unit 3Unit 3
Data Processing Using Quantum
Data Processing Using QuantumData Processing Using Quantum
Data Processing Using Quantumnibraspk
 
Office excel tips and tricks 201101
Office excel tips and tricks 201101Office excel tips and tricks 201101
Office excel tips and tricks 201101
Vishwanath Ramdas
 

Similar to Introduction to Stata (20)

StataTutorial.pdf
StataTutorial.pdfStataTutorial.pdf
StataTutorial.pdf
 
STATA_Training_for_data_science_juniors.pdf
STATA_Training_for_data_science_juniors.pdfSTATA_Training_for_data_science_juniors.pdf
STATA_Training_for_data_science_juniors.pdf
 
SAS Commands
SAS CommandsSAS Commands
SAS Commands
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
 
Dressen-RSA-2019-preconference-data-workshop-copy.pptx
Dressen-RSA-2019-preconference-data-workshop-copy.pptxDressen-RSA-2019-preconference-data-workshop-copy.pptx
Dressen-RSA-2019-preconference-data-workshop-copy.pptx
 
Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docx
 
Percentiles and Deciles
Percentiles and DecilesPercentiles and Deciles
Percentiles and Deciles
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Language
 
Day1, session ii&amp;iii- spss
Day1, session ii&amp;iii- spssDay1, session ii&amp;iii- spss
Day1, session ii&amp;iii- spss
 
Automating Test File Creation
Automating Test File CreationAutomating Test File Creation
Automating Test File Creation
 
spss presentation.pdf
spss presentation.pdfspss presentation.pdf
spss presentation.pdf
 
Week 2 Project - STAT 3001Student Name Type your name here.docx
Week 2 Project - STAT 3001Student Name Type your name here.docxWeek 2 Project - STAT 3001Student Name Type your name here.docx
Week 2 Project - STAT 3001Student Name Type your name here.docx
 
R stata
R stataR stata
R stata
 
DATA HANDLING FOR SPSS
DATA HANDLING FOR SPSSDATA HANDLING FOR SPSS
DATA HANDLING FOR SPSS
 
Stata claass lecture
Stata claass lectureStata claass lecture
Stata claass lecture
 
Manual InnerSoft STATS
Manual InnerSoft STATSManual InnerSoft STATS
Manual InnerSoft STATS
 
An introduction to STATA.pdf
An introduction to STATA.pdfAn introduction to STATA.pdf
An introduction to STATA.pdf
 
Unit 3
Unit 3Unit 3
Unit 3
 
Data Processing Using Quantum
Data Processing Using QuantumData Processing Using Quantum
Data Processing Using Quantum
 
Office excel tips and tricks 201101
Office excel tips and tricks 201101Office excel tips and tricks 201101
Office excel tips and tricks 201101
 

Recently uploaded

Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 

Recently uploaded (20)

Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 

Introduction to Stata

  • 1. Faculty of Economics and Political Science, Cairo University STATA9 :Instructor Samaa H. Hosny Ph.D. Candidate Sunday-Wednesday 10-14 May 2009
  • 2. :Section 1 Introduction and Overview
  • 3. 1. Stata interface: - windows, - icons vs. syntax, and - initial output 2. Stata community and website: www.stata.com
  • 4. Compare Stata with SPSS. 3  As per statistical capabilities, Stata can do a lot more than SPSS, i.e. more advanced.  SPSS is more inclined towards the business world.  Stata is more inclined towards the research community. It offers a helpful exchange of ideas and experience between its academic users.
  • 5. Compare Stata with SPSS . 3 ((cont’d  Can receive updates as well as ado files, i.e. you don’t need to wait for a new version to run new commands.  Compare the two websites: www.spss.com and www.stata.com  Join the mailing list of updates: send a message to majordomo@hsphsun2.harvard.edu and write:  subscribe statalist email@address  OR for daily summary you can write: subscribe statalist-digest email@address
  • 6. 4. Introducing the three Stata editions:  Stata/SE: special edition  Stata/IC: Intercooled  Stata/small: small (educational) 5. Two dimensions of data: cases and variables
  • 7. Stata/ SE Stata/IC /Stata small Max. no. of variables 32,766 2,047 99 Max. no. of observations 2,147,483,647 2,147,483,647 1,000 Max. no. of characters for a string variable 244 80 80 Matrices x 1,000 1,000 x 800 800 x 40 40
  • 8. 6. Help: Built-in (offline) / internet (online) - From Stata: Help >> Search >>Search all - findit keyword - Very helpful links at: http://www.stata.com/links/resources1.html 7. File Types: - Data file: filename.dta - Do file: filename.do - Ado file: filename.dta - Log file (only readable in Stata): filename.smcl - Log file (text file): filename.log
  • 9. Main tasks. 8 1. Accessing data 2. Entering data in Stata 3. Convert data (via StatTransfer) from formats such as text, Excel, SPSS, SAS, and other softwares 4. Save Stata format as a .dta file 5. File and data preparation 6. Descriptive statistics 7. Tabulating data: frequency tables and cross tabulations 8. Graphics 9. Data analysis 10. Preparing a report using Stata output: (creating a Word document)
  • 10. Good Practices. 9 1. Documentation within a program (using the *) 2. Intuitive variable names, labels, and file names 3. Avoid destroying or over-writing original data 4. Appropriate use of command abbreviation 5. Keeping a record of work: commands (do) and outputs (log)
  • 11. :Section 2 Getting Started (Hands-On Applications(
  • 12. Opening and closing Stata. 1 Like any other software: Start button>>Programs>>Stata OR Double click on the software icon on the desktop
  • 13. Memory issues: checking and . 2 setting memory capacity  To check memory status (default is 1m = 1 Megabytes) memory  To change the memory (needed for large data sets) set memory 750m set memory 750m,perm
  • 14. Current directory and . 3 changing it  The current directory is found in the status bar as a path below the 4 windows. To use and save files without the need to rewrite the whole required path every time we write a “use” or “save” command, we change the directory to the one we want to deal with directly as follows: cd D:StataData //since no spaces included in the name OR cd “D:StataData Files” //a space in the path name requires using quotes
  • 15. 4. Opening and closing files (data, log, and do)  Open file existing in any directory: use C:folderfilename.dta  Open file existing on the current directory: use filename  Open specific variables : use age region using C:folderfilename.dta  Open specific cases : use C:folderfilename.dta if male==1 //using a certain category use C:folderfilename.dta in 1/10 //using the first 10 cases only
  • 16. Preparing a report using . 5 Stata output: ((creating a Word document  Steps: 1. Open a log file to save all contents of the session (commands and outputs( using: log using filename.log //to have a text file not an .smcl file 2. Carry on all the analysis required, then write: log close 3. Open this file using any .txt reader and copy it to a Word file 4. Format the Word file with font: “Courier New” of size 8 or 9 to have exactly the same shape of output as in the Stata output window
  • 17. Viewing existing data in a . 6 data file  To view all existing variable names, specification, labels, number of variables and observations we use: describe d, fullnames //to avoid abbreviations in names  To namely select some of them we use: describe region1 region2  To select all those starting with the same letters (e.g. reg( we use: d reg* // describe all variables starting with reg-d *tion // describe all variables ending with -tion
  • 18. Viewing existing data in . 6 )a data file (cont’d  To view all existing observations: list  To view some selected observations: list in 1/5 //1st 5 observations  To view some selected variables: l age gov sex in 1/10 //1st 10 observations in the three variables l age gov sex if male==1 //only males in the three variables li X* //all variables starting with X
  • 19. Viewing existing data in . 6 )a data file (cont’d Important Notes!  To avoid running very long outputs in general, for example all the observations (in case of very large datasets( we can use: the Break icon in the toolbar or from the Keyboard: Ctrl+C at any time to stop getting more output from the same command.  To permanently switch off the –more- option between pages of output we type: set more off
  • 20. Entering and saving . 7 data 1. Manually through the keyboard: string variables should be specified as str before the varname (e.g. var3 is string of 9 places, it’s str9(: input var1 var2 str9 var3 val11 val12 “val13” . . . valN1 valN2 valN3 end
  • 21. Entering and saving . 7 data 2. Manually through the data editor  Enter values in the table cell by cell (where the cursor (colored cell( is.  Double click on the varname and edit its name, label, and format.
  • 22. Entering and saving . 7 data 3. Download or search for datasets by:  Typing in the command window: help datasets  searching www.stata.com for datasets  searching the internet
  • 23. Entering and saving . 7 data 4. Using StatTransfer to transfer any spreadsheet into Stata format (The best way in order not to lose any data( as well as maintaining all variable labels and storage types (in case the file was in SPSS or any other statistical package saving information about the variables(
  • 24. Entering and saving . 7 data 4. Save an Excel file with variable header (i.e. varnames in the first row(>>select all>> copy from Excel sheet >>highlight the upper left cell in Stata data editor>>paste 6. Save an Excel file using tab delimited format (.txt( without variable headers (i.e. all columns are values( Then type in Stata command window: insheet using Book1.txt
  • 25. Entering and saving data . 7 )(Precautions  Take care of any data that might have been missed while transferring to Stata without StataTransfer  Make sure you label the variables and rename them in Stata after the insheet  Also check Stata infile command  Note that Stata10 reads directly from Excel by using the file icon in the Stata interface.
  • 26. Labeling data, variables, . 8 and values label data “This is Employment data” label variable employ "Employment Status” label define employed 0 “unemployed" 1 “employed” label values employ employed label define employed 2 "Other", add OR label define employed 0 "0: No" 1 "1: Yes" 2 "2: Other", modify
  • 27. 9. Describing and tabulating data 1. An overview of data • The first step is to see the data (variables and observations) by the ‘list’ and ‘describe’ commands. • See the labels of a variable in full name label list name NB! Here we type the name of the label list NOT the varname
  • 28. 9. Describing and tabulating data 2. Summarizing data (descriptive statistics):  For quantitative data (numeric variables only) summarize  To show basic descriptives of var X: i.e. No. of obs., mean, st.dev., min, & max values sum X  To show detailed descriptives of var X: basic + percentiles, variance, and skewness sum X, detail
  • 29. 9. Describing and tabulating data 3. Frequency tables tabulate X ta X, nolabel //shows codes NOT labels tab1 X1 X2 X3 X4 //for each one separately ta X, summarize(Y) //summarizes Y for each category of X
  • 30. 9. Describing and tabulating data 4. Cross tabulations  Can take up to 2 variables: Y on rows, X on columns with totals: ta Y X ta X1 X2, row //displays row percentages (% for each category) ta X1 X2, row nofreq //displays row percentages without frequencies bysort X: tab Y, summarize(Z) missing //for each categ. of X (including the missing categ.), we tabulate Y and calculate basic descriptives of Z
  • 31. 9. Describing and tabulating data 4. Crosstabs (cont’d)  Another command. More flexible in options esp. weights. Can take up to 3 variables, with Y as the rows and X2 as the columns for each category of X1. table Y X1 X2, row col //a new row and col. for totals table Y X, by(Z) //a separate table for Y on rows and X on columns for each category of Z
  • 32. 10. Data manipulation 1. Creating case number (case id) generate id=_n 2. Deleting existing variables/cases drop X //deletes variable X keep X Y Z //deletes all other variables drop if gov==1 //deletes all cases in this governorate drop if ~male //deletes females and missing values keep if age>=15 & age<=60 //deletes all other cases
  • 33. 10. Data manipulation 3. Dealing with Variable Groups:  Grouping variables in a variable set global set1 “X1 X2 X3 X4”  When we use this variable set in any command, we call it by adding a $ before the name. For example: tab1 $set1
  • 34. 10. Data manipulation 3. Dealing with Variable Groups:  The use of dash (-) for var X1-Y10: rename XX_2009 //will be executed on variables X1 to Y10  The use of star (*): (previously discussed) describe X* des demo*99 list *Y* list *W
  • 35. 10. Data manipulation 4. Creating new variables (gen) generate y=1 g z=1 if (x=5) gen samplesize=_N //column of a constant=total number of observations in the dataset (total sample size) bysort family: gen famsize=_N //column of constants=total number of observations in each family (add up to sample size)
  • 36. 10. Data manipulation  Creating new variables (gen) (cont’d) gen l_income=log(income) //natural log OR gen l_income=ln(income) //natural log gen loginc=log10(income) //base 10 log gen Y=sqrt(X) //get square-root of X gen Z=exp(Y) //get the exponential gen sqage=age^2 //get the square age gen XY=X*Y //interaction term gen lagYt = Yt[_n-1] //lagYt=Yt-1
  • 37. 10. Data manipulation Creating new variables (egen and its options) egen avage = mean(age) //mean age of sample (only 1 value) bysort hh: egen avg = mean(age) //mean age for every hh egen meddiff = median(var1-var2) // (exp, - means subtraction) median of the difference egen avginc = rowmean(W X Y Z) OR egen avginc = rowmean(W - Z) //(varlist, - means through) egen ttlsales = total(sales), by(region)
  • 38. 10. Data manipulation Dummy variable construction:  Manual (allowing missing values) gen female=0 if sex==2 replace female=0 if sex==1  Automatic (NOT allowing missing values) gen married=(mrtst==2) //generate a dummy for married tab region, gen(region) //generate 6 dummy variables for the 6 regions  xi commands for categorical data xi: tab1 i.region //can be used with any command. Dummies not saved
  • 39. 11. Graphics  The commands that draw graphs are  command description  ------------------------------------------------------------------------------------------  graph twoway scatterplots, line plots, etc.  graph matrix scatterplot matrices  graph bar bar charts  graph dot dot charts  graph box box-and-whisker plots  graph pie pie charts  other more commands to draw statistical graphs  ------------------------------------------------------------------------------------------
  • 40. 11. Graphics  The commands that save a previously drawn graph, redisplay previously saved graphs, and combine graphs are  command description  -------------------------------------------------------------------------  graph save save graph to disk  graph use redisplay graph stored on disk  graph display redisplay graph stored in memory  graph combine combine graphs into one  -------------------------------------------------------------------------
  • 41. 11. Graphics 1. Histograms histogram X // draws a histogram for variable X histogram X if male==1 in 1/1000 //histogram for variable X for males only in the first 1000 cases histogram X, percentage normal // histogram for variable X along with the normal curve For more info on options: help histogram
  • 42. 11. Graphics 2. Bar graphs graph bar X // draws a bar chart with vertical bars for variable X graph hbar Y // draws a bar chart with horizontal bars for variable Y For more info on options: help graph bar
  • 43. 11. Graphics 3. Scatterplots graph twoway scatter Y X twoway scatter Y X scatter Y X The above three commands are equivalent.
  • 44. 11. Graphics graph twoway (scatter y1 x) (scatter y2 x) // draws a scatter plot for variable y1 against x and for y2 against x This is equivalent to typing OR twoway scatter y1 x || scatter y2 x graph twoway (scatter y x) (lfit y x) // draws a scatter plot for variable y against x and adds the linear prediction fit graph matrix X1 X2 X3 // scatterplot matrices for the three variables together (two at a time) For more info on options: help scatter OR help graph_twoway
  • 45. 11. Graphics 4. Line graphs graph twoway line Y X twoway line Y X line Y X The above three commands are equivalent. For more info on options: help line OR help graph_twoway
  • 46. 11. Graphics 5. Labeling graph and graph axes scatter lexp region, title("Scatter plot") subtitle("Life expectancy at birth, US") yvarlabel("life expectancy") xvarlabel("Region") Note: The whole command should be written on one line.
  • 47. 11. Graphics 6. Saving graphs This command is written directly after the graph that you wish to save: e.g. scatter yvar xvar graph save mygraph //save previous graph This will create the file mygraph.gph OR scatter yvar xvar, saving(mygraph) graph use mygraph //use saved graph
  • 48. 11. Graphics 7. Combining graphs e.g. using lifeexp.dta: scatter lexp region, saving(figure1, replace) scatter gnppc region, saving(figure2, replace) graph combine figure1.gph figure2.gph, saving(byregion)
  • 49. Further topics of interest. 12  It should be noted that data should be weighted to be representative of the population (help weight)  Stata can merge files (add variables from one file to another) and append files (add cases). (help merge) and (help append)  Numerous options are present with every command
  • 50. Matrices. 13  To input matrix A: 11 530 550 32130 , we do the following: matrix input A=(11,530550,32130) mat list A // to show the matrix content mat define detA=det(A) // to get the determinant of A mat define invA=inv(A) // to get the inverse of A mat define transA=A’ // to get the transpose of A Mat D=A+B // to get the sum of A and B