SlideShare a Scribd company logo
Data Management in Stata

                             Ista Zahn

                                 IQSS


                         January 24 2013



                      The Institute
                      for Quantitative Social Science
                      at Harvard University


Ista Zahn (IQSS)        Data Management in Stata   January 24 2013   1 / 52
Outline

1   Introduction

2   Generating and replacing variables

3   By processing

4   Missing values

5   Variable types

6   Merging, appending, and joining

7   Creating summarized data sets

8   Wrap-up


      Ista Zahn (IQSS)       Data Management in Stata   January 24 2013   2 / 52
Introduction

Topic

1   Introduction

2   Generating and replacing variables

3   By processing

4   Missing values

5   Variable types

6   Merging, appending, and joining

7   Creating summarized data sets

8   Wrap-up


      Ista Zahn (IQSS)       Data Management in Stata   January 24 2013   3 / 52
Introduction

Copy the workshop materials to your home directory
   Log in to an Athena workstation using your Athena user name and
   password
   Click on the “Ubuntu” button on the upper-left and type “term” as
   shown below




   Click on the “Terminal” icon as shown above
   In the terminal, type this line exactly as shown:
cd; wget http://tinyurl.com/statadatman-zip; unzip statadatman-zip

   If you see “ERROR 404: Not Found”, then you mistyped the command
   – try again, making sure to type the command exactly as shown
   Ista Zahn (IQSS)       Data Management in Stata     January 24 2013   4 / 52
Introduction

Launch Stata on Athena



   To start Stata type these commands in the terminal:

    add stata
    xstata

   Open up today’s Stata script
         In Stata, go to Window => New do file => Open
         Locate and open the StatDatMan.do script in the StataDatMan folder
         in your home directory
   I encourage you to add your own notes to this file!




   Ista Zahn (IQSS)         Data Management in Stata     January 24 2013   5 / 52
Introduction

Workshop Description




   This is an Introduction to data management in Stata
   Assumes basic knowledge of Stata
   Not appropriate for people already well familiar with Stata
   If you are catching on before the rest of the class, experiment with
   command features described in help files




   Ista Zahn (IQSS)        Data Management in Stata      January 24 2013   6 / 52
Introduction

Organization




   Please feel free to ask questions at any point if they are relevant to
   the current topic (or if you are lost!)
   There will be a Q&A after class for more specific, personalized
   questions
   Collaboration with your neighbors is encouraged
   If you are using a laptop, you will need to adjust paths accordingly




   Ista Zahn (IQSS)        Data Management in Stata       January 24 2013   7 / 52
Introduction

Opening Files in Stata


   Look at bottom left hand corner of Stata screen
         This is the directory Stata is currently reading from
   Files are located in the StataDatMan folder on the Desktop
   Start by telling Stata where to look for these

    // change directory
    cd "C:/Users/dataclass/Desktop/StataDatMan"

    // Use dir to see what is in the directory:
    dir

    // use the gss data set
    use gss.dta




   Ista Zahn (IQSS)           Data Management in Stata           January 24 2013   8 / 52
Generating and replacing variables

Topic

1   Introduction

2   Generating and replacing variables

3   By processing

4   Missing values

5   Variable types

6   Merging, appending, and joining

7   Creating summarized data sets

8   Wrap-up


      Ista Zahn (IQSS)                  Data Management in Stata   January 24 2013   9 / 52
Generating and replacing variables

Logic Statements Useful Data Manipulation



        == equal to (status quo)
          = used in assigning values
         != not equal to
          > greater than
        >= greater than or equal to
          & and
            | or




   Ista Zahn (IQSS)                  Data Management in Stata   January 24 2013   10 / 52
Generating and replacing variables

Basic Data Manipulation Commands




  Basic commands you’ll use for generating new variables or recoding
  existing variables:
         egen
         replace
         recode
  Many different means of accomplishing the same thing in Stata
         Find what is comfortable (and easy) for you




   Ista Zahn (IQSS)                  Data Management in Stata   January 24 2013   11 / 52
Generating and replacing variables

Generate and Replace



   The gen command is often used with logic statements, as in this
   example:

   // create "hapnew" variable
   gen hapnew = . //set to missing
   //set to 1 if happy equals 1
   replace hapnew=1 if happy==1
   //set to 1 if happy and hapmar = 3
   replace hapnew=1 if happy>3 & hapmar>3
   //set to 3 if happy or hapmar = 4
   replace hapnew=3 if happy>4 | hapmar>4
   tab hapnew // tabulate the new variable




   Ista Zahn (IQSS)                  Data Management in Stata   January 24 2013   12 / 52
Generating and replacing variables

Recode



  The recode command is basically generate and replace combined
  You can recode an existing variable OR use recode to create a new
  variable
   // recode the wrkstat variable
   recode wrkstat (1=8) (2=7) (3=6) (4=5) (5=4) (6=3) (7=2) (8=1)
   // recode wrkstat into a new variable named wrkstat2
   recode wrkstat (1=8), gen(wrkstat2)
   // tabulate workstat
   tab wrkstat




   Ista Zahn (IQSS)                  Data Management in Stata   January 24 2013   13 / 52
Generating and replacing variables

Basic Rules for Recode




         Rule                     Example           Meaning
         #=#                      3=1               3 recoded to 1
         ##=#                     2. =9             2 and . recoded to 9
         #/# = #                  1/5=4             1 through 5 recoded to 4
         nonmissing=#             nonmiss=8         nonmissing recoded to 8
         missing=#                miss=9            missing recoded to 9




   Ista Zahn (IQSS)                  Data Management in Stata       January 24 2013   14 / 52
Generating and replacing variables

egen

   egen means “extension” to generate
   Contains a variety of more sophisticated functions
   Type “help egen” in Stata to get a complete list of functions
   Let’s create a new variable that counts the number of “yes” responses
   on computer, email and internet use:

   // count number of yes on comp email and interwebs
   egen compuser= anycount(usecomp usemail usenet), values(1)
   tab compuser
   // assess how much missing data each participant has:
   egen countmiss = rowmiss(age-wifeft)
   codebook countmiss
   // compare values on multiple variables
   egen ftdiff=diff(wkft//)
   codebook ftdiff


   Ista Zahn (IQSS)                  Data Management in Stata   January 24 2013   15 / 52
By processing

Topic

1   Introduction

2   Generating and replacing variables

3   By processing

4   Missing values

5   Variable types

6   Merging, appending, and joining

7   Creating summarized data sets

8   Wrap-up


      Ista Zahn (IQSS)       Data Management in Stata   January 24 2013   16 / 52
By processing

The “By” Command


  Sometimes, you’d like to create variables based on different categories
  of a single variable
        For example, say you want to look at happiness based on whether an
        individual is male or female
  The “by” prefix does just this:

   // tabulate happy separately for male and female
   bysort sex: tab happy
   // generate summary statistics using bysort
   bysort state: egen stateincome = mean(income)
   bysort degree: egen degreeincome = mean(income)
   bysort marital: egen marincomesd = sd(income)




  Ista Zahn (IQSS)         Data Management in Stata      January 24 2013   17 / 52
By processing

The “By” Command

  Some commands won’t work with by prefix, but have by options:

   // generate separate histograms for female and male
   hist happy, by(sex)




  Ista Zahn (IQSS)       Data Management in Stata   January 24 2013   18 / 52
Missing values

Topic

1   Introduction

2   Generating and replacing variables

3   By processing

4   Missing values

5   Variable types

6   Merging, appending, and joining

7   Creating summarized data sets

8   Wrap-up


      Ista Zahn (IQSS)        Data Management in Stata   January 24 2013   19 / 52
Missing values

Missing Values




   Always need to consider how missing values are coded when recoding
   variables
   Stata’s symbol for a missing value is “.”
   Stata interprets “.” as a large value
         What are implications of this?




   Ista Zahn (IQSS)           Data Management in Stata   January 24 2013   20 / 52
Missing values

Missing Values

   If we want to generate a new variable that identifies highly educated
   women, we might use the command:

   // generate and replace without considering missing values
   gen hi_ed=0
   replace hi_ed=1 if wifeduc>15
   // What happens to our missing values when we
   tab hi_ed, mi nola
   // Instead, we might try:
   drop hi_ed
   // gen hi_ed, but don’t set a value if wifeduc is missing
   gen hi_ed = 0 if wifeduc != .
   // only replace non-missing
   replace hi_ed=1 if wifeduc >15 & wifeduc !=.
   tab hi_ed, mi //check to see that missingness is preserved

   Note that you need the “mi” option to tab to view your missing data
   values
   Ista Zahn (IQSS)        Data Management in Stata    January 24 2013   21 / 52
Missing values

Missing Values


   What if you used a numeric value originally to code missing data (e.g.,
   “999”)?
   The mvdecode command will convert all these values to missing

   mvdecode _all, mv(999)

   The “_all” command tells Stata to do this to all variables
   Use this command carefully!
         If you have any variables where “999” is a legitimate value, Stata is
         going to recode it to missing
         As an alternative, you could list var names separately rather than using
         “_all” command




   Ista Zahn (IQSS)           Data Management in Stata       January 24 2013   22 / 52
Variable types

Topic

1   Introduction

2   Generating and replacing variables

3   By processing

4   Missing values

5   Variable types

6   Merging, appending, and joining

7   Creating summarized data sets

8   Wrap-up


      Ista Zahn (IQSS)        Data Management in Stata   January 24 2013   23 / 52
Variable types

Variable Types




   Stata uses two main types of variables: String and Numeric
   String variables are typically used for text variables
   To be able to perform any mathematical operations, your variables
   need to be in a numeric format




   Ista Zahn (IQSS)          Data Management in Stata       January 24 2013   24 / 52
Variable types

     Variable Types



        Stata’s numeric variable types:

1     type                 Minimum              Maximum    being 0     bytes
2     ----------------------------------------------------------------------
3     byte                    -127                  100    +/-1          1
4     int                  -32,767               32,740    +/-1          2
5     long          -2,147,483,647        2,147,483,620    +/-1          4
6     float   -1.70141173319*10^38 1.70141173319*10^38     +/-10^-38     4
7     double -8.9884656743*10^307 8.9884656743*10^307      +/-10^-323    8
8     ----------------------------------------------------------------------
9     Precision for float is 3.795x10^-8.
10    Precision for double is 1.414x10^-16.




         Ista Zahn (IQSS)       Data Management in Stata   January 24 2013   25 / 52
Variable types

Variable Types




   How can I deal with those annoying string variables?
   Sometimes you need to convert to/from string variables

   // not run; generate "newvar" equal to numeric version of var2
   destring var1, gen(newvar)
   // not run; convert var1 to string.
   tostring var1, gen(newvar)




   Ista Zahn (IQSS)        Data Management in Stata       January 24 2013   26 / 52
Variable types

Date and Time Variable Types




   Stata offers several options for date and time variables
   Generally, Stata will read date/time variables as strings
   You’ll need to convert string variables in order to perform any
   mathematical operations
   Once data is in date/time form, Stata uses several symbols to identify
   these variables
         %tc, %td, %tw, etc.




   Ista Zahn (IQSS)            Data Management in Stata   January 24 2013   27 / 52
Variable types

     Variable Types: Date and Time



1     Format String-to-numeric conversion function
2         -------+-----------------------------------------
3         %tc     clock(string, mask)
4         %tC     Clock(string, mask)
5         %td     date(string, mask)
6         %tw     weekly(string, mask)
7         %tm     monthly(string, mask)
8         %tq     quarterly(string, mask)
9         %th     halfyearly(string, mask)
10        %ty     yearly(string, mask)
11        %tg     no function necessary; read as numeric
12        -------------------------------------------------




         Ista Zahn (IQSS)       Data Management in Stata      January 24 2013   28 / 52
Variable types

Variable Types: Date and Time



    Format Meaning         Value = -1         Value = 0      Value = 1
    %tc    clock           31dec1959          01jan1960      01jan1960
                           23:59:59.999       00:00:00.000   00:00:00.001
    %td         days       31dec1959          01jan1960      02jan1960
    %tw         weeks      1959w52            1960w1         1960w2
    %tm         months     1959m12            1960m1         1960m2
    %tq         quarters 1959q4               1960q1         1960q2
    %th         half-years 1959h2             1960h1         1960h2
    %tg         generic    -1                 0              1




   Ista Zahn (IQSS)          Data Management in Stata        January 24 2013   29 / 52
Variable types

Variable Types: Date and Time


   To convert a string variable into date/time format, first select the
   date/time format you’ll be using (e.g., %tc, %td, %tw, etc.)
   Let’s say we create a string variable, today’s date (today) that we
   want to format
   // create string variable and convert to date
   gen today = "Feb 18, 2011"
   gen date1 = date(today, "MDY")
   tab date1
   // use the format command to change how the date is displayed
   // format so humans can read the date
   format date1 %d
   tab date1




   Ista Zahn (IQSS)         Data Management in Stata    January 24 2013   30 / 52
Variable types

Variable Types



   What if you have a variable “time” formatted as
   DDMMYYYYhhmmss?
   // Not run: conceptual example only
   generate double time2 = clock(time, "DMYHMS")
   tab time2
   format time2 %tc
   tab time2

   “double” command necessary for all clock formats
   basically tells Stata to allow a long string of characters




   Ista Zahn (IQSS)         Data Management in Stata      January 24 2013   31 / 52
Variable types

Exercise 1: Generate, Replace, Recode & Egen
Open the datafile, gss.dta.
  1   Generate a new variable that represents the squared value of age.
  2   Recode values “99” and “98” on the variable, “hrs1” as “missing.”
  3   Generate a new variable equal to “1” if income is greater than “19”.
  4   Recode the marital variable into a “string” variable and then back into
      a numeric variable.
  5   Create a new variable that counts the number of times a respondent
      answered “don’t know” in regard to the following variables: life,
      richwork, hapmar.
  6   Create a new variable that counts the number of missing responses for
      each respondent.
  7   Create a new variable that associates each individual with the average
      number of hours worked among individuals with matching educational
      degrees.
      Ista Zahn (IQSS)         Data Management in Stata    January 24 2013   32 / 52
Merging, appending, and joining

Topic

1   Introduction

2   Generating and replacing variables

3   By processing

4   Missing values

5   Variable types

6   Merging, appending, and joining

7   Creating summarized data sets

8   Wrap-up


      Ista Zahn (IQSS)                 Data Management in Stata   January 24 2013   33 / 52
Merging, appending, and joining

Merging Datasets




   Merge in Stata is for adding new variables from a second dataset to
   the dataset you’re currently working with
         Current active dataset = master dataset
         Dataset you’d like to merge with master = using dataset
   If you want to add OBSERVATIONS, you’d use “append” (we’ll go
   over that next)




   Ista Zahn (IQSS)                 Data Management in Stata   January 24 2013   34 / 52
Merging, appending, and joining

Merging Datasets




   Several different ways that you might be interested in merging data
         Two datasets with same participant pool, one row per participant (1:1)
         A dataset with one participant per row with a dataset with multiple
         rows per participant (1:many or many:1)




   Ista Zahn (IQSS)                 Data Management in Stata   January 24 2013   35 / 52
Merging, appending, and joining

Merging Datasets




   Stata will create a new variable (“_merge”) that describes the source
   of the data
         Use option, “nogenerate” if you don’t want _merge created
         Use option, “generate(varname)” to give merge your own variable name
   Need to add IDs to your dataset?

   // create a variable "id" equal to the row number
   generate id = _n




   Ista Zahn (IQSS)                 Data Management in Stata   January 24 2013   36 / 52
Merging, appending, and joining

Merging Datasets




   Before you begin:
         Identify the “ID” that you will use to merge your two datasets
         Determine which variables you’d like to merge
         In Stata >= 11, data does NOT have to be sorted
         Variable types must match across datasets
         Can use “force” option to get around this, but not recommended




   Ista Zahn (IQSS)                 Data Management in Stata   January 24 2013   37 / 52
Merging, appending, and joining

Merging Datasets



    Let’s say I want to perform a 1:1 merge using the dataset “data2” and
    the ID, “participant”
merge 1:1 participant using data2.dta
    Now, let’s say that I have one dataset with individual students
    (master) and another dataset with information about the students’
    schools called “school”
     // Not run: conceptual example only. Merge school and student data
     merge m:1 schoolID using school.dta




     Ista Zahn (IQSS)                 Data Management in Stata   January 24 2013   38 / 52
Merging, appending, and joining

Merging Datasets




   What if my school dataset was the master and my student dataset
   was the merging dataset?

   // Not run: conceptual example only.
   merge 1:m schoolID using student.dta

   It is also possible to do a many:many merge
         Data needs to be sorted in both the master and using datasets




   Ista Zahn (IQSS)                 Data Management in Stata   January 24 2013   39 / 52
Merging, appending, and joining

Merging Datasets



   Update and replace options:
         In standard merge, the master dataset is the authority and WON’T
         CHANGE
         What if your master dataset has missing data and some of those values
         are not missing in your using dataset?
         Specify “update”- it will fill in missing without changing nonmissing
   What if you want data from your using dataset to overwrite that in
   your master?
         Specify “replace update”- it will replace master data with using data
         UNLESS the value is missing in the using dataset




   Ista Zahn (IQSS)                 Data Management in Stata   January 24 2013   40 / 52
Merging, appending, and joining

Appending Datasets




   Sometimes, you’ll have observations in two different datasets, or you’d
   like to add observations to an existing dataset
   Append will simply add observations to the end of the observations in
   the master
   // Not run: conceptual example. add rows of data from dataset2
   append using dataset2




   Ista Zahn (IQSS)                 Data Management in Stata   January 24 2013   41 / 52
Merging, appending, and joining

Appending Datasets



   Some options with Append:
         generate(newvar) will create variable that identifies source of
         observation

   // Not run: conceptual example.
   append using dataset1, generate(observesource)

   “force” will allow for data type mismatches (again, this is not
   recommended)




   Ista Zahn (IQSS)                 Data Management in Stata   January 24 2013   42 / 52
Merging, appending, and joining

Joinby



   Merge will add new observations from using that do not appear in
   master
   Sometimes, you need to add variables from using but want to be sure
   the list of participants in your master does not change

   // Not run: conceptual exampe. Similar to merge but drops non-matches
   joinby participant using dataset1

   Any observations in using that are NOT in master will be omitted




   Ista Zahn (IQSS)                 Data Management in Stata   January 24 2013   43 / 52
Creating summarized data sets

Topic

1   Introduction

2   Generating and replacing variables

3   By processing

4   Missing values

5   Variable types

6   Merging, appending, and joining

7   Creating summarized data sets

8   Wrap-up


      Ista Zahn (IQSS)                Data Management in Stata   January 24 2013   44 / 52
Creating summarized data sets

Collapse




   Collapse will take master data and create a new dataset of summary
   statistics
   Useful in hierarchal linear modeling if you’d like to create aggregate,
   summary statistics
   Can generate group summary data for many descriptive stats
         Mean, media, sd, sum, min, max, percentiles, standard errors
   Can also attach weights




   Ista Zahn (IQSS)                Data Management in Stata   January 24 2013   45 / 52
Creating summarized data sets

Collapse




   Before you collapse
         Save your master dataset and then save it again under a new name
               This will prevent collapse from writing over your original data
         Consider issues of missing data. Do you want Stata to use all possible
         observations? If not:
               cw (casewise) option will make casewise deletions




   Ista Zahn (IQSS)                Data Management in Stata        January 24 2013   46 / 52
Creating summarized data sets

Collapse



   Let’s say you have a dataset with patient information from multiple
   hospitals
   You want to generate mean levels of patient satisfaction for EACH
   hospital

   // Not run: conceptual example. calculate average ptsatisfaction by ho
   save originaldata
   collapse (mean) ptsatisfaction, by(hospital)
   save hospitalcollapse




   Ista Zahn (IQSS)                Data Management in Stata   January 24 2013   47 / 52
Creating summarized data sets

Collapse



   You could also generate different statistics for multiple variables

   // create mean ptsatisfaction, median ptincome, sd ptsatisfaction for
   collapse (mean) ptsatisfaction (median) ptincome (sd) ptsatisfaction,

   What if you want to rename your new variables in this process?

   // Same as previous example, but rename variables
   collapse (mean) ptsatmean=ptsatisfaction (median) ptincmed=ptincome
    (sd) sdptsat=ptsatisfaction, by(hospital)




   Ista Zahn (IQSS)                Data Management in Stata   January 24 2013   48 / 52
Creating summarized data sets

Exercise 2: Merge, Append, and Joinby
Open the dataset, gss2.dta
  1   The gss2 dataset contains only half of the variables that are in the
      complete gss dataset. Merge dataset gss1 with dataset gss2. The
      identification variable is “id.”
  2   Open the dataset, gss.dta
  3   Merge in data from the “marital.dta” dataset, which includes income
      information grouped by individuals’ marital status. The marital
      dataset contains collapsed data regarding average statistics of
      individuals based on their marital status.
  4   Additional observations for the gssAppend.dta dataset can be found in
      “gssAddObserve.dta.” Create a new dataset that combines the
      observations in gssAppend.dta with those in gssAddObserve.dta.
  5   Create a new dataset that summarizes mean and standard deviation of
      income based on individuals’ degree status (“degree”). In the process
      of creating this new dataset, rename your three new variables.
      Ista Zahn (IQSS)                Data Management in Stata   January 24 2013   49 / 52
Wrap-up

Topic

1   Introduction

2   Generating and replacing variables

3   By processing

4   Missing values

5   Variable types

6   Merging, appending, and joining

7   Creating summarized data sets

8   Wrap-up


      Ista Zahn (IQSS)       Data Management in Stata   January 24 2013   50 / 52
Wrap-up

Help Us Make This Workshop Better




   Please take a moment to fill out a very short feedback form
   These workshops exist for you–tell us what you need!
   http://tinyurl.com/StataDatManFeedback




   Ista Zahn (IQSS)       Data Management in Stata    January 24 2013   51 / 52
Wrap-up

Additional resources



   training and consulting
         IQSS workshops:
         http://projects.iq.harvard.edu/rtc/filter_by/workshops
         IQSS statistical consulting: http://rtc.iq.harvard.edu
   Stata resources
         UCLA website: http://www.ats.ucla.edu/stat/Stata/
         Great for self-study
         Links to resources
   Stata website: http://www.stata.com/help.cgi?contents
   Email list: http://www.stata.com/statalist/




   Ista Zahn (IQSS)          Data Management in Stata      January 24 2013   52 / 52

More Related Content

What's hot

Data Analysis
Data AnalysisData Analysis
Data Analysis
sikander kushwaha
 
(Manual spss)
(Manual spss)(Manual spss)
(Manual spss)
Enas Ahmed
 
Data Analysis using SPSS: Part 1
Data Analysis using SPSS: Part 1Data Analysis using SPSS: Part 1
Data Analysis using SPSS: Part 1
Taddesse Kassahun
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
Manish Parihar
 
introduction to spss
introduction to spssintroduction to spss
introduction to spss
Omid Minooee
 
Software packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSSSoftware packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSS
ANAND BALAJI
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
Sabbir Tahmidur Rahman
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
ewhite00
 
Statistical distributions
Statistical distributionsStatistical distributions
Statistical distributions
TanveerRehman4
 
Time series analysis in Stata
Time series analysis in StataTime series analysis in Stata
Time series analysis in Stata
shahisec1
 
Introduction to Descriptive Statistics
Introduction to Descriptive StatisticsIntroduction to Descriptive Statistics
Introduction to Descriptive Statistics
Sanju Rusara Seneviratne
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
Seth Anandaram Jaipuria College
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
Neny Isharyanti
 
Statistical analysis using spss
Statistical analysis using spssStatistical analysis using spss
Statistical analysis using spss
jpcagphil
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Aileen Balbido
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
DrZahid Khan
 
Spss
SpssSpss
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and Statistics
T.S. Lim
 
SPSS How to use Spss software
SPSS How to use Spss softwareSPSS How to use Spss software
SPSS How to use Spss software
Debashis Baidya
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Dr Resu Neha Reddy
 

What's hot (20)

Data Analysis
Data AnalysisData Analysis
Data Analysis
 
(Manual spss)
(Manual spss)(Manual spss)
(Manual spss)
 
Data Analysis using SPSS: Part 1
Data Analysis using SPSS: Part 1Data Analysis using SPSS: Part 1
Data Analysis using SPSS: Part 1
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
introduction to spss
introduction to spssintroduction to spss
introduction to spss
 
Software packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSSSoftware packages for statistical analysis - SPSS
Software packages for statistical analysis - SPSS
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Statistical distributions
Statistical distributionsStatistical distributions
Statistical distributions
 
Time series analysis in Stata
Time series analysis in StataTime series analysis in Stata
Time series analysis in Stata
 
Introduction to Descriptive Statistics
Introduction to Descriptive StatisticsIntroduction to Descriptive Statistics
Introduction to Descriptive Statistics
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Statistical analysis using spss
Statistical analysis using spssStatistical analysis using spss
Statistical analysis using spss
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Spss
SpssSpss
Spss
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and Statistics
 
SPSS How to use Spss software
SPSS How to use Spss softwareSPSS How to use Spss software
SPSS How to use Spss software
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 

Viewers also liked

Model probit
Model probitModel probit
Model probit
Meina Zubayr
 
Probit and logit model
Probit and logit modelProbit and logit model
Probit and logit model
Jithmi Roddrigo
 
STATA - Panel Regressions
STATA - Panel RegressionsSTATA - Panel Regressions
STATA - Panel Regressions
stata_org_uk
 
Logical Framework And Project Proposal
Logical Framework And Project ProposalLogical Framework And Project Proposal
Logical Framework And Project Proposal
rexcris
 
Graphing stata (2 hour course)
Graphing stata (2 hour course)Graphing stata (2 hour course)
Graphing stata (2 hour course)
izahn
 
STATA - Graphing Data
STATA - Graphing DataSTATA - Graphing Data
STATA - Graphing Data
stata_org_uk
 
Probit analysis in toxicological studies
Probit analysis in toxicological studies Probit analysis in toxicological studies
Probit analysis in toxicological studies
kunthavai Nachiyar
 
Probit analysis
Probit analysisProbit analysis
Probit analysis
Pramod935
 
STATA - Probit Analysis
STATA - Probit AnalysisSTATA - Probit Analysis
STATA - Probit Analysis
stata_org_uk
 
STATA - Download Examples
STATA - Download ExamplesSTATA - Download Examples
STATA - Download Examples
stata_org_uk
 
STATA - Introduction
STATA - IntroductionSTATA - Introduction
STATA - Introduction
stata_org_uk
 
STATA - Time Series Analysis
STATA - Time Series AnalysisSTATA - Time Series Analysis
STATA - Time Series Analysis
stata_org_uk
 
Monitoring and Evaluation Framework
Monitoring and Evaluation FrameworkMonitoring and Evaluation Framework
Monitoring and Evaluation Framework
Dr. Joy Kenneth Sala Biasong
 
Project Monitoring & Evaluation
Project Monitoring & EvaluationProject Monitoring & Evaluation
Project Monitoring & Evaluation
Srinivasan Rengasamy
 
Monitoring & evaluation presentation[1]
Monitoring & evaluation presentation[1]Monitoring & evaluation presentation[1]
Monitoring & evaluation presentation[1]
skzarif
 

Viewers also liked (15)

Model probit
Model probitModel probit
Model probit
 
Probit and logit model
Probit and logit modelProbit and logit model
Probit and logit model
 
STATA - Panel Regressions
STATA - Panel RegressionsSTATA - Panel Regressions
STATA - Panel Regressions
 
Logical Framework And Project Proposal
Logical Framework And Project ProposalLogical Framework And Project Proposal
Logical Framework And Project Proposal
 
Graphing stata (2 hour course)
Graphing stata (2 hour course)Graphing stata (2 hour course)
Graphing stata (2 hour course)
 
STATA - Graphing Data
STATA - Graphing DataSTATA - Graphing Data
STATA - Graphing Data
 
Probit analysis in toxicological studies
Probit analysis in toxicological studies Probit analysis in toxicological studies
Probit analysis in toxicological studies
 
Probit analysis
Probit analysisProbit analysis
Probit analysis
 
STATA - Probit Analysis
STATA - Probit AnalysisSTATA - Probit Analysis
STATA - Probit Analysis
 
STATA - Download Examples
STATA - Download ExamplesSTATA - Download Examples
STATA - Download Examples
 
STATA - Introduction
STATA - IntroductionSTATA - Introduction
STATA - Introduction
 
STATA - Time Series Analysis
STATA - Time Series AnalysisSTATA - Time Series Analysis
STATA - Time Series Analysis
 
Monitoring and Evaluation Framework
Monitoring and Evaluation FrameworkMonitoring and Evaluation Framework
Monitoring and Evaluation Framework
 
Project Monitoring & Evaluation
Project Monitoring & EvaluationProject Monitoring & Evaluation
Project Monitoring & Evaluation
 
Monitoring & evaluation presentation[1]
Monitoring & evaluation presentation[1]Monitoring & evaluation presentation[1]
Monitoring & evaluation presentation[1]
 

Similar to Data management in Stata

Stata datman
Stata datmanStata datman
Stata datman
izahn
 
Spss tutorial 1
Spss tutorial 1Spss tutorial 1
Spss tutorial 1
debataraja
 
Important SAS Tips and Tricks for A Grade
Important SAS Tips and Tricks for A GradeImportant SAS Tips and Tricks for A Grade
Important SAS Tips and Tricks for A Grade
Lesa Cote
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
santoshranjan77
 
Metopen 6
Metopen 6Metopen 6
Metopen 6
Ali Murfi
 
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19   156   Page .docxIMG1.jpgIMG2.jpgIMG3.jpg2016 6 19   156   Page .docx
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
wilcockiris
 
StataTutorial.pdf
StataTutorial.pdfStataTutorial.pdf
StataTutorial.pdf
GeorgeMgendi2
 
Writing maintainable Oracle PL/SQL code
Writing maintainable Oracle PL/SQL codeWriting maintainable Oracle PL/SQL code
Writing maintainable Oracle PL/SQL code
Donald Bales
 
Writing Maintainable Code
Writing Maintainable CodeWriting Maintainable Code
Writing Maintainable Code
Donald Bales
 
Stata tutorial
Stata tutorialStata tutorial
Stata tutorial
Patrick Elyanu
 
Data Creation and Importing in IBM SPSS
Data Creation and Importing in IBM SPSSData Creation and Importing in IBM SPSS
Data Creation and Importing in IBM SPSS
Thiyagu K
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
butest
 
10 class.pptx
10 class.pptx10 class.pptx
10 class.pptx
poonam yadav
 
This is the official tutorial from Oracle.httpdocs.oracle.comj.pdf
This is the official tutorial from Oracle.httpdocs.oracle.comj.pdfThis is the official tutorial from Oracle.httpdocs.oracle.comj.pdf
This is the official tutorial from Oracle.httpdocs.oracle.comj.pdf
jillisacebi75827
 
SAS for Beginners
SAS for BeginnersSAS for Beginners
SAS for Beginners
Colby Stoever
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
guest9d79e073
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
Mark Ginnebaugh
 
Pig - Analyzing data sets
Pig - Analyzing data setsPig - Analyzing data sets
Pig - Analyzing data sets
Creditas
 
Data structures cs301 power point slides lecture 01
Data structures   cs301 power point slides lecture 01Data structures   cs301 power point slides lecture 01
Data structures cs301 power point slides lecture 01
shaziabibi5
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
Russell Jurney
 

Similar to Data management in Stata (20)

Stata datman
Stata datmanStata datman
Stata datman
 
Spss tutorial 1
Spss tutorial 1Spss tutorial 1
Spss tutorial 1
 
Important SAS Tips and Tricks for A Grade
Important SAS Tips and Tricks for A GradeImportant SAS Tips and Tricks for A Grade
Important SAS Tips and Tricks for A Grade
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
 
Metopen 6
Metopen 6Metopen 6
Metopen 6
 
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19   156   Page .docxIMG1.jpgIMG2.jpgIMG3.jpg2016 6 19   156   Page .docx
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
 
StataTutorial.pdf
StataTutorial.pdfStataTutorial.pdf
StataTutorial.pdf
 
Writing maintainable Oracle PL/SQL code
Writing maintainable Oracle PL/SQL codeWriting maintainable Oracle PL/SQL code
Writing maintainable Oracle PL/SQL code
 
Writing Maintainable Code
Writing Maintainable CodeWriting Maintainable Code
Writing Maintainable Code
 
Stata tutorial
Stata tutorialStata tutorial
Stata tutorial
 
Data Creation and Importing in IBM SPSS
Data Creation and Importing in IBM SPSSData Creation and Importing in IBM SPSS
Data Creation and Importing in IBM SPSS
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
 
10 class.pptx
10 class.pptx10 class.pptx
10 class.pptx
 
This is the official tutorial from Oracle.httpdocs.oracle.comj.pdf
This is the official tutorial from Oracle.httpdocs.oracle.comj.pdfThis is the official tutorial from Oracle.httpdocs.oracle.comj.pdf
This is the official tutorial from Oracle.httpdocs.oracle.comj.pdf
 
SAS for Beginners
SAS for BeginnersSAS for Beginners
SAS for Beginners
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Pig - Analyzing data sets
Pig - Analyzing data setsPig - Analyzing data sets
Pig - Analyzing data sets
 
Data structures cs301 power point slides lecture 01
Data structures   cs301 power point slides lecture 01Data structures   cs301 power point slides lecture 01
Data structures cs301 power point slides lecture 01
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 

Recently uploaded

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 

Recently uploaded (20)

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 

Data management in Stata

  • 1. Data Management in Stata Ista Zahn IQSS January 24 2013 The Institute for Quantitative Social Science at Harvard University Ista Zahn (IQSS) Data Management in Stata January 24 2013 1 / 52
  • 2. Outline 1 Introduction 2 Generating and replacing variables 3 By processing 4 Missing values 5 Variable types 6 Merging, appending, and joining 7 Creating summarized data sets 8 Wrap-up Ista Zahn (IQSS) Data Management in Stata January 24 2013 2 / 52
  • 3. Introduction Topic 1 Introduction 2 Generating and replacing variables 3 By processing 4 Missing values 5 Variable types 6 Merging, appending, and joining 7 Creating summarized data sets 8 Wrap-up Ista Zahn (IQSS) Data Management in Stata January 24 2013 3 / 52
  • 4. Introduction Copy the workshop materials to your home directory Log in to an Athena workstation using your Athena user name and password Click on the “Ubuntu” button on the upper-left and type “term” as shown below Click on the “Terminal” icon as shown above In the terminal, type this line exactly as shown: cd; wget http://tinyurl.com/statadatman-zip; unzip statadatman-zip If you see “ERROR 404: Not Found”, then you mistyped the command – try again, making sure to type the command exactly as shown Ista Zahn (IQSS) Data Management in Stata January 24 2013 4 / 52
  • 5. Introduction Launch Stata on Athena To start Stata type these commands in the terminal: add stata xstata Open up today’s Stata script In Stata, go to Window => New do file => Open Locate and open the StatDatMan.do script in the StataDatMan folder in your home directory I encourage you to add your own notes to this file! Ista Zahn (IQSS) Data Management in Stata January 24 2013 5 / 52
  • 6. Introduction Workshop Description This is an Introduction to data management in Stata Assumes basic knowledge of Stata Not appropriate for people already well familiar with Stata If you are catching on before the rest of the class, experiment with command features described in help files Ista Zahn (IQSS) Data Management in Stata January 24 2013 6 / 52
  • 7. Introduction Organization Please feel free to ask questions at any point if they are relevant to the current topic (or if you are lost!) There will be a Q&A after class for more specific, personalized questions Collaboration with your neighbors is encouraged If you are using a laptop, you will need to adjust paths accordingly Ista Zahn (IQSS) Data Management in Stata January 24 2013 7 / 52
  • 8. Introduction Opening Files in Stata Look at bottom left hand corner of Stata screen This is the directory Stata is currently reading from Files are located in the StataDatMan folder on the Desktop Start by telling Stata where to look for these // change directory cd "C:/Users/dataclass/Desktop/StataDatMan" // Use dir to see what is in the directory: dir // use the gss data set use gss.dta Ista Zahn (IQSS) Data Management in Stata January 24 2013 8 / 52
  • 9. Generating and replacing variables Topic 1 Introduction 2 Generating and replacing variables 3 By processing 4 Missing values 5 Variable types 6 Merging, appending, and joining 7 Creating summarized data sets 8 Wrap-up Ista Zahn (IQSS) Data Management in Stata January 24 2013 9 / 52
  • 10. Generating and replacing variables Logic Statements Useful Data Manipulation == equal to (status quo) = used in assigning values != not equal to > greater than >= greater than or equal to & and | or Ista Zahn (IQSS) Data Management in Stata January 24 2013 10 / 52
  • 11. Generating and replacing variables Basic Data Manipulation Commands Basic commands you’ll use for generating new variables or recoding existing variables: egen replace recode Many different means of accomplishing the same thing in Stata Find what is comfortable (and easy) for you Ista Zahn (IQSS) Data Management in Stata January 24 2013 11 / 52
  • 12. Generating and replacing variables Generate and Replace The gen command is often used with logic statements, as in this example: // create "hapnew" variable gen hapnew = . //set to missing //set to 1 if happy equals 1 replace hapnew=1 if happy==1 //set to 1 if happy and hapmar = 3 replace hapnew=1 if happy>3 & hapmar>3 //set to 3 if happy or hapmar = 4 replace hapnew=3 if happy>4 | hapmar>4 tab hapnew // tabulate the new variable Ista Zahn (IQSS) Data Management in Stata January 24 2013 12 / 52
  • 13. Generating and replacing variables Recode The recode command is basically generate and replace combined You can recode an existing variable OR use recode to create a new variable // recode the wrkstat variable recode wrkstat (1=8) (2=7) (3=6) (4=5) (5=4) (6=3) (7=2) (8=1) // recode wrkstat into a new variable named wrkstat2 recode wrkstat (1=8), gen(wrkstat2) // tabulate workstat tab wrkstat Ista Zahn (IQSS) Data Management in Stata January 24 2013 13 / 52
  • 14. Generating and replacing variables Basic Rules for Recode Rule Example Meaning #=# 3=1 3 recoded to 1 ##=# 2. =9 2 and . recoded to 9 #/# = # 1/5=4 1 through 5 recoded to 4 nonmissing=# nonmiss=8 nonmissing recoded to 8 missing=# miss=9 missing recoded to 9 Ista Zahn (IQSS) Data Management in Stata January 24 2013 14 / 52
  • 15. Generating and replacing variables egen egen means “extension” to generate Contains a variety of more sophisticated functions Type “help egen” in Stata to get a complete list of functions Let’s create a new variable that counts the number of “yes” responses on computer, email and internet use: // count number of yes on comp email and interwebs egen compuser= anycount(usecomp usemail usenet), values(1) tab compuser // assess how much missing data each participant has: egen countmiss = rowmiss(age-wifeft) codebook countmiss // compare values on multiple variables egen ftdiff=diff(wkft//) codebook ftdiff Ista Zahn (IQSS) Data Management in Stata January 24 2013 15 / 52
  • 16. By processing Topic 1 Introduction 2 Generating and replacing variables 3 By processing 4 Missing values 5 Variable types 6 Merging, appending, and joining 7 Creating summarized data sets 8 Wrap-up Ista Zahn (IQSS) Data Management in Stata January 24 2013 16 / 52
  • 17. By processing The “By” Command Sometimes, you’d like to create variables based on different categories of a single variable For example, say you want to look at happiness based on whether an individual is male or female The “by” prefix does just this: // tabulate happy separately for male and female bysort sex: tab happy // generate summary statistics using bysort bysort state: egen stateincome = mean(income) bysort degree: egen degreeincome = mean(income) bysort marital: egen marincomesd = sd(income) Ista Zahn (IQSS) Data Management in Stata January 24 2013 17 / 52
  • 18. By processing The “By” Command Some commands won’t work with by prefix, but have by options: // generate separate histograms for female and male hist happy, by(sex) Ista Zahn (IQSS) Data Management in Stata January 24 2013 18 / 52
  • 19. Missing values Topic 1 Introduction 2 Generating and replacing variables 3 By processing 4 Missing values 5 Variable types 6 Merging, appending, and joining 7 Creating summarized data sets 8 Wrap-up Ista Zahn (IQSS) Data Management in Stata January 24 2013 19 / 52
  • 20. Missing values Missing Values Always need to consider how missing values are coded when recoding variables Stata’s symbol for a missing value is “.” Stata interprets “.” as a large value What are implications of this? Ista Zahn (IQSS) Data Management in Stata January 24 2013 20 / 52
  • 21. Missing values Missing Values If we want to generate a new variable that identifies highly educated women, we might use the command: // generate and replace without considering missing values gen hi_ed=0 replace hi_ed=1 if wifeduc>15 // What happens to our missing values when we tab hi_ed, mi nola // Instead, we might try: drop hi_ed // gen hi_ed, but don’t set a value if wifeduc is missing gen hi_ed = 0 if wifeduc != . // only replace non-missing replace hi_ed=1 if wifeduc >15 & wifeduc !=. tab hi_ed, mi //check to see that missingness is preserved Note that you need the “mi” option to tab to view your missing data values Ista Zahn (IQSS) Data Management in Stata January 24 2013 21 / 52
  • 22. Missing values Missing Values What if you used a numeric value originally to code missing data (e.g., “999”)? The mvdecode command will convert all these values to missing mvdecode _all, mv(999) The “_all” command tells Stata to do this to all variables Use this command carefully! If you have any variables where “999” is a legitimate value, Stata is going to recode it to missing As an alternative, you could list var names separately rather than using “_all” command Ista Zahn (IQSS) Data Management in Stata January 24 2013 22 / 52
  • 23. Variable types Topic 1 Introduction 2 Generating and replacing variables 3 By processing 4 Missing values 5 Variable types 6 Merging, appending, and joining 7 Creating summarized data sets 8 Wrap-up Ista Zahn (IQSS) Data Management in Stata January 24 2013 23 / 52
  • 24. Variable types Variable Types Stata uses two main types of variables: String and Numeric String variables are typically used for text variables To be able to perform any mathematical operations, your variables need to be in a numeric format Ista Zahn (IQSS) Data Management in Stata January 24 2013 24 / 52
  • 25. Variable types Variable Types Stata’s numeric variable types: 1 type Minimum Maximum being 0 bytes 2 ---------------------------------------------------------------------- 3 byte -127 100 +/-1 1 4 int -32,767 32,740 +/-1 2 5 long -2,147,483,647 2,147,483,620 +/-1 4 6 float -1.70141173319*10^38 1.70141173319*10^38 +/-10^-38 4 7 double -8.9884656743*10^307 8.9884656743*10^307 +/-10^-323 8 8 ---------------------------------------------------------------------- 9 Precision for float is 3.795x10^-8. 10 Precision for double is 1.414x10^-16. Ista Zahn (IQSS) Data Management in Stata January 24 2013 25 / 52
  • 26. Variable types Variable Types How can I deal with those annoying string variables? Sometimes you need to convert to/from string variables // not run; generate "newvar" equal to numeric version of var2 destring var1, gen(newvar) // not run; convert var1 to string. tostring var1, gen(newvar) Ista Zahn (IQSS) Data Management in Stata January 24 2013 26 / 52
  • 27. Variable types Date and Time Variable Types Stata offers several options for date and time variables Generally, Stata will read date/time variables as strings You’ll need to convert string variables in order to perform any mathematical operations Once data is in date/time form, Stata uses several symbols to identify these variables %tc, %td, %tw, etc. Ista Zahn (IQSS) Data Management in Stata January 24 2013 27 / 52
  • 28. Variable types Variable Types: Date and Time 1 Format String-to-numeric conversion function 2 -------+----------------------------------------- 3 %tc clock(string, mask) 4 %tC Clock(string, mask) 5 %td date(string, mask) 6 %tw weekly(string, mask) 7 %tm monthly(string, mask) 8 %tq quarterly(string, mask) 9 %th halfyearly(string, mask) 10 %ty yearly(string, mask) 11 %tg no function necessary; read as numeric 12 ------------------------------------------------- Ista Zahn (IQSS) Data Management in Stata January 24 2013 28 / 52
  • 29. Variable types Variable Types: Date and Time Format Meaning Value = -1 Value = 0 Value = 1 %tc clock 31dec1959 01jan1960 01jan1960 23:59:59.999 00:00:00.000 00:00:00.001 %td days 31dec1959 01jan1960 02jan1960 %tw weeks 1959w52 1960w1 1960w2 %tm months 1959m12 1960m1 1960m2 %tq quarters 1959q4 1960q1 1960q2 %th half-years 1959h2 1960h1 1960h2 %tg generic -1 0 1 Ista Zahn (IQSS) Data Management in Stata January 24 2013 29 / 52
  • 30. Variable types Variable Types: Date and Time To convert a string variable into date/time format, first select the date/time format you’ll be using (e.g., %tc, %td, %tw, etc.) Let’s say we create a string variable, today’s date (today) that we want to format // create string variable and convert to date gen today = "Feb 18, 2011" gen date1 = date(today, "MDY") tab date1 // use the format command to change how the date is displayed // format so humans can read the date format date1 %d tab date1 Ista Zahn (IQSS) Data Management in Stata January 24 2013 30 / 52
  • 31. Variable types Variable Types What if you have a variable “time” formatted as DDMMYYYYhhmmss? // Not run: conceptual example only generate double time2 = clock(time, "DMYHMS") tab time2 format time2 %tc tab time2 “double” command necessary for all clock formats basically tells Stata to allow a long string of characters Ista Zahn (IQSS) Data Management in Stata January 24 2013 31 / 52
  • 32. Variable types Exercise 1: Generate, Replace, Recode & Egen Open the datafile, gss.dta. 1 Generate a new variable that represents the squared value of age. 2 Recode values “99” and “98” on the variable, “hrs1” as “missing.” 3 Generate a new variable equal to “1” if income is greater than “19”. 4 Recode the marital variable into a “string” variable and then back into a numeric variable. 5 Create a new variable that counts the number of times a respondent answered “don’t know” in regard to the following variables: life, richwork, hapmar. 6 Create a new variable that counts the number of missing responses for each respondent. 7 Create a new variable that associates each individual with the average number of hours worked among individuals with matching educational degrees. Ista Zahn (IQSS) Data Management in Stata January 24 2013 32 / 52
  • 33. Merging, appending, and joining Topic 1 Introduction 2 Generating and replacing variables 3 By processing 4 Missing values 5 Variable types 6 Merging, appending, and joining 7 Creating summarized data sets 8 Wrap-up Ista Zahn (IQSS) Data Management in Stata January 24 2013 33 / 52
  • 34. Merging, appending, and joining Merging Datasets Merge in Stata is for adding new variables from a second dataset to the dataset you’re currently working with Current active dataset = master dataset Dataset you’d like to merge with master = using dataset If you want to add OBSERVATIONS, you’d use “append” (we’ll go over that next) Ista Zahn (IQSS) Data Management in Stata January 24 2013 34 / 52
  • 35. Merging, appending, and joining Merging Datasets Several different ways that you might be interested in merging data Two datasets with same participant pool, one row per participant (1:1) A dataset with one participant per row with a dataset with multiple rows per participant (1:many or many:1) Ista Zahn (IQSS) Data Management in Stata January 24 2013 35 / 52
  • 36. Merging, appending, and joining Merging Datasets Stata will create a new variable (“_merge”) that describes the source of the data Use option, “nogenerate” if you don’t want _merge created Use option, “generate(varname)” to give merge your own variable name Need to add IDs to your dataset? // create a variable "id" equal to the row number generate id = _n Ista Zahn (IQSS) Data Management in Stata January 24 2013 36 / 52
  • 37. Merging, appending, and joining Merging Datasets Before you begin: Identify the “ID” that you will use to merge your two datasets Determine which variables you’d like to merge In Stata >= 11, data does NOT have to be sorted Variable types must match across datasets Can use “force” option to get around this, but not recommended Ista Zahn (IQSS) Data Management in Stata January 24 2013 37 / 52
  • 38. Merging, appending, and joining Merging Datasets Let’s say I want to perform a 1:1 merge using the dataset “data2” and the ID, “participant” merge 1:1 participant using data2.dta Now, let’s say that I have one dataset with individual students (master) and another dataset with information about the students’ schools called “school” // Not run: conceptual example only. Merge school and student data merge m:1 schoolID using school.dta Ista Zahn (IQSS) Data Management in Stata January 24 2013 38 / 52
  • 39. Merging, appending, and joining Merging Datasets What if my school dataset was the master and my student dataset was the merging dataset? // Not run: conceptual example only. merge 1:m schoolID using student.dta It is also possible to do a many:many merge Data needs to be sorted in both the master and using datasets Ista Zahn (IQSS) Data Management in Stata January 24 2013 39 / 52
  • 40. Merging, appending, and joining Merging Datasets Update and replace options: In standard merge, the master dataset is the authority and WON’T CHANGE What if your master dataset has missing data and some of those values are not missing in your using dataset? Specify “update”- it will fill in missing without changing nonmissing What if you want data from your using dataset to overwrite that in your master? Specify “replace update”- it will replace master data with using data UNLESS the value is missing in the using dataset Ista Zahn (IQSS) Data Management in Stata January 24 2013 40 / 52
  • 41. Merging, appending, and joining Appending Datasets Sometimes, you’ll have observations in two different datasets, or you’d like to add observations to an existing dataset Append will simply add observations to the end of the observations in the master // Not run: conceptual example. add rows of data from dataset2 append using dataset2 Ista Zahn (IQSS) Data Management in Stata January 24 2013 41 / 52
  • 42. Merging, appending, and joining Appending Datasets Some options with Append: generate(newvar) will create variable that identifies source of observation // Not run: conceptual example. append using dataset1, generate(observesource) “force” will allow for data type mismatches (again, this is not recommended) Ista Zahn (IQSS) Data Management in Stata January 24 2013 42 / 52
  • 43. Merging, appending, and joining Joinby Merge will add new observations from using that do not appear in master Sometimes, you need to add variables from using but want to be sure the list of participants in your master does not change // Not run: conceptual exampe. Similar to merge but drops non-matches joinby participant using dataset1 Any observations in using that are NOT in master will be omitted Ista Zahn (IQSS) Data Management in Stata January 24 2013 43 / 52
  • 44. Creating summarized data sets Topic 1 Introduction 2 Generating and replacing variables 3 By processing 4 Missing values 5 Variable types 6 Merging, appending, and joining 7 Creating summarized data sets 8 Wrap-up Ista Zahn (IQSS) Data Management in Stata January 24 2013 44 / 52
  • 45. Creating summarized data sets Collapse Collapse will take master data and create a new dataset of summary statistics Useful in hierarchal linear modeling if you’d like to create aggregate, summary statistics Can generate group summary data for many descriptive stats Mean, media, sd, sum, min, max, percentiles, standard errors Can also attach weights Ista Zahn (IQSS) Data Management in Stata January 24 2013 45 / 52
  • 46. Creating summarized data sets Collapse Before you collapse Save your master dataset and then save it again under a new name This will prevent collapse from writing over your original data Consider issues of missing data. Do you want Stata to use all possible observations? If not: cw (casewise) option will make casewise deletions Ista Zahn (IQSS) Data Management in Stata January 24 2013 46 / 52
  • 47. Creating summarized data sets Collapse Let’s say you have a dataset with patient information from multiple hospitals You want to generate mean levels of patient satisfaction for EACH hospital // Not run: conceptual example. calculate average ptsatisfaction by ho save originaldata collapse (mean) ptsatisfaction, by(hospital) save hospitalcollapse Ista Zahn (IQSS) Data Management in Stata January 24 2013 47 / 52
  • 48. Creating summarized data sets Collapse You could also generate different statistics for multiple variables // create mean ptsatisfaction, median ptincome, sd ptsatisfaction for collapse (mean) ptsatisfaction (median) ptincome (sd) ptsatisfaction, What if you want to rename your new variables in this process? // Same as previous example, but rename variables collapse (mean) ptsatmean=ptsatisfaction (median) ptincmed=ptincome (sd) sdptsat=ptsatisfaction, by(hospital) Ista Zahn (IQSS) Data Management in Stata January 24 2013 48 / 52
  • 49. Creating summarized data sets Exercise 2: Merge, Append, and Joinby Open the dataset, gss2.dta 1 The gss2 dataset contains only half of the variables that are in the complete gss dataset. Merge dataset gss1 with dataset gss2. The identification variable is “id.” 2 Open the dataset, gss.dta 3 Merge in data from the “marital.dta” dataset, which includes income information grouped by individuals’ marital status. The marital dataset contains collapsed data regarding average statistics of individuals based on their marital status. 4 Additional observations for the gssAppend.dta dataset can be found in “gssAddObserve.dta.” Create a new dataset that combines the observations in gssAppend.dta with those in gssAddObserve.dta. 5 Create a new dataset that summarizes mean and standard deviation of income based on individuals’ degree status (“degree”). In the process of creating this new dataset, rename your three new variables. Ista Zahn (IQSS) Data Management in Stata January 24 2013 49 / 52
  • 50. Wrap-up Topic 1 Introduction 2 Generating and replacing variables 3 By processing 4 Missing values 5 Variable types 6 Merging, appending, and joining 7 Creating summarized data sets 8 Wrap-up Ista Zahn (IQSS) Data Management in Stata January 24 2013 50 / 52
  • 51. Wrap-up Help Us Make This Workshop Better Please take a moment to fill out a very short feedback form These workshops exist for you–tell us what you need! http://tinyurl.com/StataDatManFeedback Ista Zahn (IQSS) Data Management in Stata January 24 2013 51 / 52
  • 52. Wrap-up Additional resources training and consulting IQSS workshops: http://projects.iq.harvard.edu/rtc/filter_by/workshops IQSS statistical consulting: http://rtc.iq.harvard.edu Stata resources UCLA website: http://www.ats.ucla.edu/stat/Stata/ Great for self-study Links to resources Stata website: http://www.stata.com/help.cgi?contents Email list: http://www.stata.com/statalist/ Ista Zahn (IQSS) Data Management in Stata January 24 2013 52 / 52