Stata claass lecture


Published on

Published in: Technology, Economy & Finance
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Stata claass lecture

  1. 1. Defination of Stata: Stata is a general purpose statistical package that is developed and maintained by Stata Corporation. There are several forms and flavors of stata such that- The standard intercooled Stata - The more limited small Stata - Stata/SE (Special Edition) which can handle large data - Stata/MP (Multiple Process)which runs in par up to 32 bit processor These Stata exists for windows (2000, XP, later version) and Onix platform and Macintosh. Windows of Stata: When Stata started a screen opens with the following for windows - Command: Here command are issued interactively - Result: Here result are displayed - Review: Here all command issued within the current Stata - Variables: Here variables of the current data set are listed There are some other windows in stata which can be selected from “window” menu. - Graph: The graphs or charts that constructed from a data set are displayed in graph window - Viewer: To get any help about stata commands with other instructions are summarized in a viewer window. So you can get help from this window about the problem you may face or specifically the commands with several options. - Variable Manager: Show the properties and others of the variable. - Data Editor: To show the values of each variables is simple the data of the current Stata session - Do file: It is useful to build up a file containing the commands necessary to carry out a particular data analysis. In the file, the commands can be written and run as a batch by clicking or the menu of the do file or by writing a command. - Log file: To save the results or output a special types of file are available in Stata which is named as log file. There are two types of log file- SMCL and Importing Data in Stata: Generally statistical offices and institutions produce a large amount of data on machines- reachable media or in the internet. In stata all of the data sets in different format are not reachable directly. Some of the data may available in other statistical packages format such as SAS, SPSS, MINITAB, R, GAUSS etc. These formats are not reachable, so can be converted by other third party software STAT data. Its own format with extension .dta. The American Standard Code for Information Interchange (ASCII) files can be read directly in Stata. The importantimportant importing methods are discussed below: 1. Reading Stata format data: If we are interested on data that is already in the STAT formats, then reading data is very easy as like other statistical software. We need to click on file, then go to specific directory and select the data file and then open.
  2. 2. Alternatively the data can be read by writing a command. For example we named data.dta have a data named c:userdesktopdata.dta Then the command will beusec:userdesktopdata.dta 2. Reading data from other formats: Many freely accessible data sets are available in SAS, SPSS, MINITAB, R, GUASS, Excel etc. Stata has no important filter for reading system files to other statistical packages except SAS export file format (.fda) and Haver Analytics Databases. The commands to read that particular data sets are fdause and haver respectively to read the other format or other system files we need to use a data conversion program. The most common program or software that are used to convert the data from one statistical packages format to others areStat/transfer by Circle system DBMS/Copy from Dataflux, a subsidiary of SAS instate The advantages of using a conversion program is that you can - Keep all of the varialbes - Keep all of the value labels that you had been assigned to previous format of the data file and - Even keep the missing values definition. 3. Reading ASCIItext files: Stata has three commands for reading ASCII files. The commands are: infile, insheetandinfix. The last two commands are simplified special cases of the “infile” command. Suppose we have the data in “spreadsheet format” that the type is ASCII, then the files are often tab delimited with file extension.txt or comma separated with extension .csv, then the commands to read the data areinfileusing data.csv, clear insheet using data.csv, clear insheet using filename.txt, clear infile var1, var2,…,varn using filenames insheet using filename.txt, clear infix using filename.txt if sex: “M” sex: “0” or in → If .csv Three major strengths of Stata: - Data Manipulation - Statistics - Graphics Data manipulation Stata is an excellent tool for data manipulation which includes- Moving data from one external source into the program. - Cleaning it up
  3. 3. - Generating new variables - Adding variables and value levels. - Generating summary data sets. - Merging and appending the data sets - Checking the merging errors - Collapsing cross-section time series data on either of its dimension - Reshaping the data sets and so on. Generally stata provide all of the answer of the question regarding data. Statistics In terms of statistics, stata provides all of the standard univariate, bivariate and multivariate statistical tools. From descriptive statistics, analysis and t-test through one, two and N-way. ANOVA, regression, principal components, time series, econometric analysis and so on. STATA regression is full, featured including regression diagnostic, prediction, robust estimation of standard errors instrumental variables and two-stage least square, seemingly unrelated regression, vector auto regression, error collection models. It has a very powerful set of technique for the analysis of limited dependent variable dependent variable including logit, probit, ordered logit and probit, multi-normal light and many more. Graphics Stata graphics are excellent tools for exploratory data analysis and can produce high quality 2D publication standard graphics several dozen different forms. Every aspect of graphics may be programmed and customized, and new graph types and graph schemes are being continuously produced. The basic programmability of graphics implies that a number of similar graph may be generated without any pointing or clicking to auto aspect of the graphs. Stata 12 provides support for “Control plots” and “heatmaps”. Creating and changing the variables 1. The commands generate and replace Age 30 29 21 48 45 21 19 22 generateagesq=age*age age^2 generateagedm=1 if age>30 replaceagedm=0 if age<=30 replaceagedm=0 if agedm≠1 geb var1=1 if age==30 gennewsew=0 if sex==1 replace new sex=1 if sex==0 generate inc1=income-r(mean) generateincomesum=sum(income)
  4. 4. inc1 inc2 inc3 *egen income=r sum(inc1 inc2 inc3) Set obs 100 *generate unit=r uniform () r normal () variable names: A-Z,a-z, 0-9 Changing codes by, -n and –N hh 1 2 3 4 5 6 7 8 9 10 Vill 1 1 1 1 1 2 2 2 2 2 Union hhage Generate inc1=sum (income) Bysvill: gen inc1=sum (income) Bysvill union: gen inc2= sum (income) sortvill union sortvill bys ill: gen count= -N or,count=count(vill) bysvill: gen index= -n keep if index=1 * Dealing with missing data: replaceinc= . ifhh= =9 & income= =3000000 replaceinc= 30000 if inc= =. &hh= =9 Reading & writing: ASCII Text .dta inc Index 1 2 3 4 5 1 2 3 4 5