Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Stata claass lecture
1. Defination of Stata:
Stata is a general purpose statistical package that is developed and maintained by Stata
Corporation. There are several forms and flavors of stata such that- The standard intercooled Stata
- The more limited small Stata
- Stata/SE (Special Edition) which can handle large data
- Stata/MP (Multiple Process)which runs in par up to 32 bit processor
These Stata exists for windows (2000, XP, later version) and Onix platform and
Macintosh.
Windows of Stata:
When Stata started a screen opens with the following for windows
- Command: Here command are issued interactively
- Result: Here result are displayed
- Review: Here all command issued within the current Stata
- Variables: Here variables of the current data set are listed
There are some other windows in stata which can be selected from “window” menu.
- Graph: The graphs or charts that constructed from a data set are displayed in
graph window
- Viewer: To get any help about stata commands with other instructions are
summarized in a viewer window. So you can get help from this window about the
problem you may face or specifically the commands with several options.
- Variable Manager: Show the properties and others of the variable.
- Data Editor: To show the values of each variables is simple the data of the
current Stata session
- Do file: It is useful to build up a file containing the commands necessary to
carry out a particular data analysis. In the file, the commands can be written and
run as a batch by clicking or the menu of the do file or by writing a command.
- Log file: To save the results or output a special types of file are available in
Stata which is named as log file. There are two types of log file- SMCL and
Importing Data in Stata:
Generally statistical offices and institutions produce a large amount of data on
machines- reachable media or in the internet. In stata all of the data sets in different
format are not reachable directly. Some of the data may available in other statistical
packages format such as SAS, SPSS, MINITAB, R, GAUSS etc. These formats are
not reachable, so can be converted by other third party software STAT data. Its own
format with extension .dta. The American Standard Code for Information Interchange
(ASCII) files can be read directly in Stata. The importantimportant importing
methods are discussed below:
1. Reading Stata format data:
If we are interested on data that is already in the STAT formats, then reading data
is very easy as like other statistical software. We need to click on file, then go to
specific directory and select the data file and then open.
2. Alternatively the data can be read by writing a command. For example we named
data.dta have a data named
c:userdesktopdata.dta
Then the command will beusec:userdesktopdata.dta
2. Reading data from other formats:
Many freely accessible data sets are available in SAS, SPSS, MINITAB, R,
GUASS, Excel etc. Stata has no important filter for reading system files to other
statistical packages except SAS export file format (.fda) and Haver Analytics
Databases.
The commands to read that particular data sets are fdause and haver respectively to
read the other format or other system files we need to use a data conversion program.
The most common program or software that are used to convert the data from one
statistical packages format to others areStat/transfer by Circle system
DBMS/Copy from Dataflux, a subsidiary of SAS instate
The advantages of using a conversion program is that you can
- Keep all of the varialbes
- Keep all of the value labels that you had been assigned to previous format of the
data file and
- Even keep the missing values definition.
3. Reading ASCIItext files:
Stata has three commands for reading ASCII files. The commands are:
infile, insheetandinfix. The last two commands are simplified special cases of the
“infile” command.
Suppose we have the data in “spreadsheet format” that the type is ASCII, then the
files are often tab delimited with file extension.txt or comma separated with extension
.csv, then the commands to read the data areinfileusing data.csv, clear
insheet using data.csv, clear
insheet using filename.txt, clear
infile var1, var2,…,varn using filenames
insheet using filename.txt, clear
infix using filename.txt if sex: “M”
sex: “0”
or in
→ If .csv
Three major strengths of Stata:
- Data Manipulation
- Statistics
- Graphics
Data manipulation
Stata is an excellent tool for data manipulation which includes- Moving data from one external source into the program.
- Cleaning it up
3. - Generating new variables
- Adding variables and value levels.
- Generating summary data sets.
- Merging and appending the data sets
- Checking the merging errors
- Collapsing cross-section time series data on either of its dimension
- Reshaping the data sets and so on.
Generally stata provide all of the answer of the question regarding data.
Statistics
In terms of statistics, stata provides all of the standard univariate, bivariate and
multivariate statistical tools. From descriptive statistics, analysis and t-test through one,
two and N-way. ANOVA, regression, principal components, time series, econometric
analysis and so on. STATA regression is full, featured including regression diagnostic,
prediction, robust estimation of standard errors instrumental variables and two-stage least
square, seemingly unrelated regression, vector auto regression, error collection models. It
has a very powerful set of technique for the analysis of limited dependent variable
dependent variable including logit, probit, ordered logit and probit, multi-normal light
and many more.
Graphics
Stata graphics are excellent tools for exploratory data analysis and can produce
high quality 2D publication standard graphics several dozen different forms. Every aspect
of graphics may be programmed and customized, and new graph types and graph
schemes are being continuously produced.
The basic programmability of graphics implies that a number of similar graph
may be generated without any pointing or clicking to auto aspect of the graphs. Stata 12
provides support for “Control plots” and “heatmaps”.
Creating and changing the variables
1. The commands generate and replace
Age
30
29
21
48
45
21
19
22
generateagesq=age*age
age^2
generateagedm=1 if age>30
replaceagedm=0 if age<=30
replaceagedm=0 if agedm≠1
geb var1=1 if age==30
gennewsew=0 if sex==1
replace new sex=1 if sex==0
generate inc1=income-r(mean)
generateincomesum=sum(income)
4. inc1 inc2 inc3
*egen income=r sum(inc1 inc2 inc3)
Set obs 100
*generate unit=r uniform ()
r normal ()
variable names:
A-Z,a-z, 0-9
Changing codes by, -n and –N
hh
1
2
3
4
5
6
7
8
9
10
Vill
1
1
1
1
1
2
2
2
2
2
Union
hhage
Generate inc1=sum (income)
Bysvill: gen inc1=sum (income)
Bysvill union: gen inc2= sum (income)
sortvill union
sortvill
bys ill: gen count= -N
or,count=count(vill)
bysvill: gen index= -n
keep if index=1
* Dealing with missing data:
replaceinc= . ifhh= =9 & income= =3000000
replaceinc= 30000 if inc= =. &hh= =9
Reading & writing:
ASCII
Text
.dta
inc
Index
1
2
3
4
5
1
2
3
4
5