This is a short introduction course to Stata statistical software version 9. The course still applies to later versions of Stata, too. The course duration was 9 hours. It has been given at the Faculty of Economics and Political Science, Cairo University.
Learn how to navigate Stata’s graphical user interface, create log files, and import data from a variety of software packages. Includes tips for getting started with Stata including the creation and organization of do-files, examining descriptive statistics, and managing data and value labels. This workshop is designed for individuals who have little or no experience using Stata software.
Full workshop materials including example data sets and .do file are available at http://projects.iq.harvard.edu/rtc/event/introduction-stata
Introduces common data management techniques in Stata. Topics covered include basic data manipulation commands such as: recoding variables, creating new variables, working with missing data, and generating variables based on complex selection criteria, merging and collapsing data sets. Intended for users who have an introductory level of knowledge of Stata software.
All workshop materials including slides, do files, and example data sets can be downloaded from http://projects.iq.harvard.edu/rtc/event/data-management-stata
Learn how to navigate Stata’s graphical user interface, create log files, and import data from a variety of software packages. Includes tips for getting started with Stata including the creation and organization of do-files, examining descriptive statistics, and managing data and value labels. This workshop is designed for individuals who have little or no experience using Stata software.
Full workshop materials including example data sets and .do file are available at http://projects.iq.harvard.edu/rtc/event/introduction-stata
Introduces common data management techniques in Stata. Topics covered include basic data manipulation commands such as: recoding variables, creating new variables, working with missing data, and generating variables based on complex selection criteria, merging and collapsing data sets. Intended for users who have an introductory level of knowledge of Stata software.
All workshop materials including slides, do files, and example data sets can be downloaded from http://projects.iq.harvard.edu/rtc/event/data-management-stata
Topics for the class include multiple regression, dummy variables, interaction effects, hypothesis tests, and model diagnostics. Prerequisites include a general familiarity with Stata, including importing and managing datasets and data exploration, the linear regression model, and the ordinary least squares estimation.
Workshop materials including do files and example data sets are available from http://projects.iq.harvard.edu/rtc/event/regression-stata
SPSS for beginners, a short course about how novices can use SPSS to analyze their research findings. With this tutorial anyone becomes able to use SPSS for basic statistical analysis. No need to be a professional to use SPSS.
Topics for the class include multiple regression, dummy variables, interaction effects, hypothesis tests, and model diagnostics. Prerequisites include a general familiarity with Stata, including importing and managing datasets and data exploration, the linear regression model, and the ordinary least squares estimation.
Workshop materials including do files and example data sets are available from http://projects.iq.harvard.edu/rtc/event/regression-stata
SPSS for beginners, a short course about how novices can use SPSS to analyze their research findings. With this tutorial anyone becomes able to use SPSS for basic statistical analysis. No need to be a professional to use SPSS.
Article link httpiveybusinessjournal.compublicationmanaging-.docxfredharris32
Article link: http://iveybusinessjournal.com/publication/managing-global-risk-to-seize-competitive-advantage/
Requirements: Write one summary and study note both no longer than one pages should include all point of article. Then do a PPT and write a presenting paper only for 5 minutes.
Groups of students will create and offer two MS PowerPoint presentation summarizing the main points of one of the readings for this course along with a one page handout for the students in the class. The aim of the presentations and the handouts is to provide the audience with the main ideas of the article and study notes. Groups will bring to class enough copies of the handout for each student in the class. The handout should list the name of the author, the title of the article, the title of the journal, and the publication date and page numbers along with a summary of its main points. Please do not exceed one page for this material.
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.StringTokenizer;
/**
* Read a .dat file and reverse it.
*/
public class Reverse {
public static void main(String[]args) {
if (args.length != 3) {
System.err.println(" Incorrect number of arguments");
System.err.println(" Usage: ");
System.err.
println("\tjava Reverse <stack type> <input file> <output file>");
System.exit(1);
}
boolean useList = true;
if (args[0].compareTo("list")==0)
useList = true;
else if (args[0].compareTo("array")==0)
useList = false;
else {
System.err.println("\tSaw "+args[0]+" instead of list or array as first argument");
System.exit(1);
}
try {
//
// Set up the input file to read, and the output file to write to
//
BufferedReader fileIn =
new BufferedReader(new FileReader(args[1]));
PrintWriter fileOut =
new PrintWriter(new
BufferedWriter(new FileWriter(args[2])));
//
// Read the first line of the .dat file to get sample rate.
// We want to store the sample rate value in a variable,
// but we can ignore the "; Sample Rate" part of the line.
// Step through the first line one token (word) at a time
// using the StringTokenizer. The fourth token is the one
// we want (the sample rate).
//
StringTokenizer str;
String oneLine;
int sampleRate;
String strJunk;
oneLine = fileIn.readLine();
str = new StringTokenizer(oneLine);
strJunk = str.nextToken(); // Read in semicolon
strJunk = str.nextToken(); // Read in "Sample"
strJunk = str.nextToken(); // Read in "Rate"
// ...
Learning
Base SAS,
Advanced SAS,
Proc SQl,
ODS,
SAS in financial industry,
Clinical trials,
SAS Macros,
SAS BI,
SAS on Unix,
SAS on Mainframe,
SAS interview Questions and Answers,
SAS Tips and Techniques,
SAS Resources,
SAS Certification questions...
visit http://sastechies.blogspot.com
Week 2 Project - STAT 3001Student Name Type your name here.docxcockekeshia
Week 2 Project - STAT 3001
Student Name: <Type your name here>
Date: <Enter the date on which you began working on this assignment.>
Instructions: To complete this project, you will need the following materials:
· STATDISK User Manual (found in the classroom in DocSharing)
· Access to the Internet to download the STATDISK program.
This assignment is worth a total of 60 points.
Part I. Histograms and Frequency Tables
Instructions
Answers
1. Open the file Diamonds using menu option Datasets and then Elementary Stats, 9th Edition. This file contains some information about diamonds. What are the names of the variables in this file?
2. Create a histogram for the depth of the diamonds using the Auto-fit option. Paste the chart here. Once your histogram displays, click Turn on Labels to get the height of the bars.
3. Using the information in the above histogram, complete this table. Be sure to include frequency, relative frequency, and cumulative frequency.
Depth
Frequency
Relative Frequency
Cumulative Frequency
57-58.9
59-60.9
61-62.9
63-64.9
a. Using the frequency table above, how many of the diamonds have a depth of 60.9 or less? How do you know?
b. Using the frequency table above, how many of the diamonds have a depth between 59 and 62.9? Show your work.
c. What percent of the diamonds have a depth of 61 or more?
Part II. Comparing Datasets
Instructions
Answers
1. Create a boxplot that compares the color and clarity of the diamonds. Paste it here.
2. Describe the similarities and differences in the data sets. Please be specific to the graph created.
Part III. Finding Descriptive Numbers
Instructions
Answers
3. Open the file named Stowaway (using Datasets and then Elementary Stats, 9th Edition). This gives information on the number of stowaways going west vs east.List all the variables in the dataset.
4. Find the Mean, median, and midrange for the Data in Column 1.
5. Find the Range, variance, and standard deviation for the first column.
6. List any values for the first column that you think may be outliers. Why do you think that?
[Hint: You may want to sort the data and look at the smallest and largest values.]
7. Find the Mean, median, and midrange for the data in Column 2.
8. Find the Range, variance, and standard deviation for the data in Column 2.
9. List any values for the second column that you think may be outliers. Why do you think that?
10. Find the five-number summary for the stowaways data in Columns 1 and 2. You will need to label each of the columns with an appropriate measure in the top row for clarity.
11. Compare number of stowaways going west and east using a boxplot of Columns 1 and 2. Paste your boxplot here
12. Create a histogram for the
Column 1 data and paste it here.
13. Create a histogram for the
Column 2 data and paste it here.
Part IV. Interpreting Statistical Information
The Stowaway data contains two columns, both of which are mea.
Before conducting any statistical analysis one must have the data in tractable form for reliable and organized analysis. Whatever procedure used to do this is termed as “Data Handelling” and if we are working with SPSS then it is termed as “Data Mining”. Data mining is analysis step of knowledge Discovery in database or Data mining is an interdisciplinary subfield of computer engineering..Data Mining helps discovering of pattern in large datasets. The main moto of Data Mining is to extract information from a dataset and transform it into an knowledge that will helpful for further use. Beside from raw step analysis. In this paper Data Handling, is proposed for SPSS.
InnerSoft STATS is a Descriptive Statistics Application. InnerSoft STATS compute statistics for parameter estimation and Statistical hypothesis testing. Descriptive Statistics: Mean, Variance, Standard deviation, Coefficient of variation, Quartiles, Percentiles, Skewness, Kurtosis, Mode, Interquartile range, Sum of Squares. One-Sample Test: One-sample z-test, One-sample t-test, Chi-squared test for variance.Two-Sample Test: Student's t-test for Independent samples (pooled t-test for equal variances and unpooled t-test for unequal variances), Student's t-test for Paired samples, Two-sample F-test of equality of variances.One-Way ANOVA with multiple comparisons methods: Scheffe, Tukey HSD, Sidak, Fisher LSD, Bonferroni. Welch’s Test for equality of means, Brown–Forsythe Test for equality of means. Homoscedasticity Test: Levene's Test, Brown–Forsythe Test for equality of variances, Bartlett's Test
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
3. 1. Stata interface:
- windows,
- icons vs. syntax, and
- initial output
2. Stata community and website:
www.stata.com
4. Compare Stata with SPSS. 3
As per statistical capabilities, Stata can do a
lot more than SPSS, i.e. more advanced.
SPSS is more inclined towards the business
world.
Stata is more inclined towards the research
community. It offers a helpful exchange of
ideas and experience between its academic
users.
5. Compare Stata with SPSS . 3
((cont’d
Can receive updates as well as ado files, i.e. you
don’t need to wait for a new version to run new
commands.
Compare the two websites: www.spss.com and
www.stata.com
Join the mailing list of updates: send a message to
majordomo@hsphsun2.harvard.edu and write:
subscribe statalist email@address
OR for daily summary you can write:
subscribe statalist-digest email@address
6. 4. Introducing the three Stata
editions:
Stata/SE: special edition
Stata/IC: Intercooled
Stata/small: small (educational)
5. Two dimensions of data: cases and
variables
7. Stata/ SE Stata/IC /Stata
small
Max. no. of
variables
32,766 2,047 99
Max. no. of
observations
2,147,483,647 2,147,483,647 1,000
Max. no. of
characters for
a string
variable
244 80 80
Matrices x 1,000 1,000 x 800 800 x 40 40
8. 6. Help: Built-in (offline) / internet (online)
- From Stata: Help >> Search >>Search all
- findit keyword
- Very helpful links at:
http://www.stata.com/links/resources1.html
7. File Types:
- Data file: filename.dta
- Do file: filename.do
- Ado file: filename.dta
- Log file (only readable in Stata): filename.smcl
- Log file (text file): filename.log
9. Main tasks. 8
1. Accessing data
2. Entering data in Stata
3. Convert data (via StatTransfer) from formats such
as text, Excel, SPSS, SAS, and other softwares
4. Save Stata format as a .dta file
5. File and data preparation
6. Descriptive statistics
7. Tabulating data: frequency tables and cross
tabulations
8. Graphics
9. Data analysis
10. Preparing a report using Stata output: (creating a
Word document)
10. Good Practices. 9
1. Documentation within a program (using the *)
2. Intuitive variable names, labels, and file names
3. Avoid destroying or over-writing original data
4. Appropriate use of command abbreviation
5. Keeping a record of work: commands (do) and
outputs (log)
12. Opening and closing Stata. 1
Like any other software:
Start button>>Programs>>Stata
OR
Double click on the software icon on the
desktop
13. Memory issues: checking and . 2
setting memory capacity
To check memory status (default is 1m = 1
Megabytes)
memory
To change the memory (needed for large
data sets)
set memory 750m
set memory 750m,perm
14. Current directory and . 3
changing it
The current directory is found in the status bar as a path
below the 4 windows. To use and save files without the
need to rewrite the whole required path every time we
write a “use” or “save” command, we change the directory
to the one we want to deal with directly as follows:
cd D:StataData //since no spaces included in the name
OR
cd “D:StataData Files” //a space in the path name
requires using quotes
15. 4. Opening and closing files
(data, log, and do)
Open file existing in any directory:
use C:folderfilename.dta
Open file existing on the current directory:
use filename
Open specific variables :
use age region using C:folderfilename.dta
Open specific cases :
use C:folderfilename.dta if male==1 //using a certain category
use C:folderfilename.dta in 1/10 //using the first 10 cases only
16. Preparing a report using . 5
Stata output:
((creating a Word document
Steps:
1. Open a log file to save all contents of the session (commands and
outputs( using:
log using filename.log //to have a text file not an .smcl file
2. Carry on all the analysis required, then write:
log close
3. Open this file using any .txt reader and copy it to a Word file
4. Format the Word file with font: “Courier New” of size 8 or 9 to have
exactly the same shape of output as in the Stata output window
17. Viewing existing data in a . 6
data file
To view all existing variable names, specification, labels, number of
variables and observations we use:
describe
d, fullnames //to avoid abbreviations in names
To namely select some of them we use:
describe region1 region2
To select all those starting with the same letters (e.g. reg( we use:
d reg* // describe all variables starting with reg-d
*tion // describe all variables ending with -tion
18. Viewing existing data in . 6
)a data file (cont’d
To view all existing observations:
list
To view some selected observations:
list in 1/5 //1st 5 observations
To view some selected variables:
l age gov sex in 1/10 //1st 10 observations in the three variables
l age gov sex if male==1 //only males in the three variables
li X* //all variables starting with X
19. Viewing existing data in . 6
)a data file (cont’d
Important Notes!
To avoid running very long outputs in general, for
example all the observations (in case of very large
datasets( we can use: the Break icon in the toolbar
or from the Keyboard: Ctrl+C at any time to stop
getting more output from the same command.
To permanently switch off the –more- option
between pages of output we type:
set more off
20. Entering and saving . 7
data
1. Manually through the keyboard: string variables
should be specified as str before the varname (e.g.
var3 is string of 9 places, it’s str9(:
input var1 var2 str9 var3
val11 val12 “val13”
.
.
.
valN1 valN2 valN3
end
21. Entering and saving . 7
data
2. Manually through the data editor
Enter values in the table cell by cell (where the
cursor (colored cell( is.
Double click on the varname and edit its name,
label, and format.
22. Entering and saving . 7
data
3. Download or search for datasets by:
Typing in the command window:
help datasets
searching www.stata.com for datasets
searching the internet
23. Entering and saving . 7
data
4. Using StatTransfer to transfer any
spreadsheet into Stata format (The
best way in order not to lose any
data( as well as maintaining all
variable labels and storage types (in
case the file was in SPSS or any
other statistical package saving
information about the variables(
24. Entering and saving . 7
data
4. Save an Excel file with variable header (i.e.
varnames in the first row(>>select all>> copy
from Excel sheet >>highlight the upper left cell in
Stata data editor>>paste
6. Save an Excel file using tab delimited format
(.txt( without variable headers (i.e. all columns
are values(
Then type in Stata command window:
insheet using Book1.txt
25. Entering and saving data . 7
)(Precautions
Take care of any data that might have been
missed while transferring to Stata without
StataTransfer
Make sure you label the variables and
rename them in Stata after the insheet
Also check Stata infile command
Note that Stata10 reads directly from Excel
by using the file icon in the Stata interface.
27. 9. Describing and tabulating
data
1. An overview of data
• The first step is to see the data (variables
and observations) by the ‘list’ and
‘describe’ commands.
• See the labels of a variable in full name
label list name
NB! Here we type the name of the label list
NOT the varname
28. 9. Describing and tabulating
data
2. Summarizing data (descriptive statistics):
For quantitative data (numeric variables only)
summarize
To show basic descriptives of var X: i.e. No. of obs., mean,
st.dev., min, & max values
sum X
To show detailed descriptives of var X: basic + percentiles,
variance, and skewness
sum X, detail
29. 9. Describing and tabulating
data
3. Frequency tables
tabulate X
ta X, nolabel //shows codes NOT labels
tab1 X1 X2 X3 X4 //for each one separately
ta X, summarize(Y) //summarizes Y for each category of X
30. 9. Describing and
tabulating data
4. Cross tabulations
Can take up to 2 variables: Y on rows, X on columns with totals:
ta Y X
ta X1 X2, row //displays row percentages (% for each category)
ta X1 X2, row nofreq //displays row percentages without
frequencies
bysort X: tab Y, summarize(Z) missing
//for each categ. of X (including the missing categ.),
we tabulate Y and calculate basic descriptives of Z
31. 9. Describing and
tabulating data
4. Crosstabs (cont’d)
Another command. More flexible in options esp. weights.
Can take up to 3 variables, with Y as the rows and X2 as the columns
for each category of X1.
table Y X1 X2, row col //a new row and col. for totals
table Y X, by(Z) //a separate table for Y on rows
and X on columns for each category of Z
32. 10. Data manipulation
1. Creating case number (case id)
generate id=_n
2. Deleting existing variables/cases
drop X //deletes variable X
keep X Y Z //deletes all other variables
drop if gov==1 //deletes all cases in this governorate
drop if ~male //deletes females and missing values
keep if age>=15 & age<=60 //deletes all other cases
33. 10. Data manipulation
3. Dealing with Variable Groups:
Grouping variables in a variable set
global set1 “X1 X2 X3 X4”
When we use this variable set in any
command, we call it by adding a $ before
the name. For example:
tab1 $set1
34. 10. Data manipulation
3. Dealing with Variable Groups:
The use of dash (-)
for var X1-Y10: rename XX_2009
//will be executed on variables X1 to Y10
The use of star (*): (previously discussed)
describe X* des demo*99
list *Y* list *W
35. 10. Data manipulation
4. Creating new variables (gen)
generate y=1
g z=1 if (x=5)
gen samplesize=_N
//column of a constant=total number of
observations in the dataset (total sample size)
bysort family: gen famsize=_N
//column of constants=total number of observations in
each family (add up to sample size)
36. 10. Data manipulation
Creating new variables (gen) (cont’d)
gen l_income=log(income) //natural log
OR gen l_income=ln(income) //natural log
gen loginc=log10(income) //base 10 log
gen Y=sqrt(X) //get square-root of X
gen Z=exp(Y) //get the exponential
gen sqage=age^2 //get the square age
gen XY=X*Y //interaction term
gen lagYt = Yt[_n-1] //lagYt=Yt-1
37. 10. Data manipulation
Creating new variables (egen and its options)
egen avage = mean(age) //mean age of sample (only 1 value)
bysort hh: egen avg = mean(age) //mean age for every hh
egen meddiff = median(var1-var2) // (exp, - means subtraction)
median of the difference
egen avginc = rowmean(W X Y Z)
OR
egen avginc = rowmean(W - Z) //(varlist, - means through)
egen ttlsales = total(sales), by(region)
38. 10. Data manipulation
Dummy variable construction:
Manual (allowing missing values)
gen female=0 if sex==2
replace female=0 if sex==1
Automatic (NOT allowing missing values)
gen married=(mrtst==2) //generate a dummy for married
tab region, gen(region)
//generate 6 dummy variables for the 6 regions
xi commands for categorical data
xi: tab1 i.region
//can be used with any command. Dummies not saved
39. 11. Graphics
The commands that draw graphs are
command description
------------------------------------------------------------------------------------------
graph twoway scatterplots, line plots, etc.
graph matrix scatterplot matrices
graph bar bar charts
graph dot dot charts
graph box box-and-whisker plots
graph pie pie charts
other more commands to draw statistical graphs
------------------------------------------------------------------------------------------
40. 11. Graphics
The commands that save a previously drawn graph,
redisplay previously saved graphs, and combine graphs are
command description
-------------------------------------------------------------------------
graph save save graph to disk
graph use redisplay graph stored on disk
graph display redisplay graph stored in memory
graph combine combine graphs into one
-------------------------------------------------------------------------
41. 11. Graphics
1. Histograms
histogram X // draws a histogram for variable X
histogram X if male==1 in 1/1000
//histogram for variable X for males only in the first 1000 cases
histogram X, percentage normal
// histogram for variable X along with the normal curve
For more info on options: help histogram
42. 11. Graphics
2. Bar graphs
graph bar X
// draws a bar chart with vertical bars for variable X
graph hbar Y
// draws a bar chart with horizontal bars for variable Y
For more info on options: help graph bar
43. 11. Graphics
3. Scatterplots
graph twoway scatter Y X
twoway scatter Y X
scatter Y X
The above three commands are equivalent.
44. 11. Graphics
graph twoway (scatter y1 x) (scatter y2 x)
// draws a scatter plot for variable y1 against x and for y2 against x
This is equivalent to typing
OR twoway scatter y1 x || scatter y2 x
graph twoway (scatter y x) (lfit y x)
// draws a scatter plot for variable y against x and adds the linear prediction fit
graph matrix X1 X2 X3
// scatterplot matrices for the three variables together (two at a time)
For more info on options: help scatter
OR help graph_twoway
45. 11. Graphics
4. Line graphs
graph twoway line Y X
twoway line Y X
line Y X
The above three commands are equivalent.
For more info on options: help line
OR help graph_twoway
46. 11. Graphics
5. Labeling graph and graph axes
scatter lexp region, title("Scatter plot")
subtitle("Life expectancy at birth, US")
yvarlabel("life expectancy")
xvarlabel("Region")
Note: The whole command should be written on
one line.
47. 11. Graphics
6. Saving graphs
This command is written directly after the graph that
you wish to save:
e.g.
scatter yvar xvar
graph save mygraph //save previous graph
This will create the file mygraph.gph
OR
scatter yvar xvar, saving(mygraph)
graph use mygraph //use saved graph
49. Further topics of interest. 12
It should be noted that data should be
weighted to be representative of the
population (help weight)
Stata can merge files (add variables
from one file to another) and append
files (add cases).
(help merge) and (help append)
Numerous options are present with
every command
50. Matrices. 13
To input matrix A:
11 530
550 32130 , we do the following:
matrix input A=(11,530550,32130)
mat list A // to show the matrix content
mat define detA=det(A) // to get the determinant of A
mat define invA=inv(A) // to get the inverse of A
mat define transA=A’ // to get the transpose of A
Mat D=A+B // to get the sum of A and B