SlideShare a Scribd company logo
1 of 30
INTRODUCTION
Review of Statistics. Stata and Excel introduction
HOMEWORK FOR FRIDAY
• Using files grades, peanuts, and unrate…
• Find summary statistics for each variable
• Create histogram chart for grades
• Create line graph for unrate
• Save everything in a do file.
DESCRIPTIVE STATISTICS
• Mean – arithmetic mean, arithmetic average.
• Sum of the data values divided by the number of observations
• Mode
• Median
• Minimum, maximum
• Variance
• Standard deviation
MEAN
• Mean - arithmetic mean, arithmetic average. Sum of the data values divided by the
number of observations
• Example: Calculate the mean for the hypothetical data for shipments of peanuts from a
U.S. exporter to five Canadian cities
• Montreal – 640,000 pounds
• Ottawa – 15,000 pounds
• Toronto – 285,000 pounds
• Vancouver – 228,000 pounds
• Winnipeg – 45,000 pounds
• Notes: Σ means sum, unit of observation here is a Canadian city
MEAN CONT’D
• In excel:
• Click on fx and find the function name or type in
• =average(range of data)
• In Stata:
• Import data by clicking on “file” (upper left corner) -> “import” ->pick the format of the file->
find it by clicking ”browse”-> tick the box “import first row as variable names”
• Mean(peanuts)
STATA
• Stata is a powerful tool for researchers and applied economists.
• Infinitely extensible, gives users the same development tools used by the company’s professional
programmers
• Google is your best friend
• Stata has a few windows:
• bottom middle is the command window – this is where you type in the commands;
• top middle – the commands that you submitted appear and so does the output;
• left – all of the commands you have run;
• right – all of the variables you have in your dataset
• To view your dataset you can click on “data editor” or “data browser”
STATA
• Right now there is no data in Stata. We first have to upload the data to it. The way you
upload data into Stata (or any other type of statistical software) depends on the type of
data file you have
• Text data, such as comma-delimited files (.csv)
• Excel files (.xlsx)
• Stata files (.dta)
• Please find the dataset “grades” on blackboard. What type of file is it?
• Stata: file-> import->type of file. Please tick “import first row as variable names”
• If you want to upload a different dataset to work with it, type in “clear” in the command window
STATA LOGS AND DO-FILES
• log – records your work in Stata, start before you do anything else!
• .do file – lets you record a series of commands
• Try to make your own log and .do file
• Click on “log” -> “begin” ->give it a name ->save in the location convenient for you (this starts a
log, when you exit Stata the log will automatically save).
• Click on “do-file editor” start typing up commands. You would save it like any other document
(”save” -> give it a name, save in a convenient location).
• To run the commands in the do-file simply click ”run” at the top of the do-file
EXAMPLE
• Calculate mean for the student grades in excel and in Stata
• You will find the data set “grades” on blackboard
• Make sure your work in Stata is recorded in a log
• What is the unit of observation in the dataset (i.e. whose grades are these)?
• How many observations are there?
• What is the average grade in that class?
SMALLEST AND LARGEST OBSERVATION
• You might be wondering if anyone got 100 in the class, or what the highest grade in the class
was and possibly the lowest.
• We can do so by looking at the data, by sorting data, and by using minimum and maximum
functions in Excel and Stata
• To sort data:
• In Excel: highlight the data you want to sort, “data” -> “sort”
• In Stata: sort ’variablename’
• gsort +’variablename’ or –’variablename’
• Once you have sorted the data you can see what the first and last observations are
• Functions in Excel: =min(data), =max(data)
• Functions in Stata: summarize ‘variablename’
• Minimum and maximum let you know if you have outliers in your data or there are certain
problems with your data
APPLICATION 1. USE EXCEL
• Use UNRATE – unemployment rate dataset to find out the…
• Average unemployment rate between 1948 and 2020
• What was the maximum and minimum unemployment rate during that period?
• Any thoughts on your findings?
• TIP… Stata has an API with Fred. There are two ways of accessing the FRED database…
• Freduse command (might need to be installed)…. freduse UNRATE, clear
• File >> Import >> Federal Reserve Economic Database
APPLICATION 2. USE GRADES2 TO ANSWER THE
FOLLOWING
• In Stata:
• What is the minimum grade in that class?
• What is the maximum grade in that class?
• What is the average grade in that class?
• How do the minimums, maximums, and averages compare across the two classes?
STANDARD DEVIATION
• I want to calculate how dispersed the students’ grades are compared to the average
grade in the class
• Standard deviation (square root of variance) – spread of the observations around the
mean value
• Why is it useful? We can find out how much the data fluctuates around the mean in a
dataset and compare datasets, it also lets us know if there are any outliers in a dataset
so we can get rid of them.
• Examples: income in different cities, unemployment in different regions, return on
different companies’ stock,
STANDARD DEVIATION CONT’D
• In Excel the function for standard deviation is: =stdev(data)
• In Stata standard deviation is the part of summarize command output
STANDARD DEVIATION APPLICATIONS
• Find the standard deviation for both of the classes and compare them. What conclusion
can you draw?
• What was the standard deviation of the unemployment rate before and after outliers
were corrected? What conclusion can you draw?
VARIANCE
• Closely tied to standard deviation
• Variance = squared standard deviation
• Measure of how far away the observations are in a dataset from the mean
• To find variance in excel: =var(datarange)
• To find variance in Stata: have to square standard deviation by hand or use display r(Var)
after summarize command
• Stata retains a number of calculations (behind the scenes).
• return list
• There are other tools for calculating summary statistics…
• Help tabstat
• tabstat UNRATE, s(var)
USING STATA AND EXCEL AS A CALCULATOR
• To find variance you can always square standard deviation
• di r(Var)
• di r(sd)^2
• To use excel as a calculator you have to type in “=“ into a cell and then what you are
trying to calculate
• In Stata you have to type in the word ”display” and then what you are trying to calculate
• For example, if standard deviation is 1.6 then to calculate variance in
• Excel: =1.6^2 (or =1.6*1.6)
• Stata: display 1.6^2 (or display 1.6*1.6)
CREATING A NEW VARIABLE
• You can create new variables in Excel and Stata. This skill will be useful later on in the
class
• For now lets imagine the professor gives everyone in the first class a 1% curve and
calculate their grades
• In excel in a new cell type in: =”cell with data”+1, hover over bottom right corner of the
new cell and double click, the column should populate with calculated values. What is
the class average now once everyone received extra credit?
• Let’s import the grades into Stata and do the same. To create a new variable:
• generate var=classgrade+1
BAR CHARTS
• You would like to find out how many people in the class received an A, B, C, and D.
• The best way to look at that is to create a distribution chart (histogram) that will show
how many received each grade
• In Excel highlight the data->insert->histogram->right-click on the x-axis label to change
number of bins and their range
• In Stata click on graphics->histogram. There are many options, let’s go through some of
them
• Variable – classgrade
• Width of bins – 10 (this is how “wide” each grade category is)
• Lower limit of first bin – 60 (assuming no one failed the class)
• Y-axis – frequency
BAR CHARTS CONT’D
• We can create bar charts to compare the same variable over time (i.e. unemployment) or
across different units (i.e. income across different cities)
• Let’s create an overtime bar chart using unemployment rate data in excel
• Highlight unemployment rate column by clicking on column name twice
• Click “insert” (top right)->pick bar chart (2D column)
• Left click on x-axis labels->select data->edit->select range (years column) by
highlighting it
• To add labels to the axes, click on the chart->”+” symbol at the right corner-> tick axis
titles->type the titles into the boxes
LINE CHARTS
• Showing the progression of a variable overtime is easier with a line chart
• Load unemployment rate to Stata
• This is time series data. We have to treat it a bit differently
generate daten = tm(1948m1) +_n-1
format daten %tm
tsset daten, monthly
sort month
• Click “graphics” on the top left -> twoway graph->create->line plot type-> Y-variable is
unemployment rate, X-variable is year->submit
• To save your graph - > file->save as-> pick the type that will make it easy for you to open
the graph
• https://fred.stlouisfed.org/series/UNRATE/ compare your graphs to FRED data
SIDE NOTE
• How does Stata work with time series data…
• It uses a numerical system stating in 1/1/1960 (this value will always be 0)
• _n refers to a specific period
• _N refers to total number of observations
• Why do I need to subtract 1 when finding the correct month…
• To ensure the data align with 1/1/1960 is 0
GDP OVERTIME IN US, MEXICO, AND CANADA
• Please google “GDP per capita by country world bank” -> pick the one in current US$
(why do we have to use GDP per capita in current dollars? ) ->Download the csv file
• Use ctrl-F to find GDP for US, Mexico and Canada. Copy and paste into a new document
each country’s GDP
• Delete third and fourth columns
• Create a line chart. What conclusion can we draw about the relative economic growth of
these countries?
CORRELATION
• Is it possible to improve your score during the semester or is the grade on the first exam
closely related to the grade at the end of the semester?
• Use grades3.xlsx data set to be able to answer this question
• Import the dataset into stata. We are going to plot the observed points on a graph
where the axes are: exam grade and class grade
• To do so type in: scatter(exam1 classgrade)
• We can tell that there is a positive relationship between the two variables
• The graph that you created is called a scatterplot. By looking at scatterplots we can kind
of tell if there is a relationship between different variables in the data. We can also make
an educated guess whether the relationship between the two variables is positive or
negative by looking at a scatterplot
• Can you think of two variables that might be positively or negatively related?
CALIFORNIA SCHOOL’S DATASET
• The data set includes data on California’s school districts in 1998-1999 school year
• It includes average test scores for 5th grades in each school district
• The description of the data set is in the word document titled “California Test Scores”
• Let’s look at the relationship between total enrollment and testscores
• Stata: scatter testscr enrl_tot
• Take a look at the data description and think of what could be related to the test scores?
Is it a positive or a negative relationship?
CORRELATION COEFFICIENT
• We don’t have to guess whether there is a relationship between two variables and
whether the relationship is positive or negative
• We will use something called “correlation coefficient” (usually denoted r) to answer that
• If r is between 0 and 1 the relationship is positive
• If r is between -1 and 0 the relationship is negative
• The closer the absolute value of r to 1is, the stronger the relationship
• The closer the absolute value of r to 0 is, the weaker the relationship
• In stata to find the correlation coefficient type in: correlate variable1 variable2
• In excel to find the correlation coefficient type in: =correl(variable 1 variable2)
DO IT YOURSELF TIME
• Try to create a scatterplot for the grades3 dataset in excel
• Hint: a scatterplot is just a type of chart, your steps would be similar to creating a bar
chart in excel
• Try to find the correlation coefficient for the grades3 dataset in excel (on slido)
• Hint: the correlation coefficient is a type of function. This should be similar to finding an
average or a standard deviation in excel.
LINE OF BEST FIT
• Line of best fit is the line that best represents all of the data points on a scatterplot
• Like any straight line it has an intercept and a slope
• The equation of a straight line is: y=mx+b
• Where b – intercept with the y-axis, m – the slope of the line
• If the line of best fit for a scatterplot is y=-3x+2, this means that 2 – intercept with the Y-
axis and 3 – slope of the line.
• When x = 0, y = 2
• Since the slope is negative the relationship between the two variables is negative.
EXAMPLE: LINE OF BEST FIT FOR CLASS GRADES
• Once you have created a scatterplot in excel you can add the line of best fit to it
• Click on the “+” in the upper-right corner, tick “trendline”
• You can see that the line of best fit is upward-sloping => the relationship between the
two variables is positive
• To find out the equation of the line left-click on it ->format->display equation on chart
• What are the intercept and the slope of the line? What conclusion can we draw from
knowing those numbers?
• Do they make sense?
CONCLUSION
• We have reviewed descriptive statistics. What are some of the descriptive stats we have
discussed?
• How can we find them in excel?
• How can we find them in stata?
• What types of charts have you learned to create? How can you do this in stata/ excel?
• If the correlation coefficient is -1 what does it mean? 0? 0.2?

More Related Content

What's hot (19)

Introduction to spss 1
Introduction to spss 1Introduction to spss 1
Introduction to spss 1
 
SPS intro
SPS introSPS intro
SPS intro
 
SPSS an intro...
SPSS an intro...SPSS an intro...
SPSS an intro...
 
The Excel ToolKit
The Excel ToolKitThe Excel ToolKit
The Excel ToolKit
 
Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4Elementary Data Analysis with MS Excel_Day-4
Elementary Data Analysis with MS Excel_Day-4
 
Statistical Package for Social Science (SPSS)
Statistical Package for Social Science (SPSS)Statistical Package for Social Science (SPSS)
Statistical Package for Social Science (SPSS)
 
Datapreprocessing
DatapreprocessingDatapreprocessing
Datapreprocessing
 
New slides access
New slides accessNew slides access
New slides access
 
Spss
SpssSpss
Spss
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
Advanced Filter Concepts in MS-Excel
Advanced Filter Concepts in MS-ExcelAdvanced Filter Concepts in MS-Excel
Advanced Filter Concepts in MS-Excel
 
Various statistical software's in data analysis.
Various statistical software's in data analysis.Various statistical software's in data analysis.
Various statistical software's in data analysis.
 
(Manual spss)
(Manual spss)(Manual spss)
(Manual spss)
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSS
 
Arrays
ArraysArrays
Arrays
 
Introductionto excel2007
Introductionto excel2007Introductionto excel2007
Introductionto excel2007
 
Performing Data Science with HBase
Performing Data Science with HBasePerforming Data Science with HBase
Performing Data Science with HBase
 
sorting and filtering data in excel
sorting and filtering data in excelsorting and filtering data in excel
sorting and filtering data in excel
 
Microsoft Access
Microsoft AccessMicrosoft Access
Microsoft Access
 

Similar to Introduction

L9 using datawarrior for scientific data visualization
L9 using datawarrior for scientific data visualizationL9 using datawarrior for scientific data visualization
L9 using datawarrior for scientific data visualizationSeppo Karrila
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitizationVenkata Reddy Konasani
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spssSubodh Khanal
 
Beginners SPSS.ppt
Beginners SPSS.pptBeginners SPSS.ppt
Beginners SPSS.pptsayahuwaina
 
intro-to-spreadsheets-with-excel (1).pptx
intro-to-spreadsheets-with-excel (1).pptxintro-to-spreadsheets-with-excel (1).pptx
intro-to-spreadsheets-with-excel (1).pptxANAMARIATAMAYODUQUE1
 
Lab 3 Set Working Directory, Scatterplots and Introduction to.docx
Lab 3 Set Working Directory, Scatterplots and Introduction to.docxLab 3 Set Working Directory, Scatterplots and Introduction to.docx
Lab 3 Set Working Directory, Scatterplots and Introduction to.docxDIPESH30
 
ds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.pptds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.pptAlliVinay1
 
1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.pptAshok280385
 
1a s4 i creating runcharts final
1a s4 i creating runcharts final1a s4 i creating runcharts final
1a s4 i creating runcharts finalABCiABUHB
 
OLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENTOLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENTANNA UNIVERSITY
 
Percentiles and Deciles
Percentiles and DecilesPercentiles and Deciles
Percentiles and DecilesMary Espinar
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
1. chapter i(pasw)
1. chapter i(pasw)1. chapter i(pasw)
1. chapter i(pasw)Chhom Karath
 
4 Statistical Software.pptx
4 Statistical Software.pptx4 Statistical Software.pptx
4 Statistical Software.pptxkaleabtegegne
 

Similar to Introduction (20)

L9 using datawarrior for scientific data visualization
L9 using datawarrior for scientific data visualizationL9 using datawarrior for scientific data visualization
L9 using datawarrior for scientific data visualization
 
presentation Updated.pdf
presentation Updated.pdfpresentation Updated.pdf
presentation Updated.pdf
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
Beginners SPSS.ppt
Beginners SPSS.pptBeginners SPSS.ppt
Beginners SPSS.ppt
 
IS100 Week 8
IS100 Week 8IS100 Week 8
IS100 Week 8
 
intro-to-spreadsheets-with-excel (1).pptx
intro-to-spreadsheets-with-excel (1).pptxintro-to-spreadsheets-with-excel (1).pptx
intro-to-spreadsheets-with-excel (1).pptx
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Lab 3 Set Working Directory, Scatterplots and Introduction to.docx
Lab 3 Set Working Directory, Scatterplots and Introduction to.docxLab 3 Set Working Directory, Scatterplots and Introduction to.docx
Lab 3 Set Working Directory, Scatterplots and Introduction to.docx
 
ds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.pptds 1 Introduction to Data Structures.ppt
ds 1 Introduction to Data Structures.ppt
 
1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt1.1 introduction to Data Structures.ppt
1.1 introduction to Data Structures.ppt
 
1a s4 i creating runcharts final
1a s4 i creating runcharts final1a s4 i creating runcharts final
1a s4 i creating runcharts final
 
OLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENTOLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENT
 
Tqm old tools
Tqm old toolsTqm old tools
Tqm old tools
 
Lec 3.pptx
Lec 3.pptxLec 3.pptx
Lec 3.pptx
 
Percentiles and Deciles
Percentiles and DecilesPercentiles and Deciles
Percentiles and Deciles
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
 
1. chapter i(pasw)
1. chapter i(pasw)1. chapter i(pasw)
1. chapter i(pasw)
 
4 Statistical Software.pptx
4 Statistical Software.pptx4 Statistical Software.pptx
4 Statistical Software.pptx
 

More from Ryan Herzog

Chapter 14 - Great Recession
Chapter 14 - Great RecessionChapter 14 - Great Recession
Chapter 14 - Great RecessionRyan Herzog
 
Chapter 13 - AD/AS
Chapter 13 - AD/ASChapter 13 - AD/AS
Chapter 13 - AD/ASRyan Herzog
 
Chapter 12 - Monetary Policy
Chapter 12 - Monetary PolicyChapter 12 - Monetary Policy
Chapter 12 - Monetary PolicyRyan Herzog
 
Chapter 11 - IS Curve
Chapter 11 - IS CurveChapter 11 - IS Curve
Chapter 11 - IS CurveRyan Herzog
 
Chapter 10 - Great Recession
Chapter 10 - Great RecessionChapter 10 - Great Recession
Chapter 10 - Great RecessionRyan Herzog
 
Chapter 9 - Short Run
Chapter 9 - Short RunChapter 9 - Short Run
Chapter 9 - Short RunRyan Herzog
 
Chapter 8 - Inflation
Chapter 8 - InflationChapter 8 - Inflation
Chapter 8 - InflationRyan Herzog
 
Chapter 7 - Labor Market
Chapter 7 - Labor MarketChapter 7 - Labor Market
Chapter 7 - Labor MarketRyan Herzog
 
Chapter 6 - Romer Model
Chapter 6 - Romer Model Chapter 6 - Romer Model
Chapter 6 - Romer Model Ryan Herzog
 
Chapter 5 - Solow Model for Growth
Chapter 5 - Solow Model for GrowthChapter 5 - Solow Model for Growth
Chapter 5 - Solow Model for GrowthRyan Herzog
 
Chapter 4 - Model of Production
Chapter 4 - Model of ProductionChapter 4 - Model of Production
Chapter 4 - Model of ProductionRyan Herzog
 
Chapter 3 - Long-Run Economic Growth
Chapter 3 - Long-Run Economic GrowthChapter 3 - Long-Run Economic Growth
Chapter 3 - Long-Run Economic GrowthRyan Herzog
 
Chapter 2 - Measuring the Macroeconomy
Chapter 2 - Measuring the MacroeconomyChapter 2 - Measuring the Macroeconomy
Chapter 2 - Measuring the MacroeconomyRyan Herzog
 
Topic 7 (questions)
Topic 7 (questions)Topic 7 (questions)
Topic 7 (questions)Ryan Herzog
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)Ryan Herzog
 
Topic 6 (model specification)
Topic 6 (model specification)Topic 6 (model specification)
Topic 6 (model specification)Ryan Herzog
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)Ryan Herzog
 
Topic 4 (binary)
Topic 4 (binary)Topic 4 (binary)
Topic 4 (binary)Ryan Herzog
 

More from Ryan Herzog (20)

Chapter 14 - Great Recession
Chapter 14 - Great RecessionChapter 14 - Great Recession
Chapter 14 - Great Recession
 
Chapter 13 - AD/AS
Chapter 13 - AD/ASChapter 13 - AD/AS
Chapter 13 - AD/AS
 
Chapter 12 - Monetary Policy
Chapter 12 - Monetary PolicyChapter 12 - Monetary Policy
Chapter 12 - Monetary Policy
 
Chapter 11 - IS Curve
Chapter 11 - IS CurveChapter 11 - IS Curve
Chapter 11 - IS Curve
 
Chapter 10 - Great Recession
Chapter 10 - Great RecessionChapter 10 - Great Recession
Chapter 10 - Great Recession
 
Chapter 9 - Short Run
Chapter 9 - Short RunChapter 9 - Short Run
Chapter 9 - Short Run
 
Chapter 8 - Inflation
Chapter 8 - InflationChapter 8 - Inflation
Chapter 8 - Inflation
 
Chapter 7 - Labor Market
Chapter 7 - Labor MarketChapter 7 - Labor Market
Chapter 7 - Labor Market
 
Chapter 6 - Romer Model
Chapter 6 - Romer Model Chapter 6 - Romer Model
Chapter 6 - Romer Model
 
Chapter 5 - Solow Model for Growth
Chapter 5 - Solow Model for GrowthChapter 5 - Solow Model for Growth
Chapter 5 - Solow Model for Growth
 
Chapter 4 - Model of Production
Chapter 4 - Model of ProductionChapter 4 - Model of Production
Chapter 4 - Model of Production
 
Chapter 3 - Long-Run Economic Growth
Chapter 3 - Long-Run Economic GrowthChapter 3 - Long-Run Economic Growth
Chapter 3 - Long-Run Economic Growth
 
Chapter 2 - Measuring the Macroeconomy
Chapter 2 - Measuring the MacroeconomyChapter 2 - Measuring the Macroeconomy
Chapter 2 - Measuring the Macroeconomy
 
Topic 7 (data)
Topic 7 (data)Topic 7 (data)
Topic 7 (data)
 
Inequality
InequalityInequality
Inequality
 
Topic 7 (questions)
Topic 7 (questions)Topic 7 (questions)
Topic 7 (questions)
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)
 
Topic 6 (model specification)
Topic 6 (model specification)Topic 6 (model specification)
Topic 6 (model specification)
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)
 
Topic 4 (binary)
Topic 4 (binary)Topic 4 (binary)
Topic 4 (binary)
 

Recently uploaded

Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 

Recently uploaded (20)

Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 

Introduction

  • 1. INTRODUCTION Review of Statistics. Stata and Excel introduction
  • 2. HOMEWORK FOR FRIDAY • Using files grades, peanuts, and unrate… • Find summary statistics for each variable • Create histogram chart for grades • Create line graph for unrate • Save everything in a do file.
  • 3. DESCRIPTIVE STATISTICS • Mean – arithmetic mean, arithmetic average. • Sum of the data values divided by the number of observations • Mode • Median • Minimum, maximum • Variance • Standard deviation
  • 4. MEAN • Mean - arithmetic mean, arithmetic average. Sum of the data values divided by the number of observations • Example: Calculate the mean for the hypothetical data for shipments of peanuts from a U.S. exporter to five Canadian cities • Montreal – 640,000 pounds • Ottawa – 15,000 pounds • Toronto – 285,000 pounds • Vancouver – 228,000 pounds • Winnipeg – 45,000 pounds • Notes: Σ means sum, unit of observation here is a Canadian city
  • 5. MEAN CONT’D • In excel: • Click on fx and find the function name or type in • =average(range of data) • In Stata: • Import data by clicking on “file” (upper left corner) -> “import” ->pick the format of the file-> find it by clicking ”browse”-> tick the box “import first row as variable names” • Mean(peanuts)
  • 6. STATA • Stata is a powerful tool for researchers and applied economists. • Infinitely extensible, gives users the same development tools used by the company’s professional programmers • Google is your best friend • Stata has a few windows: • bottom middle is the command window – this is where you type in the commands; • top middle – the commands that you submitted appear and so does the output; • left – all of the commands you have run; • right – all of the variables you have in your dataset • To view your dataset you can click on “data editor” or “data browser”
  • 7. STATA • Right now there is no data in Stata. We first have to upload the data to it. The way you upload data into Stata (or any other type of statistical software) depends on the type of data file you have • Text data, such as comma-delimited files (.csv) • Excel files (.xlsx) • Stata files (.dta) • Please find the dataset “grades” on blackboard. What type of file is it? • Stata: file-> import->type of file. Please tick “import first row as variable names” • If you want to upload a different dataset to work with it, type in “clear” in the command window
  • 8. STATA LOGS AND DO-FILES • log – records your work in Stata, start before you do anything else! • .do file – lets you record a series of commands • Try to make your own log and .do file • Click on “log” -> “begin” ->give it a name ->save in the location convenient for you (this starts a log, when you exit Stata the log will automatically save). • Click on “do-file editor” start typing up commands. You would save it like any other document (”save” -> give it a name, save in a convenient location). • To run the commands in the do-file simply click ”run” at the top of the do-file
  • 9. EXAMPLE • Calculate mean for the student grades in excel and in Stata • You will find the data set “grades” on blackboard • Make sure your work in Stata is recorded in a log • What is the unit of observation in the dataset (i.e. whose grades are these)? • How many observations are there? • What is the average grade in that class?
  • 10. SMALLEST AND LARGEST OBSERVATION • You might be wondering if anyone got 100 in the class, or what the highest grade in the class was and possibly the lowest. • We can do so by looking at the data, by sorting data, and by using minimum and maximum functions in Excel and Stata • To sort data: • In Excel: highlight the data you want to sort, “data” -> “sort” • In Stata: sort ’variablename’ • gsort +’variablename’ or –’variablename’ • Once you have sorted the data you can see what the first and last observations are • Functions in Excel: =min(data), =max(data) • Functions in Stata: summarize ‘variablename’ • Minimum and maximum let you know if you have outliers in your data or there are certain problems with your data
  • 11. APPLICATION 1. USE EXCEL • Use UNRATE – unemployment rate dataset to find out the… • Average unemployment rate between 1948 and 2020 • What was the maximum and minimum unemployment rate during that period? • Any thoughts on your findings? • TIP… Stata has an API with Fred. There are two ways of accessing the FRED database… • Freduse command (might need to be installed)…. freduse UNRATE, clear • File >> Import >> Federal Reserve Economic Database
  • 12. APPLICATION 2. USE GRADES2 TO ANSWER THE FOLLOWING • In Stata: • What is the minimum grade in that class? • What is the maximum grade in that class? • What is the average grade in that class? • How do the minimums, maximums, and averages compare across the two classes?
  • 13. STANDARD DEVIATION • I want to calculate how dispersed the students’ grades are compared to the average grade in the class • Standard deviation (square root of variance) – spread of the observations around the mean value • Why is it useful? We can find out how much the data fluctuates around the mean in a dataset and compare datasets, it also lets us know if there are any outliers in a dataset so we can get rid of them. • Examples: income in different cities, unemployment in different regions, return on different companies’ stock,
  • 14. STANDARD DEVIATION CONT’D • In Excel the function for standard deviation is: =stdev(data) • In Stata standard deviation is the part of summarize command output
  • 15. STANDARD DEVIATION APPLICATIONS • Find the standard deviation for both of the classes and compare them. What conclusion can you draw? • What was the standard deviation of the unemployment rate before and after outliers were corrected? What conclusion can you draw?
  • 16. VARIANCE • Closely tied to standard deviation • Variance = squared standard deviation • Measure of how far away the observations are in a dataset from the mean • To find variance in excel: =var(datarange) • To find variance in Stata: have to square standard deviation by hand or use display r(Var) after summarize command • Stata retains a number of calculations (behind the scenes). • return list • There are other tools for calculating summary statistics… • Help tabstat • tabstat UNRATE, s(var)
  • 17. USING STATA AND EXCEL AS A CALCULATOR • To find variance you can always square standard deviation • di r(Var) • di r(sd)^2 • To use excel as a calculator you have to type in “=“ into a cell and then what you are trying to calculate • In Stata you have to type in the word ”display” and then what you are trying to calculate • For example, if standard deviation is 1.6 then to calculate variance in • Excel: =1.6^2 (or =1.6*1.6) • Stata: display 1.6^2 (or display 1.6*1.6)
  • 18. CREATING A NEW VARIABLE • You can create new variables in Excel and Stata. This skill will be useful later on in the class • For now lets imagine the professor gives everyone in the first class a 1% curve and calculate their grades • In excel in a new cell type in: =”cell with data”+1, hover over bottom right corner of the new cell and double click, the column should populate with calculated values. What is the class average now once everyone received extra credit? • Let’s import the grades into Stata and do the same. To create a new variable: • generate var=classgrade+1
  • 19. BAR CHARTS • You would like to find out how many people in the class received an A, B, C, and D. • The best way to look at that is to create a distribution chart (histogram) that will show how many received each grade • In Excel highlight the data->insert->histogram->right-click on the x-axis label to change number of bins and their range • In Stata click on graphics->histogram. There are many options, let’s go through some of them • Variable – classgrade • Width of bins – 10 (this is how “wide” each grade category is) • Lower limit of first bin – 60 (assuming no one failed the class) • Y-axis – frequency
  • 20. BAR CHARTS CONT’D • We can create bar charts to compare the same variable over time (i.e. unemployment) or across different units (i.e. income across different cities) • Let’s create an overtime bar chart using unemployment rate data in excel • Highlight unemployment rate column by clicking on column name twice • Click “insert” (top right)->pick bar chart (2D column) • Left click on x-axis labels->select data->edit->select range (years column) by highlighting it • To add labels to the axes, click on the chart->”+” symbol at the right corner-> tick axis titles->type the titles into the boxes
  • 21. LINE CHARTS • Showing the progression of a variable overtime is easier with a line chart • Load unemployment rate to Stata • This is time series data. We have to treat it a bit differently generate daten = tm(1948m1) +_n-1 format daten %tm tsset daten, monthly sort month • Click “graphics” on the top left -> twoway graph->create->line plot type-> Y-variable is unemployment rate, X-variable is year->submit • To save your graph - > file->save as-> pick the type that will make it easy for you to open the graph • https://fred.stlouisfed.org/series/UNRATE/ compare your graphs to FRED data
  • 22. SIDE NOTE • How does Stata work with time series data… • It uses a numerical system stating in 1/1/1960 (this value will always be 0) • _n refers to a specific period • _N refers to total number of observations • Why do I need to subtract 1 when finding the correct month… • To ensure the data align with 1/1/1960 is 0
  • 23. GDP OVERTIME IN US, MEXICO, AND CANADA • Please google “GDP per capita by country world bank” -> pick the one in current US$ (why do we have to use GDP per capita in current dollars? ) ->Download the csv file • Use ctrl-F to find GDP for US, Mexico and Canada. Copy and paste into a new document each country’s GDP • Delete third and fourth columns • Create a line chart. What conclusion can we draw about the relative economic growth of these countries?
  • 24. CORRELATION • Is it possible to improve your score during the semester or is the grade on the first exam closely related to the grade at the end of the semester? • Use grades3.xlsx data set to be able to answer this question • Import the dataset into stata. We are going to plot the observed points on a graph where the axes are: exam grade and class grade • To do so type in: scatter(exam1 classgrade) • We can tell that there is a positive relationship between the two variables • The graph that you created is called a scatterplot. By looking at scatterplots we can kind of tell if there is a relationship between different variables in the data. We can also make an educated guess whether the relationship between the two variables is positive or negative by looking at a scatterplot • Can you think of two variables that might be positively or negatively related?
  • 25. CALIFORNIA SCHOOL’S DATASET • The data set includes data on California’s school districts in 1998-1999 school year • It includes average test scores for 5th grades in each school district • The description of the data set is in the word document titled “California Test Scores” • Let’s look at the relationship between total enrollment and testscores • Stata: scatter testscr enrl_tot • Take a look at the data description and think of what could be related to the test scores? Is it a positive or a negative relationship?
  • 26. CORRELATION COEFFICIENT • We don’t have to guess whether there is a relationship between two variables and whether the relationship is positive or negative • We will use something called “correlation coefficient” (usually denoted r) to answer that • If r is between 0 and 1 the relationship is positive • If r is between -1 and 0 the relationship is negative • The closer the absolute value of r to 1is, the stronger the relationship • The closer the absolute value of r to 0 is, the weaker the relationship • In stata to find the correlation coefficient type in: correlate variable1 variable2 • In excel to find the correlation coefficient type in: =correl(variable 1 variable2)
  • 27. DO IT YOURSELF TIME • Try to create a scatterplot for the grades3 dataset in excel • Hint: a scatterplot is just a type of chart, your steps would be similar to creating a bar chart in excel • Try to find the correlation coefficient for the grades3 dataset in excel (on slido) • Hint: the correlation coefficient is a type of function. This should be similar to finding an average or a standard deviation in excel.
  • 28. LINE OF BEST FIT • Line of best fit is the line that best represents all of the data points on a scatterplot • Like any straight line it has an intercept and a slope • The equation of a straight line is: y=mx+b • Where b – intercept with the y-axis, m – the slope of the line • If the line of best fit for a scatterplot is y=-3x+2, this means that 2 – intercept with the Y- axis and 3 – slope of the line. • When x = 0, y = 2 • Since the slope is negative the relationship between the two variables is negative.
  • 29. EXAMPLE: LINE OF BEST FIT FOR CLASS GRADES • Once you have created a scatterplot in excel you can add the line of best fit to it • Click on the “+” in the upper-right corner, tick “trendline” • You can see that the line of best fit is upward-sloping => the relationship between the two variables is positive • To find out the equation of the line left-click on it ->format->display equation on chart • What are the intercept and the slope of the line? What conclusion can we draw from knowing those numbers? • Do they make sense?
  • 30. CONCLUSION • We have reviewed descriptive statistics. What are some of the descriptive stats we have discussed? • How can we find them in excel? • How can we find them in stata? • What types of charts have you learned to create? How can you do this in stata/ excel? • If the correlation coefficient is -1 what does it mean? 0? 0.2?