This document provides an introduction to descriptive statistics and how to calculate them in Excel and Stata. It discusses measures like the mean, median, mode, variance, standard deviation, minimum and maximum values. It also covers how to create histograms, bar charts, line charts, and calculate the correlation coefficient in both programs. Examples using sample datasets demonstrate how to find these statistics and visualize the data.
This document introduces descriptive statistics such as mean, median, mode, variance and standard deviation. It demonstrates how to calculate these statistics in Excel and Stata using sample student grade and unemployment rate datasets. Key charts for presenting data like histograms, bar charts and line charts are also illustrated for both programs. The concept of correlation is discussed and how to calculate the correlation coefficient to understand relationships between variables.
Microsoft Excel is a spreadsheet program used for data analysis, building models, calculations, and graphical presentations. It has a title bar, menu bar, toolbars, active cell indicator, and scroll bars. The worksheet contains rows and columns that intersect to form cells. Excel has various functions organized into categories like financial, date/time, text, logical, and statistical functions. Common functions include SUM, IF, COUNT, AVERAGE, and LOGICAL functions.
SPSS is a statistical software package used for analyzing data. It was developed in 1968 at Stanford University. SPSS stands for Statistical Package for the Social Sciences. The document discusses the types of variables in SPSS including qualitative (string) and quantitative (numeric) variables. It also covers defining variables such as variable name, type, width and labels to describe the values. Proper coding and labeling helps facilitate analysis and interpretation of results.
A brief introduction for beginners. Topic included: background history of SPSS, some basics but effective data management techniques, frequency distribution, descriptive statistics, hypothesis testing rule, association test/ contingency table test. All these statistical topics are explained with easy hands on example with basic data-set. This slide also provide a short but effective understanding about p-value, which is very important for statistical decision making
This one-day workshop on data analysis using SPSS has two parts. Part 1 covers entering data into SPSS, including preparing datasets, transforming data, and running descriptive statistics. Part 2 provides an overview of statistical analysis techniques and how to choose the appropriate technique for decision making, giving examples. The document introduces SPSS and its four windows: the data editor, output viewer, syntax editor, and script window. It describes how to define variables, enter and manage data files, sort cases, compute new variables, and perform basic analyses like frequencies, descriptives, and linear regression. Proper use of statistical techniques depends on the research question, variable types and definitions, and assumptions.
This document provides an overview of using the statistical software package SPSS. It discusses the four main windows in SPSS - the data editor, output viewer, syntax editor, and script window. It also covers the basics of managing data files, including opening SPSS, defining variables, and saving data. Finally, it demonstrates some common analyses in SPSS including frequencies, descriptives, and linear regression as well as how to interpret the outputs and plot regression lines. The overall purpose is to introduce the basics of using SPSS to perform statistical analysis and data management.
The document discusses several quality control tools including:
1) The seven old quality control tools which include cause and effect diagrams, Pareto analysis, scatter diagrams, decision matrices, control charts and brainstorming techniques.
2) Cause and effect diagrams (Ishikawa or fishbone diagrams) which identify potential causes for a problem or effect.
3) Check sheets which collect and analyze defect data through a structured form.
4) Histograms which show the distribution of data values to analyze process performance.
5) Pareto charts which arrange problems or causes by frequency to focus on the most important ones.
6) Scatter diagrams which look for relationships between variables by plotting paired numerical data.
This document introduces descriptive statistics such as mean, median, mode, variance and standard deviation. It demonstrates how to calculate these statistics in Excel and Stata using sample student grade and unemployment rate datasets. Key charts for presenting data like histograms, bar charts and line charts are also illustrated for both programs. The concept of correlation is discussed and how to calculate the correlation coefficient to understand relationships between variables.
Microsoft Excel is a spreadsheet program used for data analysis, building models, calculations, and graphical presentations. It has a title bar, menu bar, toolbars, active cell indicator, and scroll bars. The worksheet contains rows and columns that intersect to form cells. Excel has various functions organized into categories like financial, date/time, text, logical, and statistical functions. Common functions include SUM, IF, COUNT, AVERAGE, and LOGICAL functions.
SPSS is a statistical software package used for analyzing data. It was developed in 1968 at Stanford University. SPSS stands for Statistical Package for the Social Sciences. The document discusses the types of variables in SPSS including qualitative (string) and quantitative (numeric) variables. It also covers defining variables such as variable name, type, width and labels to describe the values. Proper coding and labeling helps facilitate analysis and interpretation of results.
A brief introduction for beginners. Topic included: background history of SPSS, some basics but effective data management techniques, frequency distribution, descriptive statistics, hypothesis testing rule, association test/ contingency table test. All these statistical topics are explained with easy hands on example with basic data-set. This slide also provide a short but effective understanding about p-value, which is very important for statistical decision making
This one-day workshop on data analysis using SPSS has two parts. Part 1 covers entering data into SPSS, including preparing datasets, transforming data, and running descriptive statistics. Part 2 provides an overview of statistical analysis techniques and how to choose the appropriate technique for decision making, giving examples. The document introduces SPSS and its four windows: the data editor, output viewer, syntax editor, and script window. It describes how to define variables, enter and manage data files, sort cases, compute new variables, and perform basic analyses like frequencies, descriptives, and linear regression. Proper use of statistical techniques depends on the research question, variable types and definitions, and assumptions.
This document provides an overview of using the statistical software package SPSS. It discusses the four main windows in SPSS - the data editor, output viewer, syntax editor, and script window. It also covers the basics of managing data files, including opening SPSS, defining variables, and saving data. Finally, it demonstrates some common analyses in SPSS including frequencies, descriptives, and linear regression as well as how to interpret the outputs and plot regression lines. The overall purpose is to introduce the basics of using SPSS to perform statistical analysis and data management.
The document discusses several quality control tools including:
1) The seven old quality control tools which include cause and effect diagrams, Pareto analysis, scatter diagrams, decision matrices, control charts and brainstorming techniques.
2) Cause and effect diagrams (Ishikawa or fishbone diagrams) which identify potential causes for a problem or effect.
3) Check sheets which collect and analyze defect data through a structured form.
4) Histograms which show the distribution of data values to analyze process performance.
5) Pareto charts which arrange problems or causes by frequency to focus on the most important ones.
6) Scatter diagrams which look for relationships between variables by plotting paired numerical data.
At the end of this Lesson (Part 1) the students should be able to know the following
Introduction
Data Entry
Variable and Value Label
Entering Data
File management
Descriptive statistics
Editing and modifying the data
This document provides an overview of using the SPSS statistical package for data analysis. It discusses the four main windows in SPSS - the data editor, output viewer, syntax editor, and script window. It also covers the basics of managing data files, including opening SPSS, defining variables, and saving data. Finally, it introduces some basic analysis techniques in SPSS like frequencies, descriptives, and linear regression analysis.
This document provides an introduction to using SPSS (Statistical Package for the Social Sciences) for data analysis. It discusses the four main windows in SPSS - the data editor, output viewer, syntax editor, and script window. It also covers the basics of managing data files, including opening SPSS, defining variables, and sorting data. Several basic analysis techniques are introduced, such as frequencies, descriptives, and linear regression. Examples are provided for how to conduct these analyses and interpret the outputs.
Learn the most important tools of excel that will enable you to become an excel master. These skills are the building blocks of any advanced analysis and should be used every time you are int the program
Elementary Data Analysis with MS Excel_Day-4Redwan Ferdous
This event took place on 12th September 2020. This was arranged by EMK Center (Makerlab). The title was 'Elementary Data Analysis with MS Excel', where very basic data analysis with MS excel was discussed.
In Day-4, the MS Excel Data Tab, View and Review tab as well as Developer Tab of Horizontal top ribbon was discussed. As well as different Quick analysis tools, What-if Analysis, Data Table, Scenario Manager, Pareto Chart was also discussed.
Statistical Package for Social Science (SPSS)sspink
This presentation includes the introduction of SPSS is basic features of Spss, how to input data manually, descriptive statistics and how to perform t-test, Anova and Chi-Square.
This document discusses data preprocessing techniques. It explains that data is often incomplete, noisy, or inconsistent when collected from the real world. Common preprocessing steps include data cleaning to handle these issues, data integration and transformation to combine multiple data sources, and data reduction to reduce the volume of data for analysis while maintaining analytical results. Specific techniques covered include filling in missing values, identifying and smoothing outliers, resolving inconsistencies, schema integration, attribute construction, data cube aggregation, dimensionality reduction, and discretization.
This document provides an overview of creating and working with tables and queries in Microsoft Access. It discusses how to create a new Access database and open an existing one. It describes the key components of the Access interface and how to work with tables, including adding and modifying data, setting field properties, and establishing relationships between tables. The document also covers creating basic selection and summation queries using the Query Wizard and design view, and how to filter tables to view specific records.
This document provides an introduction to SPSS (Statistical Package for Social Sciences) software. It discusses opening and closing SPSS, the structure and windows of SPSS including the Data View and Variable View windows for entering data. It defines key concepts in SPSS like variables, different types of variables (nominal, ordinal, interval, ratio), and the process of defining variables in the Variable View window by specifying name, type, width, labels, values etc. before entering data. Examples are given around designing an experiment with independent and dependent variables and dealing with extraneous variables.
This document discusses biological databases and SQL. It provides an overview of primary and derived data in biological research, as well as different data levels. It then discusses direct querying of selected bioinformatics databases using SQL and provides examples of 3-tier database models. The document proceeds to discuss rationale for learning SQL to query biological databases and provides definitions and explanations of key SQL concepts like tables, records, queries, data types, keys, integrity rules and constraints.
From this power point you can get the details about Advanced Filter, Use of Macros with Advanced Filter, Data Validation, Creation of data validation Drop-Down List, Handling of External Data, Goal Seek, What-if analysis,
Various statistical software's in data analysis.SelvaMani69
The document provides an overview of various statistical software used for data analysis. It discusses the history and emergence of statistical software, as well as common software packages for quantitative (e.g. SPSS, STATA, SAS) and qualitative (e.g. Atlas ti, HyperResearch) analysis. SPSS is described in more detail, including its point-and-click interface, ability to perform various analyses like regression and ANOVA, and examples of using it to code data, edit variable names, and create contingency tables. The document emphasizes that statistical software makes data analysis easier by automating calculations and reducing mathematical errors.
The document provides instructions for launching and using the statistical software SPSS. It discusses finding the SPSS icon on the computer and launching the program. Once SPSS is open, the user can start a new data file or open an existing one. Basic steps for using SPSS are outlined, including entering data, defining variables, testing for normality, statistical analysis, and interpreting results. Specific functions and menus in SPSS are demonstrated for descriptive statistics, normality testing, and t-tests.
The document discusses arrays and provides information about what arrays are, different types of arrays, initializing and accessing elements of arrays, and searching arrays. Some key points:
- An array is a group of consecutive memory locations with the same name and data type. It allows storing multiple values of the same type together.
- There are different types of arrays including one-dimensional, two-dimensional, and n-dimensional arrays.
- Elements of an array can be initialized when the array is declared and assigned values. Individual elements can also be accessed using their index.
- Searching an array involves finding a required value or element. Methods like sequential search and binary search can be used to search arrays. Sequential
This document provides an introduction to Microsoft Excel 2007, outlining the tools, skills, and functions covered in the online class. The summary includes:
1. The class will cover the basics of Excel including entering and editing data, formatting cells, copying and pasting, and basic formulas. Students will learn the new features in Excel 2007 like the ribbon interface.
2. Excel is used to perform calculations and analyze data through tools for organizing, sorting, and presenting information in tables and charts. Examples of its uses include personal finance, timesheets, and statistics.
3. Students will practice skills like resizing columns, using autofill, and getting help within Excel to understand its core capabilities for working with
This document discusses performing data science on HBase using the WibiData platform. It introduces WibiData Language (WDL), which allows analyzing data stored in HBase columns in a concise and interactive way using Scala and Apache Crunch. The document demonstrates building a histogram of editor metrics by reading user data from an HBase table, filtering and binning average edit deltas, and visualizing the results. WDL aims to make HBase data exploration more accessible for data scientists compared to other frameworks like Hive and Pig.
This document provides instructions for sorting and filtering data in Microsoft Excel spreadsheets. It discusses how to sort data alphabetically or numerically in ascending or descending order. It also describes how to perform multi-level sorts and filter data using AutoFilter. Charts in Excel are introduced as a way to display numeric data graphically. Instructions are provided for inserting charts and modifying chart elements like titles, axes, and formatting.
Microsoft Access is a database management system that allows users to create and manage databases to store and organize information. It contains different objects like tables, queries, forms and reports. Tables store data in rows and columns and can be related to each other. Queries allow users to filter and sort data. Forms provide interfaces for data entry and views. Reports generate printable views of data.
L9 using datawarrior for scientific data visualizationSeppo Karrila
A tutorial for beginning graduate students on data visualization, by hands-on training in using DataWarrior. These are only handout notes so the students can try things out on their own laptops, with the free software, instead of scribbling notes themselves. The instructor needs to demonstrate the options or functions listed in the handout notes.
This document provides an introduction to Excel, Word, and PowerPoint. It discusses the basics of spreadsheets in Excel including creating and formatting worksheets, calculations with formulas, and copying data to other programs. It also covers creating and formatting presentations in PowerPoint including adding slides, text, images, and charts. Finally, it discusses opening and viewing documents in Word and resources for learning more about Microsoft Office applications.
This document discusses preparing data for analysis. It covers the need for data exploration including validation, sanitization, and treatment of missing values and outliers. The main steps in statistical data analysis are also presented. Specific techniques discussed include calculating frequency counts and descriptive statistics to understand the distribution and characteristics of variables in a loan data set with 250,000 observations. SAS procedures like Proc Freq, Proc Univariate, and Proc Means are demonstrated for exploring the data.
At the end of this Lesson (Part 1) the students should be able to know the following
Introduction
Data Entry
Variable and Value Label
Entering Data
File management
Descriptive statistics
Editing and modifying the data
This document provides an overview of using the SPSS statistical package for data analysis. It discusses the four main windows in SPSS - the data editor, output viewer, syntax editor, and script window. It also covers the basics of managing data files, including opening SPSS, defining variables, and saving data. Finally, it introduces some basic analysis techniques in SPSS like frequencies, descriptives, and linear regression analysis.
This document provides an introduction to using SPSS (Statistical Package for the Social Sciences) for data analysis. It discusses the four main windows in SPSS - the data editor, output viewer, syntax editor, and script window. It also covers the basics of managing data files, including opening SPSS, defining variables, and sorting data. Several basic analysis techniques are introduced, such as frequencies, descriptives, and linear regression. Examples are provided for how to conduct these analyses and interpret the outputs.
Learn the most important tools of excel that will enable you to become an excel master. These skills are the building blocks of any advanced analysis and should be used every time you are int the program
Elementary Data Analysis with MS Excel_Day-4Redwan Ferdous
This event took place on 12th September 2020. This was arranged by EMK Center (Makerlab). The title was 'Elementary Data Analysis with MS Excel', where very basic data analysis with MS excel was discussed.
In Day-4, the MS Excel Data Tab, View and Review tab as well as Developer Tab of Horizontal top ribbon was discussed. As well as different Quick analysis tools, What-if Analysis, Data Table, Scenario Manager, Pareto Chart was also discussed.
Statistical Package for Social Science (SPSS)sspink
This presentation includes the introduction of SPSS is basic features of Spss, how to input data manually, descriptive statistics and how to perform t-test, Anova and Chi-Square.
This document discusses data preprocessing techniques. It explains that data is often incomplete, noisy, or inconsistent when collected from the real world. Common preprocessing steps include data cleaning to handle these issues, data integration and transformation to combine multiple data sources, and data reduction to reduce the volume of data for analysis while maintaining analytical results. Specific techniques covered include filling in missing values, identifying and smoothing outliers, resolving inconsistencies, schema integration, attribute construction, data cube aggregation, dimensionality reduction, and discretization.
This document provides an overview of creating and working with tables and queries in Microsoft Access. It discusses how to create a new Access database and open an existing one. It describes the key components of the Access interface and how to work with tables, including adding and modifying data, setting field properties, and establishing relationships between tables. The document also covers creating basic selection and summation queries using the Query Wizard and design view, and how to filter tables to view specific records.
This document provides an introduction to SPSS (Statistical Package for Social Sciences) software. It discusses opening and closing SPSS, the structure and windows of SPSS including the Data View and Variable View windows for entering data. It defines key concepts in SPSS like variables, different types of variables (nominal, ordinal, interval, ratio), and the process of defining variables in the Variable View window by specifying name, type, width, labels, values etc. before entering data. Examples are given around designing an experiment with independent and dependent variables and dealing with extraneous variables.
This document discusses biological databases and SQL. It provides an overview of primary and derived data in biological research, as well as different data levels. It then discusses direct querying of selected bioinformatics databases using SQL and provides examples of 3-tier database models. The document proceeds to discuss rationale for learning SQL to query biological databases and provides definitions and explanations of key SQL concepts like tables, records, queries, data types, keys, integrity rules and constraints.
From this power point you can get the details about Advanced Filter, Use of Macros with Advanced Filter, Data Validation, Creation of data validation Drop-Down List, Handling of External Data, Goal Seek, What-if analysis,
Various statistical software's in data analysis.SelvaMani69
The document provides an overview of various statistical software used for data analysis. It discusses the history and emergence of statistical software, as well as common software packages for quantitative (e.g. SPSS, STATA, SAS) and qualitative (e.g. Atlas ti, HyperResearch) analysis. SPSS is described in more detail, including its point-and-click interface, ability to perform various analyses like regression and ANOVA, and examples of using it to code data, edit variable names, and create contingency tables. The document emphasizes that statistical software makes data analysis easier by automating calculations and reducing mathematical errors.
The document provides instructions for launching and using the statistical software SPSS. It discusses finding the SPSS icon on the computer and launching the program. Once SPSS is open, the user can start a new data file or open an existing one. Basic steps for using SPSS are outlined, including entering data, defining variables, testing for normality, statistical analysis, and interpreting results. Specific functions and menus in SPSS are demonstrated for descriptive statistics, normality testing, and t-tests.
The document discusses arrays and provides information about what arrays are, different types of arrays, initializing and accessing elements of arrays, and searching arrays. Some key points:
- An array is a group of consecutive memory locations with the same name and data type. It allows storing multiple values of the same type together.
- There are different types of arrays including one-dimensional, two-dimensional, and n-dimensional arrays.
- Elements of an array can be initialized when the array is declared and assigned values. Individual elements can also be accessed using their index.
- Searching an array involves finding a required value or element. Methods like sequential search and binary search can be used to search arrays. Sequential
This document provides an introduction to Microsoft Excel 2007, outlining the tools, skills, and functions covered in the online class. The summary includes:
1. The class will cover the basics of Excel including entering and editing data, formatting cells, copying and pasting, and basic formulas. Students will learn the new features in Excel 2007 like the ribbon interface.
2. Excel is used to perform calculations and analyze data through tools for organizing, sorting, and presenting information in tables and charts. Examples of its uses include personal finance, timesheets, and statistics.
3. Students will practice skills like resizing columns, using autofill, and getting help within Excel to understand its core capabilities for working with
This document discusses performing data science on HBase using the WibiData platform. It introduces WibiData Language (WDL), which allows analyzing data stored in HBase columns in a concise and interactive way using Scala and Apache Crunch. The document demonstrates building a histogram of editor metrics by reading user data from an HBase table, filtering and binning average edit deltas, and visualizing the results. WDL aims to make HBase data exploration more accessible for data scientists compared to other frameworks like Hive and Pig.
This document provides instructions for sorting and filtering data in Microsoft Excel spreadsheets. It discusses how to sort data alphabetically or numerically in ascending or descending order. It also describes how to perform multi-level sorts and filter data using AutoFilter. Charts in Excel are introduced as a way to display numeric data graphically. Instructions are provided for inserting charts and modifying chart elements like titles, axes, and formatting.
Microsoft Access is a database management system that allows users to create and manage databases to store and organize information. It contains different objects like tables, queries, forms and reports. Tables store data in rows and columns and can be related to each other. Queries allow users to filter and sort data. Forms provide interfaces for data entry and views. Reports generate printable views of data.
L9 using datawarrior for scientific data visualizationSeppo Karrila
A tutorial for beginning graduate students on data visualization, by hands-on training in using DataWarrior. These are only handout notes so the students can try things out on their own laptops, with the free software, instead of scribbling notes themselves. The instructor needs to demonstrate the options or functions listed in the handout notes.
This document provides an introduction to Excel, Word, and PowerPoint. It discusses the basics of spreadsheets in Excel including creating and formatting worksheets, calculations with formulas, and copying data to other programs. It also covers creating and formatting presentations in PowerPoint including adding slides, text, images, and charts. Finally, it discusses opening and viewing documents in Word and resources for learning more about Microsoft Office applications.
This document discusses preparing data for analysis. It covers the need for data exploration including validation, sanitization, and treatment of missing values and outliers. The main steps in statistical data analysis are also presented. Specific techniques discussed include calculating frequency counts and descriptive statistics to understand the distribution and characteristics of variables in a loan data set with 250,000 observations. SAS procedures like Proc Freq, Proc Univariate, and Proc Means are demonstrated for exploring the data.
SPSS is a widely used statistical software package for analyzing social science and medical data. It provides drop down menus and templates to make data analysis and presentation user-friendly compared to other statistical software. SPSS has tools for importing, cleaning, transforming, and analyzing data. Key functions include sorting cases, merging datasets, recoding variables, and checking for outliers and normal distribution of variables.
SPSS is software used for managing data and calculating statistics. It has three main windows - the Data Editor for viewing data, Output Viewer for viewing results, and Syntax Editor for programming commands. The menu interface contains options for files, editing, viewing data, transforming data, analyzing data, and help. Data can be entered manually or read in from Excel or text files. Common analyses that can be performed include independent and paired t-tests to compare group means.
This document provides an introduction to data analysis using Microsoft Excel. It discusses the importance of data analysis and defines key terms like data, information, and knowledge. It also covers the Excel environment and basic functions for entering, organizing, and analyzing data, including sorting, filtering, formatting, and using formulas with cell references and functions. The goal is to teach students how to summarize, describe, and draw conclusions from raw data by changing it into processed information using Excel's tools and functions.
This document provides an introduction to using spreadsheets in Excel for humanities researchers. It discusses how spreadsheets can be used to store, organize, and manipulate data. Key points covered include: storing data in cells organized into columns and rows, sorting and filtering data to find relationships, using formatting to convey information, and visualizing data through automatic calculations and graphs. The goal is to help researchers think critically about their data and how spreadsheets can help analyze and understand it.
This document provides an overview of statistics concepts including descriptive and inferential statistics. Descriptive statistics are used to summarize and describe data through measures of central tendency (mean, median, mode), dispersion (range, standard deviation), and frequency/percentage. Inferential statistics allow inferences to be made about a population based on a sample through hypothesis testing and other statistical techniques. The document discusses preparing data in Excel and using formulas and functions to calculate descriptive statistics. It also introduces the concepts of normal distribution, kurtosis, and skewness in describing data distributions.
Lab 3 Set Working Directory, Scatterplots and Introduction to.docxDIPESH30
Lab 3: Set Working Directory, Scatterplots and Introduction to
Linear Regression
Chao-yo Cheng
[email protected]
Zsuzsanna Magyar
[email protected]
January 16, 2016
1 Section objectives
In this section we will use the HW2.dta. This dataset is a small set of variables from the larger
“Maddison dataset” 1 By the end, you should be comfortable using commands to import
your .dta file, making (somewhat) fancy scatterplots, and running () regressions.
2 Commands
In this lab, you should become familiar with the following commands.
cd
use
regress
and
twoway scatter
lfit
scheme ()
3 Set working directory and quickly running a .do file
• Log in and open Stata.
• Log in the class website. Save the Homework 2 data in “My Documents” (or any folder
that works for you).
• Open a .do file. First set the working directory on Stata. Type:
cd "insert path address"
To get the address of your path, right click on “My documents” and press “Copy address”
so you can copy this inside your “” after cd.
1See here for more information: http://www.ggdc.net/maddison/maddison-project/home.htm.
1
• To import the data type the command use. Type:
use HW2.dta
• Check whether or not the data has been imported properly.
• Now you know how to open your data using code. This means you can quickly run your
.do file on a clean dataset using the clear all command at the top of the .do file.
This will save you the trouble of opening a fresh dataset once your do file is finished.
• To sum up, at the top of your .do file, type
clear all
cd "path address"
use HW2.dta
Question 1. How many variables are in the dataset, and how many observations are there?
4 Scatterplots
• Everything after the “,” in a graphical command is an option. The variables being
graphed come before the comma.
• Sometimes it is nice to use a scheme for your scatterplot so it looks simpler. Here we
use the scheme (s1mono). Schemes determine the overall look of a graph.
• To draw scatterplots with observation labels and titles for the y and x axis. Type
twoway (scatter gdppc_2000 gdppc_1500, mlabel(country)), ///
scheme (s1mono) ///
ytitle("GDP per capita 2000") ///
xtitle("GDP per capita 1500") ///
title("Scatterplot of GDP per capita 1500 versus 2000")
• Add a line of best fit using lfit command. Type:
twoway (scatter gdppc_2000 gdppc_1500, mlabel(country)) (lfit gdppc_2000 gdppc_
1500, color(blue)), ///
scheme (s1mono) ///
ytitle("GDP per capita 2000") ///
xtitle("GDP per capita 1500") ///
title("Scatterplot of GDP per capita 1500 versus 2000")
Question 2. What relationship does the slope of the fitted line indicate?
5 Linear regression
• The dependent variable (or outcome variable) is what we are trying to explain. It is
also called the “outcome” or Y .
2
• The explanatory (or independent variables) are what we use to do the explaining. These
variables are also called predictors, as we think they are trying to predict the dependent
variable.
• The command fo ...
ds 1 Introduction to Data Structures.pptAlliVinay1
This document provides an introduction and overview of data structures. It begins by defining key terms like data, information, and entities. It then discusses how data structures represent logical relationships between data elements and how they should be easy to process and represent relationships. The document classifies common data structures as linear, non-linear, homogeneous, non-homogeneous, dynamic, and static. It also provides examples of basic notations, algorithms, control structures, and applications of different data structure types like arrays, stacks, queues, linked lists, trees, and graphs. Finally, it discusses complexity analysis and the tradeoff between time and space.
1.1 introduction to Data Structures.pptAshok280385
Here are the algorithms for the given problems:
1. WAA to find largest of three numbers:
1. Start
2. Read three numbers a, b, c
3. If a > b and a > c then largest number is a
4. Else If b > a and b > c then largest number is b
5. Else largest number is c
6. Print largest number
7. Stop
2. WAA to find the sum of first 10 natural numbers using for loop:
1. Start
2. Declare variables i, sum
3. Initialize i=1, sum=0
4. For i=1 to 10
5. sum =
A runchart is a tool used to assess improvement progress by plotting data over time alongside changes. It has three main elements - the time period, measurement data, and median line. A runchart is created before and during changes to evaluate effectiveness in real-time. Microsoft Excel can be used to easily create runcharts by setting up a data table and inserting a graph. Key elements like titles, labels and the median line should then be added to complete the runchart.
The document discusses several quality control tools including:
1) The seven old quality control tools which include cause and effect diagrams, Pareto analysis, scatter diagrams, decision matrices, control charts and brainstorming techniques.
2) Cause and effect diagrams (Ishikawa or fishbone diagrams) which identify potential causes for a problem or effect.
3) Check sheets which collect and analyze defect data through a structured form.
4) Histograms which show the distribution of data values to analyze process performance.
5) Pareto charts which arrange problems by frequency to focus on the most important few issues.
6) Scatter diagrams which look for relationships between variables.
7) Stratification which
The document discusses several quality control tools including:
1) The seven old quality control tools which include cause and effect diagrams, Pareto analysis, scatter diagrams, decision matrices, control charts and brainstorming techniques.
2) Cause and effect diagrams (Ishikawa or fishbone diagrams) which identify potential causes for a problem or effect.
3) Check sheets which collect and analyze data through a structured form.
4) Histograms which show the distribution of data values to determine if a process is stable.
5) Pareto charts which arrange problems or causes by frequency to focus on the most important ones.
6) Scatter diagrams which look for relationships between two variables.
7) Strat
The document discusses different types of data sets that can be analyzed in data science. It describes record, graph and network, ordered, spatial, image and multimedia data. It then discusses key concepts related to data sets including data objects, attributes, attribute types (nominal, binary, numeric), and characteristics of data sets like dimensionality and sparsity. The document also lists some repositories for finding publicly available data sets and outlines strategies for getting data like being provided data, downloading data, or scraping data from the web. Finally, it introduces the topic of data visualization.
The document discusses various statistical concepts including deciles, percentiles, coefficient of variation, five number summary, boxplots, skewness, kurtosis, and statistical software such as Excel and SPSS. Deciles and percentiles refer to the cut points when data is ordered and divided into 10 or 100 equal parts. The coefficient of variation measures the variability in a dataset relative to the mean. The five number summary consists of the minimum, first quartile, median, third quartile, and maximum values of a dataset. Boxplots provide a graphical representation of the five number summary. Skewness and kurtosis measure the asymmetry and peakedness of a distribution. Excel and SPSS are commonly used statistical software packages that allow importing
Data science combines fields like statistics, programming, and domain expertise to extract meaningful insights from data. It involves preparing, analyzing, and modeling data to discover useful information. Exploratory data analysis is the process of investigating data to understand its characteristics and check assumptions before modeling. There are four types of EDA: univariate non-graphical, univariate graphical, multivariate non-graphical, and multivariate graphical. Python and R are popular tools used for EDA due to their data analysis and visualization capabilities.
This document provides an overview of using SPSS (Statistical Package for the Social Sciences) software. It discusses installing sample data files, introduces the main interface windows including the data view, variable view and output view. It also covers how to define variable types, enter and modify data, perform basic analyses like frequencies and cross tabulations, and create charts from the output. The document is intended to help new users learn the basics of navigating the SPSS program and conducting initial analyses.
1. The document discusses PASW Statistics (SPSS), a software package used for statistical analysis. SPSS can be used to summarize, analyze, and visualize data to determine if hypotheses are supported.
2. Key aspects of SPSS covered include the data editor, which allows viewing and editing data in variable or data views, and transforming data using computations or recodes. Descriptive statistics, such as frequencies, means, and standard deviations, can be generated.
3. The median, mode, variance and other statistical techniques are defined to help understand how to analyze data in SPSS. Questions and examples are provided about loading data, recoding variables, and generating frequency tables and histograms.
This document discusses the Great Recession and policy responses to the financial crisis. It introduces financial considerations like a risk premium into the short-run model to understand the crisis. A rising risk premium interfered with monetary policy and shifted the AD curve down. This led to deflation concerns. Policy responses included unconventional monetary policy by expanding the Fed's balance sheet, fiscal stimulus, and the TARP program. Financial reform aimed to prevent future crises and address issues like moral hazard.
The document provides an overview of aggregate demand and aggregate supply (AD/AS) analysis. It discusses how monetary policy rules can be used to derive an aggregate demand curve and how the Phillips curve can be interpreted as an aggregate supply curve. The AD and AS curves can then be combined in a single framework to analyze macroeconomic effects. Specific events like inflation shocks, disinflation, and positive aggregate demand shocks are examined using the AD/AS model. Empirical evidence on inflation-output dynamics and modern monetary policy approaches are also reviewed.
This document provides an overview of monetary policy and the tools used by central banks. It discusses:
- The monetary policy (MP) curve which describes how central banks set the nominal interest rate in the short-run.
- The Phillips curve which shows how inflation responds to changes in economic activity.
- How the MP curve, Phillips curve, and aggregate demand curve (IS curve) make up the short-run macroeconomic model used to analyze the effects of monetary policy.
- How the Federal Reserve uses interest rate adjustments to influence output and inflation by shifting the MP curve and thereby affecting the real interest rate in the short-run.
This document provides an overview of the IS curve model. It begins with an introduction that establishes the relationship between interest rates and output in the short run. The IS curve captures this relationship graphically.
It then goes on to describe how to set up the basic IS curve model, which involves deriving the IS curve equation from the national income identity and consumption, investment, government spending, export, and import functions. It also discusses how to use the IS curve to show the effects of interest rate changes and aggregate demand shocks.
Finally, it discusses the microeconomic foundations of consumption behavior, investment decisions, and multiplier effects that provide the underlying basis for the IS curve relationship.
The document discusses the causes and impacts of the Great Recession that began in December 2007. It analyzes factors like the housing bubble and subprime lending crisis that contributed to the recession. It then examines the macroeconomic outcomes of the recession, including a decline in GDP of over 3%, a rise in unemployment to over 10%, and the loss of over 8 million jobs by February 2010. The recession had larger negative impacts on output and employment than typical past recessions. It also explores international impacts and compares the recession to previous financial crises.
This document provides an overview of key concepts relating to analyzing an economy in the short run, including:
- The difference between potential output in the long run and actual output which can fluctuate in the short run due to economic shocks.
- How the gap between actual and potential GDP indicates the state of the economy and whether it is in a recession.
- The relationship between output and inflation shown through the Phillips Curve, where higher output leads to increased inflation.
- Okun's Law which describes the inverse relationship between changes in output and the unemployment rate.
This document summarizes key concepts about inflation from Chapter 8 of an economics textbook. It defines inflation and discusses the quantity theory of money, explaining how the money supply, velocity of money, and nominal GDP are related. It also covers the relationship between real and nominal interest rates using the Fisher equation. The document discusses the costs of inflation and how fiscal policy and large government deficits can contribute to higher inflation, especially if a central bank lacks independence. It provides examples of hyperinflation and analyzes the causes of the Great Inflation of the 1970s.
This document summarizes key topics from a chapter on labor markets, including:
1) It describes the U.S. labor market trends of rising wages, employment-population ratios, and unemployment rates over time.
2) It explains the basic supply and demand model of the labor market and how taxes, regulations, and wage rigidities can create distortions.
3) It discusses different types of unemployment and the "bathtub model" for how employment and unemployment levels change over time.
4) It provides an overview of international labor market comparisons and differences between the U.S., Europe, and Japan.
5) It covers concepts of valuing human capital using present discounted values and explains the rising return
The document summarizes key concepts from Chapter 6 of an economics textbook on long-run economic growth. It introduces the Romer model of economic growth, which distinguishes between ideas and objects. Ideas are nonrival and lead to increasing returns. The Romer model generates sustained long-run growth through expanding knowledge. The document also combines the Solow and Romer models to develop a full theory of long-run growth accounting for both physical capital and ideas. It shows how growth accounting can be used to analyze sources of economic growth.
The document provides an overview of the Solow growth model, which models economic growth through capital accumulation over time. It describes the key components of the model, including the production function, capital accumulation equation, investment determination, and steady state. The model predicts that economies will eventually stop growing as they approach the steady state, due to diminishing returns to capital. However, it does not fully explain long-run economic growth. The document also discusses how the model can be used to analyze the effects of changes to parameters like the investment and depreciation rates.
This document provides an overview of a macroeconomic model of production. It introduces a Cobb-Douglas production function to model how output is determined by capital and labor inputs. The model assumes constant returns to scale and is solved to find the equilibrium levels of output, capital, labor, wage rates and rental rates. The model predicts that countries with more capital per person will have higher output per person. However, the model initially overpredicts output for many countries. Accounting for differences in total factor productivity across countries significantly improves the model's predictive power.
This document provides an overview of long-run economic growth by covering several topics:
1) Growth has dramatically improved living standards recently but this is a new phenomenon historically. Per capita GDP differs greatly around the world.
2) Sustained growth first emerged in different places and times, leading to a "Great Divergence" where countries now differ in per capita GDP by a factor of 50.
3) Modern economic growth is defined as growth in per capita GDP, which can be used to predict future output levels based on growth rates.
This document summarizes key concepts from a chapter on measuring macroeconomic indicators like GDP. It discusses three methods for calculating GDP - production, expenditure, and income - and how they provide identical measures. It also explains how GDP is measured over time using nominal and real GDP, and different price indexes like Laspeyres, Paasche, and chain-weighting which is preferred. Real GDP growth rates and inflation rates are calculated using these concepts.
This document provides guidance and information for a student project involving data analysis. It discusses selecting a topic related to macroeconomics or international economics and finding related data sources. It also provides instructions on conducting a literature review and summarizing research papers. The document demonstrates how to merge and reshape different data files in Stata and describes other potential data sources for economic research.
This document discusses global income inequality and its causes. It notes that inequality is rising within many countries even as it falls globally. Technological change is a major driver of rising inequality, contributing 55% of the increase. Other factors include declining unions, falling minimum wages, and trade. Racial inequality is also examined, with policies around housing and lending continuing to disadvantage minorities. The document explores potential solutions like redistribution, universal basic income, education access, and increased competition.
This document provides guidance on how to read and understand papers that use regression analysis. It discusses key elements to look for, such as the research question, dependent and independent variables, data sources, results, and interpretation. Examples of published papers are referenced and questions are provided for each that focus on understanding the methodology, variables, data, results, and conclusions. Understanding these components is important for correctly interpreting the analysis and findings presented in regression-based research papers.
This document provides an overview of multiple regression analysis in Stata. It discusses including multiple independent variables in a regression to control for other factors, using commands like preserve and restore. It also covers creating tables of regression results in Stata using outreg2, issues like multicollinearity, and interpreting coefficients in regressions with dummy variables. Examples use housing data to examine the relationship between price, age, size and other characteristics.
The document discusses model specification for multiple regression analysis, focusing on measures of fit including R-squared and standard error of regression, and how to properly interpret these statistics. It emphasizes the importance of random sampling to establish causal relationships and warns of potential biases from non-random samples, such as when evaluating mutual fund performance or estimating political support based on telephone and automobile owners.
1. The document discusses multiple regression analysis in Stata. It covers including multiple independent variables, interpreting regression coefficients, detecting multicollinearity issues, and creating tables to present regression results.
2. Examples show regressing house price on characteristics like size, age, bedrooms and bathrooms. Interpreting coefficients depends on what other variables are held constant.
3. Detecting multicollinearity involves adding variables one by one; it leads to insignificant coefficients but errs on the conservative side rather than false relationships. Perfect multicollinearity occurs when regressors are perfectly correlated.
- Categorical variables are variables that are described by words rather than numbers, like "cat person" vs "dog person". Continuous variables take numerical values like income or test scores.
- To analyze the effect of a categorical variable in a regression, it must be converted into a binary variable using 0s and 1s. This allows a comparison of means between the included group and omitted group.
- Difference-in-differences estimation compares the change in outcomes over time between a treatment group and a control group, allowing researchers to account for other factors to isolate the causal effect of a treatment or policy. It requires the presence of a treatment and control group as well as pre- and post-treatment observations.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
Physiology and chemistry of skin and pigmentation, hairs, scalp, lips and nail, Cleansing cream, Lotions, Face powders, Face packs, Lipsticks, Bath products, soaps and baby product,
Preparation and standardization of the following : Tonic, Bleaches, Dentifrices and Mouth washes & Tooth Pastes, Cosmetics for Nails.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
2. HOMEWORK FOR FRIDAY
• Using files grades, peanuts, and unrate…
• Find summary statistics for each variable
• Create histogram chart for grades
• Create line graph for unrate
• Save everything in a do file.
3. DESCRIPTIVE STATISTICS
• Mean – arithmetic mean, arithmetic average.
• Sum of the data values divided by the number of observations
• Mode
• Median
• Minimum, maximum
• Variance
• Standard deviation
4. MEAN
• Mean - arithmetic mean, arithmetic average. Sum of the data values divided by the
number of observations
• Example: Calculate the mean for the hypothetical data for shipments of peanuts from a
U.S. exporter to five Canadian cities
• Montreal – 640,000 pounds
• Ottawa – 15,000 pounds
• Toronto – 285,000 pounds
• Vancouver – 228,000 pounds
• Winnipeg – 45,000 pounds
• Notes: Σ means sum, unit of observation here is a Canadian city
5. MEAN CONT’D
• In excel:
• Click on fx and find the function name or type in
• =average(range of data)
• In Stata:
• Import data by clicking on “file” (upper left corner) -> “import” ->pick the format of the file->
find it by clicking ”browse”-> tick the box “import first row as variable names”
• Mean(peanuts)
6. STATA
• Stata is a powerful tool for researchers and applied economists.
• Infinitely extensible, gives users the same development tools used by the company’s professional
programmers
• Google is your best friend
• Stata has a few windows:
• bottom middle is the command window – this is where you type in the commands;
• top middle – the commands that you submitted appear and so does the output;
• left – all of the commands you have run;
• right – all of the variables you have in your dataset
• To view your dataset you can click on “data editor” or “data browser”
7. STATA
• Right now there is no data in Stata. We first have to upload the data to it. The way you
upload data into Stata (or any other type of statistical software) depends on the type of
data file you have
• Text data, such as comma-delimited files (.csv)
• Excel files (.xlsx)
• Stata files (.dta)
• Please find the dataset “grades” on blackboard. What type of file is it?
• Stata: file-> import->type of file. Please tick “import first row as variable names”
• If you want to upload a different dataset to work with it, type in “clear” in the command window
8. STATA LOGS AND DO-FILES
• log – records your work in Stata, start before you do anything else!
• .do file – lets you record a series of commands
• Try to make your own log and .do file
• Click on “log” -> “begin” ->give it a name ->save in the location convenient for you (this starts a
log, when you exit Stata the log will automatically save).
• Click on “do-file editor” start typing up commands. You would save it like any other document
(”save” -> give it a name, save in a convenient location).
• To run the commands in the do-file simply click ”run” at the top of the do-file
9. EXAMPLE
• Calculate mean for the student grades in excel and in Stata
• You will find the data set “grades” on blackboard
• Make sure your work in Stata is recorded in a log
• What is the unit of observation in the dataset (i.e. whose grades are these)?
• How many observations are there?
• What is the average grade in that class?
10. SMALLEST AND LARGEST OBSERVATION
• You might be wondering if anyone got 100 in the class, or what the highest grade in the class
was and possibly the lowest.
• We can do so by looking at the data, by sorting data, and by using minimum and maximum
functions in Excel and Stata
• To sort data:
• In Excel: highlight the data you want to sort, “data” -> “sort”
• In Stata: sort ’variablename’
• gsort +’variablename’ or –’variablename’
• Once you have sorted the data you can see what the first and last observations are
• Functions in Excel: =min(data), =max(data)
• Functions in Stata: summarize ‘variablename’
• Minimum and maximum let you know if you have outliers in your data or there are certain
problems with your data
11. APPLICATION 1. USE EXCEL
• Use UNRATE – unemployment rate dataset to find out the…
• Average unemployment rate between 1948 and 2020
• What was the maximum and minimum unemployment rate during that period?
• Any thoughts on your findings?
• TIP… Stata has an API with Fred. There are two ways of accessing the FRED database…
• Freduse command (might need to be installed)…. freduse UNRATE, clear
• File >> Import >> Federal Reserve Economic Database
12. APPLICATION 2. USE GRADES2 TO ANSWER THE
FOLLOWING
• In Stata:
• What is the minimum grade in that class?
• What is the maximum grade in that class?
• What is the average grade in that class?
• How do the minimums, maximums, and averages compare across the two classes?
13. STANDARD DEVIATION
• I want to calculate how dispersed the students’ grades are compared to the average
grade in the class
• Standard deviation (square root of variance) – spread of the observations around the
mean value
• Why is it useful? We can find out how much the data fluctuates around the mean in a
dataset and compare datasets, it also lets us know if there are any outliers in a dataset
so we can get rid of them.
• Examples: income in different cities, unemployment in different regions, return on
different companies’ stock,
14. STANDARD DEVIATION CONT’D
• In Excel the function for standard deviation is: =stdev(data)
• In Stata standard deviation is the part of summarize command output
15. STANDARD DEVIATION APPLICATIONS
• Find the standard deviation for both of the classes and compare them. What conclusion
can you draw?
• What was the standard deviation of the unemployment rate before and after outliers
were corrected? What conclusion can you draw?
16. VARIANCE
• Closely tied to standard deviation
• Variance = squared standard deviation
• Measure of how far away the observations are in a dataset from the mean
• To find variance in excel: =var(datarange)
• To find variance in Stata: have to square standard deviation by hand or use display r(Var)
after summarize command
• Stata retains a number of calculations (behind the scenes).
• return list
• There are other tools for calculating summary statistics…
• Help tabstat
• tabstat UNRATE, s(var)
17. USING STATA AND EXCEL AS A CALCULATOR
• To find variance you can always square standard deviation
• di r(Var)
• di r(sd)^2
• To use excel as a calculator you have to type in “=“ into a cell and then what you are
trying to calculate
• In Stata you have to type in the word ”display” and then what you are trying to calculate
• For example, if standard deviation is 1.6 then to calculate variance in
• Excel: =1.6^2 (or =1.6*1.6)
• Stata: display 1.6^2 (or display 1.6*1.6)
18. CREATING A NEW VARIABLE
• You can create new variables in Excel and Stata. This skill will be useful later on in the
class
• For now lets imagine the professor gives everyone in the first class a 1% curve and
calculate their grades
• In excel in a new cell type in: =”cell with data”+1, hover over bottom right corner of the
new cell and double click, the column should populate with calculated values. What is
the class average now once everyone received extra credit?
• Let’s import the grades into Stata and do the same. To create a new variable:
• generate var=classgrade+1
19. BAR CHARTS
• You would like to find out how many people in the class received an A, B, C, and D.
• The best way to look at that is to create a distribution chart (histogram) that will show
how many received each grade
• In Excel highlight the data->insert->histogram->right-click on the x-axis label to change
number of bins and their range
• In Stata click on graphics->histogram. There are many options, let’s go through some of
them
• Variable – classgrade
• Width of bins – 10 (this is how “wide” each grade category is)
• Lower limit of first bin – 60 (assuming no one failed the class)
• Y-axis – frequency
20. BAR CHARTS CONT’D
• We can create bar charts to compare the same variable over time (i.e. unemployment) or
across different units (i.e. income across different cities)
• Let’s create an overtime bar chart using unemployment rate data in excel
• Highlight unemployment rate column by clicking on column name twice
• Click “insert” (top right)->pick bar chart (2D column)
• Left click on x-axis labels->select data->edit->select range (years column) by
highlighting it
• To add labels to the axes, click on the chart->”+” symbol at the right corner-> tick axis
titles->type the titles into the boxes
21. LINE CHARTS
• Showing the progression of a variable overtime is easier with a line chart
• Load unemployment rate to Stata
• This is time series data. We have to treat it a bit differently
generate daten = tm(1948m1) +_n-1
format daten %tm
tsset daten, monthly
sort month
• Click “graphics” on the top left -> twoway graph->create->line plot type-> Y-variable is
unemployment rate, X-variable is year->submit
• To save your graph - > file->save as-> pick the type that will make it easy for you to open
the graph
• https://fred.stlouisfed.org/series/UNRATE/ compare your graphs to FRED data
22. SIDE NOTE
• How does Stata work with time series data…
• It uses a numerical system stating in 1/1/1960 (this value will always be 0)
• _n refers to a specific period
• _N refers to total number of observations
• Why do I need to subtract 1 when finding the correct month…
• To ensure the data align with 1/1/1960 is 0
23. GDP OVERTIME IN US, MEXICO, AND CANADA
• Please google “GDP per capita by country world bank” -> pick the one in current US$
(why do we have to use GDP per capita in current dollars? ) ->Download the csv file
• Use ctrl-F to find GDP for US, Mexico and Canada. Copy and paste into a new document
each country’s GDP
• Delete third and fourth columns
• Create a line chart. What conclusion can we draw about the relative economic growth of
these countries?
24. CORRELATION
• Is it possible to improve your score during the semester or is the grade on the first exam
closely related to the grade at the end of the semester?
• Use grades3.xlsx data set to be able to answer this question
• Import the dataset into stata. We are going to plot the observed points on a graph
where the axes are: exam grade and class grade
• To do so type in: scatter(exam1 classgrade)
• We can tell that there is a positive relationship between the two variables
• The graph that you created is called a scatterplot. By looking at scatterplots we can kind
of tell if there is a relationship between different variables in the data. We can also make
an educated guess whether the relationship between the two variables is positive or
negative by looking at a scatterplot
• Can you think of two variables that might be positively or negatively related?
25. CALIFORNIA SCHOOL’S DATASET
• The data set includes data on California’s school districts in 1998-1999 school year
• It includes average test scores for 5th grades in each school district
• The description of the data set is in the word document titled “California Test Scores”
• Let’s look at the relationship between total enrollment and testscores
• Stata: scatter testscr enrl_tot
• Take a look at the data description and think of what could be related to the test scores?
Is it a positive or a negative relationship?
26. CORRELATION COEFFICIENT
• We don’t have to guess whether there is a relationship between two variables and
whether the relationship is positive or negative
• We will use something called “correlation coefficient” (usually denoted r) to answer that
• If r is between 0 and 1 the relationship is positive
• If r is between -1 and 0 the relationship is negative
• The closer the absolute value of r to 1is, the stronger the relationship
• The closer the absolute value of r to 0 is, the weaker the relationship
• In stata to find the correlation coefficient type in: correlate variable1 variable2
• In excel to find the correlation coefficient type in: =correl(variable 1 variable2)
27. DO IT YOURSELF TIME
• Try to create a scatterplot for the grades3 dataset in excel
• Hint: a scatterplot is just a type of chart, your steps would be similar to creating a bar
chart in excel
• Try to find the correlation coefficient for the grades3 dataset in excel (on slido)
• Hint: the correlation coefficient is a type of function. This should be similar to finding an
average or a standard deviation in excel.
28. LINE OF BEST FIT
• Line of best fit is the line that best represents all of the data points on a scatterplot
• Like any straight line it has an intercept and a slope
• The equation of a straight line is: y=mx+b
• Where b – intercept with the y-axis, m – the slope of the line
• If the line of best fit for a scatterplot is y=-3x+2, this means that 2 – intercept with the Y-
axis and 3 – slope of the line.
• When x = 0, y = 2
• Since the slope is negative the relationship between the two variables is negative.
29. EXAMPLE: LINE OF BEST FIT FOR CLASS GRADES
• Once you have created a scatterplot in excel you can add the line of best fit to it
• Click on the “+” in the upper-right corner, tick “trendline”
• You can see that the line of best fit is upward-sloping => the relationship between the
two variables is positive
• To find out the equation of the line left-click on it ->format->display equation on chart
• What are the intercept and the slope of the line? What conclusion can we draw from
knowing those numbers?
• Do they make sense?
30. CONCLUSION
• We have reviewed descriptive statistics. What are some of the descriptive stats we have
discussed?
• How can we find them in excel?
• How can we find them in stata?
• What types of charts have you learned to create? How can you do this in stata/ excel?
• If the correlation coefficient is -1 what does it mean? 0? 0.2?