Introduces common data management techniques in Stata. Topics covered include basic data manipulation commands such as: recoding variables, creating new variables, working with missing data, and generating variables based on complex selection criteria, merging and collapsing data sets. Intended for users who have an introductory level of knowledge of Stata software.
All workshop materials including slides, do files, and example data sets can be downloaded from http://projects.iq.harvard.edu/rtc/event/data-management-stata
Learn how to navigate Stata’s graphical user interface, create log files, and import data from a variety of software packages. Includes tips for getting started with Stata including the creation and organization of do-files, examining descriptive statistics, and managing data and value labels. This workshop is designed for individuals who have little or no experience using Stata software.
Full workshop materials including example data sets and .do file are available at http://projects.iq.harvard.edu/rtc/event/introduction-stata
Topics for the class include multiple regression, dummy variables, interaction effects, hypothesis tests, and model diagnostics. Prerequisites include a general familiarity with Stata, including importing and managing datasets and data exploration, the linear regression model, and the ordinary least squares estimation.
Workshop materials including do files and example data sets are available from http://projects.iq.harvard.edu/rtc/event/regression-stata
This is a short introduction course to Stata statistical software version 9. The course still applies to later versions of Stata, too. The course duration was 9 hours. It has been given at the Faculty of Economics and Political Science, Cairo University.
This document provides an overview of using SPSS (Statistical Package for the Social Sciences) software. It introduces the main interfaces for working with data in SPSS, including the data view, variable view, output view, draft view, and syntax view. It also provides instructions for installing sample data files and demonstrates how to generate a basic cross-tabulation output of employment by gender using the automated features.
This document provides an introduction and overview of SPSS (Statistical Package for the Social Sciences). It discusses what SPSS is, the research process it supports, how questionnaires are translated into SPSS, different question and response formats, and levels of measurement. It also briefly outlines some of SPSS's data editing, analysis, and output features.
This document provides a basic guide to using the statistical software package SPSS. It introduces SPSS as a program used by researchers to perform statistical analysis of data. The document explains that SPSS can be used to describe data through descriptive statistics, examine relationships between variables, and compare groups. It also provides instructions on how to open and start SPSS.
This document provides an introduction and overview of Stata, a statistical software package. It discusses what Stata is, the different versions of Stata, the Stata interface and how to navigate it. It also summarizes key functions for working with data in Stata, including importing and exporting data, exploring and managing variables, and conducting statistical analysis. The document is intended as a starting point for learning to use Stata.
Learn how to navigate Stata’s graphical user interface, create log files, and import data from a variety of software packages. Includes tips for getting started with Stata including the creation and organization of do-files, examining descriptive statistics, and managing data and value labels. This workshop is designed for individuals who have little or no experience using Stata software.
Full workshop materials including example data sets and .do file are available at http://projects.iq.harvard.edu/rtc/event/introduction-stata
Topics for the class include multiple regression, dummy variables, interaction effects, hypothesis tests, and model diagnostics. Prerequisites include a general familiarity with Stata, including importing and managing datasets and data exploration, the linear regression model, and the ordinary least squares estimation.
Workshop materials including do files and example data sets are available from http://projects.iq.harvard.edu/rtc/event/regression-stata
This is a short introduction course to Stata statistical software version 9. The course still applies to later versions of Stata, too. The course duration was 9 hours. It has been given at the Faculty of Economics and Political Science, Cairo University.
This document provides an overview of using SPSS (Statistical Package for the Social Sciences) software. It introduces the main interfaces for working with data in SPSS, including the data view, variable view, output view, draft view, and syntax view. It also provides instructions for installing sample data files and demonstrates how to generate a basic cross-tabulation output of employment by gender using the automated features.
This document provides an introduction and overview of SPSS (Statistical Package for the Social Sciences). It discusses what SPSS is, the research process it supports, how questionnaires are translated into SPSS, different question and response formats, and levels of measurement. It also briefly outlines some of SPSS's data editing, analysis, and output features.
This document provides a basic guide to using the statistical software package SPSS. It introduces SPSS as a program used by researchers to perform statistical analysis of data. The document explains that SPSS can be used to describe data through descriptive statistics, examine relationships between variables, and compare groups. It also provides instructions on how to open and start SPSS.
This document provides an introduction and overview of Stata, a statistical software package. It discusses what Stata is, the different versions of Stata, the Stata interface and how to navigate it. It also summarizes key functions for working with data in Stata, including importing and exporting data, exploring and managing variables, and conducting statistical analysis. The document is intended as a starting point for learning to use Stata.
Data analysis is a process that involves gathering, modeling, and transforming data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. It describes several major techniques for data analysis, including correlation analysis, regression analysis, factor analysis, cluster analysis, correspondence analysis, conjoint analysis, CHAID analysis, discriminant/logistic regression analysis, multidimensional scaling, and structural equation modeling.
The document provides instructions for launching and using the statistical software SPSS. It discusses finding the SPSS icon on the computer and launching the program. Once SPSS is open, the user can start a new data file or open an existing one. Basic steps for using SPSS are outlined, including entering data, defining variables, testing for normality, statistical analysis, and interpreting results. Specific functions and menus in SPSS are demonstrated for descriptive statistics, normality testing, and t-tests.
This document provides an overview of topics related to research and statistics, including research problems, variables, hypotheses, data collection, presentation, and analysis using SPSS. It discusses key concepts such as descriptive versus inferential statistics, point and interval estimates, and confidence intervals for means and proportions. The document serves as an introduction to research methodology and statistical analysis concepts.
SPSS (Statistical Package for the Social Sciences) is software used for data analysis. It can process questionnaires, report data in tables and graphs, and analyze means, chi-squares, regression, and more. Originally its own company, SPSS is now owned by IBM and integrated into their software portfolio. The document provides an overview of using SPSS, including entering data from questionnaires, different question/response formats, and descriptive statistical analysis functions in SPSS like frequencies, cross-tabs, and graphs.
This document provides an introduction to SPSS (Statistical Package for Social Sciences) software. It discusses the history and ownership of SPSS, its use as a statistical analysis program, and an overview of its basic functions. Key features covered include opening and managing data files, descriptive statistics like frequencies and charts, data cleaning techniques for handling missing values, and methods for data manipulation such as recoding variables and creating new computed variables. The goal is to provide readers with foundational knowledge on using SPSS for statistical analysis in the social sciences.
Software packages for statistical analysis - SPSSANAND BALAJI
This document provides an overview of the Statistical Package for Social Sciences (SPSS). It discusses what SPSS is, how to define and enter variables, and the four main windows in SPSS including the data editor, output viewer, syntax editor, and script window. Basic functions like frequencies analysis, descriptives, and linear regression are also introduced.
This document discusses time series analysis. It defines a time series as a collection of observations made sequentially over time. Examples include financial, scientific, demographic, and meteorological time series data. The document contrasts time series data with cross-sectional data. It also describes the components of a time series, including trends, seasonal variations, cyclical variations, and irregular/random variations. The purposes and uses of time series analysis are discussed, along with methods for decomposing and measuring trends in time series data.
This document provides an overview of various statistical analysis techniques used in inferential statistics, including t-tests, ANOVA, ANCOVA, chi-square, regression analysis, and interpreting null hypotheses. It defines key terms like alpha levels, effect sizes, and interpreting graphs. The overall purpose is to explain common statistical methods for analyzing data and determining the probability that results occurred by chance or were statistically significant.
This document provides an outline and slides for a presentation on statistical distributions. It begins with an introduction to frequency distributions, measures of central tendency, variability, z-scores, and theoretical distributions. Examples of different types of distributions are shown including normal, binomial, t and chi-square distributions. The document concludes with examples of how to apply these distributions to calculate probabilities and test hypotheses related to health data.
This document provides instructions for working with time series data in Stata. It discusses how to properly format date variables from various formats into a date format recognized by Stata. It also explains how to declare time series data and use time series operators to generate lags, leads, differences, and seasonal values. Statistical commands for exploring autocorrelation and the relationship between two time series are also covered.
Lecture on Introduction to Descriptive Statistics - Part 1 and Part 2. These slides were presented during a lecture at the Colombo Institute of Research and Psychology.
This document provides an overview of statistics concepts including descriptive and inferential statistics. Descriptive statistics are used to summarize and describe data through measures of central tendency (mean, median, mode), dispersion (range, standard deviation), and frequency/percentage. Inferential statistics allow inferences to be made about a population based on a sample through hypothesis testing and other statistical techniques. The document discusses preparing data in Excel and using formulas and functions to calculate descriptive statistics. It also introduces the concepts of normal distribution, kurtosis, and skewness in describing data distributions.
This document discusses statistical analysis using SPSS. It describes descriptive statistics, which present data in a usable form by describing frequency, central tendency, and dispersion. Inferential statistics make broader generalizations from samples to populations using hypothesis testing. Hypothesis testing involves research hypotheses, null hypotheses, levels of significance, and type I and II errors. Choosing an appropriate statistical test depends on the hypothesis and measurement levels of the variables. SPSS is a comprehensive system for statistical analysis that can analyze many file types and generate reports and statistics.
1. The document discusses descriptive statistics, which is the study of how to collect, organize, analyze, and interpret numerical data.
2. Descriptive statistics can be used to describe data through measures of central tendency like the mean, median, and mode as well as measures of variability like the range.
3. These statistical techniques help summarize and communicate patterns in data in a concise manner.
This document provides an overview of logistic regression, including when and why it is used, the theory behind it, and how to assess logistic regression models. Logistic regression predicts the probability of categorical outcomes given categorical or continuous predictor variables. It relaxes the normality and linearity assumptions of linear regression. The relationship between predictors and outcomes is modeled using an S-shaped logistic function. Model fit, predictors, and interpretations of coefficients are discussed.
SPSS is a statistical software package used for data management and analysis. It allows users to enter and manage large amounts of data, perform a wide range of statistical analyses, and output results in tables and graphs. The main SPSS windows are the Data Editor, used to enter and view data, and the Viewer, which displays output of statistical analyses. Common analysis techniques demonstrated in the document include independent and paired t-tests to compare group means. The document provides guidance on using SPSS for questionnaire design and statistical analysis to efficiently analyze social science and business data.
This document provides an overview of data analysis and statistics concepts for a training session. It begins with an agenda outlining topics like descriptive statistics, inferential statistics, and independent vs dependent samples. Descriptive statistics concepts covered include measures of central tendency (mean, median, mode), measures of variability (range, standard deviation), and charts. Inferential statistics discusses estimating population parameters, hypothesis testing, and statistical tests like t-tests, ANOVA, and chi-squared. The document provides examples and online simulation tools. It concludes with some practical tips for data analysis like checking for errors, reviewing findings early, and consulting a statistician on analysis plans.
This document provides an overview of using SPSS to analyze data. It discusses opening data files in SPSS, viewing the data, entering new data values, setting up variable properties like name, type, and label. It also covers running frequency analyses and descriptive statistics, computing new variables, and concludes that SPSS is a powerful tool for statistical analysis.
Descriptive statistics, central tendency, measures of variability, measures of dispersion, skewness, kurtosis, range, standard deviation, mean, median, mode, variance, normal distribution
Ringkasan dokumen tersebut dalam 3 kalimat adalah:
Model probit digunakan untuk memprediksi kemungkinan terjadinya suatu peristiwa berdasarkan faktor-faktor yang mempengaruhinya. Model ini menggunakan fungsi distribusi normal kumulatif untuk memperoleh indeks peluang. Cara kerja model probit meliputi estimasi peluang, perolehan indeks peluang, dan regresi untuk memperoleh parameter model.
Binary outcome models are widely used in many real world application. We can used Probit and Logit models to analysis this type of data. Specially, dose response data can be analyze using these two models.
Data analysis is a process that involves gathering, modeling, and transforming data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. It describes several major techniques for data analysis, including correlation analysis, regression analysis, factor analysis, cluster analysis, correspondence analysis, conjoint analysis, CHAID analysis, discriminant/logistic regression analysis, multidimensional scaling, and structural equation modeling.
The document provides instructions for launching and using the statistical software SPSS. It discusses finding the SPSS icon on the computer and launching the program. Once SPSS is open, the user can start a new data file or open an existing one. Basic steps for using SPSS are outlined, including entering data, defining variables, testing for normality, statistical analysis, and interpreting results. Specific functions and menus in SPSS are demonstrated for descriptive statistics, normality testing, and t-tests.
This document provides an overview of topics related to research and statistics, including research problems, variables, hypotheses, data collection, presentation, and analysis using SPSS. It discusses key concepts such as descriptive versus inferential statistics, point and interval estimates, and confidence intervals for means and proportions. The document serves as an introduction to research methodology and statistical analysis concepts.
SPSS (Statistical Package for the Social Sciences) is software used for data analysis. It can process questionnaires, report data in tables and graphs, and analyze means, chi-squares, regression, and more. Originally its own company, SPSS is now owned by IBM and integrated into their software portfolio. The document provides an overview of using SPSS, including entering data from questionnaires, different question/response formats, and descriptive statistical analysis functions in SPSS like frequencies, cross-tabs, and graphs.
This document provides an introduction to SPSS (Statistical Package for Social Sciences) software. It discusses the history and ownership of SPSS, its use as a statistical analysis program, and an overview of its basic functions. Key features covered include opening and managing data files, descriptive statistics like frequencies and charts, data cleaning techniques for handling missing values, and methods for data manipulation such as recoding variables and creating new computed variables. The goal is to provide readers with foundational knowledge on using SPSS for statistical analysis in the social sciences.
Software packages for statistical analysis - SPSSANAND BALAJI
This document provides an overview of the Statistical Package for Social Sciences (SPSS). It discusses what SPSS is, how to define and enter variables, and the four main windows in SPSS including the data editor, output viewer, syntax editor, and script window. Basic functions like frequencies analysis, descriptives, and linear regression are also introduced.
This document discusses time series analysis. It defines a time series as a collection of observations made sequentially over time. Examples include financial, scientific, demographic, and meteorological time series data. The document contrasts time series data with cross-sectional data. It also describes the components of a time series, including trends, seasonal variations, cyclical variations, and irregular/random variations. The purposes and uses of time series analysis are discussed, along with methods for decomposing and measuring trends in time series data.
This document provides an overview of various statistical analysis techniques used in inferential statistics, including t-tests, ANOVA, ANCOVA, chi-square, regression analysis, and interpreting null hypotheses. It defines key terms like alpha levels, effect sizes, and interpreting graphs. The overall purpose is to explain common statistical methods for analyzing data and determining the probability that results occurred by chance or were statistically significant.
This document provides an outline and slides for a presentation on statistical distributions. It begins with an introduction to frequency distributions, measures of central tendency, variability, z-scores, and theoretical distributions. Examples of different types of distributions are shown including normal, binomial, t and chi-square distributions. The document concludes with examples of how to apply these distributions to calculate probabilities and test hypotheses related to health data.
This document provides instructions for working with time series data in Stata. It discusses how to properly format date variables from various formats into a date format recognized by Stata. It also explains how to declare time series data and use time series operators to generate lags, leads, differences, and seasonal values. Statistical commands for exploring autocorrelation and the relationship between two time series are also covered.
Lecture on Introduction to Descriptive Statistics - Part 1 and Part 2. These slides were presented during a lecture at the Colombo Institute of Research and Psychology.
This document provides an overview of statistics concepts including descriptive and inferential statistics. Descriptive statistics are used to summarize and describe data through measures of central tendency (mean, median, mode), dispersion (range, standard deviation), and frequency/percentage. Inferential statistics allow inferences to be made about a population based on a sample through hypothesis testing and other statistical techniques. The document discusses preparing data in Excel and using formulas and functions to calculate descriptive statistics. It also introduces the concepts of normal distribution, kurtosis, and skewness in describing data distributions.
This document discusses statistical analysis using SPSS. It describes descriptive statistics, which present data in a usable form by describing frequency, central tendency, and dispersion. Inferential statistics make broader generalizations from samples to populations using hypothesis testing. Hypothesis testing involves research hypotheses, null hypotheses, levels of significance, and type I and II errors. Choosing an appropriate statistical test depends on the hypothesis and measurement levels of the variables. SPSS is a comprehensive system for statistical analysis that can analyze many file types and generate reports and statistics.
1. The document discusses descriptive statistics, which is the study of how to collect, organize, analyze, and interpret numerical data.
2. Descriptive statistics can be used to describe data through measures of central tendency like the mean, median, and mode as well as measures of variability like the range.
3. These statistical techniques help summarize and communicate patterns in data in a concise manner.
This document provides an overview of logistic regression, including when and why it is used, the theory behind it, and how to assess logistic regression models. Logistic regression predicts the probability of categorical outcomes given categorical or continuous predictor variables. It relaxes the normality and linearity assumptions of linear regression. The relationship between predictors and outcomes is modeled using an S-shaped logistic function. Model fit, predictors, and interpretations of coefficients are discussed.
SPSS is a statistical software package used for data management and analysis. It allows users to enter and manage large amounts of data, perform a wide range of statistical analyses, and output results in tables and graphs. The main SPSS windows are the Data Editor, used to enter and view data, and the Viewer, which displays output of statistical analyses. Common analysis techniques demonstrated in the document include independent and paired t-tests to compare group means. The document provides guidance on using SPSS for questionnaire design and statistical analysis to efficiently analyze social science and business data.
This document provides an overview of data analysis and statistics concepts for a training session. It begins with an agenda outlining topics like descriptive statistics, inferential statistics, and independent vs dependent samples. Descriptive statistics concepts covered include measures of central tendency (mean, median, mode), measures of variability (range, standard deviation), and charts. Inferential statistics discusses estimating population parameters, hypothesis testing, and statistical tests like t-tests, ANOVA, and chi-squared. The document provides examples and online simulation tools. It concludes with some practical tips for data analysis like checking for errors, reviewing findings early, and consulting a statistician on analysis plans.
This document provides an overview of using SPSS to analyze data. It discusses opening data files in SPSS, viewing the data, entering new data values, setting up variable properties like name, type, and label. It also covers running frequency analyses and descriptive statistics, computing new variables, and concludes that SPSS is a powerful tool for statistical analysis.
Descriptive statistics, central tendency, measures of variability, measures of dispersion, skewness, kurtosis, range, standard deviation, mean, median, mode, variance, normal distribution
Ringkasan dokumen tersebut dalam 3 kalimat adalah:
Model probit digunakan untuk memprediksi kemungkinan terjadinya suatu peristiwa berdasarkan faktor-faktor yang mempengaruhinya. Model ini menggunakan fungsi distribusi normal kumulatif untuk memperoleh indeks peluang. Cara kerja model probit meliputi estimasi peluang, perolehan indeks peluang, dan regresi untuk memperoleh parameter model.
Binary outcome models are widely used in many real world application. We can used Probit and Logit models to analysis this type of data. Specially, dose response data can be analyze using these two models.
This document provides an outline for tutorials on using STATA software to perform panel regressions. It discusses opening datasets, declaring time and ID variables, performing fixed effects and random effects regressions, conducting Hausman tests, and includes a challenge example adding additional independent variables to an existing panel regression model. The datasets used are based on examples from an econometrics textbook and can be downloaded from the STATA website. Additional tutorial topics on the website include data management, statistical analysis techniques, graphs, and time series analysis.
The document provides guidance on using the logical framework approach (LFA) to design projects in a systematic and logical way. It discusses key aspects of the LFA including problem analysis, objectives analysis, strategy analysis, developing the logframe matrix, activity planning, and resource planning. The LFA helps ensure problems are analyzed systematically, objectives are clearly defined and measurable, and risks and assumptions are considered. Using the LFA helps make project proposals more coherent and increases the chances of securing donor funding.
Provide an introduction to graphics in Stata. Topics include graphing principles, descriptive graphs, and post-estimation graphs. This is an introductory workshop appropriate for those with little experience with graphics in Stata. Intended for those with basic Stata skills.
All workshop materials including slides, do files, and example data sets can be downloaded from http://projects.iq.harvard.edu/rtc/event/graphing-stata
This document provides an outline for tutorials on graphing data using STATA software. It discusses how to create histograms, scatter plots, overlaid charts, and how to label and format graphs. It also includes a challenge to graph 3-month and 1-year interest rates from macro_2e.dta dataset. The document directs readers to the website www.STATA.org.uk for step-by-step screenshot guides and to download datasets used in the tutorials.
This document provides information about probit analysis, which is a type of regression analysis used for binomial response variables. It can transform a sigmoid dose-response curve into a straight line. The document discusses the history of probit analysis and its applications in fields like toxicology. It describes how probit analysis is used to calculate toxicity values like LD50, which is the dose that kills 50% of test subjects. Finally, it explains different methods for conducting probit analysis, including by hand calculations, regression, or using statistical software.
Probit analysis is used to analyze binomial response experiments, such as testing the toxicity of chemicals. It transforms a sigmoid dose-response curve into a straight line to allow regression analysis. Key steps include:
1) Calculating proportions killed at different doses
2) Determining empirical and expected probits from tables and regression
3) Computing weighting coefficients and working probits
4) Estimating LC50 or dose that kills 50% of the subjects from the regression equation.
This document provides an outline for tutorials on using STATA software to conduct probit and logit analysis. It describes datasets used in examples from an econometrics textbook. The tutorials cover opening datasets, running probit and logit models, and conducting a challenge analysis estimating the impact of race on denial rates using a probit model. The website listed provides step-by-step guides and downloads for additional STATA tutorials.
This document provides information about tutorials and datasets for using the STATA data analysis software. It explains that the tutorials use example datasets from an econometrics textbook and how to download the datasets and command files from www.STATA.org.uk. The website also contains step-by-step screenshot guides for using STATA and tutorials on topics like data management, statistical analysis, graphs, regressions, and other analyses.
STATA is data analysis software that can be used via menu options or typed commands. It has a wide range of econometric techniques and can open, examine, and run regressions on datasets. The tutorials on www.STATA.org.uk provide step-by-step guides for using STATA to perform tasks like data management, statistical analysis, importing data, summary statistics, graphs, regressions, and other analyses.
This document outlines tutorials for using STATA software to perform time series analysis. It discusses how to manage time series data, perform Dickey-Fuller tests, estimate ARIMA, VAR, and ARCH/GARCH models, and provides challenges for forecasting with various models. Screenshot guides and sample datasets are available on the STATA website to help users learn how to apply these techniques in STATA. A variety of other statistical analysis topics also have tutorials on the site.
The document outlines the objectives, principles, content areas and task levels of the Division Monitoring and Evaluation framework. The key points are:
1. The objectives of the framework are to provide management information to improve education service delivery, implement projects and programs effectively, allocate resources appropriately, and assess organizational performance.
2. Principles of the framework include ensuring quality information, strengthening existing systems, achieving results efficiently, transparency, synergy between entities, and using M&E for continuous learning and accountability.
3. Content areas of focus for M&E in the division are delivery of education services, educational programs/projects, curriculum implementation, technical assistance, resources, and organizational effectiveness and support.
Monitoring involves systematically collecting and analyzing data during project implementation to inform decision making, ensure activities are on track, and identify any needed corrections. Evaluation assesses projects after completion to determine relevance, effectiveness, efficiency, sustainability, and impact. Both processes provide information for accountability and learning, with monitoring focusing on operational performance and evaluation making judgments about overall achievement of objectives.
This document provides an outline for a workshop on data management in Stata. The workshop covers topics such as generating and replacing variables, processing data using the "by" command, handling missing values, working with different variable types including dates and times, and merging, appending and summarizing datasets. The workshop is intended to provide an introduction to these core data management tasks in Stata for those with basic knowledge of the software.
This document provides an overview of how to use SPSS to enter and modify data. It discusses defining variable types like numeric, string, date in the variable view. It also covers creating a new dataset, recoding variables to group data into categories, and using the recoding tool to transform continuous variables into categorical variables for analysis. The document demonstrates how to backup original data before recoding and reintroduces the exceptions for recoding special variable types.
This document provides an overview of using SPSS (Statistical Package for the Social Sciences) software. It discusses installing sample data files, introduces the main interface windows including the data view, variable view and output view. It also covers how to define variable types, enter and modify data, perform basic analyses like frequencies and cross tabulations, and create charts from the output. The document is intended to help new users learn the basics of navigating the SPSS program and conducting initial analyses.
The document discusses descriptive statistics which are used to describe basic features of data through simple summaries. It covers univariate analysis which examines one variable at a time through its distribution, measures of central tendency (mean, median, mode), and measures of dispersion (range, standard deviation). Frequency distributions and histograms are presented as ways to describe a variable's distribution.
IMG1.jpg
IMG2.jpg
IMG3.jpg
2016 6 19 1:56 Page 1
1
use "/Users/Tina/Downloads/Quebec.dta
gen YEARSED=7 if HDGREE==1
replace YEARSED=12 if HDGREE==2
replace YEARSED=13 if HDGREE>=3 & HDGREE<=5
replace YEARSED=14 if HDGREE>=6 & HDGREE<=8
replace YEARSED=16 if HDGREE==9
replace YEARSED=16 if HDGREE>=9 & HDGREE<=10
replace YEARSED=20 if HDGREE==11
replace YEARSED=17 if HDGREE==12
replace YEARSED=23 if HDGREE==13
tab YEARSED HDGREE
cap gen female = SEX==1
tabstat TOTINC if female ==1, by(HDGREE) c(s) s(mean min max)
tabstat TOTINC if female ==0, by(HDGREE) c(s) s(mean min max)
drop if YRSED>20
tab YRSED HDGREE
generate age = 7 if AGEGRP==6
replace age = 12 if AGEGRP==7
replace age = 17 if AGEGRP==8
replace age = 22 if AGEGRP==9
replace age = 27 if AGEGRP==10
replace age = 32 if AGEGRP==11
replace age = 37 if AGEGRP==12
replace age = 42 if AGEGRP==13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
2016 6 19 1:56 Page 2
1
replace age = 47 if AGEGRP==14
replace age = 52 if AGEGRP==15
replace age = 57 if AGEGRP==16
replace age = 62 if AGEGRP==17
replace age = 67 if AGEGRP==18
replace age = 72 if AGEGRP==19
replace age = 77 if AGEGRP==20
replace age = 82 if AGEGRP==21
tabstat TOTINC if female==1 & age==27, by(YRSED) c(s) s(mean sd min
max)
tabstat TOTINC if female==0 & age==27, by(YRSED) c(s) s(mean sd min
max)
preserve
collapse (mean) YRSED, by(AGEGRP SEX)
tabstat YRSED if SEX==1, by(AGEGRP)
twoway (line YRSED AGEGRP if SEX==1, clcolor(red)) (line YRSED
AGEGRP if SEX==2, clcolor(blue) ytitle("Average years of education")
xtitle("Age group"))
restore
ssc instal catplot
catplot AGEGRP, percent(YRSED)
gen YRSED9=YRSED==9
catplot AGEGRP, percent(YRSED9)
gen YRSED9=YRSED==9
gen expfem=age-YRSED-6 if SEX==1
gen expfem_sq=expfem^2
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
2016 6 19 1:56 Page 3
1
gen expmale=age-YRSED-6 if SEX==2
gen expmale_sq=expmale^2
93
94
95
96
97
98
99
100
101
EC328_2014_lnwagereg_STATA.pdf
2016 6 19 6:20 Page 1
EC328_2014_lnwagereg_STATA-3.do
************INTRODUCTION TO STATA - EC328 - SPRING
2014***************************
* This file created by Justin Smith, and adapted by Christine
Neill for learning
* basic STATA commands, September 2013 through May2015.
Contact: [email protected];
* To run this program, download SLID 2010 data from ODESI,
unzip it, rename the file
* "SLID_2010.dta", and change the working directory (in (2)
below) to wherever you
* saved the file;
** (1) Preamble;
*---------;
*The command below tells Stata to use ";" to indicate the end
of a line of code
# delimit ;
*This file will walk you through the basic commands that you
might
use if you were doing statistics with Stata. The first thing to
note is that
the asterisk .
This document provides an overview of topics for getting started with data analysis using Stata. It covers basic first steps like setting the working directory, creating log files, and allocating memory. It also reviews opening and saving Stata data files, finding variables, subsetting data, and Stata's color-coding system for variable types. Additional sections describe importing data from Excel, SPSS, and SAS into Stata and exploring data through techniques such as frequencies, cross tabulations, and descriptive statistics.
The document provides guidance on writing maintainable code, including:
1) Organizing source code through file naming conventions and separating code into standalone files.
2) Using consistent naming conventions for objects like tables, sequences, packages to make relationships clear.
3) Implementing constraints, normalizing data, and planning for future changes to make the code flexible.
4) Using object-oriented principles in PL/SQL like encapsulation, inheritance and explicit data type conversions to improve modularity and prevent errors.
5) Adding comprehensive comments, error handling and documentation to prepare the code for maintenance by others.
This document provides guidance on writing maintainable code. Some key points include:
- Organize source code into separate, atomic, re-runnable files named after the objects they create. Follow consistent naming conventions.
- Use object-orientation, constraints, normalization, and flexibility to allow for expansion and minimize defects.
- Implement source control, comments, error handling, debugging capabilities, and automated testing to make the code easy to maintain and modify.
- Thoroughly test all code and document tests, errors, and successes to aid future maintenance.
This document provides an overview of getting started with data analysis using Stata. It discusses what Stata is, describes the Stata screen and interface, and covers first steps like setting the working directory, creating log files, allocating memory, using do-files, opening and saving Stata data files, finding variables quickly, subsetting data using conditional statements, understanding Stata's color-coding system, importing data from other programs like SPSS and SAS, and provides an example of a dataset in Excel. The document serves as an introduction to basic functions and workflows in Stata.
This presentation slide explains the procedure of creating a data set in the IBM SPSS statistics and also explains the way to import the excel data set into the SPSS.
This document provides an overview of the machine learning workbench WEKA. It describes how WEKA can be used to import and preprocess data, build classifiers and clustering models, perform attribute selection and data visualization, and run experiments. Key capabilities mentioned include importing data from various formats, using filters for preprocessing, implementing various learning algorithms like decision trees and SVMs, clustering algorithms, association rule learning, attribute selection methods, and the experimenter for comparing models. The Knowledge Flow GUI is also introduced as a graphical interface in WEKA.
OPENOFFICE 10 CLASS CBSE
Apache OpenOffice is the leading open-source office software suite for word processing, spreadsheets, presentations, graphics, databases and more. It is available in many languages and works on all common computers. It stores all your data in an international open standard format and can also read and write files from other common office software packages. It can be downloaded and used completely free of charge for any purpose.
Great software
Apache OpenOffice is the result of over twenty years’ software engineering. Designed from the start as a single piece of software, it has a consistency other products cannot match. A completely open development process means that anyone can report bugs, request new features, or enhance the software. The result: Apache OpenOffice does everything you want your office software to do, the way you want it to.
Easy to use
Apache OpenOffice is easy to learn, and if you’re already using another office software package, you’ll take to OpenOffice straight away. Our world-wide native-language community means that OpenOffice is probably available and supported in your own language. And if you already have files from another office package - OpenOffice will probably read them with no difficulty.
and it’s free
Best of all, Apache OpenOffice can be downloaded and used entirely free of any license fees. Like all Apache Software Foundation software, Apache OpenOffice is free to use. Apache OpenOffice is released under the Apache 2.0 License. This means you may use it for any purpose - domestic, commercial, educational, public administration. You may install it on as many computers as you like. You may make copies and give them away to family, friends, students, employees - anyone you like.
Apache OpenOffice is the leading open-source office software suite for word processing, spreadsheets, presentations, graphics, databases and more. It is available in many languages and works on all common computers. It stores all your data in an international open standard format and can also read and write files from other common office software packages. It can be downloaded and used completely free of charge for any purpose.
Great software
Apache OpenOffice is the result of over twenty years’ software engineering. Designed from the start as a single piece of software, it has a consistency other products cannot match. A completely open development process means that anyone can report bugs, request new features, or enhance the software. The result: Apache OpenOffice does everything you want your office software to do, the way you want it to.
Easy to use
Apache OpenOffice is easy to learn, and if you’re already using another office software package, you’ll take to OpenOffice straight away. Our world-wide native-language community means that OpenOffice is probably available and supported in your own language. And if you already have files from another office package - OpenOffice will probably read them
This is the official tutorial from Oracle.httpdocs.oracle.comj.pdfjillisacebi75827
This is the official tutorial from Oracle.
http://docs.oracle.com/javase/tutorial/jdbc/
Here is a good tutorial for getting started with SQLite.
http://www.tutorialspoint.com/sqlite/sqlite_java.htm
Chapter 34 in the Liang text. He uses MySQL. Getting started with SQLite might be a little
easier, but he does a good job of defining the issues in not too many pages.
For this assignment you can use SQLite OR MySQL.
There are numerous videos in YouTube that demonstrate how to do this. Some are better than
others. When you find one that is helpful, post a link to it on the discussion board.
We have been working with the front-end (GUI), and the middle (creating and manipulating
collections of objects), and now we will add on the back end. The persistent storage of data in
your applications. This exercise is to get you comfortable with connecting to a DB, adding,
deleting, retrieving data. I encourage you to play with this one, do more than the minimum.
SQLite is a very small database. It is included by default in Android and iOS. It is surprisingly
powerful for such a small footprint. It can be frustrating to see what’s going on – what is in the
DB, did the query work correctly? MySQL is often called a community database. It belongs to
Oracle, but they allow anyone to use it for free. The recent versions of the MySQL workbench
that allows you to see what’s going on in your database are really very nice – starting to look like
the Access front end.
Create a connection to a relational database using SQLite or MySQL.
Create a single database table to hold information.
Let’s make a simple class called Person for this exercise.
Person
firstName (String)
lastName(String)
age (int)
ssn (long)
creditCard (long)
Note that once you have the DB created, you don’t want to do this again every time you run your
test program. The easiest way to deal with this – for this assignment, is to comment out the code
that creates the DB creation and the table creation while you experiment with the following.
(Aside: I choose ssn and credit card as fields here so that you might think about the persistent
storage of sensitive data. There are some pretty strict laws governing the storage of some data.
Please don’t use any actual social security numbers or credit card numbers in this exercise.)
Demonstrate the insertion of a record into the database Insert several records.
Write a method called insertPerson(Person person) that adds a person object to your database.
Create another object of type Person, and demonstrate calling your method, passing the object to
the method.
Demonstrate the retrieval of information from the database. Use SQL Select statements, to
retrieve a particular Person from the database.
Write a method called selectPerson that returns a Person object. This method retrieves the data
for a Person from the database. We also need to pass a parameter to identify what person. You
can use ‘name’ if you like, or if you find it easier to use the database generated .
The document provides an overview of using SAS for institutional research. It discusses importing and exporting data, manipulating data using the DATA step, and performing basic statistical analysis using common SAS procedures. The workshop goals are to teach basic SAS programming skills including importing, the DATA step, PROC SORT, exporting, PROC PRINT, PROC FREQ, PROC SUMMARY, and PROC TABULATE in three hours. If time allows, it will also cover macros, creating text files, and email reports.
Brad McGehee's presentation on "How to Interpret Query Execution Plans in SQL Server 2005/2008".
Presented to the San Francisco SQL Server User Group on March 11, 2009.
For those who wants to start with analytics in big data:
- Big Data and Analytics, the cool guys!
- Hadoop?! You heard, but…
- ETL? It is Extract, Transform and Load
- So? What’s Pig?
- Pig Latin
- Pig Latin Basics. Let’s get started! :)
- Pig vs SQL
- UDF’s the real magic!
Data structures cs301 power point slides lecture 01shaziabibi5
This lecture covers data structures and their implementation in C++. It discusses how data structures organize data to make programs more efficient. Common data structures that will be covered include dynamic arrays, linked lists, stacks, queues, trees and graphs. The lecture emphasizes that each data structure has costs and benefits depending on the problem, and the goal is to select the most appropriate structure. It also introduces arrays as a basic built-in data structure in many languages and how dynamic arrays can be used when the size is unknown at compile time.
Agile Data Science 2.0 covers the theory and practice of applying agile methods to the practice of applied analytics research called data science. The book takes the stance that data products are the preferred output format for data science teams to effect change in an organization. Accordingly, we show how to "get meta" to enable agility in building applications describing the applied research process itself. Then we show how to use 'big data' tools to iteratively build, deploy and refine analytics applications. Tracking data-product development through the five stages of the "data value pyramid", we show you how to build applications from conception through development through deployment and then through iterative improvement. Application development is a fundamental skill for a data scientist, and by publishing your data science work as a web application, we show you how to effect maximal change within your organization.
Technologies covered include Python, Apache Spark (Spark MLlib, Spark Streaming), Apache Kafka, MongoDB, ElasticSearch and Apache Airflow.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
1. Data Management in Stata
Ista Zahn
IQSS
January 24 2013
The Institute
for Quantitative Social Science
at Harvard University
Ista Zahn (IQSS) Data Management in Stata January 24 2013 1 / 52
2. Outline
1 Introduction
2 Generating and replacing variables
3 By processing
4 Missing values
5 Variable types
6 Merging, appending, and joining
7 Creating summarized data sets
8 Wrap-up
Ista Zahn (IQSS) Data Management in Stata January 24 2013 2 / 52
3. Introduction
Topic
1 Introduction
2 Generating and replacing variables
3 By processing
4 Missing values
5 Variable types
6 Merging, appending, and joining
7 Creating summarized data sets
8 Wrap-up
Ista Zahn (IQSS) Data Management in Stata January 24 2013 3 / 52
4. Introduction
Copy the workshop materials to your home directory
Log in to an Athena workstation using your Athena user name and
password
Click on the “Ubuntu” button on the upper-left and type “term” as
shown below
Click on the “Terminal” icon as shown above
In the terminal, type this line exactly as shown:
cd; wget http://tinyurl.com/statadatman-zip; unzip statadatman-zip
If you see “ERROR 404: Not Found”, then you mistyped the command
– try again, making sure to type the command exactly as shown
Ista Zahn (IQSS) Data Management in Stata January 24 2013 4 / 52
5. Introduction
Launch Stata on Athena
To start Stata type these commands in the terminal:
add stata
xstata
Open up today’s Stata script
In Stata, go to Window => New do file => Open
Locate and open the StatDatMan.do script in the StataDatMan folder
in your home directory
I encourage you to add your own notes to this file!
Ista Zahn (IQSS) Data Management in Stata January 24 2013 5 / 52
6. Introduction
Workshop Description
This is an Introduction to data management in Stata
Assumes basic knowledge of Stata
Not appropriate for people already well familiar with Stata
If you are catching on before the rest of the class, experiment with
command features described in help files
Ista Zahn (IQSS) Data Management in Stata January 24 2013 6 / 52
7. Introduction
Organization
Please feel free to ask questions at any point if they are relevant to
the current topic (or if you are lost!)
There will be a Q&A after class for more specific, personalized
questions
Collaboration with your neighbors is encouraged
If you are using a laptop, you will need to adjust paths accordingly
Ista Zahn (IQSS) Data Management in Stata January 24 2013 7 / 52
8. Introduction
Opening Files in Stata
Look at bottom left hand corner of Stata screen
This is the directory Stata is currently reading from
Files are located in the StataDatMan folder on the Desktop
Start by telling Stata where to look for these
// change directory
cd "C:/Users/dataclass/Desktop/StataDatMan"
// Use dir to see what is in the directory:
dir
// use the gss data set
use gss.dta
Ista Zahn (IQSS) Data Management in Stata January 24 2013 8 / 52
9. Generating and replacing variables
Topic
1 Introduction
2 Generating and replacing variables
3 By processing
4 Missing values
5 Variable types
6 Merging, appending, and joining
7 Creating summarized data sets
8 Wrap-up
Ista Zahn (IQSS) Data Management in Stata January 24 2013 9 / 52
10. Generating and replacing variables
Logic Statements Useful Data Manipulation
== equal to (status quo)
= used in assigning values
!= not equal to
> greater than
>= greater than or equal to
& and
| or
Ista Zahn (IQSS) Data Management in Stata January 24 2013 10 / 52
11. Generating and replacing variables
Basic Data Manipulation Commands
Basic commands you’ll use for generating new variables or recoding
existing variables:
egen
replace
recode
Many different means of accomplishing the same thing in Stata
Find what is comfortable (and easy) for you
Ista Zahn (IQSS) Data Management in Stata January 24 2013 11 / 52
12. Generating and replacing variables
Generate and Replace
The gen command is often used with logic statements, as in this
example:
// create "hapnew" variable
gen hapnew = . //set to missing
//set to 1 if happy equals 1
replace hapnew=1 if happy==1
//set to 1 if happy and hapmar = 3
replace hapnew=1 if happy>3 & hapmar>3
//set to 3 if happy or hapmar = 4
replace hapnew=3 if happy>4 | hapmar>4
tab hapnew // tabulate the new variable
Ista Zahn (IQSS) Data Management in Stata January 24 2013 12 / 52
13. Generating and replacing variables
Recode
The recode command is basically generate and replace combined
You can recode an existing variable OR use recode to create a new
variable
// recode the wrkstat variable
recode wrkstat (1=8) (2=7) (3=6) (4=5) (5=4) (6=3) (7=2) (8=1)
// recode wrkstat into a new variable named wrkstat2
recode wrkstat (1=8), gen(wrkstat2)
// tabulate workstat
tab wrkstat
Ista Zahn (IQSS) Data Management in Stata January 24 2013 13 / 52
14. Generating and replacing variables
Basic Rules for Recode
Rule Example Meaning
#=# 3=1 3 recoded to 1
##=# 2. =9 2 and . recoded to 9
#/# = # 1/5=4 1 through 5 recoded to 4
nonmissing=# nonmiss=8 nonmissing recoded to 8
missing=# miss=9 missing recoded to 9
Ista Zahn (IQSS) Data Management in Stata January 24 2013 14 / 52
15. Generating and replacing variables
egen
egen means “extension” to generate
Contains a variety of more sophisticated functions
Type “help egen” in Stata to get a complete list of functions
Let’s create a new variable that counts the number of “yes” responses
on computer, email and internet use:
// count number of yes on comp email and interwebs
egen compuser= anycount(usecomp usemail usenet), values(1)
tab compuser
// assess how much missing data each participant has:
egen countmiss = rowmiss(age-wifeft)
codebook countmiss
// compare values on multiple variables
egen ftdiff=diff(wkft//)
codebook ftdiff
Ista Zahn (IQSS) Data Management in Stata January 24 2013 15 / 52
16. By processing
Topic
1 Introduction
2 Generating and replacing variables
3 By processing
4 Missing values
5 Variable types
6 Merging, appending, and joining
7 Creating summarized data sets
8 Wrap-up
Ista Zahn (IQSS) Data Management in Stata January 24 2013 16 / 52
17. By processing
The “By” Command
Sometimes, you’d like to create variables based on different categories
of a single variable
For example, say you want to look at happiness based on whether an
individual is male or female
The “by” prefix does just this:
// tabulate happy separately for male and female
bysort sex: tab happy
// generate summary statistics using bysort
bysort state: egen stateincome = mean(income)
bysort degree: egen degreeincome = mean(income)
bysort marital: egen marincomesd = sd(income)
Ista Zahn (IQSS) Data Management in Stata January 24 2013 17 / 52
18. By processing
The “By” Command
Some commands won’t work with by prefix, but have by options:
// generate separate histograms for female and male
hist happy, by(sex)
Ista Zahn (IQSS) Data Management in Stata January 24 2013 18 / 52
19. Missing values
Topic
1 Introduction
2 Generating and replacing variables
3 By processing
4 Missing values
5 Variable types
6 Merging, appending, and joining
7 Creating summarized data sets
8 Wrap-up
Ista Zahn (IQSS) Data Management in Stata January 24 2013 19 / 52
20. Missing values
Missing Values
Always need to consider how missing values are coded when recoding
variables
Stata’s symbol for a missing value is “.”
Stata interprets “.” as a large value
What are implications of this?
Ista Zahn (IQSS) Data Management in Stata January 24 2013 20 / 52
21. Missing values
Missing Values
If we want to generate a new variable that identifies highly educated
women, we might use the command:
// generate and replace without considering missing values
gen hi_ed=0
replace hi_ed=1 if wifeduc>15
// What happens to our missing values when we
tab hi_ed, mi nola
// Instead, we might try:
drop hi_ed
// gen hi_ed, but don’t set a value if wifeduc is missing
gen hi_ed = 0 if wifeduc != .
// only replace non-missing
replace hi_ed=1 if wifeduc >15 & wifeduc !=.
tab hi_ed, mi //check to see that missingness is preserved
Note that you need the “mi” option to tab to view your missing data
values
Ista Zahn (IQSS) Data Management in Stata January 24 2013 21 / 52
22. Missing values
Missing Values
What if you used a numeric value originally to code missing data (e.g.,
“999”)?
The mvdecode command will convert all these values to missing
mvdecode _all, mv(999)
The “_all” command tells Stata to do this to all variables
Use this command carefully!
If you have any variables where “999” is a legitimate value, Stata is
going to recode it to missing
As an alternative, you could list var names separately rather than using
“_all” command
Ista Zahn (IQSS) Data Management in Stata January 24 2013 22 / 52
23. Variable types
Topic
1 Introduction
2 Generating and replacing variables
3 By processing
4 Missing values
5 Variable types
6 Merging, appending, and joining
7 Creating summarized data sets
8 Wrap-up
Ista Zahn (IQSS) Data Management in Stata January 24 2013 23 / 52
24. Variable types
Variable Types
Stata uses two main types of variables: String and Numeric
String variables are typically used for text variables
To be able to perform any mathematical operations, your variables
need to be in a numeric format
Ista Zahn (IQSS) Data Management in Stata January 24 2013 24 / 52
25. Variable types
Variable Types
Stata’s numeric variable types:
1 type Minimum Maximum being 0 bytes
2 ----------------------------------------------------------------------
3 byte -127 100 +/-1 1
4 int -32,767 32,740 +/-1 2
5 long -2,147,483,647 2,147,483,620 +/-1 4
6 float -1.70141173319*10^38 1.70141173319*10^38 +/-10^-38 4
7 double -8.9884656743*10^307 8.9884656743*10^307 +/-10^-323 8
8 ----------------------------------------------------------------------
9 Precision for float is 3.795x10^-8.
10 Precision for double is 1.414x10^-16.
Ista Zahn (IQSS) Data Management in Stata January 24 2013 25 / 52
26. Variable types
Variable Types
How can I deal with those annoying string variables?
Sometimes you need to convert to/from string variables
// not run; generate "newvar" equal to numeric version of var2
destring var1, gen(newvar)
// not run; convert var1 to string.
tostring var1, gen(newvar)
Ista Zahn (IQSS) Data Management in Stata January 24 2013 26 / 52
27. Variable types
Date and Time Variable Types
Stata offers several options for date and time variables
Generally, Stata will read date/time variables as strings
You’ll need to convert string variables in order to perform any
mathematical operations
Once data is in date/time form, Stata uses several symbols to identify
these variables
%tc, %td, %tw, etc.
Ista Zahn (IQSS) Data Management in Stata January 24 2013 27 / 52
28. Variable types
Variable Types: Date and Time
1 Format String-to-numeric conversion function
2 -------+-----------------------------------------
3 %tc clock(string, mask)
4 %tC Clock(string, mask)
5 %td date(string, mask)
6 %tw weekly(string, mask)
7 %tm monthly(string, mask)
8 %tq quarterly(string, mask)
9 %th halfyearly(string, mask)
10 %ty yearly(string, mask)
11 %tg no function necessary; read as numeric
12 -------------------------------------------------
Ista Zahn (IQSS) Data Management in Stata January 24 2013 28 / 52
29. Variable types
Variable Types: Date and Time
Format Meaning Value = -1 Value = 0 Value = 1
%tc clock 31dec1959 01jan1960 01jan1960
23:59:59.999 00:00:00.000 00:00:00.001
%td days 31dec1959 01jan1960 02jan1960
%tw weeks 1959w52 1960w1 1960w2
%tm months 1959m12 1960m1 1960m2
%tq quarters 1959q4 1960q1 1960q2
%th half-years 1959h2 1960h1 1960h2
%tg generic -1 0 1
Ista Zahn (IQSS) Data Management in Stata January 24 2013 29 / 52
30. Variable types
Variable Types: Date and Time
To convert a string variable into date/time format, first select the
date/time format you’ll be using (e.g., %tc, %td, %tw, etc.)
Let’s say we create a string variable, today’s date (today) that we
want to format
// create string variable and convert to date
gen today = "Feb 18, 2011"
gen date1 = date(today, "MDY")
tab date1
// use the format command to change how the date is displayed
// format so humans can read the date
format date1 %d
tab date1
Ista Zahn (IQSS) Data Management in Stata January 24 2013 30 / 52
31. Variable types
Variable Types
What if you have a variable “time” formatted as
DDMMYYYYhhmmss?
// Not run: conceptual example only
generate double time2 = clock(time, "DMYHMS")
tab time2
format time2 %tc
tab time2
“double” command necessary for all clock formats
basically tells Stata to allow a long string of characters
Ista Zahn (IQSS) Data Management in Stata January 24 2013 31 / 52
32. Variable types
Exercise 1: Generate, Replace, Recode & Egen
Open the datafile, gss.dta.
1 Generate a new variable that represents the squared value of age.
2 Recode values “99” and “98” on the variable, “hrs1” as “missing.”
3 Generate a new variable equal to “1” if income is greater than “19”.
4 Recode the marital variable into a “string” variable and then back into
a numeric variable.
5 Create a new variable that counts the number of times a respondent
answered “don’t know” in regard to the following variables: life,
richwork, hapmar.
6 Create a new variable that counts the number of missing responses for
each respondent.
7 Create a new variable that associates each individual with the average
number of hours worked among individuals with matching educational
degrees.
Ista Zahn (IQSS) Data Management in Stata January 24 2013 32 / 52
33. Merging, appending, and joining
Topic
1 Introduction
2 Generating and replacing variables
3 By processing
4 Missing values
5 Variable types
6 Merging, appending, and joining
7 Creating summarized data sets
8 Wrap-up
Ista Zahn (IQSS) Data Management in Stata January 24 2013 33 / 52
34. Merging, appending, and joining
Merging Datasets
Merge in Stata is for adding new variables from a second dataset to
the dataset you’re currently working with
Current active dataset = master dataset
Dataset you’d like to merge with master = using dataset
If you want to add OBSERVATIONS, you’d use “append” (we’ll go
over that next)
Ista Zahn (IQSS) Data Management in Stata January 24 2013 34 / 52
35. Merging, appending, and joining
Merging Datasets
Several different ways that you might be interested in merging data
Two datasets with same participant pool, one row per participant (1:1)
A dataset with one participant per row with a dataset with multiple
rows per participant (1:many or many:1)
Ista Zahn (IQSS) Data Management in Stata January 24 2013 35 / 52
36. Merging, appending, and joining
Merging Datasets
Stata will create a new variable (“_merge”) that describes the source
of the data
Use option, “nogenerate” if you don’t want _merge created
Use option, “generate(varname)” to give merge your own variable name
Need to add IDs to your dataset?
// create a variable "id" equal to the row number
generate id = _n
Ista Zahn (IQSS) Data Management in Stata January 24 2013 36 / 52
37. Merging, appending, and joining
Merging Datasets
Before you begin:
Identify the “ID” that you will use to merge your two datasets
Determine which variables you’d like to merge
In Stata >= 11, data does NOT have to be sorted
Variable types must match across datasets
Can use “force” option to get around this, but not recommended
Ista Zahn (IQSS) Data Management in Stata January 24 2013 37 / 52
38. Merging, appending, and joining
Merging Datasets
Let’s say I want to perform a 1:1 merge using the dataset “data2” and
the ID, “participant”
merge 1:1 participant using data2.dta
Now, let’s say that I have one dataset with individual students
(master) and another dataset with information about the students’
schools called “school”
// Not run: conceptual example only. Merge school and student data
merge m:1 schoolID using school.dta
Ista Zahn (IQSS) Data Management in Stata January 24 2013 38 / 52
39. Merging, appending, and joining
Merging Datasets
What if my school dataset was the master and my student dataset
was the merging dataset?
// Not run: conceptual example only.
merge 1:m schoolID using student.dta
It is also possible to do a many:many merge
Data needs to be sorted in both the master and using datasets
Ista Zahn (IQSS) Data Management in Stata January 24 2013 39 / 52
40. Merging, appending, and joining
Merging Datasets
Update and replace options:
In standard merge, the master dataset is the authority and WON’T
CHANGE
What if your master dataset has missing data and some of those values
are not missing in your using dataset?
Specify “update”- it will fill in missing without changing nonmissing
What if you want data from your using dataset to overwrite that in
your master?
Specify “replace update”- it will replace master data with using data
UNLESS the value is missing in the using dataset
Ista Zahn (IQSS) Data Management in Stata January 24 2013 40 / 52
41. Merging, appending, and joining
Appending Datasets
Sometimes, you’ll have observations in two different datasets, or you’d
like to add observations to an existing dataset
Append will simply add observations to the end of the observations in
the master
// Not run: conceptual example. add rows of data from dataset2
append using dataset2
Ista Zahn (IQSS) Data Management in Stata January 24 2013 41 / 52
42. Merging, appending, and joining
Appending Datasets
Some options with Append:
generate(newvar) will create variable that identifies source of
observation
// Not run: conceptual example.
append using dataset1, generate(observesource)
“force” will allow for data type mismatches (again, this is not
recommended)
Ista Zahn (IQSS) Data Management in Stata January 24 2013 42 / 52
43. Merging, appending, and joining
Joinby
Merge will add new observations from using that do not appear in
master
Sometimes, you need to add variables from using but want to be sure
the list of participants in your master does not change
// Not run: conceptual exampe. Similar to merge but drops non-matches
joinby participant using dataset1
Any observations in using that are NOT in master will be omitted
Ista Zahn (IQSS) Data Management in Stata January 24 2013 43 / 52
44. Creating summarized data sets
Topic
1 Introduction
2 Generating and replacing variables
3 By processing
4 Missing values
5 Variable types
6 Merging, appending, and joining
7 Creating summarized data sets
8 Wrap-up
Ista Zahn (IQSS) Data Management in Stata January 24 2013 44 / 52
45. Creating summarized data sets
Collapse
Collapse will take master data and create a new dataset of summary
statistics
Useful in hierarchal linear modeling if you’d like to create aggregate,
summary statistics
Can generate group summary data for many descriptive stats
Mean, media, sd, sum, min, max, percentiles, standard errors
Can also attach weights
Ista Zahn (IQSS) Data Management in Stata January 24 2013 45 / 52
46. Creating summarized data sets
Collapse
Before you collapse
Save your master dataset and then save it again under a new name
This will prevent collapse from writing over your original data
Consider issues of missing data. Do you want Stata to use all possible
observations? If not:
cw (casewise) option will make casewise deletions
Ista Zahn (IQSS) Data Management in Stata January 24 2013 46 / 52
47. Creating summarized data sets
Collapse
Let’s say you have a dataset with patient information from multiple
hospitals
You want to generate mean levels of patient satisfaction for EACH
hospital
// Not run: conceptual example. calculate average ptsatisfaction by ho
save originaldata
collapse (mean) ptsatisfaction, by(hospital)
save hospitalcollapse
Ista Zahn (IQSS) Data Management in Stata January 24 2013 47 / 52
48. Creating summarized data sets
Collapse
You could also generate different statistics for multiple variables
// create mean ptsatisfaction, median ptincome, sd ptsatisfaction for
collapse (mean) ptsatisfaction (median) ptincome (sd) ptsatisfaction,
What if you want to rename your new variables in this process?
// Same as previous example, but rename variables
collapse (mean) ptsatmean=ptsatisfaction (median) ptincmed=ptincome
(sd) sdptsat=ptsatisfaction, by(hospital)
Ista Zahn (IQSS) Data Management in Stata January 24 2013 48 / 52
49. Creating summarized data sets
Exercise 2: Merge, Append, and Joinby
Open the dataset, gss2.dta
1 The gss2 dataset contains only half of the variables that are in the
complete gss dataset. Merge dataset gss1 with dataset gss2. The
identification variable is “id.”
2 Open the dataset, gss.dta
3 Merge in data from the “marital.dta” dataset, which includes income
information grouped by individuals’ marital status. The marital
dataset contains collapsed data regarding average statistics of
individuals based on their marital status.
4 Additional observations for the gssAppend.dta dataset can be found in
“gssAddObserve.dta.” Create a new dataset that combines the
observations in gssAppend.dta with those in gssAddObserve.dta.
5 Create a new dataset that summarizes mean and standard deviation of
income based on individuals’ degree status (“degree”). In the process
of creating this new dataset, rename your three new variables.
Ista Zahn (IQSS) Data Management in Stata January 24 2013 49 / 52
50. Wrap-up
Topic
1 Introduction
2 Generating and replacing variables
3 By processing
4 Missing values
5 Variable types
6 Merging, appending, and joining
7 Creating summarized data sets
8 Wrap-up
Ista Zahn (IQSS) Data Management in Stata January 24 2013 50 / 52
51. Wrap-up
Help Us Make This Workshop Better
Please take a moment to fill out a very short feedback form
These workshops exist for you–tell us what you need!
http://tinyurl.com/StataDatManFeedback
Ista Zahn (IQSS) Data Management in Stata January 24 2013 51 / 52
52. Wrap-up
Additional resources
training and consulting
IQSS workshops:
http://projects.iq.harvard.edu/rtc/filter_by/workshops
IQSS statistical consulting: http://rtc.iq.harvard.edu
Stata resources
UCLA website: http://www.ats.ucla.edu/stat/Stata/
Great for self-study
Links to resources
Stata website: http://www.stata.com/help.cgi?contents
Email list: http://www.stata.com/statalist/
Ista Zahn (IQSS) Data Management in Stata January 24 2013 52 / 52