Stata Learning From TreimanPresentation Transcript
Doing analysis using stata 10.0stata作图（续） Treiman Ucla 易黠 于善国家中 2009.2.18
Stata, created by Statacorp, is a statistical program used by many businesses and academic institutions around the world. Most of its users work in research, especially in the field of economics and epidemiology. Stata's full range of capabilities includes: Why I need it？ Data management Statistical analysis Graphics Simulations Custom Programming
Do Everything with -do- Files capture log close log using class.log,replace #delimit; version 10.0; set more 1; clear; program drop _all; set mem 100m; *CLASS.DO (DJT initiated 5/19/99, last revised 2/4/08); *This do-file creates computations for a paper on literacy in China.; use d:chinasurveydatachina07.dta; log close;
Document Your -do- File Exhaustively the editorial review process often takes a very long time. If you have not documented your work, you may have a great deal of trouble remembering why you have done what you have done. include comments summarizing the outcome of each set of commands
Chapter7 gssy2004case 1 curvilinear relationship reg inc age agesq if good==1 My method： gen inc1=3088.804*age-27.77605*agesq-15554.28
*Mark the good data.; mark good if inc~=. & age>19 & age<65; *Do the regression and make a predicted value.; reg inc age agesq if good==1; *Get the transformed coefficients.; gen m=_b[_cons]-(_b[age]^2)/(4*_b[agesq]); gen F=(-_b[age])/(2*_b[agesq]); l m F in 1; 仅仅list 一个对应的m和F
严格的作图syntax Not simple lab var age "Age in 2004" lab varxinc "Expected Income in 2003" . graph twoway (scatter xincage,sort connect(l) clwidth(medthick) clpattern(solid) mcolor(black) msymbol(i)), plotregion > (style(none)) xlab(20(4)64) ylab(0(5000)50000) saving(ch07fig1.gph,replace) 定义坐标 Lab=label，例如将age标记为age in 2004 Sort connect（l）直线连接? Clwidth? Clpattern? Mcolor? ms(I) Plotregion?
Case 2 二分类变量、平均值、相互控制reglninceduc hrs male if good==1; *Make graphs of the relationship between education and income, by sex, for those who work an average number of hours.; 显然大家都在用平均值来处理这种变量的交互叠加的效果。 *First, get the expected values evaluated at mean hours.; *Get the mean hours worked, which I need below.; sum hrs if good==1; gen mhrs=r(mean) if good==1; 跟我们的做法如出一辙：都是获取相应的mean。 Omg~~~~这个程序写起来也不是那么顺畅。一次只能return一个mean。 gen xincm=_b[_cons]+_b[educ]*educ+_b[hrs]*mhrs+_b[male] if male==1 & good==1; Gen xincf=_b[_cons]+_b[educ]*educ+_b[hrs]*mhrs if male==0 & good==1; 让你写出来这么多东西，也是一种痛苦吧。So many –b,torture~~
Lowess命令——加权回归 Lowess is a statistical technique for plotting a smooth curve through a set of data points in a scattergram.lowess carries out a locally weighted regression of yvar on xvar, displays the graph, and optionally saves the smoothed variable. A scattergram is a plot of various data points in a graph with a predictor variable as its x-axis and a criterion variable as its y-axis. Lowess is a version of a locally weighted scatterplotsmoothing technique. Each smoothed value is determined by a linear polynomial taking into account the values of data within a particular span of values of the criterion variable, but giving most weight to the central value of the span, less and less weight to more distant values, and zero weight to values outside the span. The span is then moved along the x-axis and a new smoothed value computed. The size of the span is set by a tension factor determining the proportion of the data points to be included in the span. Warning: lowess is computationally intensive and may therefore take a long time to run on a slow computer. Lowess calculations on 1,000 observations, for instance, require performing 1,000 regressions.