SlideShare a Scribd company logo
1 of 30
Doing analysis using stata 10.0stata作图(续) Treiman Ucla 易黠 于善国家中 2009.2.18
目录
Stata, created by Statacorp, is a statistical program used by many businesses and academic institutions around the world. Most of its users work in research, especially in the field of economics and epidemiology. Stata's full range of capabilities includes:  Why I need it? Data management Statistical analysis Graphics Simulations Custom Programming
Do Everything with -do- Files capture log close log using class.log,replace #delimit; version 10.0; set more 1; clear; program drop _all; set mem 100m; *CLASS.DO (DJT initiated 5/19/99, last revised 2/4/08); *This do-file creates computations for a paper on literacy in China.; use d:hinaurveyatahina07.dta; log close;
Document Your -do- File Exhaustively the editorial review process often takes a very long time. If you have not documented your work, you may have a great deal of trouble remembering why you have done what you have done. include comments summarizing the outcome of each set of commands
目录
Chapter7 gssy2004case 1 curvilinear relationship reg inc age agesq if good==1 My method: gen inc1=3088.804*age-27.77605*agesq-15554.28
我的作图的确有问题,看看Treiman的图
*Mark the good data.; mark good if inc~=. & age>19 & age<65; *Do the regression and make a predicted value.; reg inc age agesq if good==1; *Get the transformed coefficients.; gen m=_b[_cons]-(_b[age]^2)/(4*_b[agesq]); gen F=(-_b[age])/(2*_b[agesq]); l m F in 1; 仅仅list 一个对应的m和F
严格的作图syntax Not simple lab var age "Age in 2004"     lab varxinc "Expected Income in 2003" . graph twoway (scatter xincage,sort connect(l) clwidth(medthick) clpattern(solid) mcolor(black) msymbol(i)), plotregion > (style(none))   xlab(20(4)64) ylab(0(5000)50000) saving(ch07fig1.gph,replace) 定义坐标 Lab=label,例如将age标记为age in 2004 Sort  connect(l)直线连接? Clwidth? Clpattern? Mcolor? ms(I) Plotregion?
记住这个格式grtw (sc xinc age,  sort connect(l) ms(i))sort前空格,tw后空格。 我用简化的graph tw(sc xinc age,sort connect(l))作图 怎么去掉这些点?只需要加上 ms(i)即可。
Case 2  二分类变量、平均值、相互控制reglninceduc hrs male if good==1; *Make graphs of the relationship between education and income, by sex, for those who work an average number of hours.; 显然大家都在用平均值来处理这种变量的交互叠加的效果。 *First, get the expected values evaluated at mean hours.; *Get the mean hours worked, which I need below.; sum hrs if good==1; gen mhrs=r(mean) if good==1; 跟我们的做法如出一辙:都是获取相应的mean。 Omg~~~~这个程序写起来也不是那么顺畅。一次只能return一个mean。 gen xincm=_b[_cons]+_b[educ]*educ+_b[hrs]*mhrs+_b[male] if male==1 & good==1; Gen xincf=_b[_cons]+_b[educ]*educ+_b[hrs]*mhrs if male==0  & good==1; 让你写出来这么多东西,也是一种痛苦吧。So  many –b[],torture~~
Sum 、gen、return命令组合获取均值的方法
graph twoway (scatter xincmeduc, sort connect(l) clwidth(medthick) clpattern(solid) mcolor(black) msymbol(i)) (scatter xincfeduc,  sort connect(l) clwidth(medthick) clpattern(solid)  mcolor(black) msymbol(O)), plotregion(style(none))   legend ( label(1 "Males") label(2 "Females") cols(1) ring(0) position(11)) xlab(0(4)20)  ylab(8(1)11)  xtick(1(1)20)  ytick(8(.25)11.5)   l1("Expected ln(Income) in 2003")  saving(ch07fig2.gph,replace)
I hate to clean the syntax
Predict命令 case3 没想到吗,随处可见的平方 sysuse auto, clear generate weight2 = weight^2 regress mpg weight weight2 foreign webusenewautos, clear generate weight2 = weight^2 *Obtain out-of-sample prediction using another dataset predict mpg but,how stata user know that?
Sysuse auto,cleargenerate weight2 = weight^2regress mpg weight weight2 foreign. gen mpgf=_b[_cons]+_b[weight]*weight+_b[weight2]*weight2+_b[foreign]. gen mpgd=_b[_cons]+_b[weight]*weight+_b[weight2]*weight2. grtw (sc mpgf weight, sort connect(l) ms(i)) (sc mpgd weight, sort connect(l) ms(i)) 绘图 Lowess mpg weight grtw (sc mpgf weight, sort connect(l) ms(i))  (sc mpgd weight, sort connect(l) ms(i)), legend ( label(1 "foreign") label(2  "domestic")) Stata探测之方法 Lowess命令
Edit the graph
Set scheme economistgrtw (sc mpgf weight, sort connect(l) ms(i)) (sc mpgd weight, sort connect(l) ms(i)) ,legend(label(1 "foreign") label (2 "domestic")) ytitle("y=mileage") xtitle("x=weight")
目录
Lowess命令——加权回归 Lowess is a statistical technique for plotting a smooth curve through a set of data points in a scattergram.lowess carries out a locally weighted regression of  yvar on xvar, displays the graph, and optionally saves the smoothed variable. A scattergram is a plot of various data points in a graph with a predictor variable as its x-axis and a criterion variable as its y-axis. Lowess is a version of a locally weighted scatterplotsmoothing technique. Each smoothed value is determined by a linear polynomial taking into account the values of data within a particular span of values of the criterion variable, but giving most weight to the central value of the span, less and less weight to more distant values, and zero weight to values outside the span. The span is then moved along the x-axis and a new smoothed value computed. The size of the span is set by a tension factor determining the proportion of the data points to be included in the span. Warning:  lowess is computationally intensive and may therefore take a long time to run on a slow computer.  Lowess calculations on 1,000 observations,  for instance, require performing 1,000 regressions.
案例1 :手机上网频率与个人月收入之间的关系grtw (sc b2am inc,sort connect(l) ms(i)),ytitle(“y= b2am=_b[_cons]+_b[inc]*inc+_b[inc2]*inc2”) xtitle(“x=inc”)
案例2China 07.dta grtwlowessnewsnow income Lowessnewsnow income
其实办法都差不多 Ms大家都这样拿平均值开刀,没有什么神秘的地方,不同的是Treiman的do.file的帮助很大。让你完整的复现他的工作。这是很了不起的见解。 lawrence.C.Hamilton STATISTICS WITH STATA
Use graph combine graph to compare different x variable effect Graph combine fig08_11.gph fig08_12.gph, ycommon cols(2) scale(1.25)
目录
Nl命令——我不喜欢,太死板了。
案例1 nl exp2  y1 x predict yhat1 grtw sc y1 x line yhat1 x, sort legend(off) ytitle(“y1=10*1.03^x+e”) xtitle(“x”)
案例2
Stata Learning From Treiman

More Related Content

Similar to Stata Learning From Treiman

Use Of Calculus In Programming
Use Of Calculus In ProgrammingUse Of Calculus In Programming
Use Of Calculus In ProgrammingAfaq Siddiqui
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning conceptsJoe li
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
A review of automatic differentiationand its efficient implementation
A review of automatic differentiationand its efficient implementationA review of automatic differentiationand its efficient implementation
A review of automatic differentiationand its efficient implementationssuserfa7e73
 
One More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific ComputingOne More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific Computingtheijes
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.pptArumugam90
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013Sanjeev Mishra
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environmentjins0618
 
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Brian Brazil
 
Introduction of calculus in programming
Introduction of calculus in programmingIntroduction of calculus in programming
Introduction of calculus in programmingAfaq Siddiqui
 
C operators
C operatorsC operators
C operatorsGPERI
 
Application's of Numerical Math in CSE
Application's of Numerical Math in CSEApplication's of Numerical Math in CSE
Application's of Numerical Math in CSEsanjana mun
 
Practical Examples using Eviews.ppt
Practical Examples using Eviews.pptPractical Examples using Eviews.ppt
Practical Examples using Eviews.pptdipadtt
 

Similar to Stata Learning From Treiman (20)

Use Of Calculus In Programming
Use Of Calculus In ProgrammingUse Of Calculus In Programming
Use Of Calculus In Programming
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
working with python
working with pythonworking with python
working with python
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Create and analyse programs
Create and analyse programsCreate and analyse programs
Create and analyse programs
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Aggarwal Draft
Aggarwal DraftAggarwal Draft
Aggarwal Draft
 
A review of automatic differentiationand its efficient implementation
A review of automatic differentiationand its efficient implementationA review of automatic differentiationand its efficient implementation
A review of automatic differentiationand its efficient implementation
 
One More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific ComputingOne More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific Computing
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013
 
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
 
weatherr.pptx
weatherr.pptxweatherr.pptx
weatherr.pptx
 
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
 
Introduction of calculus in programming
Introduction of calculus in programmingIntroduction of calculus in programming
Introduction of calculus in programming
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
C operators
C operatorsC operators
C operators
 
Application's of Numerical Math in CSE
Application's of Numerical Math in CSEApplication's of Numerical Math in CSE
Application's of Numerical Math in CSE
 
Practical Examples using Eviews.ppt
Practical Examples using Eviews.pptPractical Examples using Eviews.ppt
Practical Examples using Eviews.ppt
 

More from Chengjun Wang

计算传播学导论
计算传播学导论计算传播学导论
计算传播学导论Chengjun Wang
 
数据可视化 概念案例方法 王成军 20140104
数据可视化 概念案例方法 王成军 20140104数据可视化 概念案例方法 王成军 20140104
数据可视化 概念案例方法 王成军 20140104Chengjun Wang
 
Randomly sampling YouTube users
Randomly sampling YouTube usersRandomly sampling YouTube users
Randomly sampling YouTube usersChengjun Wang
 
An introduction to computational communication
An introduction to computational communication An introduction to computational communication
An introduction to computational communication Chengjun Wang
 
Pajek chapter2 Attributes and Relations
Pajek chapter2 Attributes and RelationsPajek chapter2 Attributes and Relations
Pajek chapter2 Attributes and RelationsChengjun Wang
 
Calculate Thresholds of Diffusion with Pajek
Calculate Thresholds of Diffusion with PajekCalculate Thresholds of Diffusion with Pajek
Calculate Thresholds of Diffusion with PajekChengjun Wang
 
Chapter 2. Multivariate Analysis of Stationary Time Series
 Chapter 2. Multivariate Analysis of Stationary Time Series Chapter 2. Multivariate Analysis of Stationary Time Series
Chapter 2. Multivariate Analysis of Stationary Time SeriesChengjun Wang
 
人类行为与最大熵原理
人类行为与最大熵原理人类行为与最大熵原理
人类行为与最大熵原理Chengjun Wang
 
Impact of human value, consumer perceived value
Impact of human value, consumer perceived valueImpact of human value, consumer perceived value
Impact of human value, consumer perceived valueChengjun Wang
 
Introduction to News diffusion On News Sharing Website
Introduction to News diffusion On News Sharing WebsiteIntroduction to News diffusion On News Sharing Website
Introduction to News diffusion On News Sharing WebsiteChengjun Wang
 
The Emergence of Spiral of Silence from Individual behaviors: Agent-based Mod...
The Emergence of Spiral of Silence from Individual behaviors: Agent-based Mod...The Emergence of Spiral of Silence from Individual behaviors: Agent-based Mod...
The Emergence of Spiral of Silence from Individual behaviors: Agent-based Mod...Chengjun Wang
 
Suppressor and distort variables
Suppressor and distort variablesSuppressor and distort variables
Suppressor and distort variablesChengjun Wang
 
A M O S L E A R N I N G
A M O S  L E A R N I N GA M O S  L E A R N I N G
A M O S L E A R N I N GChengjun Wang
 

More from Chengjun Wang (15)

计算传播学导论
计算传播学导论计算传播学导论
计算传播学导论
 
数据可视化 概念案例方法 王成军 20140104
数据可视化 概念案例方法 王成军 20140104数据可视化 概念案例方法 王成军 20140104
数据可视化 概念案例方法 王成军 20140104
 
Randomly sampling YouTube users
Randomly sampling YouTube usersRandomly sampling YouTube users
Randomly sampling YouTube users
 
An introduction to computational communication
An introduction to computational communication An introduction to computational communication
An introduction to computational communication
 
Pajek chapter2 Attributes and Relations
Pajek chapter2 Attributes and RelationsPajek chapter2 Attributes and Relations
Pajek chapter2 Attributes and Relations
 
Calculate Thresholds of Diffusion with Pajek
Calculate Thresholds of Diffusion with PajekCalculate Thresholds of Diffusion with Pajek
Calculate Thresholds of Diffusion with Pajek
 
Chapter 2. Multivariate Analysis of Stationary Time Series
 Chapter 2. Multivariate Analysis of Stationary Time Series Chapter 2. Multivariate Analysis of Stationary Time Series
Chapter 2. Multivariate Analysis of Stationary Time Series
 
人类行为与最大熵原理
人类行为与最大熵原理人类行为与最大熵原理
人类行为与最大熵原理
 
Impact of human value, consumer perceived value
Impact of human value, consumer perceived valueImpact of human value, consumer perceived value
Impact of human value, consumer perceived value
 
Introduction to News diffusion On News Sharing Website
Introduction to News diffusion On News Sharing WebsiteIntroduction to News diffusion On News Sharing Website
Introduction to News diffusion On News Sharing Website
 
The Emergence of Spiral of Silence from Individual behaviors: Agent-based Mod...
The Emergence of Spiral of Silence from Individual behaviors: Agent-based Mod...The Emergence of Spiral of Silence from Individual behaviors: Agent-based Mod...
The Emergence of Spiral of Silence from Individual behaviors: Agent-based Mod...
 
Suppressor and distort variables
Suppressor and distort variablesSuppressor and distort variables
Suppressor and distort variables
 
Pajek chapter1
Pajek chapter1Pajek chapter1
Pajek chapter1
 
A M O S L E A R N I N G
A M O S  L E A R N I N GA M O S  L E A R N I N G
A M O S L E A R N I N G
 
Amos Learning
Amos LearningAmos Learning
Amos Learning
 

Stata Learning From Treiman

  • 1. Doing analysis using stata 10.0stata作图(续) Treiman Ucla 易黠 于善国家中 2009.2.18
  • 3. Stata, created by Statacorp, is a statistical program used by many businesses and academic institutions around the world. Most of its users work in research, especially in the field of economics and epidemiology. Stata's full range of capabilities includes: Why I need it? Data management Statistical analysis Graphics Simulations Custom Programming
  • 4. Do Everything with -do- Files capture log close log using class.log,replace #delimit; version 10.0; set more 1; clear; program drop _all; set mem 100m; *CLASS.DO (DJT initiated 5/19/99, last revised 2/4/08); *This do-file creates computations for a paper on literacy in China.; use d:hinaurveyatahina07.dta; log close;
  • 5. Document Your -do- File Exhaustively the editorial review process often takes a very long time. If you have not documented your work, you may have a great deal of trouble remembering why you have done what you have done. include comments summarizing the outcome of each set of commands
  • 7. Chapter7 gssy2004case 1 curvilinear relationship reg inc age agesq if good==1 My method: gen inc1=3088.804*age-27.77605*agesq-15554.28
  • 9. *Mark the good data.; mark good if inc~=. & age>19 & age<65; *Do the regression and make a predicted value.; reg inc age agesq if good==1; *Get the transformed coefficients.; gen m=_b[_cons]-(_b[age]^2)/(4*_b[agesq]); gen F=(-_b[age])/(2*_b[agesq]); l m F in 1; 仅仅list 一个对应的m和F
  • 10. 严格的作图syntax Not simple lab var age "Age in 2004" lab varxinc "Expected Income in 2003" . graph twoway (scatter xincage,sort connect(l) clwidth(medthick) clpattern(solid) mcolor(black) msymbol(i)), plotregion > (style(none)) xlab(20(4)64) ylab(0(5000)50000) saving(ch07fig1.gph,replace) 定义坐标 Lab=label,例如将age标记为age in 2004 Sort connect(l)直线连接? Clwidth? Clpattern? Mcolor? ms(I) Plotregion?
  • 11. 记住这个格式grtw (sc xinc age, sort connect(l) ms(i))sort前空格,tw后空格。 我用简化的graph tw(sc xinc age,sort connect(l))作图 怎么去掉这些点?只需要加上 ms(i)即可。
  • 12. Case 2 二分类变量、平均值、相互控制reglninceduc hrs male if good==1; *Make graphs of the relationship between education and income, by sex, for those who work an average number of hours.; 显然大家都在用平均值来处理这种变量的交互叠加的效果。 *First, get the expected values evaluated at mean hours.; *Get the mean hours worked, which I need below.; sum hrs if good==1; gen mhrs=r(mean) if good==1; 跟我们的做法如出一辙:都是获取相应的mean。 Omg~~~~这个程序写起来也不是那么顺畅。一次只能return一个mean。 gen xincm=_b[_cons]+_b[educ]*educ+_b[hrs]*mhrs+_b[male] if male==1 & good==1; Gen xincf=_b[_cons]+_b[educ]*educ+_b[hrs]*mhrs if male==0 & good==1; 让你写出来这么多东西,也是一种痛苦吧。So many –b[],torture~~
  • 14. graph twoway (scatter xincmeduc, sort connect(l) clwidth(medthick) clpattern(solid) mcolor(black) msymbol(i)) (scatter xincfeduc, sort connect(l) clwidth(medthick) clpattern(solid) mcolor(black) msymbol(O)), plotregion(style(none)) legend ( label(1 "Males") label(2 "Females") cols(1) ring(0) position(11)) xlab(0(4)20) ylab(8(1)11) xtick(1(1)20) ytick(8(.25)11.5) l1("Expected ln(Income) in 2003") saving(ch07fig2.gph,replace)
  • 15. I hate to clean the syntax
  • 16. Predict命令 case3 没想到吗,随处可见的平方 sysuse auto, clear generate weight2 = weight^2 regress mpg weight weight2 foreign webusenewautos, clear generate weight2 = weight^2 *Obtain out-of-sample prediction using another dataset predict mpg but,how stata user know that?
  • 17. Sysuse auto,cleargenerate weight2 = weight^2regress mpg weight weight2 foreign. gen mpgf=_b[_cons]+_b[weight]*weight+_b[weight2]*weight2+_b[foreign]. gen mpgd=_b[_cons]+_b[weight]*weight+_b[weight2]*weight2. grtw (sc mpgf weight, sort connect(l) ms(i)) (sc mpgd weight, sort connect(l) ms(i)) 绘图 Lowess mpg weight grtw (sc mpgf weight, sort connect(l) ms(i)) (sc mpgd weight, sort connect(l) ms(i)), legend ( label(1 "foreign") label(2 "domestic")) Stata探测之方法 Lowess命令
  • 19. Set scheme economistgrtw (sc mpgf weight, sort connect(l) ms(i)) (sc mpgd weight, sort connect(l) ms(i)) ,legend(label(1 "foreign") label (2 "domestic")) ytitle("y=mileage") xtitle("x=weight")
  • 21. Lowess命令——加权回归 Lowess is a statistical technique for plotting a smooth curve through a set of data points in a scattergram.lowess carries out a locally weighted regression of yvar on xvar, displays the graph, and optionally saves the smoothed variable. A scattergram is a plot of various data points in a graph with a predictor variable as its x-axis and a criterion variable as its y-axis. Lowess is a version of a locally weighted scatterplotsmoothing technique. Each smoothed value is determined by a linear polynomial taking into account the values of data within a particular span of values of the criterion variable, but giving most weight to the central value of the span, less and less weight to more distant values, and zero weight to values outside the span. The span is then moved along the x-axis and a new smoothed value computed. The size of the span is set by a tension factor determining the proportion of the data points to be included in the span. Warning: lowess is computationally intensive and may therefore take a long time to run on a slow computer. Lowess calculations on 1,000 observations, for instance, require performing 1,000 regressions.
  • 22. 案例1 :手机上网频率与个人月收入之间的关系grtw (sc b2am inc,sort connect(l) ms(i)),ytitle(“y= b2am=_b[_cons]+_b[inc]*inc+_b[inc2]*inc2”) xtitle(“x=inc”)
  • 23. 案例2China 07.dta grtwlowessnewsnow income Lowessnewsnow income
  • 25. Use graph combine graph to compare different x variable effect Graph combine fig08_11.gph fig08_12.gph, ycommon cols(2) scale(1.25)
  • 28. 案例1 nl exp2 y1 x predict yhat1 grtw sc y1 x line yhat1 x, sort legend(off) ytitle(“y1=10*1.03^x+e”) xtitle(“x”)