Your SlideShare is downloading. ×
0
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
R workshop i r basic (4th time)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

R workshop i r basic (4th time)

167

Published on

NYC Data Science Academy, NYC Open Data Meetup, Big Data, Data Science, NYC, Vivian Zhang, SupStat Inc, R programming, R workshop, ggplot2

NYC Data Science Academy, NYC Open Data Meetup, Big Data, Data Science, NYC, Vivian Zhang, SupStat Inc, R programming, R workshop, ggplot2

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
167
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. RRWWoorrkksshhooppII gettoknowNYCopendataportalandstarttouseR Vivian Zhang for NYC-open-data meetup http://www.meetup.com/NYC-Open-Data/ R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 1 of 27 6/13/14, 1:50 PM
  • 2. OOvveerrvviieeww nyc open data portal Rstudio R Github hack time · · · · · 2/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 2 of 27 6/13/14, 1:50 PM
  • 3. AAddvvaannttaaggeeooffuussiinnggRRssttuuddiioo Easiness· install and load R packages keep track of R dev version download github repositories debug faster - - - - 3/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 3 of 27 6/13/14, 1:50 PM
  • 4. ddiiaammoonnddssssuubbsseettttiinnggeexxaammppllee require(ggplot2) head(diamonds) ## carat cut color clarity depth table price x y z ## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 ## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 ## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 4/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 4 of 27 6/13/14, 1:50 PM
  • 5. ddiiaammoonnddssssuubbsseettttiinnggeexxaammppllee head(diamonds[-1, ]) ## carat cut color clarity depth table price x y z ## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 ## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 ## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 ## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 5/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 5 of 27 6/13/14, 1:50 PM
  • 6. ddiiaammoonnddssssuubbsseettttiinnggeexxaammppllee head(diamonds[, -1]) ## cut color clarity depth table price x y z ## 1 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 3 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 4 Premium I VS2 62.4 58 334 4.20 4.23 2.63 ## 5 Good J SI2 63.3 58 335 4.34 4.35 2.75 ## 6 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 6/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 6 of 27 6/13/14, 1:50 PM
  • 7. ddiiaammoonnddssssuubbsseettttiinnggeexxaammppllee head(diamonds[c(1, 2), ]) ## carat cut color clarity depth table price x y z ## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 7/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 7 of 27 6/13/14, 1:50 PM
  • 8. ddiiaammoonnddssssuubbsseettttiinnggeexxaammppllee names(diamonds) ## [1] "carat" "cut" "color" "clarity" "depth" "table" "price" ## [8] "x" "y" "z" head(diamonds[, c(T, T, F, F, F, F, T, F, F, F)]) ## carat cut price ## 1 0.23 Ideal 326 ## 2 0.21 Premium 326 ## 3 0.23 Good 327 ## 4 0.29 Premium 334 ## 5 0.31 Good 335 ## 6 0.24 Very Good 336 8/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 8 of 27 6/13/14, 1:50 PM
  • 9. ddiiaammoonnddssssuubbsseettttiinnggeexxaammppllee names(diamonds) ## [1] "carat" "cut" "color" "clarity" "depth" "table" "price" ## [8] "x" "y" "z" head(diamonds$carat) ## [1] 0.23 0.21 0.23 0.29 0.31 0.24 diamonds[diamonds$price == max(diamonds$price), ] ## carat cut color clarity depth table price x y z ## 27750 2.29 Premium I VS2 60.8 60 18823 8.5 8.47 5.16 9/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 9 of 27 6/13/14, 1:50 PM
  • 10. rreeaaddiinnggaannddssuubbsseettttiinnggddaattaaiinnRR blank integer logical character · include all- · +: include;-: exclude- · include TRUEs- · lookup by name- Source: Hadley Wickham 10/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 10 of 27 6/13/14, 1:50 PM
  • 11. ddaattaassttrruuccttuurreeiinnRR Source: Hadley Wickham 11/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 11 of 27 6/13/14, 1:50 PM
  • 12. rreeaaddiinntthheeooppeennddaattaa read.table() read.csv() · · rodent1year <- read.csv("C:UserszhangsGoogle DriveR codeRworkshop311_Service_Requests_from_2010_ header = TRUE, sep = ",") dim(rodent1year) summary(rodent1year) table(rodent1year$Borough) 12/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 12 of 27 6/13/14, 1:50 PM
  • 13. With() is generic function that evaluates expr in a local environment constructed from data. Using ggplot2, "aes" stands for "aesthetics", "geom"" is used to create scatterplots pplloottddiiaammoonnddss with(diamonds, plot(carat, price)) ggplot(diamonds, aes(x = carat, y = price)) + geom_point() 13/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 13 of 27 6/13/14, 1:50 PM
  • 14. pplloottddiiaammoonnddss ggplot2 generates more supplicated graph than the traditional graphics package. Let us play with some color ggplot(diamonds, aes(x = carat, y = price, colour = cut)) + geom_point() 14/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 14 of 27 6/13/14, 1:50 PM
  • 15. pplloottddiiaammoonnddss In stead of fitting linear relation, we try to fit log linear relation Log(price) is quite linear with log(carat),Bingo! ggplot(diamonds, aes(x = log(carat), y = log(price), colour = cut)) + geom_point() 15/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 15 of 27 6/13/14, 1:50 PM
  • 16. pplloottddiiaammoonnddss As letters go from D to J, the diamond becomes more and more yellow. The numbers beside "S"(small) and "VS"(very small) describe the size of "internal imperfections" in the diamonds. "IF" is internally flawless. ggplot(diamonds, aes(x = log(carat), y = log(price), colour = cut)) + geom_point() + facet_grid(clarity ~ color) 16/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 16 of 27 6/13/14, 1:50 PM
  • 17. pplloottddiiaammoonnddss Let us look back to a normal scale. The bottom left panel shows price vs carat for ultimate white and internally flawless diamonds. The upper right panel shows price vs carat for most unpure(or dirtiest) and flawed diamonds. ggplot(diamonds, aes(x = carat, y = price, colour = cut)) + geom_point() + facet_grid(clarity ~ color) 17/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 17 of 27 6/13/14, 1:50 PM
  • 18. pplloottddiiaammoonnddss As we would expect, for the diamonds at the same level of pureness(observed by row) , the price per carat increases faster for white stone (bottom left) than for yellow stone(bottom right). And for the diamond at the same level of color (observed by column), the price per carat increases faster for pure stone(bottom left) than for dirty stone(upper left). 18/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 18 of 27 6/13/14, 1:50 PM
  • 19. pplloottddiiaammoonnddss We facet the plot by one of these factor variables--clarity. ggplot(diamonds, aes(x = carat, y = price, colour = cut)) + geom_point() + facet_grid(clarity ~ .) 19/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 19 of 27 6/13/14, 1:50 PM
  • 20. ggooooddttiippttooggeenneerraatteepplloottss The same type of graph is used over and over again while new individual component of ggplot2 is introduced and interpreted. It is a very effective way to display complex relationship in large, high-dimensional data. Remember, the key is to bring in only one change each time. Source: http://gettinggeneticsdone.blogspot.com/2010/01/ggplot2-tutorial-scatterplots-in-series.html 20/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 20 of 27 6/13/14, 1:50 PM
  • 21. pplloottddiiaammoonnddss Last , we fit line for the orginal data and for the log transformed data.The linear relation is roughly perfect of the log transformed data if we ignore the few points at two sides of the distribution. ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth()ggplot(diamonds, aes(x = log(carat), y = log(price))) + geo 21/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 21 of 27 6/13/14, 1:50 PM
  • 22. aammaazziinnggNNYYTTiimmeessssaammppllee http://timelyportfolio.github.io/rCharts_512paths/ Source: Timely Portfolio and NYTimes 22/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 22 of 27 6/13/14, 1:50 PM
  • 23. wwhhyyddoowweeuusseeRR Dirk's exmaple about elegance and efficiency of R Source: Dirk Eddelbuettel 23/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 23 of 27 6/13/14, 1:50 PM
  • 24. wwhhyyddoowweeuusseeRR Dirk's exmaple about elegance and efficiency of R Source: Dirk Eddelbuettel 24/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 24 of 27 6/13/14, 1:50 PM
  • 25. hhaacckkttiimmee download an open dataset using filter read it in to your Rstudio check the dimensity of the dataset decide which columns you will use plot it! · · · · · 25/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 25 of 27 6/13/14, 1:50 PM
  • 26. RReessoouurrcceess R in a Nutshell - Joseph Adler The Art of R Programming - Norman Matloff ggplot2 - Elegant Graphics for Data Analysis - Hadley Wickham 26/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 26 of 27 6/13/14, 1:50 PM
  • 27. 27/27 R Workshop I http://www.nycopendata.com/RworkshopI/index.html#1 27 of 27 6/13/14, 1:50 PM

×