R v praxi

418 views
270 views

Published on

Prezentacia z prveho meetupu Banalytics (28. 5. 2014) o prepojeni statistickeho programovacieho jazyka R s twitterom pre ziskavanie dat a kratkej analyze tychto dat.

Published in: Data & Analytics, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
418
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

R v praxi

  1. 1. 28. 5. 2014 @rgavuliak roman.gavuliak@gmail.com
  2. 2. R v praxi freeware nástroj na analýzu
  3. 3. R2 • Free + open source • Štatistika + Grafy • “Programovací jazyk” • Unlimited packages • Vektorové operácie
  4. 4. R je hnusné...
  5. 5. R je hnusné...
  6. 6. R je hnusné...
  7. 7. ale populárne! Zdroj: r4stats.com
  8. 8. Dátové typy • klasika (číslo, písmeno, slovo) • vektor • tabuľka (data.frame, data.table) • zoznam (list)
  9. 9. Čo do rka? • A<-1 • B<-c("biela","modra","cervena") • C<-read.csv(“platby.csv”) • D<-dbSendQuery(MySQL(),”select * from xy”) • Big data technológie (hadoop, cassandra...) • Sociálne siete • NA
  10. 10. Čo s tým • > a+1 2 • summary(c) • model<-lm(pageviews~navstevnost,data=stranka) • plot(stranka$navstevnost)
  11. 11. How to: Rko a twitter #volilisme
  12. 12. Cesta začína na https://dev.twitter.com/
  13. 13. Prepojenie s twitterom
  14. 14. Vytvárame fiktívnu applikáciu
  15. 15. Prepájame Rko s twitterom install.packages("RCurl") library(RCurl) options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))) install.packages("twitteR") library(twitteR) reqURL <- "https://api.twitter.com/oauth/request_token" accessURL <- "https://api.twitter.com/oauth/access_token" authURL <- "https://api.twitter.com/oauth/authorize" apiKey <- "xxxxxxxxxxxxxxxxxxxxxx" apiSecret <- "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx„ twitCred$handshake( cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl") ) registerTwitterOAuth(twitCred)
  16. 16. Dáta volbyEU<-searchTwitter("#volilisme",n = 10000,since="2014-05-23") tweety<-twListToDF(volbyEU) head(tweety)
  17. 17. Dáta people<-lookupUsers(tweety$screenName) users<-twListToDF(people) head(users,3)
  18. 18. Čo nás zaujíma? 1. Popularita tweetov 2. Popularita userov 3. Lokácia tweetov
  19. 19. Popularita tweetov • Počet retweetov • Počet favorites
  20. 20. Popularita tweetov summary(tweety$retweetCount) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 0 0 4.008 3 20 summary(tweety$favoriteCount) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 0 0 0.3525 0 6
  21. 21. Popularita tweetov
  22. 22. Popularita userov • Počet followerov • Počet favorites
  23. 23. Popularita userov summary(tweety$followersCount) Min. 1st Qu. Median Mean 3rd Qu. Max. 4 71 186.5 1286 620.5 31151 summary(tweety$favoritesCount) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 9.5 63 315 316 223700
  24. 24. Popularita userov
  25. 25. Popularita userov plot(users$favoritesCount~users$followersCount,main="Followes vs favorites“, xlab="Total followers",ylab="Total favorites",col="blue")
  26. 26. Popularita userov plot(log(users$favoritesCount+1)~log(users$followersCount+1),xlab="l og Total followers",ylab="log Total favorites", col="orange", main="Followes vs favorites",cex=1.5)
  27. 27. Popularita userov Linearny_model<- lm(log(users$favoritesCount+1)~log(users$followersCount+1)) summary(Linearny_model)
  28. 28. Popularita userov abline(Linearny_model,col="red")
  29. 29. Lokácia tweetov nrow(tweety[!is.na(tweety$longitude),]) 4 tweety majú lokáciu
  30. 30. Lokácia tweetov table(tweety$longitude) class(tweety$longitude) tweety$longitude<-as.numeric(tweety$longitude) tweety$latitude<-as.numeric(tweety$latitude) library(ggplot2) install.packages("ggmap") library(ggmap) center=paste(mean(tweety$latitude,na.rm=TRUE),mean(tweety$longitude,na.rm=TRUE) ,sep=" ") map <- get_map(location = center, zoom = 9, maptype = "terrain", source = "google") vysledna_mapa <- ggmap(map) vysledna_mapa <- vysledna_mapa + geom_text(data=tweety,aes(x=longitude, y=latitude,label = paste("@“,screenName,sep=" ")), colour="purple",size=5,hjust=0, vjust=0)+ theme(legend.position = "none") vysledna_mapa <- vysledna_mapa + geom_point(data=tweety,aes(x=longitude, y=latitude),colour="purple",size=2,na.rm=TRUE) vysledna_mapa
  31. 31. Lokácia tweetov
  32. 32. Závery • Neúspech? • Výber cieľovky • Call to action
  33. 33. Zdroje http://www.r-bloggers.com/r-text-mining-on-twitter-prayformh370- malaysia-airlines/ http://cran.r-project.org/web/packages/twitteR/twitteR.pdf https://dev.twitter.com/docs/platform-objects/tweets https://dev.twitter.com/docs/platform-objects/users https://stackoverflow.com/questions/14095495/plotting-coordinates-of- multiple-points-at-google-map-in-r
  34. 34. Ako sa naučiť Rko? • R programming (https://www.coursera.org/course/rprog) • https://www.datacamp.com/courses/introduction-to-r • http://tryr.codeschool.com/
  35. 35. Ďakujem za pozornosť! Otázky? Pripomienky? @rgavuliak roman.gavuliak@gmail.com

×