Carma internet research module: Future data collection


Published on

Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Carma internet research module: Future data collection

  1. 1. Future Data CollectionCARMA Internet Research Module Jeff Stanton
  2. 2. Several Promising Environments and Techniques• Visual Surveys• Audio and video interviewing• Virtual Worlds• Web scraping• Network extraction and mapping• Polls Everywhere• Mobility
  3. 3. Visual SurveysVisual DNA: try:
  4. 4. Visual SurveysProvides an engaging alternative to text-based surveys; more fun for respondentsRequires considerable set-up time; each screen is like an item; each picture is like an item response; every item and response must be keyed against one or more criteria Example: Previous page, “How do you approach stress,” could be keyed against other subjective or objective measures of stress, coping, general health, immune response, etc.
  5. 5. Audio and Video InterviewingMethods: Structured, semi- structured, and unstructured interviewing; focus groupsProducts: Skype, WebEx, Adobe Connect, Cisco TelepresenceAdvantages: Reduced travel costs, speedDisadvantages: High bandwidth, user technology requirements, unreliable connections
  6. 6. Virtual WorldsVastPark:
  7. 7. Virtual WorldsCombine text and audio chat with social networking and 3D model buildingMethods: Structured, semi-structured, and unstructured interviewing; focus groups; unobtrusive observation; participant observation; possibly some experimental perceptual, cooperative, or navigational tasksProducts: VastPark, OpenSim, EduSim, TeleplaceAdvantages: Speed, low costDisadvantages: Steep learning curve; high bandwidth; user technology requirements; unreliable connections
  8. 8. Web Scraping
  9. 9. Web ScrapingRetrieval and processing of text or images, e.g., from blogs; processing may include semantic analysis of people, events, emotionsMethods: Archival document analysisProducts: 100s of commercial, mainly focused on brand, reputation, marketing; open source product: WebHarvestAdvantages: Data are plentiful and cover a wide range of topicsDisadvantages: Technology hard to master; even after considerable automated processing, analysis has an intensive, qualitative flavor
  10. 10. Make a Wordcloud with Twitter and R• Download R, the open source statistical platform; for more fun, also download R-Studio; both available for Windows, Mac, and Linux• You will need four packages to make a word cloud: twitteR, stringr, tm, and wordcloud – Use install.packages() and library() commands to prepare packages for use in R• Code appears on the following page; explanation is in my free eBook, Introduction to Data Science on the iTunes Bookstore
  11. 11. # TweetFrame() - Return a dataframe based on a search of TwitterTweetFrame<-function(searchTerm, maxTweets){ tweetList <- searchTwitter(searchTerm, n=maxTweets) tweetDF<-"rbind", lapply(tweetList, # This last step sorts the tweets in arrival order return(tweetDF[order(as.integer(tweetDF$created)), ])}# CleanTweets() - Takes the junk out of a vector of tweet textsCleanTweets<-function(tweets){ tweets <- str_replace_all(tweets," "," ") tweets <- str_replace_all(tweets, + "[a-z,A-Z,0-9]{8}","") tweets <- str_replace(tweets,"RT @[a-z,A-Z]*: ","") tweets <- str_replace_all(tweets,"#[a-z,A-Z]*","") tweets <- str_replace_all(tweets,"@[a-z,A-Z]*","") return(tweets)}# Command line codetweetDF <- TweetFrame(”#yourhashtag",100)cleanText<-CleanTweets(tweetDF$text)tweetCorpus<-Corpus(VectorSource(cleanText))tweetTDM<-TermDocumentMatrix(tweetCorpus)tdMatrix <- as.matrix(tweetTDM)sortedMatrix<-sort(rowSums(tdMatrix), decreasing=TRUE)cloudFrame<-data.frame( word=names(sortedMatrix),freq=sortedMatrix)wordcloud(cloudFrame$word,cloudFrame$freq)
  12. 12. Example Wordcloud: Hashtag “#solar”
  13. 13. Network Mapping
  14. 14. Mapping Social NetworksNicholas Christakis of the Framingham Heart Study has shown the power of social networks to influence a variety of health outcomesMethods: Traditional self-report & objective measures; topographical measures such as network centrality; “neighbor” measuresProducts: Depends on data types; TouchGraph is a network web search engine; InFlow; UCInet; See: wareAdvantages: Meaningful improvement in predictive capabilityDisadvantages: Intensive technique requires careful planning and setup; data collection difficult and time consuming
  15. 15. Facebook Polls
  16. 16. Embedded PollsCollection of short-format survey data from social networking and membership sitesMethods: Primarily standard, closed-ended self- report; single item scalesProducts: Example: Vizu provides a “widget” that allows embedding of polls on Facebook pagesAdvantages: Quick, cheap, possible to get a large sample in a short timeDisadvantages: Difficult to control access, short format limits use of multi-item scales
  17. 17. Mobility
  18. 18. Data Collection from Mobile DevicesUsing smartphones and other mobile devices as a basis for interacting with participantsMethods: Primarily self-report but can include location and movement dataProducts: Example: Survey On The Spot allows location aware surveys to be delivered to smart phones; TrailGuru collects route data from hikers and joggersAdvantages: Platform is becoming ubiquitous, location data provides new options for understanding behaviorDisadvantages: Privacy issues, small screen, complex programming interfaces
  19. 19. iPhone FunReaching Mobile Participants• Micropayment system built into the platform• Feasible for short instruments• Can be tied to particular experiences, e.g., museum visits• Responses can be geotagged to support mapping