Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Training in Analytics, R and Social Media Analytics

5,866 views

Published on

a basic training on analyzing data using R and social media analytics

Published in: Data & Analytics

Training in Analytics, R and Social Media Analytics

  1. 1. Basics of Analysis, Analytics and R Ajay Ohri
  2. 2. Why analysis ● Humans can count only till so much ● We understand summarized information ● We understand graphs faster ● We need to take decisions ● Wrong Decisions lead to huge costs
  3. 3. Central Tendency ● What is the difference between mean and median ● When to use what? ● What is expected value? ● When can mean be misleading? Exercise- What is the average height of this class
  4. 4. Grouped Means Exercise- What is height of class What is the height of class by gender What is the height of class by team What is the height of class by dark-light colored clothing CROSS TABS-exercise of mtcars
  5. 5. Variance What is the range (max - min) What is a quartile (4 quarters) What is a decile (10 deciles) No one really uses standard deviation in business world
  6. 6. Frequency Analysis contingency tables Height range Number of students Cumulative number less than 5.0 feet 25 25 5.0–5.5 feet 35 60 5.5–6.0 feet 20 80 6.0–6.5 feet 20 100 Dance Sports TV Total Men 2 10 8 20 Women 16 6 8 30 Total 18 16 16 50
  7. 7. Histogram
  8. 8. What is a distribution
  9. 9. EDA Exploratory Data Analysis Box Plot
  10. 10. Analytics • What is analytics? • Where is it used? • How is it used? • What are some good practices?
  11. 11. Analytics • What is analytics? – Study of data for helping with decision making using software • Where is it used? • How is it used? • What are some good practices?
  12. 12. Analytics • What is analytics? • Where is it used? – Industries (like Pharma, BFSI, Telecom, Retail) • How is it used? –Use statistics and software • What are some good practices?
  13. 13. Analytics • What is analytics? • Where is it used? • How is it used? • What are some good practices? – – Learn one new thing extra from your competition every day. This is a fast moving field. – Etc.
  14. 14. What is Data Science
  15. 15. Other Analytics Software • SAS (Base) et al • JMP • SPSS • Python • Octave • Clojure • Julia(?)
  16. 16. Social Media Analytics Some examples http://decisionstats.com/2013/12/04/top-fourteen-interfaces-in-social-media-and-web-analytics-on-the- internet/ Some use cases http://decisionstats.com/2014/05/10/analyzing-facebook-networks-using-rstats/ http://decisionstats.com/2013/09/11/using-twitter-data-with-r/
  17. 17. What is R? http://www.r-project.org/ • Language – Object oriented – Open Source – Free – Widely used the concept of "objects" that have data fields(attributes that describe the object) and associated procedures known as methods. Objects, which are usually instances of classes, are used to interact with one another to design applications and computer programs
  18. 18. Pre Requisites • Installation of R http://cran.rstudio.com/bin/windows/base/ • R Studio • R Packages
  19. 19. Pre Requisites • Installation of R – RTools • R Studio http://www.rstudio.com/products/rstudio/download/ • R Packages install.packages(), update.packages(), library()
  20. 20. Interfaces to R • Console Default Customization • IDE • GUI
  21. 21. Demo- Basic Objects on R Console • + • - • Log • Exp • * • / • () Functions-ls() – what objects are here rm(“foo”) removes object named foo Assignment Using = or -> assigns object names to values Hint- Up arrow gives you last typed command
  22. 22. Functions and Loops • Loops for (number in 1:5){ print (number) }
  23. 23. Functions and Loops • Function functionajay=function(a)(a^2+2*a+1) Hint: Always match brackets Each ( deserves a ) Each { deserves a } Each [ deserves a ]
  24. 24. Demo- Basic Objects on R Console • + • - • Log • Exp • * This is made more clear in next slide Functions-class() gives class dim() gives dimensions nrow() gives rows ncol() gives columns length() gives length str() gives structure Hint- Up arrow gives you last typed command
  25. 25. Demo- Datasets on R Console • Hint- use data() to list all loaded datasets
  26. 26. Demo- Datasets on R Console • Hint- use data() to list all loaded datasets library(FOO) loads package “FOO”
  27. 27. Packages in R • CRAN • CRAN Views • R Documentation
  28. 28. Documentation in R • Help ? And ?? • CRAN Views • Package Help • Tips for Googling – Stack Overflow – Email Lists – Twitter – R Bloggers
  29. 29. Graphical Interfaces to R • R Commander • Rattle • Deducer
  30. 30. Overview of R Commander
  31. 31. Demo R Commander – 3D Graphs
  32. 32. Overview of Rattle
  33. 33. Demo Rattle
  34. 34. Overview of Deducer (with JGR)
  35. 35. Demo Deducer • data() • data(mtcars)
  36. 36. read.table()
  37. 37. From Databases The RODBC package provides access to databases through an ODBC interface. The primary functions are • odbcConnect(dsn, uid="", pwd="") Open a connection to an ODBC database • sqlFetch(channel, sqltable) Read a table from an ODBC database into a data frame Hint- a good site to learn R http://www.statmethods.net
  38. 38. A Detour to SQL
  39. 39. From Web (aka Web Scraping) • readlines Hint : R is case sensitive readlines is not the same as readLines Hint : Use head() and tail() to inspect objects Other packages are XML and Curl Case Study- http://decisionstats.com/2013/04/14/using-r-for-cricket-analysis-rstats/
  40. 40. Inspecting Data Quality: Demo •
  41. 41. Inspecting Data Quality: Demo •
  42. 42. Data Selection: Demo Questions- How do I use multiple conditions (AND OR) Can I do away with subset function How do I select random sample Useful Link- http://decisionstats.com/2013/11/24/50-functions-to-clear-a-basic-interview-for- business-analytics-rstats/
  43. 43. Data Exploration • missing values are represented by NA in R • Demo – is.na – na.omit – na.rm
  44. 44. Data Visualization Notes- Explaining Basic Types of Graphs Customizing Graphs Graph Output Advanced Graphs Facets, Grammar of Graphics Data Visualization Rules
  45. 45. Data Manipulation Demo Notes- 1. gsub 2. gsub with escape 3. as operator 4. is operator

×