Introduction to Data Science and
R language
13 August 2013
•Anju Gahlawat
Index
– Introduction to Data Science
– Hidden skills of Data Scientist
– Failure of Current Statistical
tools like SAS and...
Data Science
2
Data Science is all about telling a STORY from the data.
Data Science deals with….
3
5 Hidden Skills for Data Scientists
– Be Clear: Is Your Problem Really A
Big Data Problem?
– Communicating About Your Data...
Difference between Data Science
and Big Data
Big data is more concerned with the engineering components of data and in
ans...
Shortcomings of current
Visualization and statistical tools
– The most commonly-used statistical software tools either fai...
Introduction to R
– R is a computer language and run-time environment which is used for
data manipulation, statistics, and...
=
R users rely on functions that have been developed for them
by statistical researchers, but they can also create their o...
Why R?
9
Contd…
10
Getting started R
11
▪ Latest Version 3.0.1 for windows
▪ Link to download R setup http://cran.r-project.org/bin/windows/b...
R Studio
12
Sample R code
– Read a data set into R (from a local file or
network URL).
• bse <- read.csv("bse_table.csv",
header = TRU...
Running SQL server with R
Install package – RODBC
Create ODBC connection
channel <- odbcConnect([ODBC Name]);
Tab1 <- sqlQ...
R code - Plotting graph
• > bse$Date <- as.Date(bse$Date, format="%Y-%m-%d")
• > plot(x<- bse$Date, y<-bse$Open,type = "l"...
Stock Analysis - Sample graph
16
Packages in R….
17
Some graphs made using R:
18
Introduction to Shiny – R web UI
•R Package Shiny from RStudio supplies
–interactive web application / dynamic HTML-
Pages...
What makes Shiny so special?
– Very Simple: Ready to Use Components
– Shiny is very slick, achieving interactive and pleas...
Stock Analysis - Using Shiny
21
Current Market trend
of
Statistical languages
22
Stats related to R - Google hits
23
R is the most powerful and flexible statistical programming language in the
world………
24
Job trends in Statistical Market
25
Software 2012 2013 Difference Ratio
SAS 13234 12272 -961 0.93
SPSS 3299 3289 -10 1
R 1...
Final Words of Warning
• “Using R is a bit akin to smoking.
The beginning is difficult, one may
get headaches and even gag...
Visualization is only one slice of R
cake……..
27
R deals with
• Machine Learning
• Social Media Analytics
• Sentiment Anal...
Upcoming SlideShare
Loading in...5
×

Introduction to data science and R language

989

Published on

This presentation explains about the basic introduction to Data Science and R language and how to visualize using R language

Published in: Technology, Education
1 Comment
1 Like
Statistics
Notes
  • Good coverage. For those who are looking for similar material, the free video course at my-classes.com on http://my-classes.com/course/practical-introduction-to-r-basics/ can be helpful to go along with this one.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
989
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Introduction to data science and R language"

  1. 1. Introduction to Data Science and R language 13 August 2013 •Anju Gahlawat
  2. 2. Index – Introduction to Data Science – Hidden skills of Data Scientist – Failure of Current Statistical tools like SAS and Excel – Introduction to R language – R Basic Commands – Running SQL server with R – Visualizing Data with R – Introduction to Shiny – Future of R 1
  3. 3. Data Science 2 Data Science is all about telling a STORY from the data.
  4. 4. Data Science deals with…. 3
  5. 5. 5 Hidden Skills for Data Scientists – Be Clear: Is Your Problem Really A Big Data Problem? – Communicating About Your Data – Invest in Interactive Analytics, not Reporting – Understand the Role and Quality of Human Evaluations of Data – Spend Time on the Plumbing 4
  6. 6. Difference between Data Science and Big Data Big data is more concerned with the engineering components of data and in answering the following questions: – How do you store it, – How do you manipulate it, – How do you do parallelized computations on it, – How do you access it, – How do you mine it But science is more than that. – It deals with looking at the algorithmic and mathematical aspects of extracting knowledge from data. – Data science applies advanced analytical tools and algorithms to generate predictive insights and new product innovations that are a direct result of the data 5
  7. 7. Shortcomings of current Visualization and statistical tools – The most commonly-used statistical software tools either fail completely or are too slow to be useful on huge data sets – Less scalability – Less Flexibility to new and fast scalable algorithms – Problems printing charts in Excel: Missing legend data or sometimes x or y axis missing – If there’s a value in the upper-left corner of the data set (A1 in this case), Excel fails to chart the data correctly. e.g. 6
  8. 8. Introduction to R – R is a computer language and run-time environment which is used for data manipulation, statistics, and graphics – The base part of R comes with a wide range of standard statistical and graphical analyses and user-developed extension packages built in. – R is an expression-based language. – It is possible to interface procedures written in C, C+, or FORTRAN languages for efficiency, and to write additional primitives. 7 R, And the Rise of the Best Software Money Can’t Buy
  9. 9. = R users rely on functions that have been developed for them by statistical researchers, but they can also create their own or modify the existing ones as per their needs. 8
  10. 10. Why R? 9
  11. 11. Contd… 10
  12. 12. Getting started R 11 ▪ Latest Version 3.0.1 for windows ▪ Link to download R setup http://cran.r-project.org/bin/windows/base/ ▪ 51.5MB set up file ▪ GUI for R – R Studio. Latest Version 0.97.551 ▪ Link to download R studio http://www.rstudio.com/ide/download/desktop ▪ 32.5MB exe file.
  13. 13. R Studio 12
  14. 14. Sample R code – Read a data set into R (from a local file or network URL). • bse <- read.csv("bse_table.csv", header = TRUE, sep=",") – Examine the basic structure of data 13
  15. 15. Running SQL server with R Install package – RODBC Create ODBC connection channel <- odbcConnect([ODBC Name]); Tab1 <- sqlQuery(channel, "Select * from TabName") 14
  16. 16. R code - Plotting graph • > bse$Date <- as.Date(bse$Date, format="%Y-%m-%d") • > plot(x<- bse$Date, y<-bse$Open,type = "l" , main = "BSE Data",col = blue“, xlab="Periods", ylab="Index",lwd=2) 15
  17. 17. Stock Analysis - Sample graph 16
  18. 18. Packages in R…. 17
  19. 19. Some graphs made using R: 18
  20. 20. Introduction to Shiny – R web UI •R Package Shiny from RStudio supplies –interactive web application / dynamic HTML- Pages with plain R –GUI for own needs –Website as server 19
  21. 21. What makes Shiny so special? – Very Simple: Ready to Use Components – Shiny is very slick, achieving interactive and pleasant looking web UI’s. – Event-driven (reactive programming): input <-> output (without requiring a reload of the browser) – Shiny user interfaces can be built entirely using R, or can be written directly in HTML, CSS, and JavaScript for more flexibility. – A highly customizable slider widget with built-in support for animation. – Pre-built output widgets for displaying plots, tables, and printed output of R objects. – Fast bidirectional communication between the web browser and R using the websocket package. 20
  22. 22. Stock Analysis - Using Shiny 21
  23. 23. Current Market trend of Statistical languages 22
  24. 24. Stats related to R - Google hits 23
  25. 25. R is the most powerful and flexible statistical programming language in the world……… 24
  26. 26. Job trends in Statistical Market 25 Software 2012 2013 Difference Ratio SAS 13234 12272 -961 0.93 SPSS 3299 3289 -10 1 R 1196 1693 497 1.42 Minitab 1769 1615 -154 0.91 Stata 842 898 56 1.07 JMP 644 619 -25 0.96 Statistica 61 71 10 1.17 Systat 14 15 1 1.07 BMDP 6 10 3 1.53 -1200 -1000 -800 -600 -400 -200 0 200 400 600 SAS SPSS R Minitab Stata JMP Statistica Systat BMDP Trend of Jobs on Indeed.com in March 2012 and 2013
  27. 27. Final Words of Warning • “Using R is a bit akin to smoking. The beginning is difficult, one may get headaches and even gag the first few times. But in the long run,it becomes pleasurable and even addictive. Yet, deep down, for those willing to be honest, there is something not fully healthy in it.” --Francois Pinard 26 R
  28. 28. Visualization is only one slice of R cake…….. 27 R deals with • Machine Learning • Social Media Analytics • Sentiment Analysis • Predictive Modeling • Network Analysis • Visualization • Time series Analysis • Simulation • And lot more To be continued……….

×