• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to data science and R language
 

Introduction to data science and R language

on

  • 739 views

This presentation explains about the basic introduction to Data Science and R language and how to visualize using R language

This presentation explains about the basic introduction to Data Science and R language and how to visualize using R language

Statistics

Views

Total Views
739
Views on SlideShare
739
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
1

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Good coverage. For those who are looking for similar material, the free video course at my-classes.com on http://my-classes.com/course/practical-introduction-to-r-basics/ can be helpful to go along with this one.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduction to data science and R language Introduction to data science and R language Presentation Transcript

    • Introduction to Data Science and R language 13 August 2013 •Anju Gahlawat
    • Index – Introduction to Data Science – Hidden skills of Data Scientist – Failure of Current Statistical tools like SAS and Excel – Introduction to R language – R Basic Commands – Running SQL server with R – Visualizing Data with R – Introduction to Shiny – Future of R 1
    • Data Science 2 Data Science is all about telling a STORY from the data.
    • Data Science deals with…. 3
    • 5 Hidden Skills for Data Scientists – Be Clear: Is Your Problem Really A Big Data Problem? – Communicating About Your Data – Invest in Interactive Analytics, not Reporting – Understand the Role and Quality of Human Evaluations of Data – Spend Time on the Plumbing 4
    • Difference between Data Science and Big Data Big data is more concerned with the engineering components of data and in answering the following questions: – How do you store it, – How do you manipulate it, – How do you do parallelized computations on it, – How do you access it, – How do you mine it But science is more than that. – It deals with looking at the algorithmic and mathematical aspects of extracting knowledge from data. – Data science applies advanced analytical tools and algorithms to generate predictive insights and new product innovations that are a direct result of the data 5
    • Shortcomings of current Visualization and statistical tools – The most commonly-used statistical software tools either fail completely or are too slow to be useful on huge data sets – Less scalability – Less Flexibility to new and fast scalable algorithms – Problems printing charts in Excel: Missing legend data or sometimes x or y axis missing – If there’s a value in the upper-left corner of the data set (A1 in this case), Excel fails to chart the data correctly. e.g. 6
    • Introduction to R – R is a computer language and run-time environment which is used for data manipulation, statistics, and graphics – The base part of R comes with a wide range of standard statistical and graphical analyses and user-developed extension packages built in. – R is an expression-based language. – It is possible to interface procedures written in C, C+, or FORTRAN languages for efficiency, and to write additional primitives. 7 R, And the Rise of the Best Software Money Can’t Buy
    • = R users rely on functions that have been developed for them by statistical researchers, but they can also create their own or modify the existing ones as per their needs. 8
    • Why R? 9
    • Contd… 10
    • Getting started R 11 ▪ Latest Version 3.0.1 for windows ▪ Link to download R setup http://cran.r-project.org/bin/windows/base/ ▪ 51.5MB set up file ▪ GUI for R – R Studio. Latest Version 0.97.551 ▪ Link to download R studio http://www.rstudio.com/ide/download/desktop ▪ 32.5MB exe file.
    • R Studio 12
    • Sample R code – Read a data set into R (from a local file or network URL). • bse <- read.csv("bse_table.csv", header = TRUE, sep=",") – Examine the basic structure of data 13
    • Running SQL server with R Install package – RODBC Create ODBC connection channel <- odbcConnect([ODBC Name]); Tab1 <- sqlQuery(channel, "Select * from TabName") 14
    • R code - Plotting graph • > bse$Date <- as.Date(bse$Date, format="%Y-%m-%d") • > plot(x<- bse$Date, y<-bse$Open,type = "l" , main = "BSE Data",col = blue“, xlab="Periods", ylab="Index",lwd=2) 15
    • Stock Analysis - Sample graph 16
    • Packages in R…. 17
    • Some graphs made using R: 18
    • Introduction to Shiny – R web UI •R Package Shiny from RStudio supplies –interactive web application / dynamic HTML- Pages with plain R –GUI for own needs –Website as server 19
    • What makes Shiny so special? – Very Simple: Ready to Use Components – Shiny is very slick, achieving interactive and pleasant looking web UI’s. – Event-driven (reactive programming): input <-> output (without requiring a reload of the browser) – Shiny user interfaces can be built entirely using R, or can be written directly in HTML, CSS, and JavaScript for more flexibility. – A highly customizable slider widget with built-in support for animation. – Pre-built output widgets for displaying plots, tables, and printed output of R objects. – Fast bidirectional communication between the web browser and R using the websocket package. 20
    • Stock Analysis - Using Shiny 21
    • Current Market trend of Statistical languages 22
    • Stats related to R - Google hits 23
    • R is the most powerful and flexible statistical programming language in the world……… 24
    • Job trends in Statistical Market 25 Software 2012 2013 Difference Ratio SAS 13234 12272 -961 0.93 SPSS 3299 3289 -10 1 R 1196 1693 497 1.42 Minitab 1769 1615 -154 0.91 Stata 842 898 56 1.07 JMP 644 619 -25 0.96 Statistica 61 71 10 1.17 Systat 14 15 1 1.07 BMDP 6 10 3 1.53 -1200 -1000 -800 -600 -400 -200 0 200 400 600 SAS SPSS R Minitab Stata JMP Statistica Systat BMDP Trend of Jobs on Indeed.com in March 2012 and 2013
    • Final Words of Warning • “Using R is a bit akin to smoking. The beginning is difficult, one may get headaches and even gag the first few times. But in the long run,it becomes pleasurable and even addictive. Yet, deep down, for those willing to be honest, there is something not fully healthy in it.” --Francois Pinard 26 R
    • Visualization is only one slice of R cake…….. 27 R deals with • Machine Learning • Social Media Analytics • Sentiment Analysis • Predictive Modeling • Network Analysis • Visualization • Time series Analysis • Simulation • And lot more To be continued……….