Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Basic data ingestion in r

on

  • 1,190 views

 

Statistics

Views

Total Views
1,190
Views on SlideShare
1,190
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Basic data ingestion in r Basic data ingestion in r Presentation Transcript

  • Basic Data Ingestion in RDenver RUG 11/16/10
    @jrideout
    Software Engineer & Data Monkey
    @ReturnPath
  • Where is the data?
    Flat-file (text/binary)
    Relational Database
    Where is … (from google suggestions)
    chuck norris
    the love
    my mind
    the love lyrics (apparently a song by Black Eyed Peas)
  • read.*
    read.table
    read.csv(2)
    csv2 for , decimal points, : delim
    read.delim(2)
    Tab defaults
  • read.*
    library(foreign) provides read.
    systat, xport, ssd, octave, spss, mtp, epiinfo, dta, dbf
    Many Others:
    Search http://crantastic.org/
  • Scan
    Better for numeric matrices
    M1 <- matrix(scan("test.data"),nrow=x,ncol=y,byrow=T)
    Read 10000000 items
    user system elapsed
    28.565 18.513 50.882
    M2 <- as.matrix(read.table("test.data"))
    > 40 minutes on my laptop
    Actually (read.* just uses scan anyway)
  • Others
    readLines
    Sqldf
    MapReduce
    bigmemory
  • Some tricks
    comment.char="“
    Use colClasses  or as.isfor read.table
    stringsAsFactors
    Colnames(data) <- c(‘newName’,’other’)
    na.strings = “.”
  • Working with the DF
    Attach(df); fieldname
    df[[index]]
    df$fieldname
    Plyr/Reshape
    name abbreviation
    as.*, matrix, data.matrix
  • Type coercion
    Check types with str(), typeof()
    attributes()
    logical < integer < double < complex
    It’s better to get the read.* methods right than coerce later.
  • ?