Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction to R
For Big Data Analysis
Sunday,	
  October	
  	
  25,	
  2015	
  
2:30pm	
  –	
  3:15	
  pm	
  
Moscone	
 ...
SIG Meetings at Oracle OpenWorld
All meetings will be held in User Group Pavilion,
Meeting Room, Moscone South
Monday, Oct...
© Raastech, Inc. 2015 | All rights reserved. Slide 3 of 51@Raastech
About Me
§  Harold Dost III @hdost
§  7+ years of Or...
© Raastech, Inc. 2015 | All rights reserved. Slide 4 of 51@Raastech
About Raastech
§  Small systems integrator founded in...
© Raastech, Inc. 2015 | All rights reserved. Slide 5 of 51@Raastech
Outline
1.  Getting Started
§  Installing R
§  Insta...
© Raastech, Inc. 2015 | All rights reserved. Slide 6 of 51@Raastech
Outline (Cont.)
3.  Manipulating Data (Large Data Sets...
© Raastech, Inc. 2015 | All rights reserved. Slide 7 of 51@Raastech
© Raastech, Inc. 2015 | All rights reserved. Slide 8 of 51@Raastech
Why use R?
§  Strong Community
§  Including packages...
© Raastech, Inc. 2015 | All rights reserved. Slide 9 of 51@Raastech
Know CRAN
§  Comprehensive
§  R
§  Archive
§  Netw...
© Raastech, Inc. 2015 | All rights reserved. Slide 10 of 51@Raastech
Installing R
§  Windows
§  Mac
§  Linux
© Raastech, Inc. 2015 | All rights reserved. Slide 11 of 51@Raastech
Installing R
§  Windows https://cran.r-project.org/b...
© Raastech, Inc. 2015 | All rights reserved. Slide 12 of 51@Raastech
Development Tools
§  Rstudio - http://www.rstudio.co...
© Raastech, Inc. 2015 | All rights reserved. Slide 13 of 51@Raastech
Installing Packages
§  Anything From CRAN
§  Anywhe...
© Raastech, Inc. 2015 | All rights reserved. Slide 14 of 51@Raastech
© Raastech, Inc. 2015 | All rights reserved. Slide 15 of 51@Raastech
Data Types
§  Vectors
§  Matrices
§  Arrays
§  Da...
© Raastech, Inc. 2015 | All rights reserved. Slide 16 of 51@Raastech
Everything is a Vector
§  Scalars are vectors of 1
§...
© Raastech, Inc. 2015 | All rights reserved. Slide 17 of 51@Raastech
Matrices and Arrays
§  Require All the Same Type
§ ...
© Raastech, Inc. 2015 | All rights reserved. Slide 18 of 51@Raastech
Data Frames
§  Special List
§  Columns can have dif...
© Raastech, Inc. 2015 | All rights reserved. Slide 19 of 51@Raastech
Special Values
§  Infinity, Positive and Negative: I...
© Raastech, Inc. 2015 | All rights reserved. Slide 20 of 51@Raastech
Use Case for Infinities
§  Finding Maximums and Mini...
© Raastech, Inc. 2015 | All rights reserved. Slide 21 of 51@Raastech
Not a Number (NaN)
§  In means something went wrong ...
© Raastech, Inc. 2015 | All rights reserved. Slide 22 of 51@Raastech
Assigning NaN
> a = NaN
> a
[1] NaN
© Raastech, Inc. 2015 | All rights reserved. Slide 23 of 51@Raastech
Adding NaN
Adding NaN
> b = 1
> c = a + b
> c
[1] NaN...
© Raastech, Inc. 2015 | All rights reserved. Slide 24 of 51@Raastech
Comparing NaN to Regular Number
> d = b == c
> d
[1] ...
© Raastech, Inc. 2015 | All rights reserved. Slide 25 of 51@Raastech
Comparing NaN to NaN
> e = c == a
> e
[1] NA
When	
  ...
© Raastech, Inc. 2015 | All rights reserved. Slide 26 of 51@Raastech
Detecting NaN
> a
[1] NaN
> is.nan(a)
[1] TRUE
> is.n...
© Raastech, Inc. 2015 | All rights reserved. Slide 27 of 51@Raastech
Detecting NA
> e = c == a
> e
[1] NA
> is.nan(e)
[1] ...
© Raastech, Inc. 2015 | All rights reserved. Slide 28 of 51@Raastech
Operators
§  Assignment ( ->, <-)
§  Addition (+)
§...
© Raastech, Inc. 2015 | All rights reserved. Slide 29 of 51@Raastech
© Raastech, Inc. 2015 | All rights reserved. Slide 30 of 51@Raastech
Math Functions
§  max()
§  min()
§  log()
§  sqrt...
© Raastech, Inc. 2015 | All rights reserved. Slide 31 of 51@Raastech
Deriving Simple Statistics
§  Minimum
§  Maximum
§...
© Raastech, Inc. 2015 | All rights reserved. Slide 32 of 51@Raastech
How to define your own functions
firstfunction <- fun...
© Raastech, Inc. 2015 | All rights reserved. Slide 33 of 51@Raastech
© Raastech, Inc. 2015 | All rights reserved. Slide 34 of 51@Raastech
Twitter Example
§  First Install the Package
install...
© Raastech, Inc. 2015 | All rights reserved. Slide 35 of 51@Raastech
Twitter Example
§  Authenticate
consumer = "CONSUMER...
© Raastech, Inc. 2015 | All rights reserved. Slide 36 of 51@Raastech
Twitter Example
§  Get Trend Locations
§  The resul...
© Raastech, Inc. 2015 | All rights reserved. Slide 37 of 51@Raastech
Twitter Example
§  Get Trends
trends = getTrends(SOM...
© Raastech, Inc. 2015 | All rights reserved. Slide 38 of 51@Raastech
Twitter Example
§  Retrieve Tweets
tweets <- searchT...
© Raastech, Inc. 2015 | All rights reserved. Slide 39 of 51@Raastech
Twitter Example
§  Filter
§  complete.cases is used...
© Raastech, Inc. 2015 | All rights reserved. Slide 40 of 51@Raastech
Twitter Example
§  Simplify the dataframe
simpledf <...
© Raastech, Inc. 2015 | All rights reserved. Slide 41 of 51@Raastech
Twitter Example
§  Create Matrix from Dataframe
twee...
© Raastech, Inc. 2015 | All rights reserved. Slide 42 of 51@Raastech
Twitter Example
§  Plot the Latitude and Longitude
p...
© Raastech, Inc. 2015 | All rights reserved. Slide 43 of 51@Raastech
Graphing
§  Image
§  Contour
§  Box Chart
© Raastech, Inc. 2015 | All rights reserved. Slide 44 of 51@Raastech
K-Means
§  Essentially a search algorithm
§  Divide...
© Raastech, Inc. 2015 | All rights reserved. Slide 45 of 51@Raastech
Time Series
§  Stock Quotes
§  Infection Incidents
...
© Raastech, Inc. 2015 | All rights reserved. Slide 46 of 51@Raastech
Data Sources
NASA - https://data.nasa.gov
Quandl - ht...
© Raastech, Inc. 2015 | All rights reserved. Slide 47 of 51@Raastech
Time Series Analysis
§  Regression
§  Forecasting
§...
© Raastech, Inc. 2015 | All rights reserved. Slide 48 of 51@Raastech
© Raastech, Inc. 2015 | All rights reserved. Slide 49 of 51@Raastech
Using Enterprise Data Sources
§  Database
§  Stream...
© Raastech, Inc. 2015 | All rights reserved. Slide 50 of 51@Raastech
© Raastech, Inc. 2015 | All rights reserved. Slide 51 of 51@Raastech
Oracle R Distribution
§  Available on Oracle Public ...
© Raastech, Inc. 2015 | All rights reserved. Slide 52 of 51@Raastech
ROracle
§  Open Source Package
§  Maintained by Ora...
© Raastech, Inc. 2015 | All rights reserved. Slide 53 of 51@Raastech
Oracle R Advanced Analytics for Hadoop
§  Component ...
© Raastech, Inc. 2015 | All rights reserved. Slide 54 of 51@Raastech
Oracle R Enterprise
§  Component of the Oracle Advan...
© Raastech, Inc. 2015 | All rights reserved. Slide 55 of 51@Raastech
© Raastech, Inc. 2015 | All rights reserved. Slide 56 of 51@Raastech
Contact Information
§  Harold Dost III
§  Principal...
© Raastech, Inc. 2015 | All rights reserved. Slide 57 of 51@Raastech
Resources
§  https://en.wikibooks.org/wiki/Statistic...
Upcoming SlideShare
Loading in …5
×

Introduction to Oracle R for Big Data Analysis

326 views

Published on

More and more companies are implementing big data solutions, many with the prospect of someday doing something with the data that is being collected. In this hands-on session we'll get down to brass tacks. Beginning with the setup of core tools and a run through of the R language, and get familiar with Oracle R offerings. Begin manipulating data in R and derive useful statistics from sample data. Go over where this data can be sourced and how results can be displayed into useful graphs. A great way for beginners to R to get a jump start.

Published in: Technology
  • Be the first to comment

Introduction to Oracle R for Big Data Analysis

  1. 1. Introduction to R For Big Data Analysis Sunday,  October    25,  2015   2:30pm  –  3:15  pm   Moscone  South  –  303   Raastech, Inc. 2201 Cooperative Way, Suite 600 Herndon, VA 20171 +1-703-884-2223 info@raastech.com
  2. 2. SIG Meetings at Oracle OpenWorld All meetings will be held in User Group Pavilion, Meeting Room, Moscone South Monday, October 26 •  IOUG Cloud Computing SIG Meet-Up: 10:00 a.m.—11:00 a.m. •  IOUG Oracle Enterprise Manager SIG: 5:00 p.m.—6:00 p.m. *Location: OTN Lounge, Moscone South* Tuesday, October 27 •  IOUG IoT SIG: 10:30 a.m.—11:30 a.m. •  IOUG BIWA SIG: 11:30 a.m.—12:30 p.m. Wednesday, October 28 •  IOUG Exadata SIG: 10:30 a.m.—11:30 a.m. •  IOUG RAC SIG: 1:00 p.m.—2:00 p.m. •  IOUG Oracle 12c SIG: 2:00 p.m.—3:00 p.m.
  3. 3. © Raastech, Inc. 2015 | All rights reserved. Slide 3 of 51@Raastech About Me §  Harold Dost III @hdost §  7+ years of Oracle Middleware experience §  OCE (SOA Foundation Practitioner) §  Oracle ACE Associate §  From Michigan §  blog.raastech.com
  4. 4. © Raastech, Inc. 2015 | All rights reserved. Slide 4 of 51@Raastech About Raastech §  Small systems integrator founded in 2009 §  Headquartered in the Washington DC area §  Specializes in Oracle Fusion Middleware §  Oracle Platinum Partner – 1 in 3,000 worldwide §  Oracle SOA Specialized – 1 in 1,500 worldwide §  Oracle ACE – 2 of 500 worldwide §  100% of consultants are Oracle certified §  100% of consultants present at major Oracle conferences §  100% of consultants have published books, whitepapers, or articles
  5. 5. © Raastech, Inc. 2015 | All rights reserved. Slide 5 of 51@Raastech Outline 1.  Getting Started §  Installing R §  Installing Tools §  Getting Data 2.  Understanding R §  Data Types §  Functions §  Data Import Mechanisms
  6. 6. © Raastech, Inc. 2015 | All rights reserved. Slide 6 of 51@Raastech Outline (Cont.) 3.  Manipulating Data (Large Data Sets) §  Deriving Simple Statistics §  Graphing 4.  Demo 5.  Incorporating into an Enterprise §  Using Enterprise Data Sources §  Running R in your environment. §  Familiarize with Oracle's R offerings
  7. 7. © Raastech, Inc. 2015 | All rights reserved. Slide 7 of 51@Raastech
  8. 8. © Raastech, Inc. 2015 | All rights reserved. Slide 8 of 51@Raastech Why use R? §  Strong Community §  Including packages §  Targeted for Statistical Analysis §  Less code
  9. 9. © Raastech, Inc. 2015 | All rights reserved. Slide 9 of 51@Raastech Know CRAN §  Comprehensive §  R §  Archive §  Network
  10. 10. © Raastech, Inc. 2015 | All rights reserved. Slide 10 of 51@Raastech Installing R §  Windows §  Mac §  Linux
  11. 11. © Raastech, Inc. 2015 | All rights reserved. Slide 11 of 51@Raastech Installing R §  Windows https://cran.r-project.org/bin/windows/ §  Mac https://cran.r-project.org/bin/macosx/ §  Linux https://cran.r-project.org/bin/linux/
  12. 12. © Raastech, Inc. 2015 | All rights reserved. Slide 12 of 51@Raastech Development Tools §  Rstudio - http://www.rstudio.com/products/rstudio/ §  Open Source Edition §  Commercial License - $995 §  Eclipse §  Sublime, TextPad, Other Simple Text Editors,…
  13. 13. © Raastech, Inc. 2015 | All rights reserved. Slide 13 of 51@Raastech Installing Packages §  Anything From CRAN §  Anywhere install.packages(c(“first”, “second”)) > sudo R CMD INSTALL package-version.tar.gz
  14. 14. © Raastech, Inc. 2015 | All rights reserved. Slide 14 of 51@Raastech
  15. 15. © Raastech, Inc. 2015 | All rights reserved. Slide 15 of 51@Raastech Data Types §  Vectors §  Matrices §  Arrays §  Data Frames §  Lists §  Factors
  16. 16. © Raastech, Inc. 2015 | All rights reserved. Slide 16 of 51@Raastech Everything is a Vector §  Scalars are vectors of 1 §  Basic Types (Atomic Vectors) §  Logical §  Integer §  Real §  Complex §  String/Character §  Raw
  17. 17. © Raastech, Inc. 2015 | All rights reserved. Slide 17 of 51@Raastech Matrices and Arrays §  Require All the Same Type §  Matrix Functions: §  det(), t() , solve(), diag()
  18. 18. © Raastech, Inc. 2015 | All rights reserved. Slide 18 of 51@Raastech Data Frames §  Special List §  Columns can have different types
  19. 19. © Raastech, Inc. 2015 | All rights reserved. Slide 19 of 51@Raastech Special Values §  Infinity, Positive and Negative: Inf and –Inf §  Not A Number: NaN §  Not Available: NA §  Complex Numbers, 1+9i
  20. 20. © Raastech, Inc. 2015 | All rights reserved. Slide 20 of 51@Raastech Use Case for Infinities §  Finding Maximums and Minimums §  Placeholder values when others won’t work
  21. 21. © Raastech, Inc. 2015 | All rights reserved. Slide 21 of 51@Raastech Not a Number (NaN) §  In means something went wrong somewhere §  A missing argument §  Invalid number §  Check for with is.nan(x) to prevent leaking §  Don’t use “==“ to find NaN, it will only give more NaN
  22. 22. © Raastech, Inc. 2015 | All rights reserved. Slide 22 of 51@Raastech Assigning NaN > a = NaN > a [1] NaN
  23. 23. © Raastech, Inc. 2015 | All rights reserved. Slide 23 of 51@Raastech Adding NaN Adding NaN > b = 1 > c = a + b > c [1] NaN When  adding  a  number  to  NaN  “Not  a  Number”  you  will  get   NaN.  
  24. 24. © Raastech, Inc. 2015 | All rights reserved. Slide 24 of 51@Raastech Comparing NaN to Regular Number > d = b == c > d [1] NA When  comparing  a  number  to  NaN  “Not  a  Number”  you  will   get  NA.  
  25. 25. © Raastech, Inc. 2015 | All rights reserved. Slide 25 of 51@Raastech Comparing NaN to NaN > e = c == a > e [1] NA When  comparing  NaN  “Not  a  Number”  to  NaN  you  will  get  NA.  
  26. 26. © Raastech, Inc. 2015 | All rights reserved. Slide 26 of 51@Raastech Detecting NaN > a [1] NaN > is.nan(a) [1] TRUE > is.na(a) [1] TRUE Since  NaN  aren’t  proper  numbers,  special  funcHons  must  be   used  to  detect  them.  They  are  the  result  of  math  gone  wrong.  
  27. 27. © Raastech, Inc. 2015 | All rights reserved. Slide 27 of 51@Raastech Detecting NA > e = c == a > e [1] NA > is.nan(e) [1] FALSE > is.na(e) [1] TRUE Just  as  with  NaN  special  funcHons  must  be  used,  but  NA   generally  indicates  that  there  is  missing  informaHon    
  28. 28. © Raastech, Inc. 2015 | All rights reserved. Slide 28 of 51@Raastech Operators §  Assignment ( ->, <-) §  Addition (+) §  Subtraction (–) §  Division (/) §  Multiplication (*) §  Exponent (^) §  Parentheses ( (, ) )
  29. 29. © Raastech, Inc. 2015 | All rights reserved. Slide 29 of 51@Raastech
  30. 30. © Raastech, Inc. 2015 | All rights reserved. Slide 30 of 51@Raastech Math Functions §  max() §  min() §  log() §  sqrt()
  31. 31. © Raastech, Inc. 2015 | All rights reserved. Slide 31 of 51@Raastech Deriving Simple Statistics §  Minimum §  Maximum §  Median §  Arithmetic Mean §  Function estimation §  Linear §  Log §  Exponential §  R-Values §  Standard Deviation
  32. 32. © Raastech, Inc. 2015 | All rights reserved. Slide 32 of 51@Raastech How to define your own functions firstfunction <- function(arg1, arg2, ... ){ statements return(someoutput) }
  33. 33. © Raastech, Inc. 2015 | All rights reserved. Slide 33 of 51@Raastech
  34. 34. © Raastech, Inc. 2015 | All rights reserved. Slide 34 of 51@Raastech Twitter Example §  First Install the Package install.packages(c("twitteR”))
  35. 35. © Raastech, Inc. 2015 | All rights reserved. Slide 35 of 51@Raastech Twitter Example §  Authenticate consumer = "CONSUMER KEY" secret = "SECRET KEY" setup_twitter_oauth(consumer,secret)
  36. 36. © Raastech, Inc. 2015 | All rights reserved. Slide 36 of 51@Raastech Twitter Example §  Get Trend Locations §  The resulting WOEID (Where on Earth ID) can be chosen availableTrendLocations()
  37. 37. © Raastech, Inc. 2015 | All rights reserved. Slide 37 of 51@Raastech Twitter Example §  Get Trends trends = getTrends(SOMEWOEID)
  38. 38. © Raastech, Inc. 2015 | All rights reserved. Slide 38 of 51@Raastech Twitter Example §  Retrieve Tweets tweets <- searchTwitter(trends[XX,XX],n=1500) tweetdf <- do.call("rbind",lapply(tweets,as.data.frame))
  39. 39. © Raastech, Inc. 2015 | All rights reserved. Slide 39 of 51@Raastech Twitter Example §  Filter §  complete.cases is used to check for NA and NaN numbers tweetdf <- tweetdf[complete.cases(tweetdf[,15]),] tweetdf <- tweetdf[tweetdf[,15] != 0,]
  40. 40. © Raastech, Inc. 2015 | All rights reserved. Slide 40 of 51@Raastech Twitter Example §  Simplify the dataframe simpledf <- tweetdf[c("screenName","longitude","latitude")]
  41. 41. © Raastech, Inc. 2015 | All rights reserved. Slide 41 of 51@Raastech Twitter Example §  Create Matrix from Dataframe tweetMatrix <- data.matrix(simpledf[2:3],rownames.force = FALSE)
  42. 42. © Raastech, Inc. 2015 | All rights reserved. Slide 42 of 51@Raastech Twitter Example §  Plot the Latitude and Longitude plot(tweetMatrix)
  43. 43. © Raastech, Inc. 2015 | All rights reserved. Slide 43 of 51@Raastech Graphing §  Image §  Contour §  Box Chart
  44. 44. © Raastech, Inc. 2015 | All rights reserved. Slide 44 of 51@Raastech K-Means §  Essentially a search algorithm §  Divides a dataset into k-clusters
  45. 45. © Raastech, Inc. 2015 | All rights reserved. Slide 45 of 51@Raastech Time Series §  Stock Quotes §  Infection Incidents §  Gas Prices §  Audio §  Etc. Source:  hKp://www.loc.gov/pictures/resource/hec.23488/  
  46. 46. © Raastech, Inc. 2015 | All rights reserved. Slide 46 of 51@Raastech Data Sources NASA - https://data.nasa.gov Quandl - https://www.quandl.com
  47. 47. © Raastech, Inc. 2015 | All rights reserved. Slide 47 of 51@Raastech Time Series Analysis §  Regression §  Forecasting §  Time Frequency (FFTs) Source:  hKp://groups.csail.mit.edu/netmit/sFFT/algorithm.html  
  48. 48. © Raastech, Inc. 2015 | All rights reserved. Slide 48 of 51@Raastech
  49. 49. © Raastech, Inc. 2015 | All rights reserved. Slide 49 of 51@Raastech Using Enterprise Data Sources §  Database §  Streams §  Files §  Etc.
  50. 50. © Raastech, Inc. 2015 | All rights reserved. Slide 50 of 51@Raastech
  51. 51. © Raastech, Inc. 2015 | All rights reserved. Slide 51 of 51@Raastech Oracle R Distribution §  Available on Oracle Public Yum §  Enhanced dynamic Library loading §  Enterprise Support Available §  Oracle Advanced Analytics §  Oracle Linux §  Oracle Big Data Appliance §  http://www.oracle.com/technetwork/database/database-technologies/r/r- distribution/overview/index.html
  52. 52. © Raastech, Inc. 2015 | All rights reserved. Slide 52 of 51@Raastech ROracle §  Open Source Package §  Maintained by Oracle §  Uses OCI Interface to interact with databases §  http://www.oracle.com/technetwork/database/database-technologies/r/r- technologies/overview/index.html
  53. 53. © Raastech, Inc. 2015 | All rights reserved. Slide 53 of 51@Raastech Oracle R Advanced Analytics for Hadoop §  Component of the Oracle Big Data Software Connectors Suite, an option for the BDA §  Provides abstraction from HiveQL through R just as in Oracle R Enterprise does for SQL §  http://www.oracle.com/technetwork/database/ database-technologies/bdc/r-advanalytics-for- hadoop/overview/index.html
  54. 54. © Raastech, Inc. 2015 | All rights reserved. Slide 54 of 51@Raastech Oracle R Enterprise §  Component of the Oracle Advanced Analytics Option on Oracle Database EE §  Allows use of R in the database without SQL §  Save R Objects in the database §  Easily Integrate with OBIEE §  http://www.oracle.com/technetwork/database/database- technologies/r/r-enterprise/overview/index.html
  55. 55. © Raastech, Inc. 2015 | All rights reserved. Slide 55 of 51@Raastech
  56. 56. © Raastech, Inc. 2015 | All rights reserved. Slide 56 of 51@Raastech Contact Information §  Harold Dost III §  Principal Consultant §  @hdost §  harold.dost@raastech.com
  57. 57. © Raastech, Inc. 2015 | All rights reserved. Slide 57 of 51@Raastech Resources §  https://en.wikibooks.org/wiki/Statistical_Analysis:_an_Introduction_using_R/R_basics §  http://www.r-project.org/ §  https://docs.oracle.com/cd/E57012_01/doc.141/e56973/toc.htm §  http://cran.r-project.org/web/packages/akmeans/index.html §  http://cran.r-project.org/web/packages/twitteR/index.html §  http://en.wikipedia.org/wiki/K-means_clustering §  http://www.rdatamining.com/examples/kmeans-clustering §  http://blog.revolutionanalytics.com/2009/02/how-to-choose-a-random-number-in-r.html §  https://www.packtpub.com/books/content/text-mining-r-part-2 §  http://www.eia.gov/totalenergy/data/monthly/index.cfm#consumption

×