A gentle introduction to Oracle R Enterprise

BÂLE BERNE BRUGG DUSSELDORF FRANCFORT S.M. FRIBOURG E.BR. GENÈVE
HAMBOURG COPENHAGUE LAUSANNE MUNICH STUTTGART VIENNE ZURICH
A Gentle Introduction to
Oracle R Enterprise
Lausanne, 24 November 2015
Christian Antognini
Senior Principal Consultant

@ChrisAntognini
Senior principal consultant, trainer and partner at Trivadis
– christian.antognini@trivadis.com
– http://antognini.ch
Focus: get the most out of Oracle Database
– Logical and physical database design
– Query optimizer
– Application performance management
Author of Troubleshooting Oracle Performance (Apress, 2008/14)
OakTable Network, Oracle ACE Director

What Is R?
R is a language and environment for statistical computing and graphics.
It is a GNU project.
R provides a wide variety of statistical (linear and nonlinear modelling,
classical statistical tests, time-series analysis, classification, clustering, …)
and graphical techniques, and is highly extensible.
Source: https://www.r-project.org/about.html

Agenda
1. R Technologies from Oracle
2. Oracle R Enterprise

R Technologies from Oracle
Oracle has adopted R as a language and environment for performing statistical data
analysis and advanced analytics, as well as generating sophisticated graphics
Oracle provides R integration through four key technologies:
– Oracle R Distribution
– ROracle
– Oracle R Enterprise (ORE)
– Oracle R Advanced Analytics for Hadoop (ORAAH)

Oracle R Distribution
Oracle's distribution of open source R
Free download
Support provided to customers of the
Oracle Advanced Analytics option,
Oracle Linux, and the Oracle Big
Data Appliance

ROracle
Open source R package providing a
DBI-compliant driver for Oracle
Database
Based on the OCI library
It’s publicly available on CRAN and
is maintained by Oracle

Oracle R Enterprise (ORE)
It’s a component, along with Data
Mining, of the Oracle Advanced
Analytics option
It’s a set of R packages and Oracle
Database features
– Run R commands and scripts for
analyses on data stored in the
Oracle Database
– Translate R operations into SQL
– One or more R engines run on the
database server

Oracle R Advanced Analytics for Hadoop (ORAAH)
It’s one of the components in the
Oracle Big Data Software
Connectors Suite, an option to the
Big Data Appliance (BDA)
It provides an R interface to access
HDFS and MapReduce
programming framework
– Data manipulation
– Writing mapper and reducer
functions
– Invocation of Hadoop jobs

Architecture
Oracle Database
Client R Engine
ORE Packages
Spawned R Engine
ORE Packages
Spawned R Engine
ORE Packages
Spawned R Engine
ORE Packages
Client Database Server
SQL
Results
R
Results

Advantages of Oracle R Enterprise (According to Oracle)
Operate on database-resident data
without using SQL
Eliminate data movement
Keep data secure
Use the power of the database
Use current data
Prepare data in the database
Save R objects in the database
Build models in the database
Score data in the database
Execute R scripts in the database
Integrate with the Oracle technology
stack

ore.frame Class
An ore.frame object represents a relational query for an Oracle Database instance
Typically, you get ore.frame objects that are proxies for database tables
An ore.frame object can be ordered or unordered
– This is an important difference compared to an R data.frame that always has an
explicit order
– Relation data must be explicitly ordered

Persisted R Objects
R objects (incl. ORE proxy objects) exist for the duration of the current R session
The standard R functions for saving and restoring R objects, save and load, can’t
be used with the ORE proxy objects
– The database objects associated to them aren’t persisted
To persist them, ORE provides datastores that store data in the database
– The ore.save and ore.load functions are available
– Also R objects can be persisted

Preparing and Exploring Data in the Database
Selecting Data
Indexing Data
Combining Data
Summarizing Data
Transforming Data
Sampling Data
Partitioning Data
Preparing Time Series Data
Correlating Data
Cross-Tabulating Data
Analyzing the Frequency of Cross-
Tabulations
Building Exponential Smoothing Models
on Time Series Data
Ranking Data
Sorting Data
Analyzing Distribution of Numeric
Variables

Building Models and Predictions
Two categories of models are provided:
– Oracle R Enterprise models (OREmodels package: linear regression, generalized
linear model, neural network)
– Oracle Data Mining models (OREdm package: association rules, decision trees,
Naïve Bayes, k-means, …)
The ore.predict function is able to score data in ore.frame objects
– Degree of parallelism can be manually set

ORE Embedded R Execution
It enables to store and invoke R scripts in the Oracle Database server
– Both an R and a SQL API exist
When invoked, a script executes in one or more R engines that run on the database
server
– Degree of parallelism can be manually set

Core Messages
Easy to install
Simple to use
Expensive
A more in-depth analysis is required to
judge performance and stability

Questions and Answers
Christian Antognini
Senior Principal Consultant
christian.antognini@trivadis.com

References
Oracle R Enterprise Installation and Administration Guide
Oracle R Enterprise User's Guide

A gentle introduction to Oracle R Enterprise

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A gentle introduction to Oracle R Enterprise

Similar to A gentle introduction to Oracle R Enterprise (20)

More from Swiss Data Forum Swiss Data Forum

More from Swiss Data Forum Swiss Data Forum (20)

Recently uploaded

Recently uploaded (20)

A gentle introduction to Oracle R Enterprise