Innovate Analytics with Oracle Data Mining & R
Christian Screen, Capgemini
Oracle Analytics Practice (formerly BI Consulting Group)

Tuesday December 11, 2012




                                                  Better intelligence, smarter decisions
Christian Screen
o Solutions Engineer at Capgemini
o15 Years in Technology
o Co-Author of Oracle BIEE 11g – A Hands-On Tutorial
o Podcast & Blog at ArtofBi.com
o Oracle ACE
o Oracle Deputy CTO
o Oracle Hyperion Certified Consultant
o BI Evangelist




                                              | Better intelligence, smarter decisions

                                                             © 2012 Capgemini. All rights reserved.
Agenda
o What Data Mining?
o What is R?
o Oracle BI 11g + Oracle Data Mining!
o Use Cases
o Getting your Organization Started
o Oracle Endeca & Data Mining
o With Oracle BI 11g
o Oracle Predictive Analytics stack - Tie it all together
oQ&A
                                                    | Better intelligence, smarter decisions

                                                                   © 2012 Capgemini. All rights reserved.
Getting the Message Today
What I want to leave you with today…

 An understanding of Data Mining and R
 How Predictive Analytics is what your organization needs now
 Motivation for starting a Predictive Analytics project
 How Oracle BI 11g works with Data Mining & R




                                                    | Better intelligence, smarter decisions

                                                                   © 2012 Capgemini. All rights reserved.
What is Data Mining?
                  “The process that attempts to discover
 patterns and hidden knowledge in large data sets in order to
                    aid the decision making process.”

            Discovery         Patterns                       Clustering
        Affinities
Large Data Sets                          Algorithms
                                                        Regression




                                                                                                           Unbiased
                                                                      Classification
                                         Fraud Detection

          Anomaly Detection              Association Analysis
                                                                  Probabilities
 Predictive Analytics                                 Summarization       Statistical Confidence

                                                                                 | Better intelligence, smarter decisions

                                                                                                © 2012 Capgemini. All rights reserved.
Data Mining Use Cases
        Example # 1 –                               Example #2 –
    Market Basket Analysis                      Relationship Patterns
 Determine combination of items            Think Linked-In and Facebook
  purchased together generating highest     Based on similar existing data points
  or lowest margin                           locate other accounts and rank based
 Predict new item sales when sold           on affinity
  together with product sold at discount    Predict how may friends/people may
  price                                      accept an offer based on previous
 Calculate based on previous sales of       behavior and the behavior of similar
  individual products which store would      users
  benefit most from joined product sales
 Example: computer + monitor, printer
  + paper


                                                               | Better intelligence, smarter decisions

                                                                              © 2012 Capgemini. All rights reserved.
Oracle Data Mining
Oracle’s DM Solution for Solving Business Problems

 Oracle Data Mining is installed automatically
  when you install Oracle Database Enterprise
  Edition.
 GUI is included with SQL Developer
 A mature DM Platform




                                                  | Better intelligence, smarter decisions

                                                                 © 2012 Capgemini. All rights reserved.
What is R?

o A Open Source programming language and software environment
for statistical computing and graphics.
o Standard developed and maintained by the R Foundation
o Part of the Free Software Foundation GNU project
    o Watch Movie: Revolution OS (2002)
o Client or Client Server
o Many images available on Amazon EC2
o R-Studio seems to be a leading Open Source IDE for R




                                                          | Better intelligence, smarter decisions

                                                                         © 2012 Capgemini. All rights reserved.
What is R?
          What Else?
o Really good at statistical and graphical techniques, including linear
and nonlinear modeling, classical statistical tests, time-series analysis,
classification, clustering, and others
o Text Mining and Lexical scoping
o Extends via user created Libraries/Packages
o Automated report and document generation
o Think associations, algorithms, statistics and
surprisingly good graphics




                                                                   | Better intelligence, smarter decisions

                                                                                  © 2012 Capgemini. All rights reserved.
Data Mining vs. R
                   R                                   Data Mining
 Does not require a database                Develop Statistical Models
 Complex algorithms can require large       Resides in the Database
  amounts of memory and processor            No visual output
 Visual output built in or by libraries     SQL Developer plug-in available
 User-built libraries make for expansive    Leverage some existing DB skills
  capabilities and logic to be developed     Likes cleansed data
 Skill-up required for R Script language
 Data does not need to be clean though
  it helps
 Uses connectors for data sources


                                                                | Better intelligence, smarter decisions

                                                                               © 2012 Capgemini. All rights reserved.
Oracle BI & Data Mining / R
Getting the Organization Started is Simple

 Start today! Think Competitive Advantage.
 Data Scientists are heroes
 Experimentation is key
 Leverage existing investments
 Its part of Business Intelligence
 Used by a few, developed by fewer
 Get Endeca Information Discovery
 Get R & R-Studio
 Explore Predictive Analytics
                                              | Better intelligence, smarter decisions

                                                             © 2012 Capgemini. All rights reserved.
Oracle BI & Data Mining / R
Bringing it All Under Business Intelligence makes sense…

 Traditional BI is necessary but can become stagnant
 Predictive Analytics complements BI
 Existing BI investments distribute Predictive Analytics
 All areas of an organization can benefit




                                                       | Better intelligence, smarter decisions

                                                                      © 2012 Capgemini. All rights reserved.
Oracle Predictive Analytics Now
BI, Discovery, Data Mining, & R
Traditional BI / Analytics        Predictive Analytics

  Tells us how we’re doing         Includes traditional BI
   today                             concepts
  Distributes information          Expands the BI Team
  Tracks Performance Metrics       Expands knowledge of
  Makes sense of ERP data           information
                                    Tactically targets specific
                                     scenarios of the business
                                    Provides non-linear
                                     perspective on the business
                                                   | Better intelligence, smarter decisions

                                                                  © 2012 Capgemini. All rights reserved.
Oracle Predictive Analytics
Endeca Information Discovery (EID) vs. Oracle Data Mining / R

 Endeca is mainly a three part tool (ETL, Server, Client)
  • Data Mining typically depends on a cleansed source, i.e. DW
 Endeca includes powerful features, visualizations, and guided search
  navigation, perfect for self-service discovery
  • Data Mining is database driven so no inherent graphics
  • R has visuals and client tools for development but is not distributed
 Endeca projects perform well in an iterative design-develop-feedback
  loop
  • Data Mining and R are built from models and algorithms are typically consumed
    by other systems

                                                                  | Better intelligence, smarter decisions

                                                                                 © 2012 Capgemini. All rights reserved.
Oracle Predictive Analytics Stack
Engineered to work together…

 Oracle Advanced Analytics in Oracle 11g R2 RDBMS
  • Data Mining & Enterprise R
    o   Aims to eliminate need for SAS, SPSS, Matlab, etc.
 Enterprise-R
  • Oracle’s flavor of R
                                                 Start Planning Now for
 Exadata Machine                                Running your business
 Exalytics (as consumer)                       in the future of Analytics.
  • Endeca
  • OBIEE
  • Essbase

                                                                  | Better intelligence, smarter decisions

                                                                                 © 2012 Capgemini. All rights reserved.
Upcoming Endeca Webinar
Endeca Information Discovery in the Wild

 Predictive Advertising Analytics in Media
 Scott Schlesinger Paper on Predictive Analytics in the Movies
  (http://www.digitalmarketingsuite.com/url/60534)




                                                     | Better intelligence, smarter decisions

                                                                    © 2012 Capgemini. All rights reserved.
Data Mining with Oracle BI 11g
Oracle R Enterprise in Oracle BI Dashboards




                                     | Better intelligence, smarter decisions

                                                    © 2012 Capgemini. All rights reserved.
Data Mining with Oracle BI 11g




                           | Better intelligence, smarter decisions

                                          © 2012 Capgemini. All rights reserved.
Oracle R with Oracle BI 11g
ORE in Oracle BI 11g dashboard – Flight Delays




                                       | Better intelligence, smarter decisions

                                                      © 2012 Capgemini. All rights reserved.
Data Mining & R with Oracle BI 11g
Embedded R Script using Parameterization from Oracle BI
                                 select *
                                    from table(rqTableEval(
                                      cursor(select
                                 ARRDELAY,DISTANCE,DEPDELAY,YEAR,MONTH,DAYOFMO
                                 NTH,DEPTIME,ARRTIME,UNIQUECARRIER,FLIGHTNUM,ORI
                                 GIN,DEST, ORIGIN||'-'||DEST ROUTE
                                           from ontime_s
                                           where year >= valueof(NQ_SESSION.OR_ARG1) and
                                 year <= valueof(NQ_SESSION.OR_ARG2)
                                           and DEPDELAY is not NULL),
                                      cursor(select 1 max1, 1 pos1, 'mod' name1,
                                           to_number(null) max2, to_number(null) pos2,
                                           to_char(null) name2, total, chunk, value
                                           from ontime_lm),
                                      'select ARRDELAY, DISTANCE,
                                 DEPDELAY,YEAR,MONTH,DAYOFMONTH,DEPTIME,ARRTIM
                                 E,UNIQUECARRIER,FLIGHTNUM,ORIGIN,DEST, ORIGIN||''-
                                 ''||DEST ROUTE, 1 PRED from ontime_s',
                                      'PredictDelays-score'))
                                 order by 1, 2, 3




                                                             | Better intelligence, smarter decisions

                                                                            © 2012 Capgemini. All rights reserved.
Questions?




             Better intelligence, smarter decisions
                                         © 2012 Capgemini. All rights reserved.
Contact Christian Screen
christian.screen@capgemini.com

Access Oracle BI Training at
Capgemini’s BICG University
link




                                                              www.capgemini.com/bim
                                 The information contained in this presentation is proprietary. ©2012 Capgemini. All rights reserved

Innovate Analytics with Oracle Data Mining & Oracle R

  • 1.
    Innovate Analytics withOracle Data Mining & R Christian Screen, Capgemini Oracle Analytics Practice (formerly BI Consulting Group) Tuesday December 11, 2012 Better intelligence, smarter decisions
  • 2.
    Christian Screen o SolutionsEngineer at Capgemini o15 Years in Technology o Co-Author of Oracle BIEE 11g – A Hands-On Tutorial o Podcast & Blog at ArtofBi.com o Oracle ACE o Oracle Deputy CTO o Oracle Hyperion Certified Consultant o BI Evangelist | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 3.
    Agenda o What DataMining? o What is R? o Oracle BI 11g + Oracle Data Mining! o Use Cases o Getting your Organization Started o Oracle Endeca & Data Mining o With Oracle BI 11g o Oracle Predictive Analytics stack - Tie it all together oQ&A | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 4.
    Getting the MessageToday What I want to leave you with today…  An understanding of Data Mining and R  How Predictive Analytics is what your organization needs now  Motivation for starting a Predictive Analytics project  How Oracle BI 11g works with Data Mining & R | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 5.
    What is DataMining? “The process that attempts to discover patterns and hidden knowledge in large data sets in order to aid the decision making process.” Discovery Patterns Clustering Affinities Large Data Sets Algorithms Regression Unbiased Classification Fraud Detection Anomaly Detection Association Analysis Probabilities Predictive Analytics Summarization Statistical Confidence | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 6.
    Data Mining UseCases Example # 1 – Example #2 – Market Basket Analysis Relationship Patterns  Determine combination of items  Think Linked-In and Facebook purchased together generating highest  Based on similar existing data points or lowest margin locate other accounts and rank based  Predict new item sales when sold on affinity together with product sold at discount  Predict how may friends/people may price accept an offer based on previous  Calculate based on previous sales of behavior and the behavior of similar individual products which store would users benefit most from joined product sales  Example: computer + monitor, printer + paper | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 7.
    Oracle Data Mining Oracle’sDM Solution for Solving Business Problems  Oracle Data Mining is installed automatically when you install Oracle Database Enterprise Edition.  GUI is included with SQL Developer  A mature DM Platform | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 8.
    What is R? oA Open Source programming language and software environment for statistical computing and graphics. o Standard developed and maintained by the R Foundation o Part of the Free Software Foundation GNU project o Watch Movie: Revolution OS (2002) o Client or Client Server o Many images available on Amazon EC2 o R-Studio seems to be a leading Open Source IDE for R | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 9.
    What is R? What Else? o Really good at statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others o Text Mining and Lexical scoping o Extends via user created Libraries/Packages o Automated report and document generation o Think associations, algorithms, statistics and surprisingly good graphics | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 10.
    Data Mining vs.R R Data Mining  Does not require a database  Develop Statistical Models  Complex algorithms can require large  Resides in the Database amounts of memory and processor  No visual output  Visual output built in or by libraries  SQL Developer plug-in available  User-built libraries make for expansive  Leverage some existing DB skills capabilities and logic to be developed  Likes cleansed data  Skill-up required for R Script language  Data does not need to be clean though it helps  Uses connectors for data sources | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 11.
    Oracle BI &Data Mining / R Getting the Organization Started is Simple  Start today! Think Competitive Advantage.  Data Scientists are heroes  Experimentation is key  Leverage existing investments  Its part of Business Intelligence  Used by a few, developed by fewer  Get Endeca Information Discovery  Get R & R-Studio  Explore Predictive Analytics | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 12.
    Oracle BI &Data Mining / R Bringing it All Under Business Intelligence makes sense…  Traditional BI is necessary but can become stagnant  Predictive Analytics complements BI  Existing BI investments distribute Predictive Analytics  All areas of an organization can benefit | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 13.
    Oracle Predictive AnalyticsNow BI, Discovery, Data Mining, & R Traditional BI / Analytics Predictive Analytics  Tells us how we’re doing  Includes traditional BI today concepts  Distributes information  Expands the BI Team  Tracks Performance Metrics  Expands knowledge of  Makes sense of ERP data information  Tactically targets specific scenarios of the business  Provides non-linear perspective on the business | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 14.
    Oracle Predictive Analytics EndecaInformation Discovery (EID) vs. Oracle Data Mining / R  Endeca is mainly a three part tool (ETL, Server, Client) • Data Mining typically depends on a cleansed source, i.e. DW  Endeca includes powerful features, visualizations, and guided search navigation, perfect for self-service discovery • Data Mining is database driven so no inherent graphics • R has visuals and client tools for development but is not distributed  Endeca projects perform well in an iterative design-develop-feedback loop • Data Mining and R are built from models and algorithms are typically consumed by other systems | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 15.
    Oracle Predictive AnalyticsStack Engineered to work together…  Oracle Advanced Analytics in Oracle 11g R2 RDBMS • Data Mining & Enterprise R o Aims to eliminate need for SAS, SPSS, Matlab, etc.  Enterprise-R • Oracle’s flavor of R Start Planning Now for  Exadata Machine Running your business  Exalytics (as consumer) in the future of Analytics. • Endeca • OBIEE • Essbase | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 16.
    Upcoming Endeca Webinar EndecaInformation Discovery in the Wild  Predictive Advertising Analytics in Media  Scott Schlesinger Paper on Predictive Analytics in the Movies (http://www.digitalmarketingsuite.com/url/60534) | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 17.
    Data Mining withOracle BI 11g Oracle R Enterprise in Oracle BI Dashboards | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 18.
    Data Mining withOracle BI 11g | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 19.
    Oracle R withOracle BI 11g ORE in Oracle BI 11g dashboard – Flight Delays | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 20.
    Data Mining &R with Oracle BI 11g Embedded R Script using Parameterization from Oracle BI select * from table(rqTableEval( cursor(select ARRDELAY,DISTANCE,DEPDELAY,YEAR,MONTH,DAYOFMO NTH,DEPTIME,ARRTIME,UNIQUECARRIER,FLIGHTNUM,ORI GIN,DEST, ORIGIN||'-'||DEST ROUTE from ontime_s where year >= valueof(NQ_SESSION.OR_ARG1) and year <= valueof(NQ_SESSION.OR_ARG2) and DEPDELAY is not NULL), cursor(select 1 max1, 1 pos1, 'mod' name1, to_number(null) max2, to_number(null) pos2, to_char(null) name2, total, chunk, value from ontime_lm), 'select ARRDELAY, DISTANCE, DEPDELAY,YEAR,MONTH,DAYOFMONTH,DEPTIME,ARRTIM E,UNIQUECARRIER,FLIGHTNUM,ORIGIN,DEST, ORIGIN||''- ''||DEST ROUTE, 1 PRED from ontime_s', 'PredictDelays-score')) order by 1, 2, 3 | Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 21.
    Questions? Better intelligence, smarter decisions © 2012 Capgemini. All rights reserved.
  • 22.
    Contact Christian Screen christian.screen@capgemini.com AccessOracle BI Training at Capgemini’s BICG University link www.capgemini.com/bim The information contained in this presentation is proprietary. ©2012 Capgemini. All rights reserved

Editor's Notes

  • #6 Oracle Data Mining is installed automatically when you install Oracle Database Enterprise Edition.
  • #9 http://www.rstudio.com/http://en.wikipedia.org/wiki/R_%28programming_language%29
  • #16 http://ovum.com/2012/03/14/oracle-advances-its-advanced-analytics-strategy/Oracle Advanced Analytics contains two components: firstly, Oracle’s existing data mining technology with a set of graphical tools for building predictive models; secondly, a new set of advanced analytics capabilities built around Oracle R Enterprise, which involves integrating R’s statistical processing capabilities within the Oracle database and conversely taking advantage of the scalability of Oracle’s database platform.