2 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Oracle Exadata Database
Machine
(R Language / Data Mining)
Radosław Kut
Big Data at WorkIn association with
4 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
The following is intended to ...
5 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
6 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
7 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
8 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Oracle Exadata Database Machi...
9 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Hardware Generational Advance...
10 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Exadata Architecture
Complet...
11 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Highly Engineered and Standa...
12 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Key Exadata Innovations
Extr...
13 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Exadata Intelligent Storage
...
14 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Oracle Advanced Analytics Op...
15 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
17 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
18 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
What Are ’s Challenges?
1. R...
19 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
20 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
What is Enterprise?
• Oracle...
21 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Architecture and Performance...
22 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Oracle Data Mining
• Oracle ...
23 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
What is Data Mining?
• Autom...
24 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
In-Database Data Mining
Trad...
25 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Exadata + Data Mining
“DM Sc...
26 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Oracle Data Mining Algorithm...
27 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
SQL Developer 3.0/Oracle Dat...
28 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Oracle Communications Indust...
29 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Exadata with Analytics and B...
30 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Exadata with Analytics and B...
31 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Oracle—Hardware and Software...
32 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
33 Copyright © 2011, Oracle and/or its affiliates. All rights
reserved.
Oracle Copyright 2011
Oracle Exadata Database Machine(R Language / Data Mining)
Upcoming SlideShare
Loading in...5
×

Oracle Exadata Database Machine (R Language / Data Mining)

515

Published on

Published in: Technology, News & Politics
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
515
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Temat wg agendy Big Data ApplianceThe Big Data appliance content is closely associated with Exadata (R Language / Data Mining/ NoSQL)
  • Capturing the broadest set of information and data available (both structured and unstructuredIncluding data not in your Data Warehouse…weblogs, twitter feeds, facebook feedsOrganizing this information using highly scalable platformsAnalyzing data within context of all your enterprise data using advanced analyticsDeciding on…
  • New analysis package being announced at OOW. In-database implementation of popular open source solution called R.Oracle R is100% compatible with standard R. Can reuse your scripts, other open source packagesRuns in-database (not on your laptop) so much faster, bigger models, more secure
  • New analysis package being announced at OOW. In-database implementation of popular open source solution called R.Oracle R is100% compatible with standard R. Can reuse your scripts, other open source packagesRuns in-database (not on your laptop) so much faster, bigger models, more secure
  • R is an open-source language and environment for statistical analysis and graphing It provides linear and nonlinear modeling, standard statistical methods, time-series analysis, classification, clustering, and graphical data displays. Thousands of open-source packages are available in the Comprehensive R Archive Network (CRAN) for a spectrum of applications, such as bioinformatics, spatial statistics, and financial and marketing analysis. The popularity of R has increased as its functionality matured to rival that of costly proprietary statistical packages
  • In-database implementation of popular open source solution called R.With Oracle R you canReuse your scripts, other open source packagesRun R with Oracle Database (not on your laptop)Extremely fastMore secureMore robust models
  • New analysis package being announced at OOW. In-database implementation of popular open source solution called R.Oracle R is100% compatible with standard R. Can reuse your scripts, other open source packagesRuns in-database (not on your laptop) so much faster, bigger models, more secure
  • Transcript of "Oracle Exadata Database Machine (R Language / Data Mining)"

    1. 1. 2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
    2. 2. Oracle Exadata Database Machine (R Language / Data Mining) Radosław Kut Big Data at WorkIn association with
    3. 3. 4 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remain at the sole discretion of Oracle.
    4. 4. 5 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
    5. 5. 6 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
    6. 6. 7 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
    7. 7. 8 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle Exadata Database Machine Best Platform for… • Data Warehousing • OLTP • Database Cloud Oracle’s strategic platform for ALL Database workloads
    8. 8. 9 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Hardware Generational Advances V1 V2 X2 X3 2008 2009 2010 2012 Storage (TB) 4X336 504 504168 672 Flash (TB) 5.3 5.3 22.4 8X0 44.8 3XCPU (Cores) 64 96 12864 192 Memory (GB) 16X576 1152 2048256 4096 X4 2013
    9. 9. 10 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Exadata Architecture Complete | Optimized | Standardized | Hardened Database Platform  Standard Database Servers – 8x 2-socket servers  192 cores, 2TB DRAM or – 2x 8-socket servers  160 cores, 4TB DRAM  Unified Ultra-Fast Network – 40 Gb InfiniBand internal connectivity  all ports active – 10 Gb or 1 Gb Ethernet data center connectivity  Scale-out Intelligent Storage Servers – 14x 2-socket servers  168 cores in storage – 168 SAS disk drives  672 TB HC or 200 TB HP – 56 Flash PCI cards  44 TB Flash + compressionFully Redundant
    10. 10. 11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Highly Engineered and Standardized • Hundreds of engineer years spent optimizing and hardening the system end-to-end – Frees I/T talent to focus on business needs • Standard platform improves support experience • Runs all existing Oracle Database workloads Less Risk, Better Results
    11. 11. 12 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Key Exadata Innovations Extreme Performance at Lowest Cost + ++ • Hybrid Columnar Compression – 10x compression for warehouses – 15x compression for archives Data remains compressed for scans and in Flash Space Savings Cascade to Copies compress primary DB standby test dev backup uncompressed • Smart Scale-Out Storage – InfiniBand connected servers – Smart Scan query offload • Smart PCI Flash Cache – Transparent cache in front of disk – Accelerates random I/O up to 30x – Quadruples data scan rate
    12. 12. 13 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Exadata Intelligent Storage  Exadata storage servers also run more complex operations in storage – Join filtering – Incremental backup filtering – I/O prioritization – Storage Indexing – Database level security – Offloaded scans on encrypted data – Data Mining Model Scoring  10x reduction in data sent to DB servers is common Exadata Intelligent Storage Grid
    13. 13. 14 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle Advanced Analytics Option • Oracle Advanced Analytics Option enables companies to "bring the algorithms to the data" vs. extracting the data to specialized and expensive dedicated statistical and data mining servers • Oracle Advanced Analytics Option includes: – Oracle R Enterprise • Integrates the Open-Source Statistical Environment R with the Oracle Database – Oracle Data Mining • SQL & PL/SQL focused in-database data mining and predictive analytics • Data movement is eliminated or dramatically reduced while analytical and compute intensive operations are performed inside the database, where the data resides, to increase performance, reduce cycle times required to extract information from data and reduce total cost of ownership over traditional statistical and data mining environments Transforming the Database into a Comprehensive Advanced Analytics Platform R
    14. 14. 15 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
    15. 15. 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
    16. 16. 18 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 What Are ’s Challenges? 1. R is memory constrained –R processing is single threaded - does not exploit available compute infrastructure –R lacks industrial strength for enterprise use cases 2. R has lacked mindshare in Enterprise market –R is still met with caution by the long established SAS and IBM/SPSS statistical community • However, major university (e.g. Yale ) Statistics courses now taught in R • The FDA has recently shown indications for approval of new drugs for which the submission’s data analysis was performed using R
    17. 17. 19 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
    18. 18. 20 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 What is Enterprise? • Oracle R Enterprise brings R’s statistical functionality closer to the Oracle Database 1. Eliminate R’s memory constraint by enabling R to work directly & transparently on database objects – Allows R to run on very large data sets 2. Architected for Enterprise production infrastructure – Automatically exploits database parallelism without require parallel R programming – Build and immediately deploy 3. Oracle R leverages the latest R algorithms and packages – R is an embedded component of the DBMS server ROpen Source ROpen Source
    19. 19. 21 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Architecture and Performance • Transparently function-ships R constructs to database via R  SQL translation –Data structures –Functions • Data manipulation functions (select, project, join) • Basic statistical functions (avg, sum, summary) • Advanced statistical functions(gamma, beta) • Performs data-heavy computations in database –R for summary analysis and graphics • Transparent implementation enables using wide range of R ―packages‖ from open source community Seconds
    20. 20. 22 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle Data Mining • Oracle Data Mining provides 12 powerful in-database data mining algorithms for big data analytics as a native feature of the database – Designed for or big data problems involving discovering patterns and relationships in large amounts of data and oftentimes making predictions based on those patterns, Oracle Data Mining allows data analysts and data miners to mine star schemas, transactional data and unstructured data stored inside the database, build predictive models and apply them to data inside the database--all without moving data. • Oracle Data Miner help users mine their data and define, save and share advanced analytical methodologies – Users who prefer a Graphical User Interface can use the Oracle Data Miner extension to SQL Developer to develop, build, evaluate, share and automate analytical workflows to solve important data driven business problems. • Developers can use the SQL APIs and PL/SQL to build applications to automate knowledge discovery – The Oracle Data Miner GUI generates SQL code that application developers can use to develop and deploy SQL and PL/SQL based automated predictive analytics applications that run natively inside the Oracle Database. Building Predictive Analytics Applications
    21. 21. 23 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 What is Data Mining? • Automatically sifts through data to find hidden patterns, discover new insights, and make predictions • Data Mining can provide valuable results: • Predict customer behavior (Classification) • Predict or estimate a value (Regression) • Segment a population (Clustering) • Identify factors more associated with a business problem (Attribute Importance) • Find profiles of targeted people or items (Decision Trees) • Determine important relationships and ―market baskets‖ within the population (Associations) • Find fraudulent or ―rare events‖ (Anomaly Detection)
    22. 22. 24 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 In-Database Data Mining TraditionalAnalytics Hours,Days or Weeks Data Extraction Data Prep & Transformation Data Mining ModelBuilding Data Mining Model“Scoring” Data Preparation and Transformation Data Import Source Data Dataset s/ Work Area Analytic al Process ing Process Output Target Secs,Mins or Hours Model “Scoring” Embedded Data Prep Data Preparation Model Building Oracle Data Mining Results • Faster time for “Data” to “Insights” • Lower TCO—Eliminates • Data Movement • Data Duplication • Maintains Security Data remains in the Database SQL—Mostpowerfullanguage fordata preparationand transformation Embeddeddata preparation Cutting edge machinelearning algorithms inside the SQLkernelof Database Model―Scoring‖ Data remains in the Database Savings
    23. 23. 25 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Exadata + Data Mining “DM Scoring” Pushed to Storage! • From 11g Release 2, SQL predicates and Oracle Data Mining models are pushed to storage level for execution For example, find the US customers likely to churn: select cust_id from customers where region = ‘US’ and prediction_probability(churnmod,‘Y’ using *) > 0.8; Faster
    24. 24. 26 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle Data Mining Algorithms Classification Association Rules Clustering Attribute Importance Problem Algorithm Applicability Classical statistical technique Popular / Rules / transparency Embedded app Wide / narrow data / text Minimum Description Length (MDL) Attribute reduction Identify useful data Reduce data noise Hierarchical K-Means Hierarchical O-Cluster Product grouping Text mining Gene and protein analysis Apriori Market basket analysis Link analysis Multiple Regression (GLM) Support Vector Machine Classical statistical technique Wide / narrow data / text Regression Feature Extraction Nonnegative Matrix Factorization Text analysis Feature reduction Logistic Regression (GLM) Decision Trees Naïve Bayes Support Vector Machine One Class SVM Lack examples of target field Anomaly Detection A1 A2 A3 A4 A5 A6 A7 F1 F2 F3 F4
    25. 25. 27 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 SQL Developer 3.0/Oracle Data Miner 11g Release 2 GUI • Graphical User Interface for data analyst • SQL Developer Extension (OTN download) • Explore data—discover new insights • Build and evaluate data mining models • Apply predictive models • Share analytical workflows • Deploy SQL Apply code/scripts New GUI
    26. 26. 28 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle Communications Industry Data Model Example Better Information for OBIEE Dashboards ODM’s predictions & probabilities are available in the Database for reporting using Oracle BI EE and other tools
    27. 27. 29 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Exadata with Analytics and Business Intelligence Better Together • In-database data mining builds predictive models that predict customer behavior • OBIEE’s integrated spatial mapping shows where Customer “most likely” be be HIGH and VERY HIGH value customer in the future
    28. 28. 30 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Exadata with Analytics and Business Intelligence Better Together • Exadata power • OBIEE ease-of-use Drill-through for details about top factors that define HIGH and VERY HIGH value customers
    29. 29. 31 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle—Hardware and Software Engineering to Work Together • Oracle is the world's most complete, open, and integrated business software and hardware systems company • Data Warehousing, VLDB and ILM • Oracle Data Mining • 12- in-DB data mining algorithms • Mine star schemas, text, transactional data • In-DB model build & apply—Exadata scoring • 50+ in-DB statistical functions • Oracle R Enterprise • Run R in-DB; function push down to SQL • Wide library of supported in-DB statistical functions • Embedded R supports all R packages New New GUI Oracle has taught the RDBMS how to perform data mining, statistical analysis, adv. analytics, etc.
    30. 30. 32 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
    31. 31. 33 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011

    ×