• Save
Oracle Exadata Database Machine(R Language / Data Mining)
Upcoming SlideShare
Loading in...5
×
 

Oracle Exadata Database Machine (R Language / Data Mining)

on

  • 313 views

 

Statistics

Views

Total Views
313
Views on SlideShare
311
Embed Views
2

Actions

Likes
2
Downloads
0
Comments
0

1 Embed 2

http://www.slideee.com 2

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Temat wg agendy Big Data ApplianceThe Big Data appliance content is closely associated with Exadata (R Language / Data Mining/ NoSQL)
  • Capturing the broadest set of information and data available (both structured and unstructuredIncluding data not in your Data Warehouse…weblogs, twitter feeds, facebook feedsOrganizing this information using highly scalable platformsAnalyzing data within context of all your enterprise data using advanced analyticsDeciding on…
  • New analysis package being announced at OOW. In-database implementation of popular open source solution called R.Oracle R is100% compatible with standard R. Can reuse your scripts, other open source packagesRuns in-database (not on your laptop) so much faster, bigger models, more secure
  • New analysis package being announced at OOW. In-database implementation of popular open source solution called R.Oracle R is100% compatible with standard R. Can reuse your scripts, other open source packagesRuns in-database (not on your laptop) so much faster, bigger models, more secure
  • R is an open-source language and environment for statistical analysis and graphing It provides linear and nonlinear modeling, standard statistical methods, time-series analysis, classification, clustering, and graphical data displays. Thousands of open-source packages are available in the Comprehensive R Archive Network (CRAN) for a spectrum of applications, such as bioinformatics, spatial statistics, and financial and marketing analysis. The popularity of R has increased as its functionality matured to rival that of costly proprietary statistical packages
  • In-database implementation of popular open source solution called R.With Oracle R you canReuse your scripts, other open source packagesRun R with Oracle Database (not on your laptop)Extremely fastMore secureMore robust models
  • New analysis package being announced at OOW. In-database implementation of popular open source solution called R.Oracle R is100% compatible with standard R. Can reuse your scripts, other open source packagesRuns in-database (not on your laptop) so much faster, bigger models, more secure

Oracle Exadata Database Machine(R Language / Data Mining) Oracle Exadata Database Machine (R Language / Data Mining) Presentation Transcript

  • 2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
  • Oracle Exadata Database Machine (R Language / Data Mining) Radosław Kut Big Data at WorkIn association with
  • 4 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remain at the sole discretion of Oracle.
  • 5 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
  • 6 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
  • 7 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
  • 8 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle Exadata Database Machine Best Platform for… • Data Warehousing • OLTP • Database Cloud Oracle’s strategic platform for ALL Database workloads
  • 9 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Hardware Generational Advances V1 V2 X2 X3 2008 2009 2010 2012 Storage (TB) 4X336 504 504168 672 Flash (TB) 5.3 5.3 22.4 8X0 44.8 3XCPU (Cores) 64 96 12864 192 Memory (GB) 16X576 1152 2048256 4096 X4 2013
  • 10 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Exadata Architecture Complete | Optimized | Standardized | Hardened Database Platform  Standard Database Servers – 8x 2-socket servers  192 cores, 2TB DRAM or – 2x 8-socket servers  160 cores, 4TB DRAM  Unified Ultra-Fast Network – 40 Gb InfiniBand internal connectivity  all ports active – 10 Gb or 1 Gb Ethernet data center connectivity  Scale-out Intelligent Storage Servers – 14x 2-socket servers  168 cores in storage – 168 SAS disk drives  672 TB HC or 200 TB HP – 56 Flash PCI cards  44 TB Flash + compressionFully Redundant
  • 11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Highly Engineered and Standardized • Hundreds of engineer years spent optimizing and hardening the system end-to-end – Frees I/T talent to focus on business needs • Standard platform improves support experience • Runs all existing Oracle Database workloads Less Risk, Better Results
  • 12 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Key Exadata Innovations Extreme Performance at Lowest Cost + ++ • Hybrid Columnar Compression – 10x compression for warehouses – 15x compression for archives Data remains compressed for scans and in Flash Space Savings Cascade to Copies compress primary DB standby test dev backup uncompressed • Smart Scale-Out Storage – InfiniBand connected servers – Smart Scan query offload • Smart PCI Flash Cache – Transparent cache in front of disk – Accelerates random I/O up to 30x – Quadruples data scan rate
  • 13 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Exadata Intelligent Storage  Exadata storage servers also run more complex operations in storage – Join filtering – Incremental backup filtering – I/O prioritization – Storage Indexing – Database level security – Offloaded scans on encrypted data – Data Mining Model Scoring  10x reduction in data sent to DB servers is common Exadata Intelligent Storage Grid
  • 14 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle Advanced Analytics Option • Oracle Advanced Analytics Option enables companies to "bring the algorithms to the data" vs. extracting the data to specialized and expensive dedicated statistical and data mining servers • Oracle Advanced Analytics Option includes: – Oracle R Enterprise • Integrates the Open-Source Statistical Environment R with the Oracle Database – Oracle Data Mining • SQL & PL/SQL focused in-database data mining and predictive analytics • Data movement is eliminated or dramatically reduced while analytical and compute intensive operations are performed inside the database, where the data resides, to increase performance, reduce cycle times required to extract information from data and reduce total cost of ownership over traditional statistical and data mining environments Transforming the Database into a Comprehensive Advanced Analytics Platform R
  • 15 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
  • 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
  • 18 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 What Are ’s Challenges? 1. R is memory constrained –R processing is single threaded - does not exploit available compute infrastructure –R lacks industrial strength for enterprise use cases 2. R has lacked mindshare in Enterprise market –R is still met with caution by the long established SAS and IBM/SPSS statistical community • However, major university (e.g. Yale ) Statistics courses now taught in R • The FDA has recently shown indications for approval of new drugs for which the submission’s data analysis was performed using R
  • 19 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
  • 20 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 What is Enterprise? • Oracle R Enterprise brings R’s statistical functionality closer to the Oracle Database 1. Eliminate R’s memory constraint by enabling R to work directly & transparently on database objects – Allows R to run on very large data sets 2. Architected for Enterprise production infrastructure – Automatically exploits database parallelism without require parallel R programming – Build and immediately deploy 3. Oracle R leverages the latest R algorithms and packages – R is an embedded component of the DBMS server ROpen Source ROpen Source
  • 21 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Architecture and Performance • Transparently function-ships R constructs to database via R  SQL translation –Data structures –Functions • Data manipulation functions (select, project, join) • Basic statistical functions (avg, sum, summary) • Advanced statistical functions(gamma, beta) • Performs data-heavy computations in database –R for summary analysis and graphics • Transparent implementation enables using wide range of R ―packages‖ from open source community Seconds
  • 22 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle Data Mining • Oracle Data Mining provides 12 powerful in-database data mining algorithms for big data analytics as a native feature of the database – Designed for or big data problems involving discovering patterns and relationships in large amounts of data and oftentimes making predictions based on those patterns, Oracle Data Mining allows data analysts and data miners to mine star schemas, transactional data and unstructured data stored inside the database, build predictive models and apply them to data inside the database--all without moving data. • Oracle Data Miner help users mine their data and define, save and share advanced analytical methodologies – Users who prefer a Graphical User Interface can use the Oracle Data Miner extension to SQL Developer to develop, build, evaluate, share and automate analytical workflows to solve important data driven business problems. • Developers can use the SQL APIs and PL/SQL to build applications to automate knowledge discovery – The Oracle Data Miner GUI generates SQL code that application developers can use to develop and deploy SQL and PL/SQL based automated predictive analytics applications that run natively inside the Oracle Database. Building Predictive Analytics Applications
  • 23 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 What is Data Mining? • Automatically sifts through data to find hidden patterns, discover new insights, and make predictions • Data Mining can provide valuable results: • Predict customer behavior (Classification) • Predict or estimate a value (Regression) • Segment a population (Clustering) • Identify factors more associated with a business problem (Attribute Importance) • Find profiles of targeted people or items (Decision Trees) • Determine important relationships and ―market baskets‖ within the population (Associations) • Find fraudulent or ―rare events‖ (Anomaly Detection)
  • 24 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 In-Database Data Mining TraditionalAnalytics Hours,Days or Weeks Data Extraction Data Prep & Transformation Data Mining ModelBuilding Data Mining Model“Scoring” Data Preparation and Transformation Data Import Source Data Dataset s/ Work Area Analytic al Process ing Process Output Target Secs,Mins or Hours Model “Scoring” Embedded Data Prep Data Preparation Model Building Oracle Data Mining Results • Faster time for “Data” to “Insights” • Lower TCO—Eliminates • Data Movement • Data Duplication • Maintains Security Data remains in the Database SQL—Mostpowerfullanguage fordata preparationand transformation Embeddeddata preparation Cutting edge machinelearning algorithms inside the SQLkernelof Database Model―Scoring‖ Data remains in the Database Savings
  • 25 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Exadata + Data Mining “DM Scoring” Pushed to Storage! • From 11g Release 2, SQL predicates and Oracle Data Mining models are pushed to storage level for execution For example, find the US customers likely to churn: select cust_id from customers where region = ‘US’ and prediction_probability(churnmod,‘Y’ using *) > 0.8; Faster
  • 26 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle Data Mining Algorithms Classification Association Rules Clustering Attribute Importance Problem Algorithm Applicability Classical statistical technique Popular / Rules / transparency Embedded app Wide / narrow data / text Minimum Description Length (MDL) Attribute reduction Identify useful data Reduce data noise Hierarchical K-Means Hierarchical O-Cluster Product grouping Text mining Gene and protein analysis Apriori Market basket analysis Link analysis Multiple Regression (GLM) Support Vector Machine Classical statistical technique Wide / narrow data / text Regression Feature Extraction Nonnegative Matrix Factorization Text analysis Feature reduction Logistic Regression (GLM) Decision Trees Naïve Bayes Support Vector Machine One Class SVM Lack examples of target field Anomaly Detection A1 A2 A3 A4 A5 A6 A7 F1 F2 F3 F4
  • 27 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 SQL Developer 3.0/Oracle Data Miner 11g Release 2 GUI • Graphical User Interface for data analyst • SQL Developer Extension (OTN download) • Explore data—discover new insights • Build and evaluate data mining models • Apply predictive models • Share analytical workflows • Deploy SQL Apply code/scripts New GUI
  • 28 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle Communications Industry Data Model Example Better Information for OBIEE Dashboards ODM’s predictions & probabilities are available in the Database for reporting using Oracle BI EE and other tools
  • 29 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Exadata with Analytics and Business Intelligence Better Together • In-database data mining builds predictive models that predict customer behavior • OBIEE’s integrated spatial mapping shows where Customer “most likely” be be HIGH and VERY HIGH value customer in the future
  • 30 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Exadata with Analytics and Business Intelligence Better Together • Exadata power • OBIEE ease-of-use Drill-through for details about top factors that define HIGH and VERY HIGH value customers
  • 31 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011 Oracle—Hardware and Software Engineering to Work Together • Oracle is the world's most complete, open, and integrated business software and hardware systems company • Data Warehousing, VLDB and ILM • Oracle Data Mining • 12- in-DB data mining algorithms • Mine star schemas, text, transactional data • In-DB model build & apply—Exadata scoring • 50+ in-DB statistical functions • Oracle R Enterprise • Run R in-DB; function push down to SQL • Wide library of supported in-DB statistical functions • Embedded R supports all R packages New New GUI Oracle has taught the RDBMS how to perform data mining, statistical analysis, adv. analytics, etc.
  • 32 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011
  • 33 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Copyright 2011