Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data

380 views

Published on

Las últimas tendencias en la gestión de datos en SQL, NoSQL y Big Data.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data

  1. 1. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted Data Management Trends J. Andrés Araújo Cloud Platform Solution Architect David Mauri Cloud Platform Solution Architect Oracle Ibérica March 15, 2018
  2. 2. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 3 Data Management Evolution Transactional Data Warehouse SQL Social, Web Data Lake IoT Fast Data
  3. 3. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | High-level Comparison HDFS NoSQL RDBMS Data Type Chunk Record Transaction Write Type Synchronous Eventually Consistent ACID Compliant Data Preparation No Parsing No Parsing Parsing and Validation DR Type Second Cluster Node Replica Second RDBMS DR Unit File Record Transaction DR Timing Batch Record Transaction Complex Analytics? Yes No Yes Query Speed Slow Fast for simple questions Fast # of Data Access Methods One (full table scan) One (index lookup) Many (Optimized) 7 IngestDRAcces Affordable Scale Low Predictable Latency Flexible Performance
  4. 4. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Unified Data Management Data of any type Any data source Analysis of any typeSQL GraphSpark Spatial Machine Learning SQL Access with any language node.jsJavaREST Python ScalaR
  5. 5. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Big Data SQL The Best of Both Worlds 9 SQL
  6. 6. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 10 Oracle Unified Data Management Solution Conventional view of Data Management Emerging view of Data Management Oracle Big Data SQL
  7. 7. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Storage Layer 11 Big Data SQL: Another Hadoop Processing Engine Filesystem (HDFS) NoSQL Databases (Oracle NoSQL DB, HBase) Resource Management (YARN, cgroups) Processing Layer MapReduce and Hive Spark Impala Search Big Data SQL Meta data Store
  8. 8. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Hive DN DN DN DN ORACLE SQL Engine Storage Table Table Big Data-enabled Oracle Tables Python GraphRnode.js JavaREST SQL Data Local Processing Big Data SQL Cells Leverage Metadata Big Data SQL Architecture Oracle Big Data SQL 12
  9. 9. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Anatomy of a Big Data SQL Cell 13 Smart Scan I/O Stream Data Transfer Convert to Oracle “block” format Apply Smart Scan and other optimizations
  10. 10. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Big Data SQL Goals Easily access any data across big data stores Provides a unified security model across the sources Analyze all data using Oracle’s rich SQL dialect Fast performance using Big Data SQL Smart Scan 14
  11. 11. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | I/O Elimination • Storage index • Hive partition pruning • Predicate and column pushdown parquet and ORC 15 Big Data SQL key features Data Movement Elimination • Smart Scan performs final filtering pass to ensure only requested elements are sent to Oracle Database Security • Apply Oracle Database security policies on non- Oracle data stores
  12. 12. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Big Data SQL Security Features Hadoop Security ACL’s | Sentry | HDFS Encryption | Encryption in Motion 17
  13. 13. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Big Data SQL Security Features • Same security models apply to a wider range of data stores • Advanced features such as data redaction can now be applied enabling joins between disparate sources • Oracle security layers on top of existing Hadoop functionality Hadoop Security ACL’s | Sentry | HDFS Encryption | Encryption in Motion 18
  14. 14. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Data Lifecycle Management & Query Offload More data on-line and available at a lower cost Move Partition to BDA Oracle Big Data SQL Rolling 13 months Month 14-n Big Data Rolling Windows • Process • Copy older partition to BDA • Update views • Drop older Exadata partition • Offloaded data can be accessed via Oracle & Hadoop • No Application changes required 19
  15. 15. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Database Data in HDFS 20 Hybrid Partitioned Tables …JAN 2014 FEB 2014 MAR 2014 OCT 2016 NOV 2016 DEC 2016 HDFS Orders Database OCT 2016 NOV 2016 DEC 2016 JAN 2014 FEB 2014 MAR 2014 1 All Partitions are stored internally 2 Some Partitions are moved Externally 3 Mixed Storage for Partitions No top level changes to Orders
  16. 16. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Archive Data: Big Data SQL Implementation Options 1. Table Storage Split Across Tiers 2. View Combines Data Sources HDFS HDFSDATABASE VIEW DATABASE TABLE HDFS DATABASE
  17. 17. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle Big Data SQL Demo Confidential – Oracle Internal/Restricted/Highly Restricted 22
  18. 18. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Traditional vs. Oracle Machine Learning/Predictive Analtyics • Traditional— “Move the data” —“Don’t move the data!” 23
  19. 19. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Traditional vs. Oracle Machine Learning/Predictive Analytics • Traditional— “Move the data” — “Move the algorithms” 24 Simpler, Smarter Data Management + Analytics / Machine Learning Architecture
  20. 20. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 25 Oracle Machine Learning Tools for Data Scientist Oracle R Enterprise Oracle Advanced Analytics - On-premise Database Option - Included in Cloud EE Database Oracle R Advanced Analytics for Hadoop RStudio Notebooks (Zepelin, Jupyter) Oracle Data Mining (ODM) Interface Hadoop based - Option for BDA - Included with BDCS - Planned for BDC Oracle Big Data Spatial & Graph Oracle Database & Hadoop Data Mining Enterprise AA4H BDSGSpatial & Graph
  21. 21. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | • Strengths – Powerful & Extensible – Graphical & Extensive statistics – Free—open source (CRAN + 9000 components) – Standard for Data Scientist • Challenges – Memory constrained – Single threaded – Outer loop—slows down process – Not Enterprise Oriented R environment R—Widely Popular R is a statistics language similar to Base SAS or SPSS statistics + = Enterprise
  22. 22. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle Advanced Analytics • R-SQL Transparency Framework overloads R functions for scalable in-database execution • Function overload for data transforms, statistical functions and advanced analytics • Interactive display of graphical results and flow control as in standard R • Submit user-defined R functions for execution at database server under control of Oracle Database • Scale to large datasets • Access tables, views, and external tables, as well as data through DB LINKS • Leverage database SQL parallelism • Leverage new and existing in-database statistical and data mining capabilities R Engine Other R packages Oracle R Enterprise packages User R Engine on desktop • Database can spawn multiple R engines for database-managed parallelism • Efficient data transfer to spawned R engines • Emulate map-reduce style algorithms and applications • Enables production deployment and automated execution of R scripts 1 User tables Oracle DatabaseSQL Results Database Compute Engine 2 R Engine Other R packages Oracle R Enterprise packages R Engine(s) spawned by Oracle DB R Results 3 ?x R Open Source Oracle R Enterprise Compute Engines
  23. 23. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle R Advanced Analytics for Hadoop • ORAAH = Oracle R Advanced Analytics for Hadoop, part of Big Data Software Connectors Suite (Oracle Big Data Appliance Option) • ORAAH transparency layer enables certain overloaded R functions to operate on Hive tables using R syntax and behavior (transparently translating R to HiveQL) • R interface for manipulating HDFS data and writing mapper and reducer functions in R – where you can leverage open source CRAN packages – and invoke those Hadoop jobs from R • Provides a range of predictive algorithms that execute on the Hadoop cluster with data in HDFS in a parallel/distributed manner. Oracle Internal - Proprietary 28
  24. 24. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Hadoop Cluster with Oracle R Advanced Analytics for Hadoop Oracle R Advanced Analytics for Hadoop: Using Hadoop and HIVE, plus R Engine and Open-Source R Packages R Analytics Oracle R Advanced Analytics for Hadoop R Client • ORAAH Spark algorithms: Deep Neural, GLM, LM • Spark MLlib algorithms: LM, GLM, LASSO, Ridge Regression, Decision Trees, Random Forests, SVM, k-Means, PCA • Open-source R packages distributed via Map-Red function in R HQL Basic Statistics, Data Prep, Joins and View creation 29 HQL + HDFS Access, Store, Load, Data Prep and Transform. SQL Developer Other SQL Apps SQL Client Oracle Database Server with Advanced Analytics option BigDataSQL
  25. 25. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 30 Oracle Machine Learning Algorithms CLASSIFICATION • Logistic Regression • Decision Tree • Random Forest • Neural Network • Support Vector Machine • Naïve Bayes • Explicit Semantic Analysis • Gaussian Mixture Models CLUSTERING • Hierarchical K-Means • Hierarchical O-Cluster • Expectation Maximization ANOMALY DETECTION • One-Class Support Vector Machine REGRESSION • Generalized Linear Model • Support Vector Machine • Random Forest • Linear Model • Stepwise Linear regression • LASSO ASSOCIATION RULES • A priori ATTRIBUTE IMPORTANCE • Minimum Description Length • Principal Component Analysis • Unsupervised Pairwise KL Divergence SQL PREDICTIVE QUERIES ALGORITHM TEXT SUPPORT • Algorithms support text type • Tokenization and theme extraction • Document similarity FEATURE EXTRACTION • Principal Component Analysis • Non-negative Matrix Factorization • Singular Value Decomposition TIME SERIES • Single Exponential Smoothing • Double Exponential Smoothing OPEN SOURCE ML ALGORITHMS • CRAN R Algorithm Packages through Embedded R Execution • Spark MLlib algorithm integration
  26. 26. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | CVC Spatial Update for DWR Enable Spatial and Graph use cases on every platform Oracle’s Spatial and Graph Strategy NoSQL Oracle Big Data Spatial and Graph Spatial and Graph in Cloud Offerings Oracle Database Spatial and Graph Big Data: Single Model Data Store Database 12c: Polyglot (Multi-model) Data Store Oracle Big Data Cloud Service Oracle Database Cloud Service • Enterprise Edition High Performance • Enterprise Edition Extreme Performance
  27. 27. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle Big Data Spatial and Graph • Available for Big Data platform/BDCS – Hadoop, HBase, Oracle NoSQL • Supported both on BDA and commodity hardware – CDH and Hortonworks • Database connectivity through Big Data Connectors or Big Data SQL • Included in Big Data Cloud Service Oracle Spatial and Graph (DB option) • Available with Oracle 12.2 / DBCS • Using tables for graph persistence • Graph views on relational data • In-database graph analytics – Sparsification, shortest path, page rank, triangle counting, WCC, sub graphs • SQL queries possible • Included in Database Cloud Service 32 Graph Product Options
  28. 28. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Overview of Graph • What is a graph? – A set of vertices and edges (with optional properties) – A graph is simply linked data • Why do we care? – Graphs are everywhere • Road networks, power grids, biological networks • Social networks/Social Web (Facebook, Linkedin, Twitter, Baidu, Google+,…) • Knowledge graphs (RDF, OWL) – Graphs are intuitive and flexible • Easy to navigate, easy to form a path, natural to visualize • Do not require a predefined schema E A D C B F 3
  29. 29. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Graph Analysis Examples Reachability Quickly identify multi-hop relations between (a set of) vertices and how they are connected under various constraints. Anomaly Detection Analyze the link relationships between data entities to detect subsets of data that are different from others. Centrality Analysis Analyze the topology of the network, in addition to data values, in order to identify data entities that are more important than others. Link Prediction Inspect similarities between data entities under overall network structure, and predict potential future links. e.g. product recommendation e.g. security breach trace e.g. influencer identificatione.g. Fraud detection Confidential – Oracle Internal 35
  30. 30. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Rich set of built-in parallel graph algorithms … and parallel graph mutation operations Computational Analytics: Built-in Package 40+ built-in algorithms, highly parallelized, highly performant Detecting Components and Communities Tarjan’s, Kosaraju’s, Weakly Connected Components, Label Propagation (w/ variants), Soman and Narang’s Spacification Ranking and Walking Pagerank, Personalized Pagerank, Betwenness Centrality (w/ variants), Closeness Centrality, Degree Centrality, Eigenvector Centrality, HITS, Random walking and sampling (w/ variants) Evaluating Community Structures ∑ ∑ Conductance, Modularity Clustering Coefficient (Triangle Counting) Adamic-Adar Path-Finding Hop-Distance (BFS) Dijkstra’s, Bi-directional Dijkstra’s Bellman-Ford’s Link Prediction SALSA (Twitter’s Who-to-follow) Other Classics Vertex Cover Minimum Spanning-Tree (Prim’s) a d b e g c i f h The original graph a d b e g c i f h Create Undirected Graph Simplify Graph a d b e g c i f h Left Set: “a,b,e” a d b e g c i Create Bipartite Graph ge b d i a f c h Sort-By-Degree (Renumbering) Filtered Subgraph d b g i e 36
  31. 31. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Information Management Reference Architecture 37 Actionable Events Streaming Engine Data Lake Enterprise Data & Reporting Discovery Lab Actionable Metrics Actionable Data Sets Input Events Execution Innovation Discovery Output Data Structured Enterprise Data
  32. 32. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Big Data SQL Simplifies Analyses 38 Streaming Engine Data Lake Enterprise Data & Reporting Discovery Lab Input Events Execution Innovation Discovery Output Data Structured Enterprise Data Notebooks/Analytic Services Big Data SQL Object Store Hadoop/HDFS Your Application
  33. 33. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 39

×