Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Management System: Smart SQL Processing Across Hadoop and your Data Warehouse

2,038 views

Published on

Published in: Technology, Business
  • Be the first to comment

Big Data Management System: Smart SQL Processing Across Hadoop and your Data Warehouse

  1. 1. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Smart SQL Processing for Databases, Hadoop, and Beyond Dan McClary, Ph.D. Big Data Product Management Oracle June, 2014 Oracle Confidential – Internal/Restricted/Highly Restricted
  2. 2. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Oracle Confidential – Internal/Restricted/Highly Restricted 3
  3. 3. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Databases, Hadoop, and Beyond 1 2 3 How and Why Companies are Using Big Data Making Hadoop a first-class citizen Smarter SQL Processing Oracle Confidential – Internal/Restricted/Highly Restricted 4
  4. 4. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Big Data Customer Snapshot Oracle Confidential – Internal/Restricted/Highly Restricted 5 Big Data Analytic Services • R&D, Cross-property analytics, massive ingestion • Consolidated data science platform Business Transformation • Leading Spanish Bank > 13M customers • Collect & unify all relevant information Innovative Network Defense • Hadoop and NoSQL DB for data of different speeds • Detect 0-days, uncover intrusions BDA Exadata BDA Exadata BDA Exadata
  5. 5. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Exploit the Strengths of Both Systems Oracle Confidential – Internal/Restricted/Highly Restricted 6 0 1 2 3 4 5 Tooling maturity Stringent Functionals ACID transactions Security Variety of data formats Release Pace ETL simplicity Cost effectively store data Ingestion rate Business Interoperability Hadoop RDBMS • Hadoop is good at some things • Databases are good at others • Don’t reinvent wheels
  6. 6. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | BDMS: Big Data Management System Oracle Confidential – Internal/Restricted/Highly Restricted 7 Run the Business  Integrate existing systems  Support mission-critical tasks  Protect existing expenditures  Insure skills relevance RelationalHadoop Change the Business  Disrupt competitors  Disintermediate supply chains  Leverage new paradigms  Exploit new analyses NoSQL Scale the Business  Serve faster  Meet mobile challenges  Scale-out economically
  7. 7. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Remarkable Innovation Oracle Confidential – Internal/Restricted/Highly Restricted 8 Hadoop Ecosystem
  8. 8. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Innovation Breeds Challenge Oracle Confidential – Internal/Restricted/Highly Restricted 9 Operations Languages Custom assembly HW/SW optimization Security Redundancy Integration Support Complexity APIs in flux Constant upgrade Skill sets Hadoop Ecosystem
  9. 9. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Building for Database Operations At Scale Oracle Confidential – Internal/Restricted/Highly Restricted 10 Intelligent Storage Smart Scan Storage Indexing Advanced Compression Optimized Network Protocols Easy Upgrades Easy Consolidation Engineered System for Oracle Database
  10. 10. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Building for Hadoop Operations at Scale Oracle Confidential – Internal/Restricted/Highly Restricted 11 Integrated Enterprise Management OOB Authentication Auditing Role-based Access Control Encryption High Availability Easy Upgrades Rapid Provisioning Engineered System for Hadoop & NoSQL
  11. 11. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Real Barriers to Adopting Big Data The Platform is not the Problem •Skills –Hadoop requires new expertise –Let experts be experts! –Ensure experts can work together •Integration –Prevent Hadoop from becoming a silo •Security –Need clear routes to governance or enforcement Oracle Confidential – Internal/Restricted/Highly Restricted 12
  12. 12. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 13 How do we make Hadoop a first-class citizen?
  13. 13. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 14 SQL
  14. 14. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 15 Why?
  15. 15. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 16 40 Years of SQL SELECT dept, sum(salary) FROM emp, dept WHERE dept.empid = emp.empid GROUP BY dept Still works Faster and in more places YEAR 1974YEAR 2014
  16. 16. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | SQL on Hadoop is Obvious Oracle Confidential – Internal/Restricted/Highly Restricted 17 Stinger
  17. 17. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Data Lives in Many Places Oracle Confidential – Internal/Restricted/Highly Restricted 18 Profit and Loss RelationalHadoop Application Logs NoSQL Customer Profiles SQL
  18. 18. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | The Challenge is ON Create a system that: • Gives you the full power of SQL • Requires no changes to application code • Gives you a single view of All Data stored in RDBMS and in Hadoop (++) • No changes (required) to Hadoop or my data • Best possible performance on my Hadoop data 19
  19. 19. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Smart SQL Processing on Hadoop (and more) data 20
  20. 20. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 100% of you are wondering how we do this! 21
  21. 21. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | BDMS Requirements • Full Power of SQL and Advanced Analytics • No Changes to Application Code • Single View of All Data • Fastest Performance • No Changes to Hadoop + • Unified Metadata Across RDBMS & Hadoop • SQL Access to NoSQL
  22. 22. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How did we do this? 1. Give database queries the ability to be a Hadoop client 2. Expand the database metadata to understand Hadoop objects 3. Add services to Hadoop to execute and optimize data requests 23
  23. 23. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Teaching Oracle About Hadoop 24
  24. 24. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How does MapReduce process data? • Scan and row creation needs to be able to work on “any” data format • User defined Java Classes are used to scan and create the rows RecordReader => Scans data (keys and values) InputFormat => Defines parallelism 25 Data Node disk Consumer SCAN Create ROWS
  25. 25. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How does Hive help? • Definitions are represented as tables in the Hive Metastore • Hive leverages a SerDe (Java class) to define columns on rows generated SerDe => Creates columns RecordReader => Scans data (keys and values) InputFormat => Defines parallelism 26 Data Node disk Consumer SCAN Create ROWS & COLUMNS
  26. 26. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 27 Big Data Appliance + Hadoop HDFS DataNode Exadata + Oracle Database OracleCatalog ExternalTable create table customer_address ( ca_customer_id number(10,0) , ca_street_number char(10) , ca_state char(2) , ca_zip char(10) ) organization external ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS (com.oracle.bigdata.cluster hadoop_cl_1) LOCATION ('hive://customer_address') ) HDFS DataNode HDFS NameNode Hivemetadata ExternalTable Hivemetadata Publish Hadoop Metadata to Oracle Catalog
  27. 27. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 28 create table customer_address ( ca_customer_id number(10,0) , ca_street_number char(10) , ca_state char(2) , ca_zip char(10) ) organization external ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS (com.oracle.bigdata.cluster hadoop_cl_1) LOCATION ('hive://customer_address') ) Publish Hadoop Metadata to Oracle Catalog Big Data Appliance + Hadoop HDFS DataNode Exadata + Oracle Database OracleCatalog ExternalTable HDFS DataNode HDFS NameNode Hivemetadata ExternalTable Hivemetadata create table customer_address ( ca_customer_id number(10,0) , ca_street_number char(10) , ca_state char(2) , ca_zip char(10) ) organization external ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS (com.oracle.bigdata.cluster hadoop_cl_1) LOCATION ('hive://customer_address') ) • SerDe • RecordReader • InputFormat • StorageHandlers!
  28. 28. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 29 HDFS DataNode OracleCatalog ExternalTable Select c_customer_id , c_customer_last_name , ca_county From customers , customer_address where c_customer_id = ca_customer_id and ca_state = ‘CA’ HDFS DataNode HDFS NameNode Hivemetadata ExternalTable Hivemetadata Executing Queries on Hadoop HDFS DataNode HDFS DataNode Determine: • Data locations • Data structure • Parallelism Send to specific data nodes: • Data request • Context There’s a bottleneck here!
  29. 29. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Making SQL Processing Smarter 30
  30. 30. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What Can Big Data Learn from Exadata? Oracle Confidential – Internal/Restricted/Highly Restricted 31 Minimized data movement  Performance  Smart Scan −Filters data as it streams from disk  Storage Indexing −Ensures only relevant data is read  Caching −Frequently accessed data takes less time to read
  31. 31. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 32 HDFS DataNode OracleCatalog ExternalTable Select c_customer_id , c_customer_last_name , ca_county From customers , customer_address where c_customer_id = ca_customer_id and ca_state = ‘CA’ HDFS DataNode HDFS NameNode Hivemetadata ExternalTable Hivemetadata Executing Queries on Hadoop HDFS DataNode HDFS DataNode “Tables” Do I/O and Smart Scan: • Filter rows • Project columns Move only relevant data • Relevant rows • Relevant columns Apply join with database data Note: This also works without Hive definitions, as the underlying HDFS access concepts apply…
  32. 32. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Storage Indexes Optimizing Scans on Hadoop • Automatically collect and store the minimum and maximum value within a storage unit • Before scanning a storage unit, verify whether the data requires falls within the Min- Max • If not, skip scanning the block and reduce scan time 33 HDFS DataNode HDFS DataNode HDFS NameNode Hivemetadata HDFS DataNode HDFS DataNode “Blocks” Min Max Min Max Min Max Note: This also works without Hive definitions, simply leverage the SerDE
  33. 33. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What Does This Mean for Me? 34
  34. 34. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What if You Could Query All Data? Oracle Confidential – Internal/Restricted/Highly Restricted 35 Store JSON data unconverted in Hadoop JSON Oracle Database 12cOracle Big Data Appliance SQL Data analyzed via SQLStore business-critical data in Oracle select customers_document.address.state, revenue from customers, sales where customers_document.id=sales.custID group by customers_document.address.state;  Push down to Hadoop − JSON parsing − Column projection − Bloom filter for faster join
  35. 35. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What if You Could Govern All Data? Oracle Confidential – Internal/Restricted/Highly Restricted 36 Store JSON data unconverted in Hadoop JSON Oracle Database 12cOracle Big Data Appliance SQL Data analyzed via SQLStore business-critical data in Oracle DBMS_REDACT.ADD_POLICY( object_schema => 'txadp_hive_01', object_name => 'customer_address_ext', column_name => 'ca_street_name', policy_name => 'customer_address_redaction', function_type => DBMS_REDACT.RANDOM, expression => 'SYS_CONTEXT(''SYS_SESSION_ROLES'', ''REDACTION_TESTER'')=''TRUE''' );  Apply advanced security on Hadoop − Masking/Redaction − Virtual Private Database − Fine-grained Access Control
  36. 36. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle’s Big Data Management System Oracle Confidential – Internal/Restricted/Highly Restricted 37 One fast SQL query, on all your data. Oracle SQL on Hadoop and beyond • With a Smart Scan service as in Exadata • With native SQL operators • With the security and certainty of Oracle DatabaseHappy 40th Birthday SQL
  37. 37. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | http://www.oracle.com/bigdatabreakthrough @dan_mcclary Oracle Confidential – Internal/Restricted/Highly Restricted 38
  38. 38. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 39

×