Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle OpenWorld IOUG Forum

2,783 views

Published on

Modern big data solutions often incorporate Hadoop as one of the components and require the integration of Hadoop with other components including Oracle Database. This presentation explains how Hadoop integrates with Oracle products focusing specifically on the Oracle Database products. Explore various methods and tools available to move data between Oracle Database and Hadoop, how to transparently access data in Hadoop from Oracle Database, and review how other products, such as Oracle Business Intelligence Enterprise Edition and Oracle Data Integrator integrate with Hadoop.

Published in: Technology
  • Hi there! Get Your Professional Job-Winning Resume Here - Check our website! http://bit.ly/resumpro
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle OpenWorld IOUG Forum

  1. 1. Bridging Oracle Database and Hadoop Alex Gorbachev October 2015
  2. 2. Alex Gorbachev • Chief Technology Officer at Pythian • Blogger • Cloudera Champion of Big Data • OakTable Network member • Oracle ACE Director • Founder of BattleAgainstAnyGuess.com • Founder of Sydney Oracle Meetup • EVP, IOUG
  3. 3. What is Big Data? and why Big Data today?
  4. 4. Why Big Data boom now? • Advances in communication – it’s now feasible to transfer large amounts of data economically by anyone from virtually anywhere • Commodity hardware – high performance and high capacity at low price is available • Commodity software – open-source phenomena made advanced software products affordable to anyone • New data sources – mobile, sensors, social media data-sources • What’s been only possible at very high cost in the past, can now be done by any small or large business
  5. 5. Big Data = Affordable at Scale
  6. 6. Not everyone is Facebook, Google, Yahoo and etc. These guys had to push the envelope because traditional technology didn’t scale
  7. 7. Not everyone is Facebook, Google, Yahoo and etc. These guys had to push the envelope because traditional technology didn’t scale Mere mortals’ challenge is cost and agility
  8. 8. System capability per $ Big Data technology may be expensive at low scale due to high engineering efforts. Traditional technology becomes too complex and expensive to scale. investments, $ capabilities traditional Big Data
  9. 9. What is Hadoop?
  10. 10. Hadoop Design Principle #1 Scalable Affordable Reliable Data Store HDFS – Hadoop Distributed Filesystem
  11. 11. Hadoop Design Principle #2 Bring Code to Data Code Data
  12. 12. Why is Hadoop so affordable? • Cheap hardware • Resiliency through software • Horizontal scalability • Open-source software
  13. 13. How much does it cost? Oracle Big Data Appliance X5-2 rack - $525K list price • 18 data nodes • 648 CPU cores • 2.3 TB RAM • 216 x 4TB disks • 864TB of raw disk capacity • 288TB usable (triple mirror) • 40G InfiniBand + 10GbE networking • Cloudera Enterprise
  14. 14. Hadoop is very flexible • Rich ecosystem of tools • Can handle any data format – Relational – Text – Audio, video – Streaming data – Logs – Non-relational structured data (JSON, XML, binary formats) – Graph data • Not limited to relational data processing
  15. 15. Challenges with Hadoop for those of us used to Oracle • New data access tools – Relational and non-relational data • Non-Oracle (and non-ANSI) Hive SQL – Java-based UDFs and UDAFs • Security features are not there out-of-the-box • Maybe slow for “small data”
  16. 16. Tables in Hadoop using Hadoop with relational data abstractions
  17. 17. Apache Hive • Apache Hive provides a SQL layer over Hadoop – data in HDFS (structured or unstructured via SerDe) – using one of distributed processing frameworks – MapReduce, Spark, Tez • Presents data from HDFS as tables and columns – Hive metastore (aka data dictionary) • SQL language access (HiveQL) – Parses SQL and creates execution plans in MR, Spark or Tez • JDBC and ODBC drivers – Access from ETL and BI tools – Custom apps – Development tools
  18. 18. Native Hadoop tools • Demo • HUE – HDFS files – Hive – Impala
  19. 19. Access Hive using SQL Developer • Demo • Use Cloudera JDBC drivers • Query data & browse metadata • Run DDL from SQL tab • Create Hive table definitions inside Oracle DB
  20. 20. Hadoop and OBIEE 11g • OBIEE 11.1.1.7 can query Hive/Hadoop as a data source – Hive ODBC drivers – Apache Hive Physical Layer database type • Limited features – OBIEE 11.1.1.7 OBIEE has HiveServer1 ODBC drivers – HiveQL is only a subset of ANSI SQL • Hive query response time is slow for speed of thought response time
  21. 21. ODI 12c • ODI – data transformation tool – ELT approach pushes transformations down to Hadoop - leveraging power of cluster – Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation • Upcoming support for Pig and Spark • Workflow orchestration • Metadata and model-driven • GUI workflow design • Transformation audit & data quality
  22. 22. Moving Data to Hadoop using ODI • Interface with Apache Sqoop using IKM SQL to Hive-HBase-File knowledge module – Hadoop ecosystem tool – Able to run in parallel – Optimized Sqoop JDBC drivers integration for Oracle – Bi-directional in-and-out of Hadoop to RDBMS – Data is moved directly between Hadoop cluster and database • Export RBDMS data to file and load using IKM File to Hive
  23. 23. Integrating Hadoop with Oracle Database
  24. 24. Oracle Big Data Connectors • Oracle Loader for Hadoop – Offloads some pre-processing to Hadoop MR jobs (data type conversion, partitioning, sorting). – Direct load into the database (online method) – Data Pump binary files in HDFS (offline method) • These can then be accessed as external tables on HDFS • Oracle Direct Connector for Hadoop – Create external table on files in HDFS – Text files or Data Pump binary files – WARNING: lots of data movement! Great for archival non- frequently accessed data to HDFS
  25. 25. Oracle Big Data SQL 25 Source: http://www.slideshare.net/gwenshap/data-wrangling-and-oracle-connectors-for-hadoop
  26. 26. Oracle Big Data SQL • Transparent access from Oracle DB to Hadoop – Oracle SQL dialect – Oracle DB security model – Join data from Hadoop and Oracle • SmartScan - pushing code to data – Same software base as on Exadata Storage Cells – Minimize data transfer from Hadoop to Oracle • Requires BDA and Exadata • Licensed per Hadoop disk spindle 26
  27. 27. Big Data SQL Demo
  28. 28. Big Data SQL in Oracle tools • Transparent to any app • SQL Developer • ODI • OBIEE
  29. 29. Hadoop as Data Warehouse
  30. 30. Traditional Needs of Data Warehouses • Speed of thought end user analytics experience – BI tools coupled with DW databases • Scalable data platform – DW database • Versatile and scalable data transformation engine – ETL tools sometimes coupled with DW databases • Data quality control and audit – ETL tools
  31. 31. What drives Hadoop adoption for Data Warehousing?
  32. 32. What drives Hadoop adoption for Data Warehousing? 1. Cost efficiency
  33. 33. What drives Hadoop adoption for Data Warehousing? 1. Cost efficiency 2. Agility needs
  34. 34. Why is Hadoop Cost Efficient? Hadoop leverages two main trends in IT industry • Commodity hardware – high performance and high capacity at low price is available • Commodity software – open-source phenomena made advanced software products affordable to anyone
  35. 35. How Does Hadoop Enable Agility? • Load first, structure later – Don’t need to spend months changing DW to add new types of data without knowing for sure it will be valuable for end users – Quick and easy to verify hypothesis – perfect data exploration platform • All data in one place is very powerful – Much easier to test new theories • Natural fit for “unstructured” data
  36. 36. Traditional needs of DW & Hadoop • Speed of thought end user analytics experience? – Very recent features – Impala, Presto, Drill, Hadapt, etc. – BI tools embracing Hadoop as DW – Totally new products become available • Scalable data platform? – Yes • Versatile and scalable data transformation engine? – Yes but needs a lot of DIY – ETL vendors embraced Hadoop • Data quality control and audit? – Hadoop makes it more difficult because of flexibility it brings – A lot of DIY but ETL vendors getting better supporting Hadoop + new products appear
  37. 37. Unique Hadoop Challenges • Still “young” technology – requires a lot of high quality engineering talent • Security doesn’t come out of the box – Capabilities are there but very tedious to implement and somewhat fragile • Challenge of selecting the right tool for the job – Hadoop ecosystem is huge • Hadoop breaks IT silos • Requires commoditization of IT operations – Large footprint with agile deployments
  38. 38. Typical Hadoop adoption in modern Enterprise IT Data WarehouseHadoop BI tools
  39. 39. Bring the world in your data center
  40. 40. Rare historical report
  41. 41. Find a needle in a haystack
  42. 42. Will Hadoop displace traditional DW platforms? Hadoop BI tools
  43. 43. Example pure Hadoop DW stack HDFS Hive/Pig FlumeSqoop DIY Impala Kerberos Oozie + DIY - data sources
  44. 44. Do you have a Big Data problem?
  45. 45. Your Data is NOT as BIG as you think
  46. 46. is NOT a Big Data problem Using 8 years old hardware…
  47. 47. is NOT a Big Data problem Misconfigured infrastructure…
  48. 48. is NOT a Big Data problem Lack of purging policy…
  49. 49. is NOT a Big Data problem Bad data model design…
  50. 50. is NOT a Big Data problem Bad SQL…
  51. 51. Your Data is NOT as BIG as you think Controversy…
  52. 52. Thanks and Q&A Contact info gorbachev@pythian.com +1-877-PYTHIAN To follow us pythian.com/blog @alexgorbachev @pythian linkedin.com/company/pythian

×