Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop Meets Exadata- Kerry Osborne


Published on

Published in: Technology
  • Be the first to comment

Hadoop Meets Exadata- Kerry Osborne

  1. 1. HiHadoop Meets Exadata Presented by: Kerry OsborneOracle Open World – October, 2012
  2. 2. whoami –Never Worked for OracleWorked with Oracle Since 1982 (V2)Working with Exadata since early 2010Work for Enkitec ( owns a Half Rack – V2/X2)(Enkitec owns a Big Data Appliance)Many Exadata customers and POCsExadata Book (recently translated to Chinese)Hadoop Aficionado Blog: Twitter: @KerryOracleGuy 2
  3. 3. Top Secret Feature of BDA 3
  4. 4. What’s the Point?Data Volumes are Increasing RapidlyCost of Processing / Storing is HighSomething’s Gotta Give!Besides – managing large quantities of data is whatwe do! 4
  5. 5. Hadoop Is A Virus* Stolen from Orbitz 5
  6. 6. Google Trends 6
  7. 7. Google Trends 7
  8. 8. Google Trends 8
  9. 9. Digression #1 - Big Data Not My Favorite Term 3 or 4 V’s Value Density Not the Right Tool for Every Job 9
  10. 10. Disjointed Presentation Architecture Comparison Integration Discussion Case Study ? 10
  11. 11. Traditional RDBMS Architecture RACwo Cache (SGA)r workersk dbwr lgwr etc… Block Mapper (ASM) Storage 11
  12. 12. HDFS/Hadoop Architecture HA ?wo Job Tracker Name Noderk datanode tasktracker datanode tasktracker workers workers Storage Storage 12
  13. 13. HDFS/Hadoop Architecture HA ?wo Job Trackerrk Block Mapper (namenode) datanode tasktracker datanode tasktracker workers workers Storage Storage 13
  14. 14. Exadata Architecture RACw workerso Cacherk Block Mapper (ASM) Storage Node Storage Node workers workers Storage Storage 14
  15. 15. HDFS/Hadoop Architecture HA ?wo Job Trackerrk Block Mapper (namenode) datanode tasktracker datanode tasktracker workers workers Storage Storage 15
  16. 16. Oracle + Hadoop Integration 16
  17. 17. Obligatory Marketing Slide 17
  18. 18. Integration OptionsMany Ways to Skin the Cat • Fuse • Sqoop • Oracle Big Data Connectors 18
  19. 19. Fuse – External Tables 19
  20. 20. Sqoop (SQL-to-Hadoop) • Graduated from Incubator Status in March 2012 • Slower (no direct path?) • Quest has a plug-in (oraoop) • Bi-Directional 20
  21. 21. Oracle Big Data Connectors Oracle Loader for Hadoop - OLH Oracle Direct Connector for HDFS  - ODCH Oracle R Connector for Hadoop – ORHC Oracle Data Integrator Application Adapter for HadoopNote:All Connectors are One WayAll sold together for $2K per core list 21
  22. 22. Oracle Data IntegratorApplication Adapter for Hadoop ODIAAH ? 22
  23. 23. Oracle R Connector for Hadoop (ORHC) • Provides ability to pull data from Oracle RDBMS • Provides ability to pull data from HDFS • Provides access to local file system • Not really a loader tool • Most useful for analysts 23
  24. 24. Oracle Loader for Hadoop (OLH)• Implemented as a MapReduce job (oraloader.jar)• Saves CPU on DB Server• Can convert to Oracle datatypes• Can partition data and optionally sort it• Online – direct into Oracle tables • Can load into Oracle via JDBC or OCI Direct Path• Offline – generate preprocessed files in HDFS (DP format) 24
  25. 25. Oracle Direct Connector for HDFS  (ODCH) • My Favorite • Uses External Tables • Fastest • 12T per hour • Can load DP files preprocessed by OLH • Allows Oracle SQL to query HDFS data • Doesn’t require loading into Oracle • Pretty Cool! • Downside – uses DB CPU’s 25
  26. 26. Exadoop* Mad Scientist Project 26
  27. 27. Exadoop Unusual Situation! Half Rack with 4 Spare Storage Servers Exadata Cells Very Similar to BDA Servers slower CPU’s less memory but same drives (12X3T) and IB and Flash 4 Cells ≈ Mini BDA! (happy face) 27
  28. 28. Digression #2 - BDA Stuff 28
  29. 29. Digression #2 - BDA Stuff 29
  30. 30. Digression #2 - BDA Stuff 30
  31. 31. ExadoopSituation• Pilot Underway – but wanted more power• 4 Exadata Storage Servers were sitting idle• Suggestion was to Install Hadoop Cluster on them• 1st Concern was being able to Reclaim for Exadata• Removing Data Node from HDFS Not a Problem• Adding Storage to ASM Not a Problem• So the Decision Was Made to Move Forward 31
  32. 32. ExadoopSet Up• Removed the Internal USB’s• Installed OEL 6.2• Installed CDH3• Loaded Some Data• Set Up ODCH with External Tables 32
  33. 33. ExadoopTesting• Selecting Data Using External Tables was Not Very Fast• Quickly Determined we had Used Default 1G Network• Reconfigured with IB• Helped But Not as Much as Expected• Using Little CPU on Data Nodes• But a Single Process was Pegging a CPU on the DB• Added Parallelism• No Good, Only One Slave Active• Added Multiple Files to External Table Def. – Bingo! 33
  34. 34. ExadoopTesting - Continued• Added Fuse Client• Created External Tables with Fuse• PX seems to work even on single files• Puts additional CPU load on DB server (2T/hr) 34
  35. 35. Wrap Up Right Tool For The Job? Maybe All the Cool Kids Are Doing It! 35
  36. 36. Questions?Contact Information : Kerry Osborne 36