Hadoop Meets Exadata- Kerry Osborne


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Many companies that are using Hadoop in a big way still have Oracle databases sitting right next to them. Nokia - I had a meeting with a guy from Nokia a couple of weeks ago. We discussed how they were using Hadoop and he described basically an ETL kind of setup. The HDFS cluster ingests data that is then processed by MR jobs. The aggregated data is then fed into a Relational DB so the analysts could have their way with it. People have preferences for certain tools (BI tools for example). Also, RDBMS’s can be very fast for this type of access is the data is of reasonable size. Not using Flume, ??? Usiing it for many things but positional data from phones was one of the main cases we discussed. Canadian NSA – They have Exadata and Hadoop Cluster – rows of racks of both
  • Use Firefox
  • Use Firefox
  • Use Firefox
  • With all the new options available it will take some serious thought about what architecture makes the most sense for any given problem. I had a conversation 2 weeks ago with the Canadian NSA (CSE) – completely static data set – never updated. Good for Hadoop or for HCC. HCC provides about 10x compression on their data set. So a single Exadata rack which has a raw storage capacity of about half a pedabyte can store over 2 pedabytes with normal redundancy. On the other hand, I had a conversation with Nokia about how they are using Hadoop. They have been heavily investing in the technology for a couple of years. A large part of what they do involves investing data produced by mobile phones. The data is typical mined by MR jobs and aggregated data sets are then loaded into RDBMS’s where analysts can use standard BI tools to do what they do. So they described it as an ETL type process.
  • Hadoop Meets Exadata- Kerry Osborne

    1. 1. HiHadoop Meets Exadata Presented by: Kerry OsborneOracle Open World – October, 2012
    2. 2. whoami –Never Worked for OracleWorked with Oracle Since 1982 (V2)Working with Exadata since early 2010Work for Enkitec (www.enkitec.com)(Enkitec owns a Half Rack – V2/X2)(Enkitec owns a Big Data Appliance)Many Exadata customers and POCsExadata Book (recently translated to Chinese)Hadoop Aficionado Blog: kerryosborne.oracle-guy.com Twitter: @KerryOracleGuy 2
    3. 3. Top Secret Feature of BDA 3
    4. 4. What’s the Point?Data Volumes are Increasing RapidlyCost of Processing / Storing is HighSomething’s Gotta Give!Besides – managing large quantities of data is whatwe do! 4
    5. 5. Hadoop Is A Virus* Stolen from Orbitz 5
    6. 6. Google Trends 6
    7. 7. Google Trends 7
    8. 8. Google Trends 8
    9. 9. Digression #1 - Big Data Not My Favorite Term 3 or 4 V’s Value Density Not the Right Tool for Every Job 9
    10. 10. Disjointed Presentation Architecture Comparison Integration Discussion Case Study ? 10
    11. 11. Traditional RDBMS Architecture RACwo Cache (SGA)r workersk dbwr lgwr etc… Block Mapper (ASM) Storage 11
    12. 12. HDFS/Hadoop Architecture HA ?wo Job Tracker Name Noderk datanode tasktracker datanode tasktracker workers workers Storage Storage 12
    13. 13. HDFS/Hadoop Architecture HA ?wo Job Trackerrk Block Mapper (namenode) datanode tasktracker datanode tasktracker workers workers Storage Storage 13
    14. 14. Exadata Architecture RACw workerso Cacherk Block Mapper (ASM) Storage Node Storage Node workers workers Storage Storage 14
    15. 15. HDFS/Hadoop Architecture HA ?wo Job Trackerrk Block Mapper (namenode) datanode tasktracker datanode tasktracker workers workers Storage Storage 15
    16. 16. Oracle + Hadoop Integration 16
    17. 17. Obligatory Marketing Slide 17
    18. 18. Integration OptionsMany Ways to Skin the Cat • Fuse • Sqoop • Oracle Big Data Connectors 18
    19. 19. Fuse – External Tables 19
    20. 20. Sqoop (SQL-to-Hadoop) • Graduated from Incubator Status in March 2012 • Slower (no direct path?) • Quest has a plug-in (oraoop) • Bi-Directional 20
    21. 21. Oracle Big Data Connectors Oracle Loader for Hadoop - OLH Oracle Direct Connector for HDFS  - ODCH Oracle R Connector for Hadoop – ORHC Oracle Data Integrator Application Adapter for HadoopNote:All Connectors are One WayAll sold together for $2K per core list 21
    22. 22. Oracle Data IntegratorApplication Adapter for Hadoop ODIAAH ? 22
    23. 23. Oracle R Connector for Hadoop (ORHC) • Provides ability to pull data from Oracle RDBMS • Provides ability to pull data from HDFS • Provides access to local file system • Not really a loader tool • Most useful for analysts 23
    24. 24. Oracle Loader for Hadoop (OLH)• Implemented as a MapReduce job (oraloader.jar)• Saves CPU on DB Server• Can convert to Oracle datatypes• Can partition data and optionally sort it• Online – direct into Oracle tables • Can load into Oracle via JDBC or OCI Direct Path• Offline – generate preprocessed files in HDFS (DP format) 24
    25. 25. Oracle Direct Connector for HDFS  (ODCH) • My Favorite • Uses External Tables • Fastest • 12T per hour • Can load DP files preprocessed by OLH • Allows Oracle SQL to query HDFS data • Doesn’t require loading into Oracle • Pretty Cool! • Downside – uses DB CPU’s 25
    26. 26. Exadoop* Mad Scientist Project 26
    27. 27. Exadoop Unusual Situation! Half Rack with 4 Spare Storage Servers Exadata Cells Very Similar to BDA Servers slower CPU’s less memory but same drives (12X3T) and IB and Flash 4 Cells ≈ Mini BDA! (happy face) 27
    28. 28. Digression #2 - BDA Stuff 28
    29. 29. Digression #2 - BDA Stuff 29
    30. 30. Digression #2 - BDA Stuff 30
    31. 31. ExadoopSituation• Pilot Underway – but wanted more power• 4 Exadata Storage Servers were sitting idle• Suggestion was to Install Hadoop Cluster on them• 1st Concern was being able to Reclaim for Exadata• Removing Data Node from HDFS Not a Problem• Adding Storage to ASM Not a Problem• So the Decision Was Made to Move Forward 31
    32. 32. ExadoopSet Up• Removed the Internal USB’s• Installed OEL 6.2• Installed CDH3• Loaded Some Data• Set Up ODCH with External Tables 32
    33. 33. ExadoopTesting• Selecting Data Using External Tables was Not Very Fast• Quickly Determined we had Used Default 1G Network• Reconfigured with IB• Helped But Not as Much as Expected• Using Little CPU on Data Nodes• But a Single Process was Pegging a CPU on the DB• Added Parallelism• No Good, Only One Slave Active• Added Multiple Files to External Table Def. – Bingo! 33
    34. 34. ExadoopTesting - Continued• Added Fuse Client• Created External Tables with Fuse• PX seems to work even on single files• Puts additional CPU load on DB server (2T/hr) 34
    35. 35. Wrap Up Right Tool For The Job? Maybe All the Cool Kids Are Doing It! 35
    36. 36. Questions?Contact Information : Kerry Osborne kerry.osborne@enkitec.com kerryosborne.oracle-guy.com www.enkitec.com 36