Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop, Oracle and the big data revolution collaborate 2013

2,048 views

Published on

Presentation given at Collaborate 2013

Published in: Technology

Hadoop, Oracle and the big data revolution collaborate 2013

  1. 1. Hadoop, Oracle and the Industrial Revolution of Data Guy Harrison, Dell Software Group
  2. 2. Hadoop, Oracle and theIndustrial Revolution of DataGuy HarrisonExecutive Director, R&DInformation management group
  3. 3. Introductions www.guyharrison.net guy_harrison@dell.com http://twitter.com/guyharrison3 Software Group
  4. 4. Dell, Quest and Toad4 Software Group
  5. 5. 5 Software Group
  6. 6. 6 Software Group
  7. 7. 7 Software Group
  8. 8. 8 Software Group
  9. 9. 9 Software Group
  10. 10. 10 Software Group
  11. 11. Star trek shirt fatality analysis RedYellow Blue 0 10 20 30 40 50 60 70 80 Pct 11 Software Group
  12. 12. 12 Software Group
  13. 13. 13 Software Group
  14. 14. Quest Software is now part of Dell14 Software Group
  15. 15. “Big” Data?15 Software Group
  16. 16. Three or Four “V”s Value Competitive or Collective advantage Volume Variety Terabytes Structured Petabytes Unstructured Exabytes Human Generated Zetabytes Machine Generated Velocity User populations x Transaction rates x Machine data16 Software Group
  17. 17. Data volumes have always beenincreasing…. 2006 Perspective17 Software Group
  18. 18. Though the absolute volumes are boggling… Digital information created 2011 2.13E+21Total Digital capacity 1.18E+21 Digital information 2008 4.87E+18 Living Human Genomes 5.48E+18 Google 1.10E+17 Human Brain 2.81E+15 1.E+09 1.E+11 1.E+13 1.E+15 1.E+17 1.E+19 1.E+21 Gigabyte Terabyte Petabyte Exabyte zettabyte 18 Software Group
  19. 19. Velocity19 Software Group
  20. 20. 20 Software Group
  21. 21. Fail whales 21 Software Group
  22. 22. Variety OR – the industrial Revolution of data22 Software Group
  23. 23. 23 Software Group
  24. 24. 24 Software Group
  25. 25. 25 Software Group
  26. 26. 26 Software Group
  27. 27. 27 Software Group
  28. 28. 28 Software Group
  29. 29. 29 Software Group
  30. 30. Data: now and then 1993 2013 Generated Generated internally externally Key to Key to operational competitiveness efficiency Source of product innovation Changing our world30 Software Group
  31. 31. “Big” data driven by the smallest devices31 Software Group
  32. 32. Smartphone hardware• Quad-core 1.4 GHz CPU• 1GB RAM• 64GB Storage• 1080p display• GSM/Bluetooth/WiFi Network• 8MP Camera• GPS & Compass32 Software Group
  33. 33. Smartphone software33 Software Group
  34. 34. 34 Software Group
  35. 35. 35 Software Group
  36. 36. 36 Software Group
  37. 37. 37 Software Group
  38. 38. Name: Willy BowmanNationality: GermanDON‟T MENTIONTHE WAR
  39. 39. Data Input39 Software Group
  40. 40. 40 Software Group
  41. 41. Siri “Siri call me an “I want to jump off a ambulance” bridge” From now on, I‟ll call you „An Ambulance‟. OK? I found 14 bridges nearby:41 Software Group
  42. 42. Sixth-Sense42 Software Group
  43. 43. 43 Software Group
  44. 44. 44 Software Group
  45. 45. Brain Control45 Software Group
  46. 46. 46 Software Group
  47. 47. 47 Software Group
  48. 48. 48 Software Group
  49. 49. 49 Software Group
  50. 50. 50 Software Group
  51. 51. The intrumented human • Compass • Camera • Mike/earphones • Heads up display• Bluetooth Personal Area Network• 3G/WiFi Wide Area Network • Pulse, temp• GPS monitor• Storage • Silent alarms • Pedometer, sleep monitoring 51 Software Group
  52. 52. All this requires But what else are they and generates good for? huge data sets52 Software Group
  53. 53. The data Companies want to “exhaust” itself generate competitive generates new advantage through opportunites “Big Data analytics”53 Software Group
  54. 54. Big Data Analytics Machine Collective Learning Intelligence Programs that Programs that use evolve with inputs from “crowds‟ “experience” to seem intelligent Predictive Analytics Programs that extrapolate from existing data into the future54 Software Group
  55. 55. 55 Software Group
  56. 56. 56 Software Group
  57. 57. 57 Software Group
  58. 58. 58 Software Group
  59. 59. 59 Software Group
  60. 60. 60 Software Group
  61. 61. 61 Software Group
  62. 62. 62 Software Group
  63. 63. 63 Software Group
  64. 64. 64 Software Group
  65. 65. 65 Software Group
  66. 66. Search Optimization Advertising Recommendation • Targeting Systems • Tailoring Security Game Collective • Vulnerability optimization Intelligence • Penetration Detection Medical • Risk analysis • Diagnosis Fraud Detection • Prognosis Predictive Analytics • Churn • Defaults66 Software Group
  67. 67. Collective Intelligence ? beats Artificial Intelligence67 Software Group
  68. 68. 68 Software Group
  69. 69. 69 Software Group
  70. 70. 70 Software Group
  71. 71. 71 Software Group
  72. 72. 72 Software Group
  73. 73. For the last 40 years AI has been consistently disappointing73 Software Group
  74. 74. 74 Software Group
  75. 75. 75 Software Group
  76. 76. In 2011 AI made a comeback76 Software Group
  77. 77. 77 Software Group
  78. 78. 78 Software Group
  79. 79. 79 Software Group
  80. 80. 80 Software Group
  81. 81. 81 Software Group
  82. 82. 82 Software Group
  83. 83. 83 Software Group
  84. 84. Google: Pioneers of Big Data84 Software Group
  85. 85. 85 Software Group
  86. 86. 86 Software Group
  87. 87. 87 Software Group
  88. 88. 88 Software Group
  89. 89. Google Software Architecture Google Applications Map Reduce Chubby BigTable Google File System (GFS)89 Software Group
  90. 90. Map Reduce MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP START MAP REDUCE MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP90 Software Group
  91. 91. Multi-stage Map-Reduce SORT AGGREGATE SCAN MAPPER MAPPER MAPPER MAPPER MAPPER MAPPERCLIENT REDUCE HDFS MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER91 Software Group
  92. 92. Schema on Read vs Schema on Write92 Software Group
  93. 93. Schema on Read vs Schema on Write Schema on Write Code Analyse Transform Load Utilize Extract DataData Cleanse Aggregate Warehouse Norma lize Schema on Read Code AnalyseData Load Utilize Hadoop Cleanse93 Software Group
  94. 94. Hadoop: Open Source Map- Reduce Stack94 Software Group
  95. 95. Hadoop at Yahoo Yahoo! Hadoop cluster: 4000 nodes 16PB disk 64 TB of RAM 32,000 Cores95 Software Group
  96. 96. 96 Software Group
  97. 97. MAP REDUCE HADOOP CLIENT(DISTRIBUTED (JAVA, PIG, HIVE)PROCESSING) Hadoop 1.0 HDFS Architecture (DISTRIBUTED STORAGE)JOB TRACKER NAME NODE SECONDARY NAME NODE DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER97 Software Group
  98. 98. Oozie (Workflow manager) Hive Pig SQOOP Flume (Query) (Scripting) (RDBMS loader) (Log Loader) ZooKeeper Hbase Hadoop Map Reduce (Locking) (Database) Hadoop File System (HDFS)98 Software Group
  99. 99. HBaseA Real time database built on Hadoop Log MemStore Buffer Cache Buffer Table Table Table Table Datafiles Redo HFile HFile WA Log ASM HDFS Disks Disks 99 Software Group
  100. 100. Hbase Data ModelName Site Counter NameId Name SiteId SiteNameDick Ebay 507,018 1 Dick 1 EbayDick Google 690,414 2 Jane 2 GoogleJane Google 716,426 3 FacebookDick Facebook 723,649 4 ILoveLarry.comJane Facebook 643,261 5 MadBillFans.comJane ILoveLarry.com 856,767Dick MadBillFans.com 675,230 NameId SiteId Counter 1 1 507,018 1 3 690,414 2 3 716,426 1 3 723,649 2 3 643,261 2 4 856,767 1 5 675,230Id Name Ebay Google Facebook (other columns) MadBillFans.com 1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230Id Name Google Facebook (other columns) ILoveLarry.com 2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767 100 Software Group
  101. 101. Hive101 Software Group
  102. 102. 102 Software Group
  103. 103. SQL JAVA RESULTS103 Software Group
  104. 104. Other SQL-like Hadoop Interfaces• Cloudera Impala• MapR Drill• Aster• Greenplumb (Pivotal HD)• Paraccel• Hadapt• Oracle SQL Connector for Hadoop (External Table interface to HDFS)104 Software Group
  105. 105. Pig105 Software Group
  106. 106. Pig Latin SQL or Hive QL106 Software Group
  107. 107. Meanwhile, back at the Deathstar…107 Software Group
  108. 108. 108 Software Group
  109. 109. 109 Software Group
  110. 110. Oracle ExadataDatabase servers Storage Servers 64 cores, 576 GB 112 cores, RAM 100 TB SAS or 336 TB SATA plus 5 TB SSD 110 Software Group
  111. 111. Oracle Big Data Appliance  18 Sun X4270 M2 servers − 48GB RAM per node (864GB total) − 2x6 Core CPU per node (216 total) − 12x2TB HDD per node (216 spindles, 864 TB) − 40Gb/s Infiniband between nodes − 10Gb/s Ethernet to datacentre  Competitive Pricing www.oracle.com/us/bigdata/index.html114 Software Group
  112. 112. Big Data Appliance Software• Cloudera Enterprise• Oracle Enterprise R• Oracle NoSQL• Oracle Big Data Connectors115 Software Group
  113. 113. Latency ORACLE ORACLE ORACLE BIG DATA EXALOGIC EXALYTICS APPLIANCE ORACLE WEBLOGIC ORACLE ORACLE NOSQL ESSBASE ORACLE ORACLE EXADATA LOADER FOR HADOOP APACHE HADOOP ORACLE ORACLE RDBMS TIMES TEN Storage Costs116 Software Group
  114. 114. The following week at the Borg collective….117 Software Group
  115. 115. © 2012 Quest Software Inc. All rights reserved. 118 Pg. 118
  116. 116. 119 Software Group
  117. 117. Integrating Hadoop and RDBMS120 Software Group
  118. 118. Scenario #1: Reference data in RDBMS PRODUCTS CUSTOMERS HDFS WEBlOGS RDBMS121 Software Group
  119. 119. Scenario #2: Hadoop for off-line analytics PRODUCTS CUSTOMERS HDFS SALES HISTORY RDBMS122 Software Group
  120. 120. Scenario #3: MapReduce output to RDBMS DB QUERY TOOL WEBLOGS SUMMARY HDFS WEBLOGS RDBMS123 Software Group
  121. 121. Scenario #4: Hadoop as RDBMS “activearchive” QUERY TOOL SALES 2011 SALES 2010 SALES 2009 SALES 2009 SALES 2008 SALES 2008 HDFS RDBMS124 Software Group
  122. 122. The Big Data Stack125 Software Group
  123. 123. DATA SCIENTISTCASCADING R (ET AL) JAVA API PIG MAHOUT JAVA API HIVE MAP-REDUCE HBASE HDFS126 Software Group
  124. 124. 127 Software Group
  125. 125. DATA SCIENTIST BIG DATA ANALYTICS SOFTWARECASCADING R (ET AL) JAVA API PIG MAHOUT JAVA API HIVE MAP-REDUCE HBASE HDFS128 Software Group
  126. 126. INDEXING SENTIMENT AND ANALYSIS SEARCH VISUALIZATION BASKET ANALYSIS RECOMMENDERS COLLECTIVE BIG DATA CLUSTERING INTELLIGENCE ANALYTICS PREDICTIVE ANALYTICS CLASSIFICATION EXPERT SYSTEMS MACHINE (LIKE WATSON) LEARNING OPTIMIZATION129 Software Group
  127. 127. In Summary….130 Software Group
  128. 128. Hadoop is….131 Software Group
  129. 129. Proven at Scale133 Software Group
  130. 130. A platform for Advanced analytics134 Software Group
  131. 131. ETL Free Schema on Write Code Analyse Extract Transform Load UtilizeData Clean Aggre Data se gate Warehouse Norm alize Schema on Read Code AnalyseData Load Utilize Hadoop Cleanse135 Software Group
  132. 132. The most concrete technology enabling the Big Data revolution136 Software Group
  133. 133. Hadoop is not….137 Software Group
  134. 134. But future Enterprise A replacement Data Architectures for RDBMS will likely incorporate Hadoop side by side with RDBMS138 Software Group
  135. 135. Though OLTP systems Suitable for can be built with OLTP Hadoop-compatible NoSQL systems such as HBase and Cassandra139 Software Group
  136. 136. Hadoop alone only A complete solves the storage solution challenge of Big Data140 Software Group
  137. 137. Shameless plugs141 Software Group
  138. 138. Toad for Cloud Databases142 Software Group
  139. 139. Toad BI SuiteBusiness Intelligencesolutions with first classsupport forHadoop, Oracle andmany other platforms143 Software Group
  140. 140. Kitenga Analytics Suite144 Software Group
  141. 141. SharePlex® for Hadoop JMS Queue Hadoop Poster HBase Real Time replication Change Data Batched Capture HDFS File Copy Audit / Change Redo-logs Data145 Software Group
  142. 142. Toad for Hadoop Hive Query IDE Oracle <-> Hadoop data management Basic Hadoop administration Beta June146 Software Group
  143. 143. 147 Software Group
  144. 144. THANK YOUGuy_harrison@dell.com@guyharrisonguyharrison.net
  145. 145. THANK YOUGuy_harrison@dell.com @guyharrison guyharrison.net

×