Hadoop, oracle and the industrial revolution of data

1,766 views

Published on

Presentation given at Oracle Open World 2012

  • Be the first to comment

  • Be the first to like this

Hadoop, oracle and the industrial revolution of data

  1. 1. Hadoop, Oracle and the industrial revolution of data Guy Harrison VP R&D, Database Management © 2012 Quest Software Inc. All rights reserved.
  2. 2. Hadoop, Oracle and theindustrial revolution of dataGuy HarrisonExecutive Director,R&D BusinessIntelligenceSoftware
  3. 3. Introductions www.guyharrison.net guy.harrison@quest.com http://twitter.com/guyharrison © 2012 Quest Software Inc. All rights reserved. Pg. 3
  4. 4. Quest © 2012 Quest Software Inc. All rights reserved. Pg. 4
  5. 5. © 2012 Quest Software Inc. All rights reserved. Pg. 5
  6. 6. © 2012 Quest Software Inc. All rights reserved. Pg. 6
  7. 7. © 2012 Quest Software Inc. All rights reserved. Pg. 7
  8. 8. © 2012 Quest Software Inc. All rights reserved. Pg. 9
  9. 9. © 2012 Quest Software Inc. All rights reserved. Pg. 10
  10. 10. Star trek shirt fatality analysis RedYellow Blue 0 10 20 30 40 50 60 70 80 Pct © 2012 Quest Software Inc. All rights reserved. Pg. 11
  11. 11. © 2012 Quest Software Inc. All rights reserved. Pg. 12
  12. 12. © 2012 Quest Software Inc. All rights reserved. Pg. 13
  13. 13. What is Big Data? © 2012 Quest Software Inc. All rights reserved. Pg. 14
  14. 14. ValueThe 3-4 V’s Competitive or Community advantage Volume Variety Terabytes Structured Petabytes Unstructured Exabytes Human Generated Zetabytes Machine Generated Velocity User populations x Transaction rates x Machine data © 2012 Quest Software Inc. All rights reserved. Pg. 15
  15. 15. Volume  Data volumes have always been increasing 2006 Perspective © 2012 Quest Software Inc. All rights reserved. Pg. 16
  16. 16. But the vastness is becoming mind bogglingDigital information created 2011 2.13E+21 Total Digital capacity 1.18E+21 Digital information 2008 4.87E+18 Living Human Genomes 5.48E+18 Google 1.10E+17 Human Brain 2.81E+15 1.00E+09 1.00E+10 1.00E+11 1.00E+12 1.00E+13 1.00E+14 1.00E+15 1.00E+16 1.00E+17 1.00E+18 1.00E+19 1.00E+20 1.00E+21 1.00E+22 Gigabyte Terabyte Petabyte Exabyte zettabyte © 2012 Quest Software Inc. All rights reserved. Pg. 17
  17. 17. Velocity © 2012 Quest Software Inc. All rights reserved. Pg. 18
  18. 18. Fail whales © 2012 Quest Software Inc. All rights reserved. Pg. 19
  19. 19. VarietyThe Industrial Revolution of Data © 2012 Quest Software Inc. All rights reserved. Pg. 20
  20. 20. © 2012 Quest Software Inc. All rights reserved. Pg. 21
  21. 21. © 2012 Quest Software Inc. All rights reserved. Pg. 22
  22. 22. Big Data is driven by the smallest devices © 2012 Quest Software Inc. All rights reserved. Pg. 23
  23. 23. Samsung Galaxy S IIII specifications Quad-core 1.4 GHz CPU 1GB RAM 64GB Storage 1080p display GSM/Bluetooth/WiFi Network 8MP Camera GPS & Compass © 2012 Quest Software Inc. All rights reserved. Pg. 24
  24. 24. © 2012 Quest Software Inc. All rights reserved. Pg. 25
  25. 25. © 2012 Quest Software Inc. All rights reserved. Pg. 26
  26. 26. © 2012 Quest Software Inc. All rights reserved. Pg. 27
  27. 27. © 2012 Quest Software Inc. All rights reserved. Pg. 28
  28. 28. © 2012 Quest Software Inc. All rights reserved. Pg. 29
  29. 29. © 2012 Quest Software Inc. All rights reserved. Pg. 30
  30. 30. © 2012 Quest Software Inc. All rights reserved. Pg. 31
  31. 31. © 2012 Quest Software Inc. All rights reserved. Pg. 32
  32. 32. © 2012 Quest Software Inc. All rights reserved. Pg. 33
  33. 33. © 2012 Quest Software Inc. All rights reserved. Pg. 34
  34. 34. Name: Willy BowmanNationality: GermanDON‟T MENTIONTHE WAR 35
  35. 35. Data Input© 2012 Quest Software Inc. All rights reserved. Pg. 36
  36. 36. © 2012 Quest Software Inc. All rights reserved. Pg. 37
  37. 37. Siri “Siri call me an “I want to jump off a ambulance” bridge” From now on, I‟ll call you „An Ambulance‟. OK? I found 14 bridges nearby:
  38. 38. © 2012 Quest Software Inc. All rights reserved. Pg. 39
  39. 39. © 2012 Quest Software Inc. All rights reserved. Pg. 40
  40. 40. Brain Control © 2012 Quest Software Inc. All rights reserved. Pg. 41
  41. 41. © 2012 Quest Software Inc. All rights reserved. Pg. 42
  42. 42. © 2012 Quest Software Inc. All rights reserved. Pg. 43
  43. 43. © 2012 Quest Software Inc. All rights reserved. Pg. 44
  44. 44. © 2012 Quest Software Inc. All rights reserved. Pg. 45
  45. 45. © 2012 Quest Software Inc. All rights reserved. Pg. 46
  46. 46. All of this requires and Generates Big Datasets But what are they good for? © 2012 Quest Software Inc. All rights reserved. Pg. 47
  47. 47. Value?Achieve competitive advantage From Big Data using Collective Intelligence, Machine Learning and Predictive Analytics © 2012 Quest Software Inc. All rights reserved. Pg. 48
  48. 48. Big Data AnalyticsHow do we derivevalue from the data? Machine Collective Learning Intelligence Programs that Programs that use evolve with inputs from “crowds‟ “experience” to seem intelligent Predictive Analytics Programs that extrapolate from existing data into the future
  49. 49. © 2012 Quest Software Inc. All rights reserved. Pg. 50
  50. 50. © 2012 Quest Software Inc. All rights reserved. Pg. 51
  51. 51. © 2012 Quest Software Inc. All rights reserved. Pg. 52
  52. 52. © 2012 Quest Software Inc. All rights reserved. Pg. 53
  53. 53. © 2012 Quest Software Inc. All rights reserved. Pg. 54
  54. 54. © 2012 Quest Software Inc. All rights reserved. Pg. 55
  55. 55. © 2012 Quest Software Inc. All rights reserved. Pg. 56
  56. 56. © 2012 Quest Software Inc. All rights reserved. Pg. 57
  57. 57. © 2012 Quest Software Inc. All rights reserved. Pg. 58
  58. 58. © 2012 Quest Software Inc. All rights reserved. Pg. 59
  59. 59. © 2012 Quest Software Inc. All rights reserved. Pg. 60
  60. 60. Applications Search Optimization Advertising Recommendation • Targeting Systems • Tailoring Security Game optimization Collective • Vulnerability Intelligence • Penetration Detection Medical • Risk analysis Fraud Detection • Diagnosis • Prognosis Predictive Analytics • Churn • Defaults © 2012 Quest Software Inc. All rights reserved. Pg. 61
  61. 61. Collective Intelligence beats Artificial Intelligence ? © 2012 Quest Software Inc. All rights reserved. Pg. 62
  62. 62. © 2012 Quest Software Inc. All rights reserved. Pg. 63
  63. 63. © 2012 Quest Software Inc. All rights reserved. Pg. 64
  64. 64. © 2012 Quest Software Inc. All rights reserved. Pg. 65
  65. 65. © 2012 Quest Software Inc. All rights reserved. Pg. 66
  66. 66. © 2012 Quest Software Inc. All rights reserved. Pg. 67
  67. 67. For the past 40 years, AI has been consistently disappointing © 2012 Quest Software Inc. All rights reserved. Pg. 68
  68. 68. © 2012 Quest Software Inc. All rights reserved. Pg. 69
  69. 69. © 2012 Quest Software Inc. All rights reserved. Pg. 70
  70. 70. © 2012 Quest Software Inc. All rights reserved. Pg. 71
  71. 71. © 2012 Quest Software Inc. All rights reserved. Pg. 72
  72. 72. © 2012 Quest Software Inc. All rights reserved. Pg. 73
  73. 73. © 2012 Quest Software Inc. All rights reserved. Pg. 74
  74. 74. © 2012 Quest Software Inc. All rights reserved. Pg. 75
  75. 75. © 2012 Quest Software Inc. All rights reserved. Pg. 76
  76. 76. © 2012 Quest Software Inc. All rights reserved. Pg. 77
  77. 77. Google: pioneers of big data © 2012 Quest Software Inc. All rights reserved. Pg. 78
  78. 78. © 2012 Quest Software Inc. All rights reserved. Pg. 79
  79. 79. © 2012 Quest Software Inc. All rights reserved. Pg. 80
  80. 80. © 2012 Quest Software Inc. All rights reserved. Pg. 81
  81. 81. © 2012 Quest Software Inc. All rights reserved. Pg. 82
  82. 82. Google Software Architecture Google Applications Map Reduce Chubby BigTable Google File System (GFS) © 2012 Quest Software Inc. All rights reserved. Pg. 83
  83. 83. Map Reduce MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP START MAP REDUCE MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP © 2012 Quest Software Inc. All rights reserved. Pg. 84
  84. 84. Multi-stage Map-Reduce SORT AGGREGATE SCAN MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER CLIENT REDUCE HDFS MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER © 2012 Quest Software Inc. All rights reserved. Pg. 85
  85. 85. Hadoop: Open Source Map-Reduce Stack © 2012 Quest Software Inc. All rights reserved. Pg. 86
  86. 86. Hadoop at Yahoo!  Yahoo! Hadoop cluster: − 4000 nodes − 16PB disk − 64 TB of RAM − 32,000 Cores © 2012 Quest Software Inc. All rights reserved. Pg. 87
  87. 87. © 2012 Quest Software Inc. All rights reserved. Pg. 88
  88. 88. Hadoop MAP REDUCE (DISTRIBUTED HADOOP CLIENT (JAVA, PIG, HIVE)Architecture PROCESSING)(1.0) HDFS (DISTRIBUTED STORAGE) JOB TRACKER NAME NODE SECONDARY NAME NODE DATA NODE TASK DATA NODE TASK DATA NODE TASK TRACKER TRACKER TRACKER DATA NODE TASK DATA NODE TASK DATA NODE TASK TRACKER TRACKER TRACKER DATA NODE TASK DATA NODE TASK DATA NODE TASK TRACKER TRACKER TRACKER DATA NODE TASK DATA NODE TASK DATA NODE TASK TRACKER TRACKER TRACKER © 2012 Quest Software Inc. All rights reserved. Pg. 89
  89. 89. Schema on Read vs Schema on Write © 2012 Quest Software Inc. All rights reserved. Pg. 90
  90. 90. Schema on Write Code AnalyseData Extract Transform Load Utilize Cleanse Aggregate Data Warehouse Normalize Schema on Read Code AnalyseData Load Utilize Hadoop Cleanse © 2012 Quest Software Inc. All rights reserved. Pg. 91
  91. 91. Hadoop Oozie (Workflow manager)Ecosystem Hive Pig SQOOP Flume (Query) (Scripting) (RDBMS loader) (Log Loader) ZooKeeper Hbase Hadoop Map Reduce (Locking) (Database) Hadoop File System (HDFS) © 2012 Quest Software Inc. All rights reserved. Pg. 92
  92. 92. HBase© 2012 Quest Software Inc. All rights reserved. Pg. 93
  93. 93. HBaseHBase is a real-time database built on Hadoop Log Buffer Cache MemStore Buffer Table Table Table Table Datafiles Redo HFile HFile WA Log ASM HDFS Disks Disks © 2012 Quest Software Inc. All rights reserved. Pg. 94
  94. 94. Hbase Data Model Name Site Counter NameId Name SiteId SiteName Dick Ebay 507,018 1 Dick 1 Ebay Dick Google 690,414 2 Jane 2 Google Jane Google 716,426 3 Facebook Dick Facebook 723,649 4 ILoveLarry.com Jane Facebook 643,261 5 MadBillFans.com Jane ILoveLarry.com 856,767 Dick MadBillFans.com 675,230 NameId SiteId Counter 1 1 507,018 1 3 690,414 2 3 716,426 1 3 723,649 2 3 643,261 2 4 856,767 1 5 675,230 Id Name Ebay Google Facebook (other columns) MadBillFans.com 1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230 Id Name Google Facebook (other columns) ILoveLarry.com 2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767
  95. 95. Hive© 2012 Quest Software Inc. All rights reserved. Pg. 96
  96. 96. © 2012 Quest Software Inc. All rights reserved. Pg. 97
  97. 97. SQL JAVAResults © 2012 Quest Software Inc. All rights reserved. Pg. 98
  98. 98. Pig© 2012 Quest Software Inc. All rights reserved. Pg. 99
  99. 99. Pig LatinSQL or Hive QL © 2012 Quest Software Inc. All rights reserved. Pg. 100
  100. 100. Meanwhile, back at the Death Star…. © 2012 Quest Software Inc. All rights reserved. Pg. 101
  101. 101. © 2012 Quest Software Inc. All rights reserved. Pg. 103
  102. 102. Oracle Exadata Database servers Storage Servers 64 cores, 576 GB RAM 112 cores, 100 TB SAS or 336 TB SATA plus 5 TB SSD © 2012 Quest Software Inc. All rights reserved. Pg. 104
  103. 103. © 2012 Quest Software Inc. All rights reserved.
  104. 104. © 2012 Quest Software Inc. All rights reserved. Pg. 106
  105. 105. © 2012 Quest Software Inc. All rights reserved. Pg. 107
  106. 106. Oracle Big Data Appliance 18 Sun X4270 M2 servers − 48GB RAM per node (864GB total) − 2x6 Core CPU per node (216 total) − 12x2TB HDD per node (216 spindles, 864 TB) − 40Gb/s Infiniband between nodes − 10Gb/s Ethernet to datacentre Competitive Pricing www.oracle.com/us/bigdata/index.html © 2012 Quest Software Inc. All rights reserved. Pg. 108
  107. 107. Big Data Appliance Software Cloudera Enterprise Oracle Enterprise R Oracle NoSQL Oracle Big Data Connectors © 2012 Quest Software Inc. All rights reserved. Pg. 109
  108. 108. LatencyOracle’s ORACLE BIG DATA ORACLE EXALOGIC ORACLE EXALYTICSStorage APPLIANCEHierarchy ORACLE WEBLOGIC ORACLE ORACLE NOSQL ESSBASE ORACLE ORACLE EXADATA LOADER FOR HADOOP APACHE ORACLE HADOOP ORACLE RDBMS TIMES TEN Storage Costs © 2012 Quest Software Inc. All rights reserved. Pg. 110
  109. 109. 111© 2012 Quest Software Inc. All rights reserved. Pg. 111
  110. 110. © 2012 Quest Software Inc. All rights reserved. Pg. 112
  111. 111. Hadoop and RDBMS integration © 2012 Quest Software Inc. All rights reserved. Pg. 113
  112. 112. Scenario #1: Reference data in RDBMS PRODUCTS CUSTOMERS HDFS WEBlOGS © 2012 Quest Software Inc. All rights reserved. RDBMS Pg. 114
  113. 113. Scenario #2: Hadoop for off-line analytics PRODUCTS CUSTOMERS HDFS SALES HISTORY © 2012 Quest Software Inc. All rights reserved. RDBMS Pg. 115
  114. 114. Scenario #3: MapReduce output toRDBMS DB QUERY TOOL WEBLOGS SUMMARY HDFS WEBLOGS © 2012 Quest Software Inc. All rights reserved. RDBMS Pg. 116
  115. 115. Scenario #4: Hadoop as RDBMS “active archive” QUERY TOOL SALES 2011 SALES 2010 SALES 2009 SALES 2009 SALES 2008 SALES 2008 HDFS © 2012 Quest Software Inc. All rights reserved. RDBMS Pg. 117
  116. 116. The Big Data Stack © 2012 Quest Software Inc. All rights reserved. Pg. 118
  117. 117. The Big Data Stack DATA SCIENTIST CASCADING R (ET AL) PIG MAHOUT JAVA API JAVA API HIVE MAP-REDUCE HBASE HDFS
  118. 118. The Big Data Stack BIG DATA ANALAYTIC PLATFORM DATA SCIENTIST CASCADING R (ET AL) PIG MAHOUT JAVA API JAVA API HIVE MAP-REDUCE HBASE HDFS
  119. 119. Big Data Analytics Platform INDEXING AND SEARCH SENTIMENT ANALYSIS VISUALIZATION BASKET ANALYSIS RECOMMENDERS BIG DATA ANALYTICS ADVERTISING CLUSTERING OPTIMIZATION CLASSIFICATION EXPERT SYSTEMS (LIKE WATSON)
  120. 120. In Summary© 2012 Quest Software Inc. All rights reserved. Pg. 123
  121. 121. Hadoop is…. © 2012 Quest Software Inc. All rights reserved. Pg. 124
  122. 122. © 2012 Quest Software Inc. All rights reserved.
  123. 123. Scalable • 4000 nodes at Yahoo! • >100 PB at Facebook • 10,000 node design goal for Hadoop 2.0 © 2012 Quest Software Inc. All rights reserved. Pg. 126
  124. 124. A platform for AI, CI & analytics © 2012 Quest Software Inc. All rights reserved. Pg. 127
  125. 125. ETL “Free” Schema on Write Code Analyse Data Extract Transform Load Utilize Cleanse Aggregate Data Warehouse Normalize Schema on Read Code Analyse Data Load Utilize Hadoop Cleanse © 2012 Quest Software Inc. All rights reserved. Pg. 128
  126. 126. The most concrete technology enabling the Big Data revolution © 2012 Quest Software Inc. All rights reserved. Pg. 129
  127. 127. Hadoop is not…. © 2012 Quest Software Inc. All rights reserved. Pg. 130
  128. 128. A replacement for RDBMSBut future Enterprise Data Architectures will likely incorporate Hadoop side by side with RDBMS © 2012 Quest Software Inc. All rights reserved. Pg. 131
  129. 129. Suitable for OLTPThough OLTP systems can be built with Hadoop-compatible NoSQL systems such as HBase and Cassandra © 2012 Quest Software Inc. All rights reserved. Pg. 132
  130. 130. A complete solutionHadoop alone only solves the storage challenge of Big Data © 2012 Quest Software Inc. All rights reserved. Pg. 133
  131. 131. Shameless plugs © 2012 Quest Software Inc. All rights reserved. Pg. 134
  132. 132. Toad for CloudDatabases Work with Hive, Hbase, Oracle, SQ L Server, Cassandra, MyS QL, MongoDB, BI servers and other NoSQL © 2012 Quest Software Inc. All rights reserved. Pg. 136 and SQL datastores
  133. 133. Toad for CloudDatabases Toad for Cloud Databases • Federated SQL queries across Hive, Hbase, NoSQL, RDBMS © 2012 Quest Software Inc. All rights reserved. Pg. 137
  134. 134. © 2012 Quest Software Inc. All rights reserved.
  135. 135. Toad BI SuiteBusiness Intelligence solutionswith first class support forHadoop, Oracle and many otherplatforms © 2012 Quest Software Inc. All rights reserved. Pg. 139
  136. 136. SharePlex® for Hadoop Hadoop JMS Queue PosterChange Data Capture Audit / Change Redo-logs Data Batched HDFS HBase Real File Copy Time replication © 2012 Quest Software Inc. All rights reserved. Pg. 140
  137. 137. Toad for Hadoop • Hive Query IDE • Oracle <-> Hadoop data management • Basic Hadoop administration • ETA beta H1 2013 © 2012 Quest Software Inc. All rights reserved. Pg. 141
  138. 138. © 2012 Quest Software Inc. All rights reserved. Pg. 143
  139. 139. Summary: The future belongs to those of us prepared to wear funny hats and glasses The connected and mobile internet requires and produces “big data” that is qualitatively different from the data we’ve had before − Requiring different types of datastores Enterprise can leverage big data for competitive advantage − Requiring different types of analytical engines © 2012 Quest Software Inc. All rights reserved. Pg. 144
  140. 140. Thank You guy.harrison@quest.com www.guyharrison.net @guyharrison© 2012 Quest Software Inc. All rights reserved. Pg. 145

×