Hadoop, Oracle and the big data revolution collaborate 2013

1,774 views
1,723 views

Published on

Presentation given at Collaborate 2013

Published in: Technology
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total views
1,774
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
61
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide
  • Our engineers are frantically working to replace the Quest Tshirt with a Dell tshirt
  • Hadoop, Oracle and the big data revolution collaborate 2013

    1. 1. Hadoop, Oracle and the Industrial Revolution of Data Guy Harrison, Dell Software Group
    2. 2. Hadoop, Oracle and theIndustrial Revolution of DataGuy HarrisonExecutive Director, R&DInformation management group
    3. 3. Introductions www.guyharrison.net guy_harrison@dell.com http://twitter.com/guyharrison3 Software Group
    4. 4. Dell, Quest and Toad4 Software Group
    5. 5. 5 Software Group
    6. 6. 6 Software Group
    7. 7. 7 Software Group
    8. 8. 8 Software Group
    9. 9. 9 Software Group
    10. 10. 10 Software Group
    11. 11. Star trek shirt fatality analysis RedYellow Blue 0 10 20 30 40 50 60 70 80 Pct 11 Software Group
    12. 12. 12 Software Group
    13. 13. 13 Software Group
    14. 14. Quest Software is now part of Dell14 Software Group
    15. 15. “Big” Data?15 Software Group
    16. 16. Three or Four “V”s Value Competitive or Collective advantage Volume Variety Terabytes Structured Petabytes Unstructured Exabytes Human Generated Zetabytes Machine Generated Velocity User populations x Transaction rates x Machine data16 Software Group
    17. 17. Data volumes have always beenincreasing…. 2006 Perspective17 Software Group
    18. 18. Though the absolute volumes are boggling… Digital information created 2011 2.13E+21Total Digital capacity 1.18E+21 Digital information 2008 4.87E+18 Living Human Genomes 5.48E+18 Google 1.10E+17 Human Brain 2.81E+15 1.E+09 1.E+11 1.E+13 1.E+15 1.E+17 1.E+19 1.E+21 Gigabyte Terabyte Petabyte Exabyte zettabyte 18 Software Group
    19. 19. Velocity19 Software Group
    20. 20. 20 Software Group
    21. 21. Fail whales 21 Software Group
    22. 22. Variety OR – the industrial Revolution of data22 Software Group
    23. 23. 23 Software Group
    24. 24. 24 Software Group
    25. 25. 25 Software Group
    26. 26. 26 Software Group
    27. 27. 27 Software Group
    28. 28. 28 Software Group
    29. 29. 29 Software Group
    30. 30. Data: now and then 1993 2013 Generated Generated internally externally Key to Key to operational competitiveness efficiency Source of product innovation Changing our world30 Software Group
    31. 31. “Big” data driven by the smallest devices31 Software Group
    32. 32. Smartphone hardware• Quad-core 1.4 GHz CPU• 1GB RAM• 64GB Storage• 1080p display• GSM/Bluetooth/WiFi Network• 8MP Camera• GPS & Compass32 Software Group
    33. 33. Smartphone software33 Software Group
    34. 34. 34 Software Group
    35. 35. 35 Software Group
    36. 36. 36 Software Group
    37. 37. 37 Software Group
    38. 38. Name: Willy BowmanNationality: GermanDON‟T MENTIONTHE WAR
    39. 39. Data Input39 Software Group
    40. 40. 40 Software Group
    41. 41. Siri “Siri call me an “I want to jump off a ambulance” bridge” From now on, I‟ll call you „An Ambulance‟. OK? I found 14 bridges nearby:41 Software Group
    42. 42. Sixth-Sense42 Software Group
    43. 43. 43 Software Group
    44. 44. 44 Software Group
    45. 45. Brain Control45 Software Group
    46. 46. 46 Software Group
    47. 47. 47 Software Group
    48. 48. 48 Software Group
    49. 49. 49 Software Group
    50. 50. 50 Software Group
    51. 51. The intrumented human • Compass • Camera • Mike/earphones • Heads up display• Bluetooth Personal Area Network• 3G/WiFi Wide Area Network • Pulse, temp• GPS monitor• Storage • Silent alarms • Pedometer, sleep monitoring 51 Software Group
    52. 52. All this requires But what else are they and generates good for? huge data sets52 Software Group
    53. 53. The data Companies want to “exhaust” itself generate competitive generates new advantage through opportunites “Big Data analytics”53 Software Group
    54. 54. Big Data Analytics Machine Collective Learning Intelligence Programs that Programs that use evolve with inputs from “crowds‟ “experience” to seem intelligent Predictive Analytics Programs that extrapolate from existing data into the future54 Software Group
    55. 55. 55 Software Group
    56. 56. 56 Software Group
    57. 57. 57 Software Group
    58. 58. 58 Software Group
    59. 59. 59 Software Group
    60. 60. 60 Software Group
    61. 61. 61 Software Group
    62. 62. 62 Software Group
    63. 63. 63 Software Group
    64. 64. 64 Software Group
    65. 65. 65 Software Group
    66. 66. Search Optimization Advertising Recommendation • Targeting Systems • Tailoring Security Game Collective • Vulnerability optimization Intelligence • Penetration Detection Medical • Risk analysis • Diagnosis Fraud Detection • Prognosis Predictive Analytics • Churn • Defaults66 Software Group
    67. 67. Collective Intelligence ? beats Artificial Intelligence67 Software Group
    68. 68. 68 Software Group
    69. 69. 69 Software Group
    70. 70. 70 Software Group
    71. 71. 71 Software Group
    72. 72. 72 Software Group
    73. 73. For the last 40 years AI has been consistently disappointing73 Software Group
    74. 74. 74 Software Group
    75. 75. 75 Software Group
    76. 76. In 2011 AI made a comeback76 Software Group
    77. 77. 77 Software Group
    78. 78. 78 Software Group
    79. 79. 79 Software Group
    80. 80. 80 Software Group
    81. 81. 81 Software Group
    82. 82. 82 Software Group
    83. 83. 83 Software Group
    84. 84. Google: Pioneers of Big Data84 Software Group
    85. 85. 85 Software Group
    86. 86. 86 Software Group
    87. 87. 87 Software Group
    88. 88. 88 Software Group
    89. 89. Google Software Architecture Google Applications Map Reduce Chubby BigTable Google File System (GFS)89 Software Group
    90. 90. Map Reduce MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP START MAP REDUCE MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP90 Software Group
    91. 91. Multi-stage Map-Reduce SORT AGGREGATE SCAN MAPPER MAPPER MAPPER MAPPER MAPPER MAPPERCLIENT REDUCE HDFS MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER91 Software Group
    92. 92. Schema on Read vs Schema on Write92 Software Group
    93. 93. Schema on Read vs Schema on Write Schema on Write Code Analyse Transform Load Utilize Extract DataData Cleanse Aggregate Warehouse Norma lize Schema on Read Code AnalyseData Load Utilize Hadoop Cleanse93 Software Group
    94. 94. Hadoop: Open Source Map- Reduce Stack94 Software Group
    95. 95. Hadoop at Yahoo Yahoo! Hadoop cluster: 4000 nodes 16PB disk 64 TB of RAM 32,000 Cores95 Software Group
    96. 96. 96 Software Group
    97. 97. MAP REDUCE HADOOP CLIENT(DISTRIBUTED (JAVA, PIG, HIVE)PROCESSING) Hadoop 1.0 HDFS Architecture (DISTRIBUTED STORAGE)JOB TRACKER NAME NODE SECONDARY NAME NODE DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER97 Software Group
    98. 98. Oozie (Workflow manager) Hive Pig SQOOP Flume (Query) (Scripting) (RDBMS loader) (Log Loader) ZooKeeper Hbase Hadoop Map Reduce (Locking) (Database) Hadoop File System (HDFS)98 Software Group
    99. 99. HBaseA Real time database built on Hadoop Log MemStore Buffer Cache Buffer Table Table Table Table Datafiles Redo HFile HFile WA Log ASM HDFS Disks Disks 99 Software Group
    100. 100. Hbase Data ModelName Site Counter NameId Name SiteId SiteNameDick Ebay 507,018 1 Dick 1 EbayDick Google 690,414 2 Jane 2 GoogleJane Google 716,426 3 FacebookDick Facebook 723,649 4 ILoveLarry.comJane Facebook 643,261 5 MadBillFans.comJane ILoveLarry.com 856,767Dick MadBillFans.com 675,230 NameId SiteId Counter 1 1 507,018 1 3 690,414 2 3 716,426 1 3 723,649 2 3 643,261 2 4 856,767 1 5 675,230Id Name Ebay Google Facebook (other columns) MadBillFans.com 1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230Id Name Google Facebook (other columns) ILoveLarry.com 2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767 100 Software Group
    101. 101. Hive101 Software Group
    102. 102. 102 Software Group
    103. 103. SQL JAVA RESULTS103 Software Group
    104. 104. Other SQL-like Hadoop Interfaces• Cloudera Impala• MapR Drill• Aster• Greenplumb (Pivotal HD)• Paraccel• Hadapt• Oracle SQL Connector for Hadoop (External Table interface to HDFS)104 Software Group
    105. 105. Pig105 Software Group
    106. 106. Pig Latin SQL or Hive QL106 Software Group
    107. 107. Meanwhile, back at the Deathstar…107 Software Group
    108. 108. 108 Software Group
    109. 109. 109 Software Group
    110. 110. Oracle ExadataDatabase servers Storage Servers 64 cores, 576 GB 112 cores, RAM 100 TB SAS or 336 TB SATA plus 5 TB SSD 110 Software Group
    111. 111. Oracle Big Data Appliance  18 Sun X4270 M2 servers − 48GB RAM per node (864GB total) − 2x6 Core CPU per node (216 total) − 12x2TB HDD per node (216 spindles, 864 TB) − 40Gb/s Infiniband between nodes − 10Gb/s Ethernet to datacentre  Competitive Pricing www.oracle.com/us/bigdata/index.html114 Software Group
    112. 112. Big Data Appliance Software• Cloudera Enterprise• Oracle Enterprise R• Oracle NoSQL• Oracle Big Data Connectors115 Software Group
    113. 113. Latency ORACLE ORACLE ORACLE BIG DATA EXALOGIC EXALYTICS APPLIANCE ORACLE WEBLOGIC ORACLE ORACLE NOSQL ESSBASE ORACLE ORACLE EXADATA LOADER FOR HADOOP APACHE HADOOP ORACLE ORACLE RDBMS TIMES TEN Storage Costs116 Software Group
    114. 114. The following week at the Borg collective….117 Software Group
    115. 115. © 2012 Quest Software Inc. All rights reserved. 118 Pg. 118
    116. 116. 119 Software Group
    117. 117. Integrating Hadoop and RDBMS120 Software Group
    118. 118. Scenario #1: Reference data in RDBMS PRODUCTS CUSTOMERS HDFS WEBlOGS RDBMS121 Software Group
    119. 119. Scenario #2: Hadoop for off-line analytics PRODUCTS CUSTOMERS HDFS SALES HISTORY RDBMS122 Software Group
    120. 120. Scenario #3: MapReduce output to RDBMS DB QUERY TOOL WEBLOGS SUMMARY HDFS WEBLOGS RDBMS123 Software Group
    121. 121. Scenario #4: Hadoop as RDBMS “activearchive” QUERY TOOL SALES 2011 SALES 2010 SALES 2009 SALES 2009 SALES 2008 SALES 2008 HDFS RDBMS124 Software Group
    122. 122. The Big Data Stack125 Software Group
    123. 123. DATA SCIENTISTCASCADING R (ET AL) JAVA API PIG MAHOUT JAVA API HIVE MAP-REDUCE HBASE HDFS126 Software Group
    124. 124. 127 Software Group
    125. 125. DATA SCIENTIST BIG DATA ANALYTICS SOFTWARECASCADING R (ET AL) JAVA API PIG MAHOUT JAVA API HIVE MAP-REDUCE HBASE HDFS128 Software Group
    126. 126. INDEXING SENTIMENT AND ANALYSIS SEARCH VISUALIZATION BASKET ANALYSIS RECOMMENDERS COLLECTIVE BIG DATA CLUSTERING INTELLIGENCE ANALYTICS PREDICTIVE ANALYTICS CLASSIFICATION EXPERT SYSTEMS MACHINE (LIKE WATSON) LEARNING OPTIMIZATION129 Software Group
    127. 127. In Summary….130 Software Group
    128. 128. Hadoop is….131 Software Group
    129. 129. Proven at Scale133 Software Group
    130. 130. A platform for Advanced analytics134 Software Group
    131. 131. ETL Free Schema on Write Code Analyse Extract Transform Load UtilizeData Clean Aggre Data se gate Warehouse Norm alize Schema on Read Code AnalyseData Load Utilize Hadoop Cleanse135 Software Group
    132. 132. The most concrete technology enabling the Big Data revolution136 Software Group
    133. 133. Hadoop is not….137 Software Group
    134. 134. But future Enterprise A replacement Data Architectures for RDBMS will likely incorporate Hadoop side by side with RDBMS138 Software Group
    135. 135. Though OLTP systems Suitable for can be built with OLTP Hadoop-compatible NoSQL systems such as HBase and Cassandra139 Software Group
    136. 136. Hadoop alone only A complete solves the storage solution challenge of Big Data140 Software Group
    137. 137. Shameless plugs141 Software Group
    138. 138. Toad for Cloud Databases142 Software Group
    139. 139. Toad BI SuiteBusiness Intelligencesolutions with first classsupport forHadoop, Oracle andmany other platforms143 Software Group
    140. 140. Kitenga Analytics Suite144 Software Group
    141. 141. SharePlex® for Hadoop JMS Queue Hadoop Poster HBase Real Time replication Change Data Batched Capture HDFS File Copy Audit / Change Redo-logs Data145 Software Group
    142. 142. Toad for Hadoop Hive Query IDE Oracle <-> Hadoop data management Basic Hadoop administration Beta June146 Software Group
    143. 143. 147 Software Group
    144. 144. THANK YOUGuy_harrison@dell.com@guyharrisonguyharrison.net
    145. 145. THANK YOUGuy_harrison@dell.com @guyharrison guyharrison.net

    ×