Hadoop, Oracle and the big data revolution collaborate 2013
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Hadoop, Oracle and the big data revolution collaborate 2013

on

  • 1,843 views

Presentation given at Collaborate 2013

Presentation given at Collaborate 2013

Statistics

Views

Total Views
1,843
Views on SlideShare
1,836
Embed Views
7

Actions

Likes
1
Downloads
55
Comments
1

1 Embed 7

https://twitter.com 7

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Excellent talk!
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Our engineers are frantically working to replace the Quest Tshirt with a Dell tshirt

Hadoop, Oracle and the big data revolution collaborate 2013 Presentation Transcript

  • 1. Hadoop, Oracle and the Industrial Revolution of Data Guy Harrison, Dell Software Group
  • 2. Hadoop, Oracle and theIndustrial Revolution of DataGuy HarrisonExecutive Director, R&DInformation management group
  • 3. Introductions www.guyharrison.net guy_harrison@dell.com http://twitter.com/guyharrison3 Software Group
  • 4. Dell, Quest and Toad4 Software Group
  • 5. 5 Software Group
  • 6. 6 Software Group
  • 7. 7 Software Group
  • 8. 8 Software Group
  • 9. 9 Software Group
  • 10. 10 Software Group
  • 11. Star trek shirt fatality analysis RedYellow Blue 0 10 20 30 40 50 60 70 80 Pct 11 Software Group
  • 12. 12 Software Group
  • 13. 13 Software Group
  • 14. Quest Software is now part of Dell14 Software Group
  • 15. “Big” Data?15 Software Group
  • 16. Three or Four “V”s Value Competitive or Collective advantage Volume Variety Terabytes Structured Petabytes Unstructured Exabytes Human Generated Zetabytes Machine Generated Velocity User populations x Transaction rates x Machine data16 Software Group
  • 17. Data volumes have always beenincreasing…. 2006 Perspective17 Software Group
  • 18. Though the absolute volumes are boggling… Digital information created 2011 2.13E+21Total Digital capacity 1.18E+21 Digital information 2008 4.87E+18 Living Human Genomes 5.48E+18 Google 1.10E+17 Human Brain 2.81E+15 1.E+09 1.E+11 1.E+13 1.E+15 1.E+17 1.E+19 1.E+21 Gigabyte Terabyte Petabyte Exabyte zettabyte 18 Software Group
  • 19. Velocity19 Software Group
  • 20. 20 Software Group
  • 21. Fail whales 21 Software Group
  • 22. Variety OR – the industrial Revolution of data22 Software Group
  • 23. 23 Software Group
  • 24. 24 Software Group
  • 25. 25 Software Group
  • 26. 26 Software Group
  • 27. 27 Software Group
  • 28. 28 Software Group
  • 29. 29 Software Group
  • 30. Data: now and then 1993 2013 Generated Generated internally externally Key to Key to operational competitiveness efficiency Source of product innovation Changing our world30 Software Group
  • 31. “Big” data driven by the smallest devices31 Software Group
  • 32. Smartphone hardware• Quad-core 1.4 GHz CPU• 1GB RAM• 64GB Storage• 1080p display• GSM/Bluetooth/WiFi Network• 8MP Camera• GPS & Compass32 Software Group
  • 33. Smartphone software33 Software Group
  • 34. 34 Software Group
  • 35. 35 Software Group
  • 36. 36 Software Group
  • 37. 37 Software Group
  • 38. Name: Willy BowmanNationality: GermanDON‟T MENTIONTHE WAR
  • 39. Data Input39 Software Group
  • 40. 40 Software Group
  • 41. Siri “Siri call me an “I want to jump off a ambulance” bridge” From now on, I‟ll call you „An Ambulance‟. OK? I found 14 bridges nearby:41 Software Group
  • 42. Sixth-Sense42 Software Group
  • 43. 43 Software Group
  • 44. 44 Software Group
  • 45. Brain Control45 Software Group
  • 46. 46 Software Group
  • 47. 47 Software Group
  • 48. 48 Software Group
  • 49. 49 Software Group
  • 50. 50 Software Group
  • 51. The intrumented human • Compass • Camera • Mike/earphones • Heads up display• Bluetooth Personal Area Network• 3G/WiFi Wide Area Network • Pulse, temp• GPS monitor• Storage • Silent alarms • Pedometer, sleep monitoring 51 Software Group
  • 52. All this requires But what else are they and generates good for? huge data sets52 Software Group
  • 53. The data Companies want to “exhaust” itself generate competitive generates new advantage through opportunites “Big Data analytics”53 Software Group
  • 54. Big Data Analytics Machine Collective Learning Intelligence Programs that Programs that use evolve with inputs from “crowds‟ “experience” to seem intelligent Predictive Analytics Programs that extrapolate from existing data into the future54 Software Group
  • 55. 55 Software Group
  • 56. 56 Software Group
  • 57. 57 Software Group
  • 58. 58 Software Group
  • 59. 59 Software Group
  • 60. 60 Software Group
  • 61. 61 Software Group
  • 62. 62 Software Group
  • 63. 63 Software Group
  • 64. 64 Software Group
  • 65. 65 Software Group
  • 66. Search Optimization Advertising Recommendation • Targeting Systems • Tailoring Security Game Collective • Vulnerability optimization Intelligence • Penetration Detection Medical • Risk analysis • Diagnosis Fraud Detection • Prognosis Predictive Analytics • Churn • Defaults66 Software Group
  • 67. Collective Intelligence ? beats Artificial Intelligence67 Software Group
  • 68. 68 Software Group
  • 69. 69 Software Group
  • 70. 70 Software Group
  • 71. 71 Software Group
  • 72. 72 Software Group
  • 73. For the last 40 years AI has been consistently disappointing73 Software Group
  • 74. 74 Software Group
  • 75. 75 Software Group
  • 76. In 2011 AI made a comeback76 Software Group
  • 77. 77 Software Group
  • 78. 78 Software Group
  • 79. 79 Software Group
  • 80. 80 Software Group
  • 81. 81 Software Group
  • 82. 82 Software Group
  • 83. 83 Software Group
  • 84. Google: Pioneers of Big Data84 Software Group
  • 85. 85 Software Group
  • 86. 86 Software Group
  • 87. 87 Software Group
  • 88. 88 Software Group
  • 89. Google Software Architecture Google Applications Map Reduce Chubby BigTable Google File System (GFS)89 Software Group
  • 90. Map Reduce MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP START MAP REDUCE MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP90 Software Group
  • 91. Multi-stage Map-Reduce SORT AGGREGATE SCAN MAPPER MAPPER MAPPER MAPPER MAPPER MAPPERCLIENT REDUCE HDFS MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER91 Software Group
  • 92. Schema on Read vs Schema on Write92 Software Group
  • 93. Schema on Read vs Schema on Write Schema on Write Code Analyse Transform Load Utilize Extract DataData Cleanse Aggregate Warehouse Norma lize Schema on Read Code AnalyseData Load Utilize Hadoop Cleanse93 Software Group
  • 94. Hadoop: Open Source Map- Reduce Stack94 Software Group
  • 95. Hadoop at Yahoo Yahoo! Hadoop cluster: 4000 nodes 16PB disk 64 TB of RAM 32,000 Cores95 Software Group
  • 96. 96 Software Group
  • 97. MAP REDUCE HADOOP CLIENT(DISTRIBUTED (JAVA, PIG, HIVE)PROCESSING) Hadoop 1.0 HDFS Architecture (DISTRIBUTED STORAGE)JOB TRACKER NAME NODE SECONDARY NAME NODE DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER97 Software Group
  • 98. Oozie (Workflow manager) Hive Pig SQOOP Flume (Query) (Scripting) (RDBMS loader) (Log Loader) ZooKeeper Hbase Hadoop Map Reduce (Locking) (Database) Hadoop File System (HDFS)98 Software Group
  • 99. HBaseA Real time database built on Hadoop Log MemStore Buffer Cache Buffer Table Table Table Table Datafiles Redo HFile HFile WA Log ASM HDFS Disks Disks 99 Software Group
  • 100. Hbase Data ModelName Site Counter NameId Name SiteId SiteNameDick Ebay 507,018 1 Dick 1 EbayDick Google 690,414 2 Jane 2 GoogleJane Google 716,426 3 FacebookDick Facebook 723,649 4 ILoveLarry.comJane Facebook 643,261 5 MadBillFans.comJane ILoveLarry.com 856,767Dick MadBillFans.com 675,230 NameId SiteId Counter 1 1 507,018 1 3 690,414 2 3 716,426 1 3 723,649 2 3 643,261 2 4 856,767 1 5 675,230Id Name Ebay Google Facebook (other columns) MadBillFans.com 1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230Id Name Google Facebook (other columns) ILoveLarry.com 2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767 100 Software Group
  • 101. Hive101 Software Group
  • 102. 102 Software Group
  • 103. SQL JAVA RESULTS103 Software Group
  • 104. Other SQL-like Hadoop Interfaces• Cloudera Impala• MapR Drill• Aster• Greenplumb (Pivotal HD)• Paraccel• Hadapt• Oracle SQL Connector for Hadoop (External Table interface to HDFS)104 Software Group
  • 105. Pig105 Software Group
  • 106. Pig Latin SQL or Hive QL106 Software Group
  • 107. Meanwhile, back at the Deathstar…107 Software Group
  • 108. 108 Software Group
  • 109. 109 Software Group
  • 110. Oracle ExadataDatabase servers Storage Servers 64 cores, 576 GB 112 cores, RAM 100 TB SAS or 336 TB SATA plus 5 TB SSD 110 Software Group
  • 111. Oracle Big Data Appliance  18 Sun X4270 M2 servers − 48GB RAM per node (864GB total) − 2x6 Core CPU per node (216 total) − 12x2TB HDD per node (216 spindles, 864 TB) − 40Gb/s Infiniband between nodes − 10Gb/s Ethernet to datacentre  Competitive Pricing www.oracle.com/us/bigdata/index.html114 Software Group
  • 112. Big Data Appliance Software• Cloudera Enterprise• Oracle Enterprise R• Oracle NoSQL• Oracle Big Data Connectors115 Software Group
  • 113. Latency ORACLE ORACLE ORACLE BIG DATA EXALOGIC EXALYTICS APPLIANCE ORACLE WEBLOGIC ORACLE ORACLE NOSQL ESSBASE ORACLE ORACLE EXADATA LOADER FOR HADOOP APACHE HADOOP ORACLE ORACLE RDBMS TIMES TEN Storage Costs116 Software Group
  • 114. The following week at the Borg collective….117 Software Group
  • 115. © 2012 Quest Software Inc. All rights reserved. 118 Pg. 118
  • 116. 119 Software Group
  • 117. Integrating Hadoop and RDBMS120 Software Group
  • 118. Scenario #1: Reference data in RDBMS PRODUCTS CUSTOMERS HDFS WEBlOGS RDBMS121 Software Group
  • 119. Scenario #2: Hadoop for off-line analytics PRODUCTS CUSTOMERS HDFS SALES HISTORY RDBMS122 Software Group
  • 120. Scenario #3: MapReduce output to RDBMS DB QUERY TOOL WEBLOGS SUMMARY HDFS WEBLOGS RDBMS123 Software Group
  • 121. Scenario #4: Hadoop as RDBMS “activearchive” QUERY TOOL SALES 2011 SALES 2010 SALES 2009 SALES 2009 SALES 2008 SALES 2008 HDFS RDBMS124 Software Group
  • 122. The Big Data Stack125 Software Group
  • 123. DATA SCIENTISTCASCADING R (ET AL) JAVA API PIG MAHOUT JAVA API HIVE MAP-REDUCE HBASE HDFS126 Software Group
  • 124. 127 Software Group
  • 125. DATA SCIENTIST BIG DATA ANALYTICS SOFTWARECASCADING R (ET AL) JAVA API PIG MAHOUT JAVA API HIVE MAP-REDUCE HBASE HDFS128 Software Group
  • 126. INDEXING SENTIMENT AND ANALYSIS SEARCH VISUALIZATION BASKET ANALYSIS RECOMMENDERS COLLECTIVE BIG DATA CLUSTERING INTELLIGENCE ANALYTICS PREDICTIVE ANALYTICS CLASSIFICATION EXPERT SYSTEMS MACHINE (LIKE WATSON) LEARNING OPTIMIZATION129 Software Group
  • 127. In Summary….130 Software Group
  • 128. Hadoop is….131 Software Group
  • 129. Proven at Scale133 Software Group
  • 130. A platform for Advanced analytics134 Software Group
  • 131. ETL Free Schema on Write Code Analyse Extract Transform Load UtilizeData Clean Aggre Data se gate Warehouse Norm alize Schema on Read Code AnalyseData Load Utilize Hadoop Cleanse135 Software Group
  • 132. The most concrete technology enabling the Big Data revolution136 Software Group
  • 133. Hadoop is not….137 Software Group
  • 134. But future Enterprise A replacement Data Architectures for RDBMS will likely incorporate Hadoop side by side with RDBMS138 Software Group
  • 135. Though OLTP systems Suitable for can be built with OLTP Hadoop-compatible NoSQL systems such as HBase and Cassandra139 Software Group
  • 136. Hadoop alone only A complete solves the storage solution challenge of Big Data140 Software Group
  • 137. Shameless plugs141 Software Group
  • 138. Toad for Cloud Databases142 Software Group
  • 139. Toad BI SuiteBusiness Intelligencesolutions with first classsupport forHadoop, Oracle andmany other platforms143 Software Group
  • 140. Kitenga Analytics Suite144 Software Group
  • 141. SharePlex® for Hadoop JMS Queue Hadoop Poster HBase Real Time replication Change Data Batched Capture HDFS File Copy Audit / Change Redo-logs Data145 Software Group
  • 142. Toad for Hadoop Hive Query IDE Oracle <-> Hadoop data management Basic Hadoop administration Beta June146 Software Group
  • 143. 147 Software Group
  • 144. THANK YOUGuy_harrison@dell.com@guyharrisonguyharrison.net
  • 145. THANK YOUGuy_harrison@dell.com @guyharrison guyharrison.net