Thriving and surviving the Big Data revolution

950 views

Published on

Presentation on Big Data given at Collaborate 2014 #c14lv

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
950
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
37
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • And this of course doesn’t count the people who don’t go into the online store in the first place.
  • Dell’s solutions are winning in the market, but to be end to end, they are missing a key ingredient…
  • End to end solution is completed with addition of analytics platform.
  • Thriving and surviving the Big Data revolution

    1. 1. 1 Global MarketingConfidential REMINDER Check in on the COLLABORATE mobile app 207:Surviving and thriving in the big data revolution Guy Harrison Executive Director, R&D Information Management Group Dell Software
    2. 2. 207:Surviving and thriving in the big data revolution Guy Harrison Executive Director, R&D Information management group
    3. 3. 3 Software Group Introductions Web: guyharrison.net Email: guy.harrison@software.dell.com Twitter: @guyharrison Google Plus: https://www.google.com/+GuyHarrison1
    4. 4. 4 Software Group
    5. 5. 5 Software Group
    6. 6. 6 Software Group
    7. 7. 7 Software Group
    8. 8. 8 Software Group Dell and Quest – a brief history
    9. 9. 9 Software Group But Seriously
    10. 10. 10 Software Group What is Big Data?
    11. 11. 11 Software Group Three or Four “V”s Volume Terabytes Petabytes Exabytes Zetabytes Variety Structured Unstructured Human Generated Machine Generated Velocity User populations x Transaction rates x Machine data Value Competitive or Collective advantage
    12. 12. 12 Software Group Instead - the industrial Revolution of data
    13. 13. 13 Software Group
    14. 14. 14 Software Group
    15. 15. 15 Software Group
    16. 16. 16 Software Group
    17. 17. 17 Software Group
    18. 18. 18 Software Group
    19. 19. 19 Software Group
    20. 20. 20 Software Group
    21. 21. 21 Software Group Generated internally Key to operational efficiency 1993 Generated externally Key to competitive advantage Source of product innovation Changing our lives 2013 Data means more
    22. 22. 22 Software Group Big Data is the culmination of cloud, social and mobile
    23. 23. 23 Software Group Not all upside
    24. 24. 24 Software Group Will Big Data kill retail?
    25. 25. 25 Software Group Prevalence of Showrooming 0 10 20 30 40 50 60 70 Consumer Electronics Home Improvement Pct Garter Research G00249458 Survey Analysis: Focus on Customer Basics to Challenge Amazon, as 'Showrooming' Is Universal but Not Unbeatable Published: 12 February 2013
    26. 26. 26 Software Group
    27. 27. 27 Software Group
    28. 28. 28 Software Group
    29. 29. 29 Software Group
    30. 30. 30 Software Group Some novel defences
    31. 31. 31 Software Group Web analytics for retail
    32. 32. 32 Software Group Connected Store • Shelf assortment optimization • In store offers • Customer entertainment • Checkout anywhere • Relationship management • Customer analytics
    33. 33. 33 Software Group
    34. 34. 34 Software Group Why showrooming? Selection Stock Faster Cheaper Dynamic Pricing Predictive ordering Assortment optimization Predictive recommendations Personalization Defences?
    35. 35. 35 Software Group It’s not enough to lay out products on tables • Online has significant advantages • Retailers can only survive by embracing online and emulating online practices – Dynamic pricing – Shelf optimization – Personalized service and selection • Only big data analytics can provide these advantages
    36. 36. 36 Software Group There’s a similar story in every industry Web Transport Power Grid Dating Retail Security FinanceGovernment Science Healthcare Insurance Telecom Advertising
    37. 37. 37 Software Group The Revolution is not over yet
    38. 38. 38 Software Group
    39. 39. 39 Software Group
    40. 40. 40 Software Group
    41. 41. 41 Software Group
    42. 42. 42 Software Group Willy Bowman Nationality: German Don’t Mention the WAR!
    43. 43. 43 Software Group Buying choices: Amazon softcover: $45.99 Oracle Performance Survival Guide Amazon Kindle: $39.99 Say “screw you bookseller” to buy kindle version
    44. 44. 44 Software Group
    45. 45. 45 Software Group Data Input
    46. 46. 46 Software Group
    47. 47. Siri From now on, I’ll call you ‘An Ambulance’. OK? “Siri call me an ambulance” I found 14 bridges nearby: “I want to jump off a bridge”
    48. 48. 48 Software Group
    49. 49. 49 Software Group
    50. 50. 50 Software Group Brain Control
    51. 51. 51 Software Group
    52. 52. 52 Software Group
    53. 53. 53 Software Group Muze
    54. 54. 54 Software Group
    55. 55. 55 Software Group
    56. 56. 56 Software Group The instrumented human • Bluetooth Personal Area Network • 3G/WiFi Wide Area Network • GPS • Storage • Pulse, temp monitor • Silent alarms • Pedometer, sleep monitoring • Compass • Camera • Mike/earphones • Heads up display • Emotion/Attention monitor
    57. 57. 57 Software Group The instrumented world
    58. 58. 58 Software Group All of which accelerates what we call Big Data
    59. 59. 59 Software Group Big Database technologies
    60. 60. 60 Software Group Pioneers of Big Data
    61. 61. 61 Software Group
    62. 62. 62 Software Group
    63. 63. 63 Software Group
    64. 64. 64 Software Group
    65. 65. 65 Software Group
    66. 66. 66 Software Group Google File System (GFS) Map Reduce BigTable Google Applications Google Software Architecture
    67. 67. 67 Software Group Start ReduceMap Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Map Reduce
    68. 68. 68 Software Group HDFS MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER SCANSORT MAPPER MAPPER MAPPER MAPPER AGGREGATE REDUCEClient Multi-stage Map-Reduce
    69. 69. 69 Software Group Schema on Read vs Schema on Write Data Analyse Aggregate Nor maliz e Cleanse Code Extract LoadTransform Data Warehouse Data Load Hadoop Analyse Cleans e Code Utilize Schema on Write Schema on Read Utilize
    70. 70. 70 Software Group Hadoop: Open Source Map- Reduce Stack
    71. 71. 71 Software Group Hadoop at Yahoo Yahoo! Hadoop cluster: • 4000 nodes • 16PB disk • 64 TB of RAM • 32,000 Cores
    72. 72. 72 Software Group
    73. 73. 73 Software Group
    74. 74. 74 Software Group Hadoop File System (HDFS) Map Reduce/ YARN Hbase (Database) ZooKeeper (Locking) SQOOP (RDBMS loader) Hive (Query) Pig (Scripting) Flume (Log Loader) Oozie (Workflow manager) Hadoop ecosystem
    75. 75. 75 Software Group Hadoop 1.0 Architecture MAP REDUCE (DISTRIBUTED PROCESSING) HADOOP CLIENT (JAVA, PIG, HIVE) HDFS (DISTRIBUTED STORAGE) JOB TRACKER DATA NODE TASK TRACKER DATA NODE TASK TRACKER DATA NODE TASK TRACKER DATA NODE TASK TRACKER NAME NODE DATA NODE TASK TRACKER DATA NODE TASK TRACKER DATA NODE TASK TRACKER DATA NODE TASK TRACKER SECONDARY NAME NODE DATA NODE TASK TRACKER DATA NODE TASK TRACKER DATA NODE TASK TRACKER DATA NODE TASK TRACKER DATA NODE TASK TRACKER DATA NODE TASK TRACKER DATA NODE TASK TRACKER
    76. 76. 76 Software Group Hadoop 2.0 YARN* APPLICATION MASTER NODE MANAGER CONTAINER RESOURCE MANAGER NODE MANAGER CONTAINER NODE MANAGER CONTAINER HADOOP CLIENT (JAVA, PIG, HIVE) *Yet Another Resource Negotiator
    77. 77. 77 Software Group Tez1 1Hindi for “fast” HDFS MAP REDUCE MAP MAP REDUCE MAP MAP REDUCE MAP Job 2Job 1 Job 3 HDFS Job 1
    78. 78. 78 Software Group HBase A Real time database built on Hadoop ASM Datafiles Buffer Cache Table Table Redo Disks Log Buffer HDFS HFile MemStore Table Table WA Log Disks HFile
    79. 79. 79 Software Group Name Site Counter Dick Ebay 507,018 Dick Google 690,414 Jane Google 716,426 Dick Facebook 723,649 Jane Facebook 643,261 Jane ILoveLarry.com 856,767 Dick MadBillFans.com 675,230 NameId Name 1 Dick 2 Jane SiteId SiteName 1 Ebay 2 Google 3 Facebook 4 ILoveLarry.com 5 MadBillFans.com NameId SiteId Counter 1 1 507,018 1 3 690,414 2 3 716,426 1 3 723,649 2 3 643,261 2 4 856,767 1 5 675,230 Id Name Ebay Google Facebook (other columns) MadBillFans.com 1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230 Id Name Google Facebook (other columns) ILoveLarry.com 2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767 Hbase Data Model
    80. 80. 80 Software Group Hive
    81. 81. 81 Software Group
    82. 82. 82 Software Group SQL JAVA RESULTS
    83. 83. 83 Software Group Other SQL-like Hadoop Interfaces Cloudera Impala MapR Drill Aster Greenplumb (Pivotal HD) Paraccel Hadapt Oracle SQL Connector for Hadoop (External Table interface to HDFS)
    84. 84. 84 Software Group Pig Pig Latin SQL or Hive QL
    85. 85. 85 Software Group Flume and SQOOP CUSTOMERS WebLogs PRODUCTS HDFS RDBMS FLUME SQOOP
    86. 86. 86 Software Group Berkeley Data Analytic Stack (BDAS) Yarn Yarn EC2 Yarn Mesos – heterogeneous cluster manager Tachyon – in memory File system Spark – memory optimized distributed execution Spark Streaming Mlbase, Mlib , – Machine Learning Map Reduce Shark (SQL) Hive (SQL) BlinkDB
    87. 87. 87 Software Group Meanwhile, back at the Death Star
    88. 88. 88 Software Group
    89. 89. 89 Software Group Oracle Exadata (X-2) Database servers 64 cores, 576 GB RAM Storage Servers 112 cores, 100 TB SAS or 336 TB SATA plus 5 TB SSD
    90. 90. 93 Software Group Oracle Big Data Appliance • 18 Sun X4270 M2 servers – 48GB RAM per node (864GB total) – 2x6 Core CPU per node (216 total) – 12x2TB HDD per node (216 spindles, 864 TB) – 40Gb/s Infiniband between nodes – 10Gb/s Ethernet to datacentre • Competitive Pricing www.oracle.com/us/bigdata/index.html
    91. 91. 94 Software Group Big Data Appliance Software • Cloudera Enterprise • Oracle Enterprise R • Oracle NoSQL • Oracle Big Data Connectors
    92. 92. 95 Software Group Generating competitive advantage through “Big Data analytics” Machine Learning Programs that evolve with “experience” Collective Intelligence Programs that use inputs from “crowds’ to seem intelligent Predictive Analytics Programs that extrapolate from existing data into the future Big Data Analytics AKA Data Science
    93. 93. 96 Software Group Collective Intelligence
    94. 94. 97 Software Group
    95. 95. 98 Software Group
    96. 96. 99 Software Group
    97. 97. 100 Software Group
    98. 98. 101 Software Group
    99. 99. 102 Software Group
    100. 100. 103 Software Group
    101. 101. 104 Software Group
    102. 102. 105 Software Group Google Flu Trends
    103. 103. 106 Software Group
    104. 104. 107 Software Group Collective Intelligence outsmarts Artificial Intelligence?
    105. 105. 108 Software Group
    106. 106. 109 Software Group
    107. 107. 110 Software Group
    108. 108. 111 Software Group
    109. 109. 112 Software Group Artificial Intelligence Strikes back
    110. 110. 113 Software Group
    111. 111. 114 Software Group
    112. 112. 115 Software Group
    113. 113. 116 Software Group
    114. 114. 117 Software Group Watson is big data AI
    115. 115. 119 Software Group Classification • Create a model that identifies/classifies new data • Spam detection, churn risk, customer value
    116. 116. 120 Software Group Clustering • Group data without a pre-existing classification scheme • For instance, basket analysis
    117. 117. 121 Software Group Supervised Machine Learning Raw Data Clean Validate Model Candidate Model Training Set Validation Set Production Model New Data New Business Existing Business Prediction
    118. 118. 122 Software Group Inmaps.linkedin.com Unsupervised learning
    119. 119. 123 Software Group
    120. 120. 124 Software Group Big Data Analytics Data Science Search Optimization Recommendation Systems Security •Vulnerability •Penetration Detection Fraud Detection CRM •Churn •Defaults Medical •Risk analysis •Diagnosis •Prognosis Game optimization Advertising •Targeting •Tailoring
    121. 121. 125 Software Group Data Science is hard • Machine learning, collective intelligence, Hadoop, predict ive analytics, R, Weka, Mahout, a re HARD • Small-medium businesses need help to compete • Data scientists to the rescue?
    122. 122. 126 Software Group Data Scientists to the rescue?
    123. 123. 127 Software Group Kitenga Analytics Suite
    124. 124. 128 Software Group Toad for Hadoop http://www.toadworld.com/products/ toad-for-hadoop/default.aspx
    125. 125. 129 Software Group SharePlex® for Hadoop Redo-logs Change Data Capture JMS Queue Hadoop Poster Batched HDFS File Copy Audit / Change Data HBase Real Time replication
    126. 126. 130 Software Group Toad BI Suite
    127. 127. 131 Software Group
    128. 128. 132 Software GroupConfidential Keycomponentstobuildend-to- endBI/Analyticssolutions Dell’s offering was not complete… Data Integration Database Management Advanced Analytics Business Intelligence Server and Storage Server and Storage TOAD & Shareplex TOAD BI Boomi Kitenga In order to address the demands that face mid-market customers, Dell must offer end-to-end solutions enabled with advanced analytic capabilities
    129. 129. 133 Software GroupConfidential Dell acquires Statsoft Data Integration Database Management Advanced Analytics Business Intelligence Server and Storage STATISTICA Server and Storage TOAD & Shareplex TOAD BI Boomi Kitenga Keycomponentstobuildend-to- endBI/Analyticssolutions Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
    130. 130. 134 Software GroupConfidential Confidential 13 4
    131. 131. 135 Software GroupConfidential Confidential Data Visualization 13 5
    132. 132. 136 Software GroupConfidential Confidential Live scoring – integration into operational systems 13 6
    133. 133. 137 Software GroupConfidential Confidential Industry and cross-industry packaged solutions 13 7
    134. 134. 138 Software Group For your business • How could data and algorithms transform your business? • What are the technologies that will be most important? – Mobility – Cloud – Hadoop – Big Data Analytics • Where is the data? – Start collecting now!
    135. 135. 139 Software Group For your career • Hadoop and NoSQL creates strong career opportunities for DBAs and developers – Demand will exceed supply for the foreseeable future • Lot’s of opportunities for those with Math & Statistics – Good time to brush off that statistics textbook and play with R (maybe Oracle Enterprise R?) • Easy to get started with Hadoop – SQOOP – Hive – Pig
    136. 136. Please complete the session evaluation on the mobile app We appreciate your feedback and insight This box will have simplified instructions about how to complete the session evaluation online

    ×