Becoming a data driven organization

1,405 views
1,255 views

Published on

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,405
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
63
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Becoming a data driven organization

  1. 1. 2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
  2. 2. New IT Agenda Provide Access To All Applications & Data Through Mobile Devices. Use Adaptive, Data-Driven Security To Rapidly Respond To Emerging Threats. Build A Data Lake To Deliver Insights And Applications On All Data. Move To A Software-Defined Data Center Infrastructure And Expand It To A Hybrid Cloud. 2 Use Agile Development To Build New Customer-Centric Applications. Balance Risk Cut Operational Costs & Legacy More Than Ever React Faster To Find New Growth Today’s Business Challenges
  3. 3. Best Of Breed. Architected Horizontally, Not Vertically. Choice. Our Strategy: Build A Differentiated Stack BIG DATA SOLUTIONS PLATFORM AS A SERVICE AGILE APPLICATION DEVELOPMENT INFORMATION INFRASTRUCTURE CONVERGED INFRASTRUCTURE SOFTWARE-DEFINED DATA CENTER HYBRID-CLOUD, MOBILITY ADVANCEDSECURITY
  4. 4. Many Industries Face Structural Change
  5. 5. Volvo Cars – Big Data app & service Deliveries to your car – Roam Delivery
  6. 6. The EMC Exabyte Journey
  7. 7. 20,772 HARD DRIVES 300 PALLETS 8 TRUCKS 85 PETABYTESSOLD INTO ONE WEB-SCALE PROVIDER IN ONE ORDER
  8. 8. Capacity Performance Low Service Level High Service Level Performance “Good Enough” Capacity Optimized ($/GB) Data Loss Not A Disaster Consistently Good Performance Eventual Consistency Of Data Data Loss Not A Disaster Performance “Good Enough” Capacity Optimized ($/GB) Data Loss A Disaster Consistently Good Performance Consistent Data Data Loss A Disaster Great Performance Consistent Data Data Loss A Disaster
  9. 9. 11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
  10. 10. 12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
  11. 11. 13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. The Core is Super-Scaling
  12. 12. 14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. The Edge is Hyper-Extending
  13. 13. THE 3RD PLATFORM OF IT
  14. 14. TODAY’S DATA CENTER SOFTWARE-DEFINED DATA CENTER TRADITIONAL APPLICATIONS NEXT GEN CLOUD APPLICATIONS 120M -2016- 91M -2013- 34M -2016- 11M -2013- TRADITIONAL APPLICATION GROWTH NEXT GEN CLOUD APPLICATION GROWTH
  15. 15. THE 3RD PLATFORM REDEFINES EVERYTHING
  16. 16. BUILT FOR THE SPEED OF BUSINESS
  17. 17. 21Pivotal Confidential–Internal Use Only 21Pivotal Confidential–Internal Use Only Data Driven Application Development
  18. 18. 22Pivotal Confidential–Internal Use Only Pivotal At-a-Glance Ÿ  New Independent Venture: Spun out & jointly owned by EMC & VMware Ÿ  Deep Execution Talent: 1700 employees Ÿ  Proven Leadership: Paul Maritz, CEO Ÿ  Global Customer Validation: +1000 Tier-1 Enterprise Customers Ÿ  Strategic Backing: $100M investment by GE Ÿ  Bold Vision: New platform for a new era, focused on the intersection of apps, big data and analytics
  19. 19. 23Pivotal Confidential–Internal Use Only Need for Speed Ÿ  Enterprises are being driven to compete, innovate & execute faster than ever before: –  Global reach and emerging markets –  Ever-increasing customer expectations –  Legacy environment & cost pressures Ÿ  At a time when we’re witnessing the most disruptive platform shifts and advances in technology in over 30 years Ÿ  Every business is quickly becoming a software business Ÿ  Software is how companies engage with customers, powered by new data insights and new Social Cloud Big Data Mobile
  20. 20. 24Pivotal Confidential–Internal Use Only What Matters: Apps. Data. Analytics. Apps power businesses, and those apps generate data Analytic insights from that data drive new app functionality, which in-turn drives new data The faster you can move around that cycle, the faster you learn, innovate & pull away from the competition
  21. 21. 25Pivotal Confidential–Internal Use Only “Software is Eating the World”
  22. 22. 26Pivotal Confidential–Internal Use Only Software is Changing Industries $3.5B valuation Financial Services $3.5B valuation Travel & Hospitality $3.5B valuation Transportation $3.2B Acquisition by Google Home Automation $20B valuation Entertainment $1.1B acquisition Monsanto--Agriculture
  23. 23. 27Pivotal Confidential–Internal Use Only We need to innovate or we die. Pivotal and Cloud Foundry is our big bet to leapfrog the competition With Cloud Foundry, we built what looks like a software company… Moving from silos to a single platform. Cloud Foundry's potential to transform business is vast… companies leveraging software will outperform their peers Enterprises Must Become Great at Software
  24. 24. 28Pivotal Confidential–Internal Use Only Francisco Gonzalez CEO at BBVA “Banks need to take on Amazon and Google or die. The shift to digital requires a complete overhaul of banks technology…it is a matter of survival.”
  25. 25. 29Pivotal Confidential–Internal Use Only Rapid Execution Requires a New Approach •  Agile teams and rapid iteration •  Continuous delivery without downtime •  Horizontally scalability (data and app) •  Standardized service binding and discovery •  First class Mobile support •  Deep user analytics Development Delivery Operation Iteration
  26. 26. 30Pivotal Confidential–Internal Use Only Jonathan Rosenberg CTO & VP, Collaboration “PaaS is the operating system for the cloud. As the set of APIs and services for PaaS's grow, the choice of PaaS becomes more crucial as the costs of porting go up. This is one of the benefits of open source PaaS offerings like Cloud Foundry.”
  27. 27. 31Pivotal Confidential–Internal Use Only Is Your Enterprise Ready?
  28. 28. 32Pivotal Confidential–Internal Use Only Fail, Learn, Adapt, Repeat. 6 Months to 6 Weeks $1.1M Saving per App
  29. 29. 33Pivotal Confidential–Internal Use Only Incredible Cloud Foundry Ecosystem
  30. 30. 34Pivotal Confidential–Internal Use Only The Cloud Foundry Foundation more to come… “This is a significant announcement for PaaS in general and for Cloud Foundry in particular. It potentially signals a consolidation that is going to become apparent…predicts that Red Hat will shutter OpenShift and throw its hat in with Cloud Foundry within the year.” - Forbes, Ben Kepes, 2/24/14
  31. 31. 35Pivotal Confidential–Internal Use Only Elastic Runtime Java, Spring, Ruby, Node.JS Built-in “Middleware” Services Operation Manager Installation, Management, Monitoring, Upgrades/Updates ...ETC Pivotal Approach: Your Platform for Building Great Software PivotalOne Pivotal One CO-INNOVATION Agile Software Development Data Lake Solutions PIVOTAL MySQL Pivotal One SERVICES
  32. 32. 36Pivotal Confidential–Internal Use Only Pivotal Approach: An Application Centric World Infrastructure Specific JVM VM Pre-Provisioned Pool of VMs Container 1 App Server JVM, etc.. Container 2 App Server JVM, etc.. App1 Common Access Tier (App1, App2) App Server Configurations Built-in Middleware Services JVM VM App2 App Server Configurations IaaS Agnostic
  33. 33. 37Pivotal Confidential–Internal Use Only GE Capital Builds Foundation For Value Add Insights “ Critical Insights and data is deleted because it’s too expensive to store.” “We need the ability to blend data fabric, build analytics, and create applications on top of this.” “Access any internal or external data of interest through a familiar interface.” “Now we analyze Social Media to predict trends, and help dealers make decisions.”
  34. 34. 38Pivotal Confidential–Internal Use Only Use-Case: Data and PaaS Drives Business Agility Pivotal CF Operation Manager Any Infrastructure Big/Fast Data Real-time change to customer-facing application based on data analysis Deploy/Update (Private/Public)
  35. 35. 39Pivotal Confidential–Internal Use Only Spring becomes the enabler Deploy to Cloud or on premise Big, Fast, Flexible Data Data Processing, Integration Spring Data •  JPA/JDBC •  MongoDB •  Redis •  Neo4j •  GemFire •  Data REST •  Spring Hadoop •  Spring Integration •  Spring Batch •  CloudFoundry •  vCloud Suite •  Google App Engine •  Amazon Elastic Beanstalk •  CloudBees
  36. 36. 40Pivotal Confidential–Internal Use Only Data Driven: Harder Than it Sounds Operationalize Ingest Distill Interface Process Analytical Transactional Operationalize Ingest Distill Interface Process Analytical Transactional Operationalize Ingest Distill Interface Process Analytical Transactional Real Time Near Real Time Batch Predictive call routing, fraud prediction, dynamic pricing, re-marketing, stream analytics Analytic model designs, transaction analysis, trend analysis ETL, archive, trending, monthly and weekly jobs
  37. 37. 41Pivotal Confidential–Internal Use Only Data Driven: Impossible in Silos Finance Manufacturing Marketing IT Data Growth Over 60% Floods These Silos
  38. 38. 42Pivotal Confidential–Internal Use Only One Platform, Multiple Use Cases Flexibility to expand the ingestion of network data as the underlying infrastructure changes – easy expansion into 4G LTE. External data sources •  Linked-in •  Twitter •  Facebook •  Weather •  … Internal data sources •  CRM •  EDW •  Customer Portals •  … Mobile Network Infrastructure Intelligently set triggers at the edge of the network that looks for 'interesting' events that require instant action Build New Apps Integrate Existing Apps Agile approach to enable apps to access in real-time All History Current Status Predicted Intelligence Single Unified Platform •  Capture everything •  Real-time data processing •  Single version of the truth •  End-to-end visibility across different data sources •  Scalable and cost effective
  39. 39. 43Pivotal Confidential–Internal Use Only RTI In Telco Ecosystem RTI-T Integration Sanity Check Filter Enrich Transform Apply Logic OSS BSS Network Elements Application Hosting Environment for Real-time Use-case Realization Network Optimization Customer Experience Management .... Ingest DistributeProcess Route Persist Long Term Storage Operational Datastore Analytics
  40. 40. 44Pivotal Confidential–Internal Use Only Solution Architecture RTI-T Integration Sanity Check Filter Enrich Transform Apply Logic OSS BSS Network Elements Cloud Foundry Network Optimization Customer Experience Management .... Ingest DistributeProcess Route Persist Pivotal Data Lake GemFire XDADS/HAWQ
  41. 41. 45Pivotal Confidential–Internal Use Only World’s Leading Experts Pivotal Labs – Pivotal Data Labs Pivotal One Pivotal Application Suite BATCH BATCH NEAR TIME NEAR TIMEHAWQGreenplum DB Pivotal HD REAL TIME REAL TIMEGemFire XDGemFire
  42. 42. 46Pivotal Confidential–Internal Use Only Pivotal’s Opportunity Uniquely positioned to help enterprises modernize each facet of this cycle today Comprehensive portfolio of products spanning Apps, Data & Analytics Converging these technologies into a coherent, next-gen Enterprise PaaS platform
  43. 43. BUILT FOR THE SPEED OF BUSINESS
  44. 44. © Copyright 2014 Pivotal. All rights reserved. It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change. “ ”-Charles Darwin BE PREPARED FOR THE SPEED OF BUSINESS
  45. 45. BUILT FOR THE SPEED OF BUSINESS
  46. 46. © Copyright 2014 Pivotal. All rights reserved. Ÿ  What is the Pivotal platform? Ÿ  Why is it so cool? Ÿ  What amazing things have we done with it? BE PREPARED FOR THE SPEED OF BUSINESS
  47. 47. © Copyright 2014 Pivotal. All rights reserved. Evolving Enterprise Data Architecture Analytic Data Marts MPP Database Operational Intelligence In-Memory DB Run-Time Applications In-MemoryObject EnterpriseData Warehouse RDBMS Data Staging Platform TraditionalBI/Reporting Data Visualization Data Ingestion System Stream/CEP
  48. 48. © Copyright 2014 Pivotal. All rights reserved. Analytic Data Marts Operational Intelligence Run-Time Applications EnterpriseData Warehouse Data Staging Platform TraditionalBI/Reporting Data Visualization Data Ingestion System Pivotal Data Product Portfolio
  49. 49. © Copyright 2014 Pivotal. All rights reserved. Ÿ  What is the Pivotal platform? Ÿ  Why is it so cool? Ÿ  What amazing things have we done with it? BE PREPARED FOR THE SPEED OF BUSINESS
  50. 50. © Copyright 2014 Pivotal. All rights reserved. Why is the Pivotal platform so awesome? Infrastructure Independent Fast & Scalable Schema Free Easy to Use Real Time
  51. 51. © Copyright 2014 Pivotal. All rights reserved. Big Data & Data Science Decision    =    Data    +    Rules   “Big  Data”   Data   Science  
  52. 52. © Copyright 2014 Pivotal. All rights reserved. Social  Media   Commercial  &   Public  Data   Dark  Data   “Big Data” Opera7onal   Data  
  53. 53. © Copyright 2014 Pivotal. All rights reserved. Combining data sources: Example IPSQ (Quality) Owner: TS Production team Test flags from production line 1 year ~300GB APDM Owner: TS Production team Full vehicle history including IPST (technical), IPSL (logistics), IPSQ test flags and all test results. 30 years ~TBs FASTA Owner: Aftersales Dealership electronic tests Identifies early issues with cars >25TB IQS: Initial Quality Survey from JD Power Owner: R&D Survey responses from new owners after 90 days for approx 1700 vehicles Few thousand lines ~MB Social Data Owner: R&D Pulling 500MB per day from Twitter TQP Owner: Supplier management PDFs of parts spec sheets ~ 500GB
  54. 54. © Copyright 2014 Pivotal. All rights reserved. Generating value from data: Car configurator example Sales Configurations Customers Sales Configurations Customers Basic Recommendation Engine Sales Configurations Customers Advanced Recommendation Engine Sales Configurations Customers Yield Optimization Engine All car elements: •  Attribute frequencies (colors etc) •  Attribute combination frequencies For instance: •  Browsing history •  Usage patterns •  Demographic insights Ideally: •  Volumes •  Pricing •  By market •  Linkable to configurations
  55. 55. © Copyright 2014 Pivotal. All rights reserved. Traditional Systems “Big Data”“Fast Data” The value of data over time Time Value of Data ($) µs ms s hour day month year yr+ Pivotal Data Science Labs
  56. 56. © Copyright 2014 Pivotal. All rights reserved. "Pivotal is aiming for the enterprise market that's realizing that software is the biggest differentiator in any industry." — Larry Dignan, ZDNet “The number of companies that have bought into the initiative, the amount of code being contributed, the customer wins that ecosystem members are enjoying suggest that Cloud Foundry is preeminent among all the open source PaaS initiatives." — Ben Kepes, Forbes "If you're in the business of building enterprise software, scrambling to figure out what your company is doing around big data and analytics, mobile and the cloud, then there's a fair chance you'll want to pay attention to Pivotal." — Arik Hesseldahl, WSJD But don’t just take our word for it…
  57. 57. © Copyright 2014 Pivotal. All rights reserved. Ÿ  What is the Pivotal platform? Ÿ  Why is it so cool? Ÿ  What amazing things have we done with it? BE PREPARED FOR THE SPEED OF BUSINESS
  58. 58. © Copyright 2014 Pivotal. All rights reserved.© Copyright 2014 Pivotal. All rights reserved. What does traffic data look like?
  59. 59. © Copyright 2014 Pivotal. All rights reserved. …like this?
  60. 60. © Copyright 2014 Pivotal. All rights reserved. …or this? (Note: This is the least offensive topic cluster in our Twitter data!)
  61. 61. © Copyright 2014 Pivotal. All rights reserved. Velocity by Time of Day
  62. 62. © Copyright 2014 Pivotal. All rights reserved. 0 50 100 150 200 0.0000.0050.0100.0150.0200.0250.030 density Link 1000064869 km/h Velocity Distribution
  63. 63. © Copyright 2014 Pivotal. All rights reserved. Gaussian Mixture Model 0 50 100 150 200 0.0000.0050.0100.0150.0200.0250.030 density Combined Component 1 Component 2 Component 3 Component 4 Link 1000064869 km/h 0 50 100 150 200 0.0000.0050.0100.0150.0200.0250.030 density Combined Component 1 Component 2 Component 3 Component 4 Link 1000064869 km/h 0 50 100 150 200 0.0000.0050.0100.0150.0200.0250.030 density Combined Component 1 Component 2 Component 3 Component 4 Link 1000064869 km/h 0 50 100 150 200 0.0000.0050.0100.0150.0200.0250.030 density Combined Component 1 Component 2 Component 3 Component 4 Link 1000064869 km/h
  64. 64. © Copyright 2014 Pivotal. All rights reserved. Decision Trees Example 0 50 100 150 200 250 0.000.010.020.030.04 density Combined Component 1 Component 2 Component 3 hour >= 14 hour < 20 weekday = 1,2,3,4,5 weekday = 1,2,3,4,5 weekday = 1,2,3,4,5 nextlink = −1,100000 < 14 >= 20 6,7 6,7 6,7 100002 1 0.47 1 0.69 1 0.76 1 0.81 3 0.85 2 0.55 2 0.73 3 0.65 2 0.56 2 0.63 2 0.68 3 0.49 3 0.73
  65. 65. © Copyright 2014 Pivotal. All rights reserved. Sneak Peek at our TfL Data Demo Ÿ  Used the freely accessible TfL data for a demo Ÿ  Shows # of active disruptions over different days in London Ø  Rush hour effects visible Ø  Nights are more quiet, but more disruptions on weekend nights
  66. 66. © Copyright 2014 Pivotal. All rights reserved. Kaiser Permanente Hackathon Insight Patient Application
  67. 67. © Copyright 2014 Pivotal. All rights reserved. Kaiser Permanente Hackathon
  68. 68. © Copyright 2014 Pivotal. All rights reserved. Text Analytics for Churn Prediction Customer A major telecom company Business Problem Reducing churn through more accurate models Challenges Ÿ  Existing models only used structured features Ÿ  Call center memos had poor structure and had lots of typos Solution Ÿ  Built sentiment analysis models to predict churn and topic models to understand topics of conversation in call center memos Ÿ  Achieved 16% improve in ROC curve for Churn Prediction
  69. 69. © Copyright 2014 Pivotal. All rights reserved. Predicting Commodity Futures through Twitter Customer A major a agri-business cooperative Business Problem Predict price of commodity futures through Twitter Challenges Ÿ  Language on Twitter does not adhere to rules of grammar and has poor structure Ÿ  No domain specific label corpus of tweet sentiment – problem is semi-supervised Solution Ÿ  Built Sentiment Analysis and Text Regression algorithms to predict commodity futures from Tweets Ÿ  Established the foundation for blending the structured data (market fundamentals) with unstructured data (tweets)
  70. 70. © Copyright 2014 Pivotal. All rights reserved. Network Intrusion Detection Customer One of the worlds largest health care providers Business Problem Detect advanced cyber threats in a large heterogeneous environment and to reduce malware ‘free-time’ Challenges Ÿ  Covert threats employ advanced techniques to bypass traditional security appliances. Ÿ  Last year 416 days was the median number before detection on a compromised network. (Source: Mandiant) Solution Ÿ  Built a new behavioral intrusion detection framework based on machine learning, graph theory and security research Ÿ  Designed operational components of their next generation SIEM. Firewall •  Engineered a full featured custom social graph based intrusion model •  Identified breaches not a single security product they owned was able to detect
  71. 71. © Copyright 2014 Pivotal. All rights reserved. Website Classification Customer An internet domain name service provider Business Problem Create a multilevel website classification that groups websites by function rather than topic Challenges Ÿ  Complex unstructured data format required several transformations Ÿ  Model needed to be language independent, so classic language features could not be used Solution Ÿ  New hierarchical model resulted in reducing the number of previously ‘unclassified’ websites by ~75% Ÿ  Created an in-database analytics framework for unsupervised learning models and enabled real- time validation of current production model Map of Domains
  72. 72. © Copyright 2014 Pivotal. All rights reserved. Pivotal’s Platform Uniquely positioned to help enterprises modernize each facet of this cycle today Comprehensive portfolio of products spanning Apps, Data & Analytics Converging these technologies into a coherent, next-gen Enterprise PaaS platform
  73. 73. BUILT FOR THE SPEED OF BUSINESS
  74. 74. Infrastructure for Data Driven Workloads
  75. 75. 79© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. PLATFORM AS A SERVICE VIRTUAL WORKSPACE BUSINESS DATA LAKE SECURITY ANALYTICS SOFTWARE DEFINED DATA CENTER SERVICEPROVIDER ENTERPRISEDATACENTERA Unique Federation Of Companies Delivering The Software-Defined Enterprise. Solutions & Choice. Partners BIG DATA SOLUTIONS PLATFORM AS A SERVICE AGILE APPLICATION DEVELOPMENT SOFTWARE-DEFINED DATA CENTER HYBRID-CLOUD, MOBILITY INFORMATION INFRASTRUCTURE CONVERGED INFRASTRUCTURE ADVANCEDSECURITY
  76. 76. 80© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Converged Infrastructures Partners vCloud Hybrid Service Hybrid Cloud Managed As One Cloud Federation Solutions 5 Solutions Enabling The Software-Defined Enterprise Next Gen Cloud Apps PLATFORM AS A SERVICE SOFTWARE-DEFINED DATA CENTER VIRTUAL WORKSPACE BUSINESS DATA LAKE SECURITY ANALYTICS SOFTWARE-DEFINED DATA CENTER VIRTUAL WORKSPACE PLATFORM AS A SERVICE SECURITY ANALYTICS BUSINESS DATA LAKE
  77. 77. 81© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Hadoop Overview Hadoop is an open-source framework from Apache that allows for parallel batch processing of very large data sets MapReduce is the Hadoop process that divides the workload so multiple devices can process it HDFS is the file system for the data. It provides data protection and locality with multiple mirrors (usually 3 times)
  78. 78. 82© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Isilon Scale-Out NAS Architecture OneFS Operating Environment Intra-cluster Communication Layer Client/Application Layer Ethernet Layer SingleFS/Volume CIFSNFS FTPHTTP HDFS for Hadoop REST for Object Gig-e 10 Gig-e Network Protocols
  79. 79. 83© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Traditional “Share-Nothing” Hadoop Existing Virtualized Data Center SHARE-NOTHING Hadoop Infrastructure Unstructured Data 1 Existing Primary Storage 2 3 4 2 3 4 2 3 4 2 3 4 •  Hadoop replication count (R=3) means 4 data copies •  Data has to copy to the Hadoop cluster before analysis can begin (Time to Results) How will you maintain data consistency when a file changes on your primary storage?
  80. 80. 84© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Existing Virtualized Data Center Existing Primary Storage Isilon “Share-Everything” Hadoop 1 Ÿ  Start using Hadoop NOW with unused processing and RAM available in your VMware environment Ÿ  No replication required (Use your existing data) Ÿ  Access to same data via NAS and HDFS protocols Ÿ  Time to results extremely fast using already existing data with NO COPIES or wasted $ Analysis Can Begin with the 1st VM New Hadoop Compute Nodes Unstructured Data Use Native HDFS Protocol
  81. 81. 85© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Ethernet Job Tracker Task Tracker DataNode 2nd NameNode NameNode Hadoop Architecture - Traditional R (RHIPE) Mahout Hive HBasePIG NameNode Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node
  82. 82. 86© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Ethernet R (RHIPE) PIG Mahout Hive HBase Job Tracker Task Tracker DataNode Compute Node Compute Node Compute Node Compute NodeCompute Node Compute Node NameNode Hadoop Architecture with Isilon name node name node name node name node datanode
  83. 83. 87© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. HDFS SMB, NFS, HTTP, FTP, HDFS Node reply Node reply Node reply Node reply NameNode Data Support for Multiple Hadoop Distributions name node name node name node name node datanode NFS SMB SMB NFS MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce
  84. 84. 88© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Dependent Scaling Traditional Hadoop HDFS Isilon HDFS Ÿ  Storage to Compute ratio is fixed Ÿ  Scaling compute means scaling capacity Ÿ  Difficult to provide QoS Ÿ  Compute upgrade is a forklift Ÿ  Scale compute independent of storage Ÿ  Achieve optimal performance balance even as workloads evolve Ÿ  No data migrations, ever! Ÿ  Add new performance as hardware evolves Compute Storage Required performance/ capacity Required Hadoop Cluster Nodes
  85. 85. 89© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Independent Scaling Traditional Hadoop HDFS Isilon HDFS Ÿ  Storage to Compute ratio is fixed Ÿ  Scaling compute means scaling capacity Ÿ  Difficult to provide QoS Ÿ  Compute upgrade is a forklift Ÿ  Scale compute independent of storage Ÿ  Achieve optimal performance balance even as workloads evolve Ÿ  No data migrations, ever! Ÿ  Add new performance as hardware evolves Compute Storage Required performance/ capacity Required Hadoop Cluster Nodes
  86. 86. 90© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Snapshot & Version Control Before After Ÿ  Traditional HDFS does not have replication Ÿ  No Snapshotting of data Ÿ  Loss of version control Ÿ  Not designed for Mission Critical Ÿ  Full SnapshotIQ integration identifies changes Ÿ  Multi-threaded, Multi-Node Scale- Out replication Ÿ  Improved RPO/RTO for business continuity Ÿ  Geo-replicated Hadoop!
  87. 87. 91© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Data Center Network Time-to-Results Data Copy Analysis In-Place Analysis Existing Primary Storage Hadoop on a Stick Have you ever copied 100TB from Primary Storage to a Hadoop system? How long does it take to copy 100TB from one place to another over a 10Gb link? >24 Hours Data Center Network Existing Primary Storage Hadoop Compute Nodes Reading relevant data to analysis
  88. 88. 92© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. A real world example Cost Comparison Customer requirements Ÿ  640 TB raw capacity Ÿ  64 Compute (1 per 10TB) DAS Option Ÿ  14.8% usable capacity/DataNode Ÿ  38 racks of servers Isilon Option Ÿ  10 Racks (including Compute) Ÿ  65% less expensive than DAS Hadoop on Isilon is often significantly less costly! Network Hadoop Licensing Management Config Installation Energy Isilon Servers $ 0 $ 1M $ 3M $ 4M $ 5M $ 6M $ 2M
  89. 89. 93© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Efficiency and flexibility The Isilon Advantage for Hadoop Ÿ  No data ingest necessary Ÿ  Eliminate 3x mirroring Ÿ  Over 80% storage utilization Ÿ  SmartDedupe to further reduce storage needs by up to 30% Ÿ  Scale compute and data independently Ÿ  Multi-protocol access Ÿ  Simultaneous multi-distribution support Ÿ  Ability to leverage VMware vSphere Big Data Extensions to reduce datacenter footprint, power, space, and cooling
  90. 90. 94© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Data protection and security The Isilon Advantage for Hadoop Ÿ  Highly resilient architecture Ÿ  Robust data protection options (DR, snapshots, etc.) Ÿ  Eliminate NameNode single point of failure Ÿ  SEC 17a-4 compliant WORM Ÿ  Kerberos authentication Ÿ  Hadoop multi-tenancy
  91. 91. 95© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. How Do I Start Using Hadoop? EMC Hadoop Starter Kit (HSK) Ÿ  Visit https://community.emc.com/docs/DOC-26892 Ÿ  Watch the demo video Ÿ  Follow the instructions to deploy Hadoop to your existing Isilon and VMware infrastructure in about an hour Ÿ  There are customized HSKs for Apache, Pivotal, Cloudera, and Hortonworks
  92. 92. 96© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

×