Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Modern infrastructure for business data lake

1,487 views

Published on

Published in: Business

Modern infrastructure for business data lake

  1. 1. 1© Copyright 2015 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
  2. 2. 2© Copyright 2015 EMC Corporation. All rights reserved. Scale-out Converged Solutions for Analytics Julianna DeLua, VCE Dan Beres, EMC Isilon
  3. 3. 3© Copyright 2015 EMC Corporation. All rights reserved. AGENDA  History of Analytic Infrastructure  Why Scale-Out, Converged Solutions  Analytic Workflow  vHadoop  Test Results  Customer Use Cases and Feedback  Conclusion / Next Steps
  4. 4. 4© Copyright 2015 EMC Corporation. All rights reserved. A Brief History of Analytic Infrastructure
  5. 5. 5© Copyright 2015 EMC Corporation. All rights reserved. VCE Confidential© 2015 VCE Company, LLC. All rights reserved. 2013 – Shared infrastructure? Let me know when you know “for sure” it works. In the meanwhile, a few industry pioneers / early adopters start POC with EMC / VCE 2014 – Extend converged system benefits with Isilon scale out – augment enterprise app/data with Hadoop, Splunk, no-SQL. Great performance! 2015 –Internet of things initiatives accelerate. Rapid technological advancements with architectural flexibility - Vscale
  6. 6. 6© Copyright 2015 EMC Corporation. All rights reserved. The Private/Public Cloud “Infinite, inexpensive compute and storage” ENABLED BY Agile Product Development Culture ANATOMY OF A MODERN DIGITAL BUSINESS CAPABILITIES NEEDED BUSINESS DRIVERS • New systems of engagement • New business models • Internet of Things Platform Data Algorithms (Code) “Catch people or things in the act and affect the outcome” = $$$ Compelling, Unique User Experience/Model Existing SystemsA MAJOR PRESSING CHALLENGE Analytics/BI • How do we architect for agile data-driven business? • Can we manage big, fast data? • Value driven CIO • Meet future business needs while simplifying and taking cost out of legacy? • Avoid lock-in again? • People and organization CEO/CMO • How do we become an agile, digital business? • Anticipate and delight customers? • Partner collaboration • Where/how do we start? A MAJOR PRESSING CHALLENGE
  7. 7. 7© Copyright 2015 EMC Corporation. All rights reserved.  Sub-optimal environment—data locked in high volume, variety, or velocity.  Lack of service-enablement—difficulties in optimizing virtualized, multi-tenant service approach.  Compliance/security exposure—lack of encryption, exposure, and data loss.  Limited standardization—not using data center standards.  Downtime/SLA issues—not readily configurable to handle mixed workloads.  System utilization—inefficient islands of storage and systems, inability to reuse data for multiple solutions.  Long cycles for accessing and sharing information locked in unstructured data.  Cannot rapidly create value via technology- enabled XaaS.  Explicitly demonstrate security, compliance, and governance.  Inability to plan system progression that combine structured/unstructured – exacerbating silos of appliances and hardwares  Insufficient posture against outages and peak period of IT use.  Escalating deployment management and maintenance costs for growing data. CUSTOMER PAINS TECHNICAL PROBLEMS Typical Customer Pains and Technical problems
  8. 8. 8© Copyright 2015 EMC Corporation. All rights reserved. CONVERSATIONS LEAD TO PLATFORM EVOLUTION Conversations Downtime and response time issues missing business SLA • Increased flash use • Continuous need for migration • Network scale points • Data mobility • Hadoop, Splunk, PaaS, Cassandra, MongoDB, Legacy DB • Aggregate/disaggregate pool of resources • Control required for application proliferation Faster time to drive value from innovation multitude of applications • Mobile and social offers • Turn 360 degree insight to customer acquisitions • Fulfillment, inventory and customer management AWS is costing too much but business wants faster go live and flexibility
  9. 9. 9© Copyright 2015 EMC Corporation. All rights reserved. VCE VSCALE™ ARCHITECTURE FLEXIBLE SCALE-OUT THROUGH EXPANDED MULTI-SYSTEM ARCHITECTURE 9 MPP DB Hadoop PROD & DR In memory DBBI / DW Enterprise App - SAP Microsoft Email, collaboration Hadoop POC Pivotal Cloud Foundry Video Surveillance
  10. 10. 10© Copyright 2015 EMC Corporation. All rights reserved. Edge & Central Analytics Workflow Swift HTTP RAN | DAV Isilon OneFS Easy to Grow Manage & Administer Additional Clients to More Content Multiprotocol Access to Same Data Log OneFS …….. FTP SyncIQ SyncIQ HDFS NFS SMB HDFS Glance External WAN Internal WAN Oracle NFS Mediation App Server
  11. 11. 11© Copyright 2015 EMC Corporation. All rights reserved. vHadoop+Isilon Install & Deployment Guide
  12. 12. 12© Copyright 2015 EMC Corporation. All rights reserved. “Fix These Problems….Prove it Out!”  Expensive and Won’t Scale – Hundreds of Servers to support less than 2PB Usable Storage (1:7 ratio) – “We have a guy with shopping carts walking down the rows replacing parts” – Additional Staging Area for Data before Ingesting into Hadoop – Can’t Scale Storage without Compute – Locked & Not Elastic  Lacks Enterprise Features – No Cost Effective Data Redundancy – Limited File-system Security, only Simple Authentication – Multiple Points of Failure – Maintaining Hadoop “PODs” involves significant downtime  Time To Results – Requires Significant time to ingest and copy Data – Building Production Hadoop “PODs” can take months – Network Infrastructure Saturation & Expense
  13. 13. 13© Copyright 2015 EMC Corporation. All rights reserved. NFS NFS SMB SMB SWIFT HDFS SWIFT RAN RAN FTP EMC Isilon Enabled Workflows
  14. 14. 14© Copyright 2015 EMC Corporation. All rights reserved. HDFS SMB, NFS, HTTP, FTP, HDFS node info node info node info node info node info node info node info node info node info Node reply Node reply Node reply Node reply Node reply Node reply Node reply Node reply Node reply file file file file file file file file Node reply Node reply Node reply Node replyNFS NFS SMB SMB name node name node name node name node name node name node name node MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce MAP Reduce datanodedatanode Isilon Original Data Original Data OneFS Compute Data 1X EMC Isilon Enabled Hadoop Name node Data Compute
  15. 15. 15© Copyright 2015 EMC Corporation. All rights reserved.  Created and tuned Hadoop VMs to maximize Throughput – >90% Utilization of CPUs for Compute – Memory footprint reduced (MEM Page sharing across VMs) – Hadoop 2.0 with YARN does not need FLASH for HDFS  Incremental testing to validate Scalability – Validated 2:1 ratio Compute Node to Isilon Node (can also support 3:1) – 2 VMs per Compute Node for Optimal Performance on Dual Socket – Linear Scalability in performance by incrementally adding more compute  Validated Enterprise/Production Ready - Security Greater with AD Authorization and Access No need to anonymize data Whitepaper Created - Deployment & Upgrade Of Hardware and Software in hours not days/weeks - Validated reduced data-center footprint & environmentals with UCS Blade Servers, vHadoop & Isilon Hadoop Test Findings
  16. 16. 16© Copyright 2015 EMC Corporation. All rights reserved. 1TB Hadoop Job Cycle Comparison Isilon Significantly Reduces Time To Results Traditional Hadoop+DAS 17:32 30:18 20:5020:50 Isilon Enabled vHadoop 18:51 Terasort Test on 1TB DAS Isilon Benefit MB/s Per Node 55.00 85.00 55% Compute Min 30.18 18.51 -39% TTR Min 89.30 18.51 -79% Isilon Advantages • Eliminates All Data Movement • Allows for Virtualized Compute • Significantly Less Cost • 79% Faster TTR! TTR- 89.3 Minutes!
  17. 17. 17© Copyright 2015 EMC Corporation. All rights reserved. EMC Isilon – Only Security Compliant Datastore for Hadoop  Highly resilient architecture – Robust data protection options (DR, Snapshots, SyncIQ) – Clustered Multi-Point Name Node with Kerberos – SEC 17a-4 compliant WORM – Hadoop multi-tenancy with dedicated network and access zones  Hadoop on Isilon provides full ACLs for NFS, SMB, and HDFS – Each file/ directory has an Access Control List (ACL) consisting of one or more Access Control Entries (ACE). – Each ACE assigns a set of permissions (read, write, delete) to a specific security identifier (user or group). – Deny ACEs which remove permissions and override any “Allow ACEs”  Standard Hadoop only provides basic Unix-type “Simple” permissions – Effective permissions are determined based on the file owner (single user, single group, other/world) – Read and/or write permissions can be assigned to the owner, the group, and “everyone else” – What do you do when you need to assign read access to multiple groups (A, B & C)? – What do you do when you need to assign read access to the group A and read+write access to group B? – How do you maintain permissions when files are copied from Windows NTFS shares?
  18. 18. 18© Copyright 2015 EMC Corporation. All rights reserved. Supporting Documentation
  19. 19. 19© Copyright 2015 EMC Corporation. All rights reserved. HCFS Certification: Process Detail Certification Step Duration Partner Prep Partner defines HDP test matrix (platforms, HDP components, HDFS APIs, HDP version and partner product version) Partner provides sample product to Hortonworks so Engineering and Field teams are familiar with partner technology Testing HDFS Test Suite training - at Hortonworks HQ and online Partner deploys, runs, analyzes, and reports HDFS Test Suite with technical support from Hortonworks HDP Core Test Suite (Map/Reduce, YARN, Tez and Hbase, Hive and Pig) training – at Hortonworks HQ and online Partner deploys, runs, analyzes, and reports HDP Core Test Suite with technical support from Hortonworks Partner deploys, runs, analyzes, and reports on remaining HDP Component Test Suites with technical support from Hortonworks Testing time allocation Documentation Joint review of test suite execution results Hortonworks creates functional gap analysis document, need partner sign off Documentation time allocation Validation Hortonworks validates test suite execution results and certifies HCFS for specified HDP version and partner product version Total certification time allocation 90-180 days
  20. 20. 20© Copyright 2015 EMC Corporation. All rights reserved. Scale-out Isilon for Scale-out Hadoop Compute Nodes  Isilon is a scale-out system; Hadoop HDFS is partially similar  HDFS on Isilon functions as a Parallel file system  Each compute node performs I/O on every Isilon node in the Rack  I/O bandwidth and storage capacity can be increased linearly simply by adding Isilon nodes  Compute can be increased or decreased on the fly and can easily be virtualized  With a mesh network that is faster than the disks, data locality is irrelevant Isilon Nodes
  21. 21. 21© Copyright 2015 EMC Corporation. All rights reserved. Hadoop Architecture – Traditional DAS Dozens of Hadoop Racks Requires Significant Investment Network Infrastructure Rack Ethernet Switch Compute Shuffle+HDFS SATA 10+ Gbps Core Ethernet Switch Compute 10 Gbps … Shuffle+HDFS Compute… Shuffle+HDFS Rack Ethernet Switch Compute Shuffle+HDFS SATA 10+ Gbps Compute 10 Gbps Shuffle+HDFS Compute… Shuffle+HDFS The ratio of compute and disk space/performance is fixed. Non-local HDFS I/O (30- 90% of HDFS I/O) will go through Ethernet. Local disk usage is shared between shuffle I/O (60% of all I/O during terasort) and HDFS I/O. Core Network Switches Are Additional Cost for Hadoop+DAS (more Network traffic required)
  22. 22. 22© Copyright 2015 EMC Corporation. All rights reserved. Hadoop Architecture – Isilon for HDFS Reduced traffic across the Core Ethernet switch--HDFS traffic will only travel within a rack and across IB. Isilon InfiniBand Switch Rack Ethernet Switch Compute Shuffle SATA 10+ Gbps 10 Gbps Core Ethernet Switch Compute Shuffle 10 Gbps … … IB Rack Ethernet Switch Compute Shuffle SATA 10 Gbps Compute Shuffle 10 Gbps … … IB … The number of compute and Isilon nodes can be adjusted independently to achieve the optimal ratio of compute and I/O bandwidth HDFS I/O ALWAYS comes through a rack-local Isilon node which collects data blocks from all other Isilon nodes across the InfiniBand fabric (used only for MR copy phase) 10+ Gbps (used only for MR copy phase) Shuffle I/O (65% of all I/O during terasort) remains on local storage. Isilon HDFS Isilon HDFS Isilon HDFS Isilon HDFS
  23. 23. 23© Copyright 2015 EMC Corporation. All rights reserved. Traditional Hadoop - Layers
  24. 24. 24© Copyright 2015 EMC Corporation. All rights reserved. Isilon+Hadoop – NO Layers
  25. 25. 25© Copyright 2015 EMC Corporation. All rights reserved. ESG LAB REVIEW – VBLOCK SYSTEMS WITH VCE TECHNOLOGY EXTENSIONF FOR EMC ISILON • Objectives • Underscore business challenges and opportunities for progressing to enterprise Hadoop • Establish requirements to be ready for production – Extensibility, Governance, Security, Availability, Performance and Multi-Use • Perform benchmarks Vblock System 340 with EMC Isilon with Teragen suite “By leveraging an industry-proven Integrated computing platform ( ICP) in VCE Vblock Systems and combining it with EMC Isilon and VMware vSphere Big Data Extensions, organizations get a fully integrated platform that meets and grows with their big data and analytics requirements. — Tony Palmer, Senior Lab Analyst, ESG
  26. 26. 26© Copyright 2015 EMC Corporation. All rights reserved. 0 200 400 600 800 1,000 1,200 1,400 TeraGen TeraSort TeraValidate JobDuration(seconds) Comparing Performance of Traditional Hadoop to VCE Vblock System with EMC Isilon (TeraSort Suite) 16 Traditional Hadoop Nodes (combined Compute and DAS) 16 VCE Compute Nodes and EMC Isilon Storage ESG LAB OBSERVATION ON TERAGEN BENCHMARKS
  27. 27. 27© Copyright 2015 EMC Corporation. All rights reserved. VCE CUSTOMER BENEFITS
  28. 28. 28© Copyright 2015 EMC Corporation. All rights reserved. VCE LOWERS OPERATIONAL COSTS 0 0 IT Staff Cost Facilities Infrastructur e After Vblock System Deployment Before Vblock System Deployment 41% 13% 38% IDC Research Study OF VCE CUSTOMERS, SEPTEMBER 2013
  29. 29. 29© Copyright 2015 EMC Corporation. All rights reserved. GAS AND UTILITY LEADER • Situation • Largest provider of gas and electric energy in the US. Innovate to drive clean, sustainable future. Better management of costs and risks using predictive models. Operational improvement and compliance management. Expected data growth and application complexity with smart meter data management. • Solution • Vblock System 340 to be used for private and public cloud in the hybrid cloud model to keep custom applications and sensitive data in-house while pushing others to public. Initiated with Pivotal to become software led company with Pivotal CF. • Anticipated Business Benefits – Increase agility for applications deployment using Platform as a Service (PaaS) and big data solution – Support 600+ new applications planned annually faster at lower cost – Improve disaster recovery readiness and data protection – Lower costs and detect issues by enabling field personnel – Increased customer satisfaction including cost savings via meter data Drive to Clean energy transformation while managing cost and risk Differentiators: Suited to Hybrid Cloud Model and future expansion – upgrades and scaling. Extending VCE-Pivotal-EMC relationship while being open to tap eco-system
  30. 30. 30© Copyright 2015 EMC Corporation. All rights reserved. FOOD AND BEVERAGE GIANT • Situation • Global food and beverage conglomerate to accelerate financial reporting and reflect customer behaviors. Seeking a better alternative to third party cloud base model. Operational improvement and customer intimacy with leading brand recognition throughout the world. Data loading, processing and end-user impact crucial • Solution • Use Vblock System for a shuffle and extend with VCE technology extension for EMC Isilon to run Pivotal Hadoop and HAWQ. For Pivotal Greenplum, use VCE technology extension for compute (Cisco C240). Bring some of the core applications to the corporate IT. • Anticipated Business Benefits – Streamline financial reporting process for goods coming from multiple geographies while keeping up to data and support broadening user query – Exploit mobile applications for customer preferences and inventory management – Support product launches and marketing campaigns based on consumption logs, brand preferences and social media – Improve disaster recovery readiness and data protection – Start with one project, gain momentum while ensuring readiness for the future Financial reporting and marketing analysis Back to Private Cloud Differentiators: Ability to match architecture to workloads. Reuse existing environment. Extensible for future growth
  31. 31. 31© Copyright 2015 EMC Corporation. All rights reserved. WHY VCE AND EMC FOR SCALE-OUT CONVERGED ANALYTIC SOLUTION? • Adaptable, modular, and mission critical • Incremental scaling with your demand from the broad VCE and EMC portfolio • Pre-tested, validated and certified by EMC and VCE • Exploit end-to-end analytics on the SAME VCE and EMC platform • Take advantage of broadening EMC partner eco-system • Contact your EMC or VCE representatives • Contact : EMC – dan.beres@isilon.com VCE - julianna.delua@vce.com

×