Your SlideShare is downloading. ×
0
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
EMC Big Data Solutions Overview
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

EMC Big Data Solutions Overview

1,164

Published on

Overview of emerging EMC Big Data solutions using Hadoop, and Splunk

Overview of emerging EMC Big Data solutions using Hadoop, and Splunk

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,164
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
112
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Enterprise Data Warehouse Offload – Enterprises that have large amounts of data in expensive data warehouses (Teradata, Netezza and others). Customers are paying over $30k per TB to EDWs today. RainStor has co-engineered data movement solutions for Teradata offload. RainStor’s compression + Isilon’s utilization rate = over 90% cost savings for customer.Analytical Archive (Hadoop) – Customers looking to create a Big Data analytical platform (mostly with Hadoop). RainStor’s compression + Isilon’s scale-out attributes and value propositions (including security and SQL access) fit well in a centralized data archiving architecture that scales with ease - with or without Hadoop. Compete effectively against Hadoop on DAS environments with RainStor + Isilon scale-out NAS.Compliance Archive – There are a number of regulations (SEC 17a-4, Basel, Dodd-Frank, HIPAA) customers must adhere to depending on the vertical and country. Customers (FS in particular) are fined heavily for not keeping historical data for long enough and accessible at all times. RainStor + Isilon is the ONLY solution in the market with built-in Compliance and Audit functionalities.Tape Avoidance/Replacement– Most enterprise customers have data stored on Tape. Tape is a convenient & cost-effective archival strategy due to ever decreasing IT budgets. However data on tape is inaccessible and inefficient. Move PBs of tape data onto RainStor + Isilon at a super reasonable cost. Tape - $0.04/GB, RS + Isilon - $0.06-0.08/GB
  • At Splunk, our mission is to make machine data accessible, usable and valuable to everyone. Andthis overarching mission is what drives our company and product priorities.
  • Splunk now has more than 850 employees worldwide, with headquarters in San Francisco and 14 offices around the world.Since first shipping its software in 2006, Splunk now has over 6,000 customers in 90+ countries. These organizations are using Splunk software to improve service levels, reduce operations costs, mitigate security risks, enable compliance, enhance DevOps collaboration and create new product and service offerings. Please always refer to latest company data found here: http://www.splunk.com/company.
  • Splunk is the leading platform for machine data analytics with over 6,000 organizations using Splunk (as of 9/1/13) – for data volumes ranging from tens of GBs to tens of TBs to over 100 TBs of data PER DAY.Splunk software reliably collects and indexes all the streaming data from IT systems, technology devices and the Internet of Things in real-time - tens of thousands of sources in unpredictable formats and types. Splunk software is optimized for real-time, low latency and interactivity.Organizations use Splunk software and their data the following ways:1. Find and fix problems dramatically faster2. Automatically monitor to identify issues, problems and attacks3. Gain end-to-end visibility to track and deliver on IT KPIs and make better-informed IT decisions4. Gain real-time insight from operational data to make better-informed business decisionsThis is described as Operational Intelligence: visibility, insights and intelligence from operational data.
  • Isilon has many different uses in Healthcare. While there are many healthcare specific applications such as PACS and VNA, there are many other scenarios that will benefit from the scale-out capabilities of Isilon including more horizontal applications such as files shares and video surveillance.EMC Syncplicity and Isilon can work together to provide the unmatched flexibility and ease of use of EMC Synplicity’s file sync and sharing technology with a secure, on-premise storage infrastructure. Data remains on-premise on Isilon and subject to all IT data governance and protection policies. Files are not replicated in the cloud and remain under IT control. This allows the customer to remain HIPAA compliant and allow their employess to have access to data across their devices anytime, anywhere.With Pivotal HD, combined with EMC Isilon's native integration of the Hadoop Distributed File System (HDFS) protocol, customers have an enterprise-proven Hadoop solution on a scale-out NAS architecture. This powerful combination succeeds in reducing the complexities traditionally associated with Hadoop deployments and allows enterprises to easily extract business value from unstructured data.An emerging area in healthcare is the use of Clinical Next Generation Sequencing. NGS has been around in a research setting for decades. However, the cost is reaching a point where it is becoming viable in a clinical setting. Leading institutions, children’s and specialty hospitals are starting to deploy NGS.For this presentation, we will focus on the top 2 sections.
  • Naively converting slave nodes to VMsPlace NodeManager and DataNode JVMs in different VMsEach can scale independentlyCommon storage layer
  • Transcript

    • 1. EMC Big Data Solutions Overview © Copyright 2014 EMC Corporation. All rights reserved. 1
    • 2. Big Data - Why do I care? Digital universe is expanding rapidly – – 44x to 50x data expansion this decade By 2020 40ZB (40 trillion GB) ▪ 1.7 MB of new information will be created for each and every human being on the planet -- every second of every day. 41% growth of IoT, M2M data – – % of data generated about us exploding % of data tagged and analyzed exploding – 22% from China alone – – servers will increase 10x Information directly managed by enterprises will grow 14% Data under security governance will grow 40% Number of IT professionals is expected to grow by only a factor of 1.5x by 2020. Emerging Markets +62% of data IT challenges: – – © Copyright 2014 EMC Corporation. All rights reserved. 2
    • 3. Big Data Challenges for IT Complexity – Multiple Hadoop distributions (Apache, Cloudera, Hortonworks, Pivotal) Costs – Acquisition & Operations Security & Governance – Finance SEC17a-4, HIPPA – ISO – Audit Big Data is more than Hadoop – Use familiar analytics tools © Copyright 2014 EMC Corporation. All rights reserved. 3
    • 4. EMC Hadoop Starter Kit © Copyright 2014 EMC Corporation. All rights reserved. 4
    • 5. EMC Starter Kit for Hadoop Simple, Easy, Cost Effective Create simplified process to get started with Hadoop: – 4-8 node cluster – Automated, repeatable deployment – Leverage existing infrastructure investment Success Criteria: – Low, no new cost – 2 hour customer deployment – Make it easy to leverage familiar, robust enterprise infrastructure © Copyright 2014 EMC Corporation. All rights reserved. 5
    • 6. EMC Hadoop Starter Kit EMC-VMware Deployment Guide – Enable HDFS on Isilon cluster – Deploy Cloudera compute cluster – Deploy Hortonworks compute cluster – Deploy PivotalHD compute cluster – Deploy Apache compute cluster – Test data set – Ulysses with Map Reduce process – Collateral available through ECN, blogs, and twitter Running deployment in OIL for demo’s, Pilots EMC vLab created – PivotalHD with VMware, EMC Isilon © Copyright 2014 EMC Corporation. All rights reserved. 6
    • 7. EMC Hadoop Starter Kit How do I get Free access to Hadoop Starter Kit? • Type “EMC hadoop Starter kit” into google • • • • • https://community.emc.com/community/connect/everything_big_data https://community.emc.com/docs/DOC-26892 http://theruddyduck.typepad.com/ https://www.youtube.com/watch?feature=player_embedded&v=MtBRbTeJbZM https://www.youtube.com/watch?feature=player_embedded&v=1Lch5e3wGtA Key Data Sets: • Close to 4300 views! • HSK Downloads: • Pivotal – 410 • Cloudera – 261 • HortonWorks – 275 • Apache – 310 • Over 150 Isilon HDFS license’s deployed world wide! © Copyright 2014 EMC Corporation. All rights reserved. 7
    • 8. EMC ViPR with HDFS © Copyright 2014 EMC Corporation. All rights reserved. 8
    • 9. VCE VblockTM Turnkey Solution for Big Data and Analytics VMware vSphere including Big Data Extension (BDE) Cisco Unified Computing System (UCS) servers Cisco Data Center and Cloud Networking (DCN) portfolio EMC Symmetric VMAX, VNX and Isilon EMC Avamar, Data Domain, VPLEX, RecoverPoint © Copyright 2014 EMC Corporation. All rights reserved. 9
    • 10. VCE VblockTM Converged Platform for Big Data and Analytics © Copyright 2014 EMC Corporation. All rights reserved. 10
    • 11. Big Data Challenges for IT Complexity – Multiple Hadoop distributions (Apache, Cloudera, Hortonworks, Pivotal) Costs – Acquisition & Operations Security & Governance – Finance SEC17a-4, HIPPA – ISO – Audit Big Data is more than Hadoop – Use familiar analytics tools © Copyright 2014 EMC Corporation. All rights reserved. 11
    • 12. Jyothi Swaroop Director, Product Marketing & Alliances 12 © Copyright 2014 EMC Corporation. All rights reserved. 12
    • 13. RainStor & EMC Isilon Solution & Use-case Analytical Archive:  Enterprise Data Warehouse Offload Compliance Archive:  Tape Avoidance/Replacement Enterprise Data First SQL Compatible, Enterprise-grade Database to run on Isilon Scale-out NAS (with Hadoop or not). 13 © Copyright 2014 EMC Corporation. All rights reserved. 13
    • 14. RainStor Architecture © Copyright 2014 EMC Corporation. All rights reserved. 14
    • 15. Hadoop Data Security • Authentication – RBAC • Authorization – ACL’s by user • Encryption – Data at Rest • Audit Trail – logs data access by user for audit • Immutability – data can never changed © Copyright 2014 EMC Corporation. All rights reserved. 15
    • 16. Big Data Challenges for IT Complexity – Multiple Hadoop distributions (Apache, Cloudera, Hortonworks, Pivotal) Costs – Acquisition & Operations Security & Governance – Fiance SEC17a-4, HIPPA – ISO – Audit Big Data is more than Hadoop – Use familiar analytics tools © Copyright 2014 EMC Corporation. All rights reserved. 16
    • 17. Big Data with Splunk © Copyright 2014 EMC Corporation. All rights reserved. 17
    • 18. Splunk Company Highlights Company (SPLK: >100% IPO) • • • • • • • Founded 2004 First SW in 2006 HQ: San Francisco, CA AP HQ: Hong Kong EMEA HQ: London Over 850+ employees 8+ Offices WW © Copyright 2014 EMC Corporation. All rights reserved. Products/ Business Model • On Premise, SaaS or In the Cloud: Licensed by Daily Index Volume • Free Download 500MB Trial: Same bits Scale 500MB > 100s TBs/day Business Highlights 6000+ Customers 60+ Fortune 100 90+ Countries 18
    • 19. Industry Leading Platform for Machine Data Operational Intelligence Any Machine Data Online Services Security Servers Search and Investigation Web Services GPS Location Networks Storage Operational Visibility Real-time Business Insights Packaged Applications Desktops Messaging Online Shopping Cart Proactive Monitoring Telecoms RFID Energy Meters Databases Web Clickstreams Custom Applications Call Detail Records Smartphones and Devices © Copyright 2014 EMC Corporation. All rights reserved. EMC Storage Commodity Servers 19
    • 20. Industry Leading Platform for Machine Data Operational Intelligence Any Machine Data Online Services Web Services Security Servers Search and Investigation Online Shopping Cart GPS Location Schemaon-the-fly Packaged Applications Desktops Telecoms Messaging Universal forwarding Custom Applications RFID No back-end RDBMS Real-time Business Insights No need to filter data Energy Meters Databases Web Clickstreams Operational Visibility Any amount, any location, any source Networks Storage Proactive Monitoring Call Detail Records Smartphones and Devices © Copyright 2014 EMC Corporation. All rights reserved. HA Indexes and Storage Commodity Servers 20
    • 21. EMC Starter Kit for Splunk • Splunk is easy to setup and deploy • Infrastructure for Splunk should be easy and inexpensive • Use familiar, robust IT infrastructure • Leverage existing IT investment • Provide reliable, repeatable, tested solution How do I get Free access to EMC-Splunk Starter Kit? • Type “EMC reference architecture for splunk” into google • https://community.emc.com/docs/DOC-27406 • Over 1000 views! © Copyright 2014 EMC Corporation. All rights reserved. 21
    • 22. Splunk Performance with Shared Storage & Compute Time to 1st event (s) 3 2.499 Time to search (s) 2.48 3.02 30 2 18.07 20 1 26.50 20.18 10 0 0 Single Search Isilon DAS Single Search Average KBPS (1000s) 30 Isilon EC2 EC2 Average EPS (1000s) 22,400 10,944 10,649 10 79,057 80 20 0 40 38,730 37,574 0 Single Index RAID 10 6x15k RPM DAS Isilon © Copyright 2014 EMC Corporation. All rights reserved. DAS Single Index EC2 Isilon DAS EC2 22
    • 23. EMC Solutions for Hadoop Partners Big Data on Vblock Many Joint Pivotal on EMC customers Formal collaboration established Jointly architected Vblock for Hadoop with VMware, Cisco, EMC Several Customer Pilots Officially Support Isilon Co-branded HSK for Cloudera Many Joint Customers Several key wins Co-branded HSK for Splunk Hadoop Wins Enabling Service Providers HDaaS Many installed wins with all of the major distributions Two new case studies: Many Joint Customers Joint support © Copyright 2014 EMC Corporation. All rights reserved. 23
    • 24. Why Use Shared Infrastructure for Hadoop? © Copyright 2014 EMC Corporation. All rights reserved. 25
    • 25. Hadoop Deployment Models Slave Node VM VM VM Combined Storage/ Compute Compute VM VM T1 T2 VM Storage Storage Hadoop in VM Separate Storage Separate Compute Tenant • VM lifecycle determined by Datanode • Limited elasticity • Limited to Hadoop Multi-Tenancy • Separate compute from data • Elastic compute • Enable shared workloads • Raise utilization © Copyright 2014 EMC Corporation. All rights reserved. • Separate virtual clusters per tenant • Stronger VM-grade security and resource isolation • Enable deployment of multiple Hadoop runtime versions 26
    • 26. Why HDFS on EMC (Isilon) shared storage • No Ingest necessary • Eliminate NameNode • • • • • SPOF Eliminate 3x mirroring Enterprise feature set Multi-protocol access Simultaneous Multidistribution support Better cost! © Copyright 2014 EMC Corporation. All rights reserved. • Smart-Dedupe for • • • • • Hadoop SEC 17a-4 Compliant WORM Kerberos Authentication Hadoop Multi-tenancy Simultaneous Distribution Version Support Great performance! Module 4: Horizontal and Vertical Markets 27
    • 27. Why Virtualize Hadoop? Operational Simplicity with Performance Maximize Resource Utilization on New or Existing Hardware  Rapid Deployment  True multi-tenancy  Self service tools  Elastic scaling  Automated resource rebalancing  Avoid dedicated hardware  Performance © Copyright 2014 EMC Corporation. All rights reserved. Architect Scalable and Flexible Big Data Platform  Choice of distributions and storage  VM-based isolation  Maintain management flexibility at scale  Increase resource utilization  Leverage vSphere features 28
    • 28. Performance: Native vs. Virtual, 32 hosts, 16 disks/host Source: http://www.vmware.com/resources/techresources/10360 © Copyright 2014 EMC Corporation. All rights reserved. 29
    • 29. Pivotal-Isilon Alliance Federation Plan & Field Momentum Q4 2013 Copyright 2013 Pivotal. All rights rights reserved. © Copyright 2014 EMC Corporation. Allreserved. 30 30
    • 30. Pivotal Overview ▶ Developer-friendly. ▶ Industry leading application framework and runtimes. ▶ Complete & disruptive set of data products. ▶ Services that accelerate productivity. One ▶ Multi-cloud deployment. ▶ Commitment to open source & open standards. Data Science Team © Copyright 2014 EMC Corporation. All rights reserved. 31
    • 31. Revised Color Palette For 2014 White R 255 G 255 B 255 Black R 0 G 0 B 0 VMware Gray EMC Blue R 113 R 44 G 112 G 149 B 116 B 221 Replaces © Copyright 2014 EMC Corporation. All rights reserved. Green R 73 G 169 B 66 Replaces Lt. R G B Blue 147 197 255 EMC Gray R 186 G 188 B 190 Replaces Pivotal Green R 0 G 125 B 104 Replaces Red R 206 G 49 B 49 Replaces 32
    • 32. © Copyright 2014 EMC Corporation. All rights reserved. 33

    ×