1© Copyright 2014 EMC Corporation. All rights reserved.
EMC Hadoop Starter Kit
ViPR Edition
EMC Open Innovation Lab
2© Copyright 2014 EMC Corporation. All rights reserved.
The Digital Universe
Less than 1% of
the World’s Data
is Analyzed
...
3© Copyright 2014 EMC Corporation. All rights reserved.
Location & Types Of Big Data
Structured Data
Unstructured
Data
Ent...
4© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Challenges
Depends on HDFS for data repository
– Must make ...
5© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Storage Options
Hadoop HDFS
• Leverage Hadoop distro
HDFS d...
6© Copyright 2014 EMC Corporation. All rights reserved.
ViPR HDFS
HDFS is becoming the de facto file
system for distribute...
7© Copyright 2014 EMC Corporation. All rights reserved.
Support Mixed Workloads
Object, File and HDFS operations on the sa...
8© Copyright 2014 EMC Corporation. All rights reserved.
Simple, Easy, Cost Effective
EMC Starter Kit for Hadoop – ViPR Edi...
EMC Hadoop Starter Kit - ViPR Edition
Upcoming SlideShare
Loading in …5
×

EMC Hadoop Starter Kit - ViPR Edition

1,329 views

Published on

Are you deploying Hadoop and want enterprise infrastructure manageability, reliability, and availability? The new EMC Hadoop Starter Kit shows you how to this without building HDFS data silo's.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,329
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
36
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • We are experiencing a perfect storm of technology and analytic innovation. In the past analysis started with an hypotheses and a corresponding set of data with specific elements that needed to be collected. The data collected was scrubbed and stored in neat columns and rows. Analysis depended on precise data collection. Today with the reduction in the cost of storing, and computing data, along with the amount of date we can collect analysis is based on discovering corrleation.
  • Today data is being collected and stored. That data is available for analysis. Analytics processing today does not depend on neat data because the size of the data sets minimizes the impact of anomalies. New analytic systems such as Hadoop have been created and are optimized for this type of analysis. As an IT provider what are the challenges associated with deploying Hadoop
  • The Hadoop Distributed File System (HDFS) is becoming increasingly popular as a file system layer for distributed applications, beyond Hadoop.Scenarios: High aggregate throughput access to data, e.g. MapReduce. In some cases, low latency access.Concerns: Scale, durability, cost, managementHDFS is becoming a de facto file system for distributed applications but it has some challenges and limitations that have slowed adoption within enterprises. Many enterprises don’t have the Big Data analytics expertise. Also, enterprises have experimented in the lab, which means they have a dedicated Hadoop cluster and need a way to move or copy data into the cluster for analytics. Data that has value can be hard to identify or move from where it primarily resides. ViPR offers a great platform for HDFS. By delivering HDFS as a data service rather than as a file system on a dedicated infrastructure, it brings the capabilities of HDFS to the data where it resides. Much like the object data service, it enables hybrid data types such as HDFS-on-object or HDFS-on-file. The HDFS data service is built in software so it allows for colocation.
  • In addition to physical segregation, buckets provide logical segregation within the object store. Just like in S3, a user can create buckets which logically segregate applications or sets of data. These buckets can grown and shrink on-demand. The actual data objects are distributed and intermingled across the physical devices that comprise the virtual storage array.
  • EMC Hadoop Starter Kit - ViPR Edition

    1. 1. 1© Copyright 2014 EMC Corporation. All rights reserved. EMC Hadoop Starter Kit ViPR Edition EMC Open Innovation Lab
    2. 2. 2© Copyright 2014 EMC Corporation. All rights reserved. The Digital Universe Less than 1% of the World’s Data is Analyzed By 2020, the Internet will connect 7.6B people and 200B things (sensors, machines, cars, appliances…) Data Volumes 2000: 2 Exabytes a year 2011: 2 Exabytes a day
    3. 3. 3© Copyright 2014 EMC Corporation. All rights reserved. Location & Types Of Big Data Structured Data Unstructured Data Enterprise Forecast Data Location Data Credit Data Shipping Data Social, Video Data Partner Public 10101010100101010 011001010101110010 1101010100101011111 Telemetry Data Location & Types Of Big (& Fast!) Data
    4. 4. 4© Copyright 2014 EMC Corporation. All rights reserved. Hadoop Challenges Depends on HDFS for data repository – Must make legacy data accessible through HDFS Hadoop HDFS inefficiencies: – 3 copies for protection – No advanced data efficiency: de-duplication, thin provision – Security Integration with robust traditional data center products: compute virtualization, enterprise storage
    5. 5. 5© Copyright 2014 EMC Corporation. All rights reserved. Hadoop Storage Options Hadoop HDFS • Leverage Hadoop distro HDFS data services • Compute, and data converged on cluster of servers Storage Array • Name node and Data node services from storage array (i.e. EMC Isilon) Storage OS Name node and Data node services from storage OS (i.e. EMC ViPR)
    6. 6. 6© Copyright 2014 EMC Corporation. All rights reserved. ViPR HDFS HDFS is becoming the de facto file system for distributed applications ViPR is a great platform for HDFS – Addresses limitations of off-the-shelf HDFS – Brings HDFS to existing storage hardware – Enables HDFS/object/file scenarios – Flexible software model allows colocation
    7. 7. 7© Copyright 2014 EMC Corporation. All rights reserved. Support Mixed Workloads Object, File and HDFS operations on the same data VIRTUAL ARRAY Isilon 3rd Party VNX 5500 ViPR Data Services offer three bucket options: – Object – HDFS – ObjectandHDFS ObjectandHDFS provides user with access to either S3 or HDFS – Full compatibility with existing object based APIs ▪ Amazon S3, Openstack Swift, Atmos Object HDFS Object & HDFS
    8. 8. 8© Copyright 2014 EMC Corporation. All rights reserved. Simple, Easy, Cost Effective EMC Starter Kit for Hadoop – ViPR Edition Deployment guides for major Hadoop distributions: – Pivotal, Cloudera, and Hortonworks Four step deployment: – Deploy preferred Hadoop Distribution – Deploy EMC ViPR with Object, and HDFS data services – Configure Hadoop distribution to use ViPR HDFS target – Validation Process ▪ Load data file via S3 interface ▪ Test MapReduce job

    ×