• Share
  • Email
  • Embed
  • Like
  • Private Content
Hadoop Overview
 

Hadoop Overview

on

  • 1,937 views

( EMC World 2012 ) :Apache Hadoop is now enterprise ready. This session reviews the features/roadmap of Hadoop. We will review some of the key capabilities of GPHD 1.x and our plans for 2012.

( EMC World 2012 ) :Apache Hadoop is now enterprise ready. This session reviews the features/roadmap of Hadoop. We will review some of the key capabilities of GPHD 1.x and our plans for 2012.

Statistics

Views

Total Views
1,937
Views on SlideShare
1,935
Embed Views
2

Actions

Likes
3
Downloads
163
Comments
0

2 Embeds 2

http://64.73.205.98 1
http://dev.techarda.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hadoop Overview Hadoop Overview Presentation Transcript

    • HADOOP OVERVIEW Milind Bhandarkar, Chief Architect, CTO Office, Greenplum Will Davis Senior Manager, Product Marketing, Greenplum© Copyright 2012 EMC Corporation. All rights reserved. 1
    • Agenda Hadoop – what’s the big deal? Evolution of Hadoop from Web 2.0 to Enterprise adoption Deployment considerations for Enterprises – Enterprise storage – Integration into architecture and analytics workflow – Training/support resources How Greenplum HD is Hadoop built for the enterprise© Copyright 2012 EMC Corporation. All rights reserved. 2
    • Background of Hadoop© Copyright 2012 EMC Corporation. All rights reserved. 3
    • What is Hadoop Framework that allows for distributed processing of large data sets across clusters of commodity servers – Store large amount of data – Process the large amount of data stored Inspired by Google’s MapReduce and Google File System (GFS) papers Apache Open Source Project – Initial work done at Yahoo! – Very active open source community© Copyright 2012 EMC Corporation. All rights reserved. 4
    • The Hadoop Opportunity Internet age + exploding data growth Enterprises increasingly interested in leveraging new data sources quickly: – Spot emerging trends – Identify new opportunities, etc. Traditional database tools not able to cope – Weren’t built for big data use cases – Lack scale, not cost-effective, rigid data structure Need for new approach  Hadoop© Copyright 2012 EMC Corporation. All rights reserved. EMC Confidential 5
    • Why Hadoop is Important? Handles large amounts of data Stores data in native format Delivers linear scalability at low cost Resilient in case of infrastructure failures Transparent application scalability© Copyright 2012 EMC Corporation. All rights reserved. 6
    • Why Hadoop is Important? Handles large amounts of data Stores data in native format Delivers linear scalability at low cost Resilient in case of infrastructure failures Transparent application scalability Enterprises can gain a competitive advantage through the adoption of big data analytics© Copyright 2012 EMC Corporation. All rights reserved. 7
    • What is Hadoop? Two Core Components HDFS MapReduce Scalable storage in Compute via the Hadoop Distribued MapReduce distributed File System processing platform• Storage & Compute in 1 Framework• Open Source Project of the Apache Software Foundation• Java-intensive programming required© Copyright 2012 EMC Corporation. All rights reserved. 8
    • Hadoop Architecture 1. Data is ingested into the Hadoop File System (HDFS) 2. Computation occurs inside Hadoop (MapReduce) 3. Results are exported from HDFS for use Hadoop Data Node Hadoop Data Node Hadoop Data Node Hadoop Data Ethernet Node Hadoop Data Node Hadoop Data Node Hadoop Data Node© Copyright 2012 EMC Corporation. All rights reserved. 9
    • Hadoop Components Spring Hadoop •Integrates Spring and Hadoop Frameworks Mahout •Scalable machine learning libraries HBase •Database for random, real time read/write access Hive •System for SQL-like query data on top of HDFS Pig •Procedural language that abstracts MapReduce Zookeeper •Highly reliable distributed coordination MapReduce •Framework for writing scalable data applications HDFS •Hadoop Distributed File System© Copyright 2012 EMC Corporation. All rights reserved. 10
    • Hadoop Use Case Examples Scale-out content  Personalization and management & data asset management repository analysis Batch processing of  Trade analytics heterogeneous data ETL  Credit scoring (Extract/Transform/Load )  Customer retention Pre-processing and  Sentiment analysis integration with data (opinion mining) warehouse© Copyright 2012 EMC Corporation. All rights reserved. 12
    • Evolution of Hadoop From Web 2.0 to Enterprise© Copyright 2012 EMC Corporation. All rights reserved. 13
    • Web 2.0 Organizations are“Data-Driven” “The future is here, it’s just not evenly distributed yet.” –WILLIAM GIBSON© Copyright 2012 EMC Corporation. All rights reserved. 14
    • Technology Adoption Lifecycle Innovators/ Early Majority Late Majority Laggards Early Adopters© Copyright 2012 EMC Corporation. All rights reserved. 15
    • Evolution of the Hadoop Market Innovators/ Early Majority Late Majority Laggards Early Adopters Hadoop Early Adopters Hadoop Early Majority© Copyright 2012 EMC Corporation. All rights reserved. 16
    • Evolution of the Hadoop Market HADOOP PROFILE (TODAY) Pioneers and academics Application Architect Visionary Open source / community driven Build-your-own server, application & storage infrastructure Commodity components Web 2.0 Universities Life Sciences Hadoop Early Adopters Hadoop Early Majority© Copyright 2012 EMC Corporation. All rights reserved. 17
    • Evolution of the Hadoop Market HADOOP PROFILE (TODAY) HADOOP PROFILE (FUTURE) Pioneers and academics IT Manager & CIO Application Architect Data Scientist Visionary Line-of-business Open source / community driven Commercial distribution Build-your-own server, application Turnkey solution & storage infrastructure End-to-End Data protection Commodity components Web 2.0 Fortune 1000 Universities Financial Services Life Sciences Retail Hadoop Early Adopters Hadoop Early Majority© Copyright 2012 EMC Corporation. All rights reserved. 18
    • 2012: Hadoop Beyond Web 2.0© Copyright 2012 EMC Corporation. All rights reserved. 19
    • Greenplum HD: Hadoop for the Enterprise© Copyright 2012 EMC Corporation. All rights reserved. 20
    • Hadoop Challenges in the Enterprise  Hadoop is hard right now! – Setup & configuration is resource-intensive – Lack of skills to make Hadoop work – Poor integration with existing technologies – Management at Scale is nonexistent – Backup & disaster recovery missing© Copyright 2012 EMC Corporation. All rights reserved. 21
    • Greenplum HD Enterprise-Ready Hadoop  Simple, efficient and scalable  Proven at scale with worldwide EMC support  Purpose-built Hadoop infrastructure  Services to address the talent gap  Parallel analytics access with Greenplum Database© Copyright 2012 EMC Corporation. All rights reserved. 22
    • Greenplum HD Architecture Greenplum Chorus GREENPLUM COMMAND CENTER Hadoop Tools (Pig, Hive, HBase, Zookeeper, Mahout, etc…) MapReduce Layer Pluggable Storage Layer (HDFS API) Apache HDFS Isilon OneFS© Copyright 2012 EMC Corporation. All rights reserved. 23
    • Enterprise Storage for Hadoop  Integrated big data storage and analytic solution based on Greenplum HD and Isilon scale-out NASCompute  Isilon is 1st and only enterprise scale out NAS storage platform that natively integrates the Hadoop Distributed File System (HDFS) protocol  Seamless analytics access withStorage Greenplum - Hadoop insights directly plug into Greenplum Database to augment analytics© Copyright 2012 EMC Corporation. All rights reserved. EMC Confidential 24
    • Flexible and Efficient Independently Scale Compute & Storage – Add Greenplum HD or Isilon nodes for performance or capacity Eliminate 3x copies of data in HDFS – Isilon enables 80% utilization for greater storage efficiency Seamless Analytics Access with Greenplum Database – Hadoop Fused with GPDB for Big Data analytics© Copyright 2012 EMC Corporation. All rights reserved. 25
    • Simplified Deployment  Remove the need for data staging – Isilon enables data access over standard protocols (NFS, CIFS, FTP, HTTP, HDFS)  No single point of failure – Isilon distributes the NameNode to provide high availability and load balancing  Enterprise data services for Hadoop – Advanced backup and disaster recovery capabilities© Copyright 2012 EMC Corporation. All rights reserved. 26
    • Advanced Management Greenplum Command Center – Complete platform management and control Greenplum Package Manager – Automates install, uninstall, update, and query for analytics extensions – Support package migration during upgrade, segment recovery, expansion, and standby initialization© Copyright 2012 EMC Corporation. All rights reserved. 27
    • Proven at Scale with Worldwide Support  Industries largest Hadoop support team – Industry’s most accomplished Hadoop talents (from Yahoo!, LinkedIn, Talend, etc.)  Tested at scale on the Greenplum Analytics Workbench – 1,000-node, 24-petabyte cluster – Multi-million dollar investment by Bringing Rapid EMC and partners Innovation to Hadoop – Reduced risk for EMC customers – Certification of partner products© Copyright 2012 EMC Corporation. All rights reserved. 28
    • Get Started With Hadoop Today  Hadoop Architecture Services – POC planning and deployment – Installation and best practices – Educate the team  Greenplum Analytics Labs – Leverage the expertise of Greenplum’s Data Scientists – Packaged solutions that produce business value and actionable results – Accelerate Hadoop capabilities on your data with your analysts  Establish a strategic vision – Roadmap for Hadoop and unified analytics© Copyright 2012 EMC Corporation. All rights reserved. 29
    • Provide Feedback & Win!  125 attendees will receive $100 iTunes gift cards. To enter the raffle, simply complete: – 5 sessions surveys – The conference survey  Download the EMC World Conference App to learn more: emcworld.com/app© Copyright 2012 EMC Corporation. All rights reserved. 30
    • © Copyright 2012 EMC Corporation. All rights reserved. 31
    • Thank You© Copyright 2012 EMC Corporation. All rights reserved. 32