Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

5. pivotal hd 2013


Published on

VMWare Big Data Forum

Published in: Technology
  • Be the first to comment

  • Be the first to like this

5. pivotal hd 2013

  1. 1. 1© Copyright 2013 Pivotal. All rights reserved. 1© Copyright 2013 Pivotal. All rights reserved. A NEW PLATFORM FOR A NEW ERA
  2. 2. 2Pivotal Confidential–Internal Use Only 2© Copyright 2013 Pivotal. All rights reserved. Pivotal HD
  3. 3. 3Pivotal Confidential–Internal Use Only HDFS HBase Pig, Hive, Mahout Map Reduce Sqoop Flume Resource Management & Workflow Yarn Zookeeper Apache Pivotal HD Added Value Configure, Deploy, Monitor, Manage Command Center Hadoop Virtualization (HVE) Data Loader Pivotal HD Enterprise Xtension Framework Catalog Services Query Optimizer Dynamic Pipelining ANSI SQL + Analytics HAWQ– Advanced Database Services Pivotal HD Architecture
  4. 4. 4Pivotal Confidential–Internal Use Only • HDFS – The Hadoop Distributed File System acts as the storage layer for Hadoop • MapReduce – Parallel processing framework used for data computation in Hadoop • Hive – Structured, data warehouse implementation for data in HDFS that provides a SQL-like interface to Hadoop • Pig – High-level procedural language for data pipeline/data flow processing in Hadoop • HBase – NoSQL, key-value data store on top of HDFS • Mahout – Library of scalable machine- learning Algorithms • Spring Hadoop – Integrates the Spring framework into Hadoop Pivotal HD Components
  5. 5. 5Pivotal Confidential–Internal Use Only • Installation and Configuration Manager (ICM) – cluster installation, upgrade, and expansion tools. • GP Command Center – visual interface for cluster health, system metrics, and job monitoring. • Hadoop Virtualization Extension (HVE) – enhances Hadoop to support virtual node awareness and enables greater cluster elasticity. • GP Data Loader – parallel loading infrastructure that supports “line speed” data loading into HDFS. • Isilon Integration – extensively tested at scale with guidelines for compute-heavy, storage-heavy, and balanced configurations. • Advanced Database Services (HAWQ)– high-performance, “True SQL” query interface running within the Hadoop cluster. • Extensions Framework (GPXF) – support for HAWQ interfaces on external data providers (HBase, Avro, etc.). • Advanced Analytics Functions (MADLib) – ability to access parallelized machine-learning and data-mining functions at scale. GPHD Includes… Pivotal HD Adds the Following to GPHD… Pivotal HD Value-Added Components
  6. 6. 6Pivotal Confidential–Internal Use Only Component Version Hadoop 1.0.3 HBase 0.92.1 Hive 0.8.1 Mahout 0.6 Pig 0.9.2 Zookeeper 3.3.5 Flume 1.2.0 Sqoop 1.4.1 Spring Hadoop GPHD 1.2 Core Distribution Pivotal HD Enterprise Pivotal Core Components & Versions Component Version Hadoop 2.0.2 HBase 0.94.2 Hive 0.9.1 Mahout 0.8.0 Pig 0.10.0 Zookeeper 3.4.5 Flume 1.3.1 Sqoop 1.4.2 Spring Hadoop 1.0.0
  7. 7. 7Pivotal Confidential–Internal Use Only DataLoader . . Streams Push Pull Connectors Flume HDFS DataLoader Data Source Registration Copy Strategy Optimization Web GUI and CLI Data Destination Registration Data Copy Job Management Data Processing REST APIs Files HDFS NFS HTTP FTP Local
  8. 8. 8Pivotal Confidential–Internal Use Only Command Center Simple and complete cluster management  Install and configure Hadoop components and services  Centralized interface for Pivotal HD cluster monitoring, diagnostics, and management  Live and historical Hadoop system metrics analysis Configure Monitor Manage Analyze Deploy
  9. 9. 9Pivotal Confidential–Internal Use Only Command Center – Monitor, Manage, and Analyze  Host, application, and job level monitoring across the entire Pivotal HD cluster performance  Visualize and analyze live and historical Hadoop cluster information through Command Center Dashboard  Quick diagnostics of functional or performance issue
  10. 10. 10Pivotal Confidential–Internal Use Only Hadoop Virtualization Extensions (HVE) • HVE enables Hadoop to support more effective virtual deployments • This creates the opportunity to provision and scale the compute and storage processes independently resulting in: • Much better resource utilization • Improved resource allocation and consumption • Support Multi-Tenancy
  11. 11. 11Pivotal Confidential–Internal Use Only HAWQ Delivers  SQL compliant  World-class query optimizer  Interactive query  Horizontal scalability  Robust data management  Common Hadoop formats  Deep analytics
  12. 12. 12Pivotal Confidential–Internal Use Only Xtension Framework  An advanced version of GPDB external tables  Enables combining HAWQ data and Hadoop data in single query  Supports connectors for HDFS, Hbase and Hive  Provides extensible framework API to enable custom connector development for other data sources HDFS HBase Hive Xtension Framework