1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Deep Dive On...
2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Agenda
• Piv...
3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
What Matters...
4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
How Pivotal ...
5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal’s Bi...
6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Busi...
7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Busi...
8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Busi...
9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
How is a Bus...
10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Introducing...
11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD ...
12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
New Apache ...
13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop at t...
14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Real-Time A...
15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Real-Time D...
16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Gem...
17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Deep Scalab...
18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Deep Scalab...
19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
• HAWQ 1.2
...
20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal vs....
21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
HAWQ: SQL o...
22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
HAWQ Contin...
23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
NameNode HA...
24© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Error Table...
25© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Parquet
• F...
26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
HAWQ Expans...
27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Big Computi...
28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Background
...
29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
MPI Backgro...
30© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
GraphLab
• ...
31© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD:...
32© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
World’s Lea...
33© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Ena...
34© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD ...
35© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Large Mid-M...
36© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Major TV Ne...
37© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Home Applia...
38© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
National He...
39© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Aviation: P...
40© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Brazilian T...
41© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD ...
42© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD ...
43© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Thank You
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Upcoming SlideShare
Loading in …5
×

Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform

1,236 views

Published on

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,236
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
80
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform

  1. 1. 1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Deep Dive On Pivotal HD - World Class HDFS Platform Michael Goddard
  2. 2. 2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Agenda • Pivotal • Pivotal Business Data Lake • Introducing Pivotal HD 2.0 • Pivotal HD 2.0 and Isilon Update • Customer Success • Q&A
  3. 3. 3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. What Matters: Apps. Data. Analytics. Apps power businesses, and those apps generate data Analytic insights from that data drive new app functionality, which in-turn drives new data The faster you can move around that cycle, the faster you learn, innovate & pull away from the competition
  4. 4. 4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. How Pivotal Gets You There Uniquely positioned to help enterprises modernize each facet of this cycle today Comprehensive portfolio of products & services spanning Big Data, PaaS & Agile Converging these technologies into a coherent, next-gen Enterprise PaaS platform Pivotal Labs Agile Development Pivotal Data Fabric Pivotal One PaaS
  5. 5. 5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal’s Big Bets for the Future 1. HDFS becomes the data substrate for the next generation of data infrastructures 2. A set of integrated, consumer-grade services must evolve on top of HDFS – stream ingestion, analytical processing, and transactional serving 3. Provisioning flexibility and elasticity become critical capabilities for this data infrastructure
  6. 6. 6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal Business Data Lake Govern where it matters  Focus on MDM and RDM  Enforce only when sharing  Treat corporate as aggregation of local Encourage local requirements  Let the business decide what they need  Build from the bottom  Enable traceability to source  Disposable data views Distill on demand  Select only what you want  Business friendly tooling  Re-usable information maps  Rapid change cycle Store everything  Store everything ‘as is’  Include structured and unstructured data  Store it cheaply
  7. 7. 7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal Business Data Lake Architecture Centralized Management System monitoring System management Unified Data Management Tier Data mgmt. services MDM RDM Audit and policy mgmt. Processing Tier Workflow Management In-memory MPP database Existing Sources Unified Sources Flexible Actions Real-time ingestion Micro batch ingestion Batch ingestion Real-time insights Interactive insights Batch insights HDFS New Data Sources
  8. 8. 8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal Business Data Lake Architecture Centralized Management Unified Data Management Tier Data Dispatch MDM RDM Data Dispatch Processing Tier Spring XD Pivotal GemFire XD HAWQ Unified Sources Flexible Actions Clickstream Sensor Data Weblogs Network Data CRM Data ERP Data Pivotal GemFire Pivotal RabbitMQ Redis Pivotal CFPivotal HD Command Center Existing SourcesNew Data Sources
  9. 9. 9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. How is a Business Data Lake Different? Business Data LakeCriteria EDW Common data model Base class = standard data Derived classes = local data Single class = single view across the enterprise Data quality Full spectrum 1 0 0 1 01 0 0 1 0 1 1 1 0 Data integration Multiple interfaces SQL, SAS, R, MapReduce, NoSQL SQL access integration with SAS, R and other analytical interfaces Mixed workload with varying QoS Support low latency, interactive and batch Limited QoS separation required
  10. 10. 10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Introducing Pivotal HD 2.0 • Foundation for Business Data Lake • World’s Most Advanced Real- Time Analytics Platform • Most Extensive Set of Advanced Analytical Toolsets
  11. 11. 11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal HD Architecture HDFS HBas e Pig, Hive, Mahout Map Reduce Sqoop Flume Resource Management & Workflow YARN ZooKeeper Apache Pivotal Command Center Configure, Deploy, Monitor, Manage Spring XD Pivotal HD Enterprise Spring Xtension Framework Catalog Services Query Optimizer Dynamic Pipelining ANSI SQL + Analytics HAWQ – Advanced Database Services Distributed In-memory Store Query Transactions Ingestion Processing Hadoop Driver – Parallel with Compaction ANSI SQL + In-Memory Pivotal GemFire XD – Real-Time Database Services MADlib Algorithms Oozie Virtual Extensions GraphLab, Open MPI
  12. 12. 12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. New Apache Hadoop Features in Pivotal HD 2.0 • Apache Hadoop 2.2 enables enterprise operationalization features such as NFS and Snapshots • Hive 0.12 is faster, has better scalability, and broader SQL data type support • Pig 0.12 (incl. PiggyBank) increases productivity and appeal for broader set of users • HBase 0.96 improves in mean time between recovery and modularization for easy upgrade and reduced dependencies
  13. 13. 13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Hadoop at the Center Enabling the Data-Driven Enterprise Hadoop as a Service Big Data On-Demand GemFire XD In-Memory Real-time Analytics Spring XD Building Big Data Apps Open Source Algorithm Libraries Chorus Big Data Collaboration Fastest SQL Query Engine
  14. 14. 14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Real-Time Analytics • Adds fast data ingest, and real-time event processing and query performance, enabling SQL users to rapidly analyze and react to high volumes of events on HDFS • Enables the creation of low latency, scale out OLTP applications integrated out of the box with a big data store. • Creates a single platform for Analytics and OLTP, removing the need for an ETL process • Supports changes to database tables while still complying to the immutability of HDFS
  15. 15. 15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Real-Time Data Services on Pivotal HD Pivotal GemFire XD HAWQ Pivotal Extension Framework Model Refresh MapReduce I/P & O/P Formatter Native Persistence Command Center Model Refresh Online Apps Analytic Apps Sensor Data / Feeds Pivotal HD Enterprise Shared Data Re-evaluate Model Re-evaluate Model HDFS
  16. 16. 16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal GemFire XD 1.0 Major Features Enterprise real-time data processing platform for SLA critical applications; enables users to rapidly and reliably analyze and react to high volumes of events while leveraging 10s of TBs of in-memory reference data. Cloud Scale Real-Time Platform Seamless Pivotal HD Integration Optimized for Real-Time Analytics • Very low & predictable latencies at high and variable loads • 10s of TBs in-memory (MemScale) • Multi-tiered caching • Real-time event processing • Rolling upgrade support • SQL-based queries • Support structured data • Java stored procedures • Deep Spring Data integration • Scale to HDFS with policy driven in-memory data retention • Online and offline querying of HDFS data • ETL-less bi-directional integration with other Pivotal HD services • Pivotal Extension Framework Integration • ICM Integration Enterprise-Class Reliability • Distributed transactions (JTA) • HA through in-memory redundancy • Active-active deployments across WAN • JMX based scalable management • Visual monitoring through Pulse
  17. 17. 17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Deep Scalable Analytics • User Defined Functions: PL/R, PL/Java, PL/Python enable writing UDFs in additional languages that execute inside the database, improving performance • Parquet columnar open storage format delivers significant performance and scalability improvements • Richer set of open source machine learning algorithms helps conduct rapid data science experiments on relational data
  18. 18. 18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Deep Scalable Analytics Provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
  19. 19. 19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. • HAWQ 1.2 Deep Scalable Analytics • Linear Regression • Logistic Regression • Multinomial Logistic Regression • K-Means • Association Rules • Latent Dirichlet Allocation • Naïve Bayes • Elastic Net Regression • Decision Trees / Random Forest • Support Vector Machines • Cox Proportional Hazards Regression • Descriptive Statistics • ARIMA
  20. 20. 20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal vs. PL/R • Interface is R client • Execution is in database • Parallelism handled by PivotalR • Supports a portion of R PivotalR • Interface is SQL client • Execution is in R • Parallelism via SQL function invocation • Supports all of R PL/R
  21. 21. 21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. HAWQ: SQL on Hadoop, Format Agnostic Pivotal HD: HDFS Data Lake Future formats … ANSI SQL
  22. 22. 22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. HAWQ Continue to Soar  NameNode High Availability (HA) Support improves availability of query processing with full Hadoop fault tolerance  Error Table helps to debug data errors  Parquet file format: columnar data storage for HDFS  HAWQ expansion increases performance (concurrency/throughput) by expanding query processing to newly added data nodes
  23. 23. 23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. NameNode HA Support • Feature: – Automatic failover to secondary NameNode when primary fails • Benefits: – Fully fault tolerant to NameNode failures – Improved availability of query processing – Integrated into Hadoop availability model
  24. 24. 24© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Error Table • Feature: – System table for storing non-conforming data • Benefits: – Eliminates erroneous data load – Reduces retries during load – Helps to debug errors in data structures
  25. 25. 25© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Parquet • Features: – Open storage format – Hybrid row/column open storage format – Configurable Parquet or AO/CO format support – Compression Type: Snappy and Gzip – Additional data type support – Parquet Input Format Reader API • Benefits: – Delivers significant performance and scalability improvements – Industry standard compression: Saves storage – Usable in MapReduce/Hive work loads
  26. 26. 26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. HAWQ Expansion • Features: – Expand HAWQ nodes to additional DataNodes – Expand # of segments per HAWQ segment host • Benefits: – Expand query processing – Increase performance by utilizing maximum CPU/resources – Increased concurrency/throughput
  27. 27. 27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Big Computing and Graph Analytics • Open MPI is one of the most mature parallel computing frameworks now available within HDFS, eliminating costly data movement and shortening data science cycles • GraphLab is a graph-based library of machine learning algorithms – allowing Data Scientists and Analysts to leverage popular algorithms such as PageRank, collaborative filtering and computer vision in HDFS Open MPI
  28. 28. 28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Background • Hadoop MapReduce is not a good fit for iterative applications (like graph computing, machine learning, etc.) • User needs to build separate system/clusters to support those applications • MPI is (one of) the most mature/used parallel computing frameworks – MPI = Big Computing, Hadoop = Big Data – MPI + Hadoop = Big Computing + Big Data
  29. 29. 29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. MPI Background • What is MPI? – “a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on wide variety of parallel computers” Wikipedia • What is Open MPI? – One of the most popular implementations of MPI, community supported • What is Hamster? – “Hadoop And Mpi on the same cluSTER
  30. 30. 30© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. GraphLab • Topic Modeling contains applications like LDA, which can be used to cluster documents and extract topical representations. • Graph Analytics contain applications like PageRank and triangle counting, which can be applied to general graphs to estimate community structure. • Clustering contains standard data clustering tools such as k-means • Collaborative Filtering contains a collection of applications used to make predictions about users interests and factorize large matrices. • Graphical Models contain tools for reasoning about structured noisy data. • Computer Vision contains a collection of tools for reasoning about images.
  31. 31. 31© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal HD: Built for Data Science Relational Advanced Analytics Data Science on Pivotal HD Graph Advanced Analytics SQL R Python Java Languages: Custom Analytic Functions - UDFs
  32. 32. 32© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. World’s Leading Experts Pivotal Labs – Pivotal Data Labs On Demand Services Pivotal Data Dispatch BATCH BATCH INTERACTIVE INTERACTIVEHAWQGreenplum DB Unlimited Pivotal HD REAL-TIME REAL-TIMEGemFire XDGemFire | SQLFire
  33. 33. 33© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal Enables Hadoop Market Adoption Data Lakes Unify Unstructured and Structured Data Access Big Data Apps Build analytic and transaction-led applications impacting top line revenue Data-Driven Enterprise App Dev and Operational Management on HDFS Data Architecture
  34. 34. 34© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal HD 2.0 and Isilon Update • Isilon aligns with our Enterprise Grade Message • Pivotal Command Center 2.2 (part of Pivotal HD 2.0) – Works with Pivotal HD 1.1.1 – ‘Down’ status of HDFS is removed when Isilon is configured • Isilon has accelerated their integration from Q4 to Q3 for HDFS 2.2
  35. 35. 35© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Large Mid-Market Financier Builds Foundation to Store All Data of Interest, Convert Insights to Value- added Services Challenge: • Mid Market financier seeks to maintain high margins through value-added services • Realized that critical insights could come from many sources, but much was deleted due to storage cost • Frustrated by lack of ability to blend data fabric, build analytics on top, create applications on top of this. Solution: • Data Lake provides accessibility of any information of interest through familiar SQL-Like interface • Provide foundation for creation of Analytics and Applications as value added services: forecast demand based on social media sentiment, analytics on fleet vehicle usage
  36. 36. 36© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Major TV Network Replaces Teradata with Pivotal Builds Infrastructure to Capture $40 Million in Untapped Revenue Challenge: • Ad Inventory is an inherently perishable product, and is subject to inefficient, “traditional” selling process. • Upward trend in volume and traffic due to higher ad quality, mobile devices. • Inability to react: 7 hour lag time in communication between ad fulfillment and sales teams, this was exacerbated by major broadcast events. Solution: • Reduced 7-hour lag time to under 1 hour – enabling network sales to communicate delivered impressions, forecast spend inventory and sell more effectively • Maximized profit by selling across brands/channels – allowing network to better leverage non-premium inventory
  37. 37. 37© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Home Appliance Maker Lays Foundation for “Smart” Connected Devices, Big Data-based Decisions Challenge: • Prepare for next generation appliances: “smart” connected devices, controlled by mobile phone • Silo’ed environment including Teradata, SAS, HP made it difficult to derive true insights across disparate data Solution: • Enable Innovation, improve service performance through appliances that provide feedback based on output, environmental factors • Improve marketing efficiency with targeted campaigns based on market demographics, buying indicators • Better understand requirements for parts inventory based on current appliances lifecycle
  38. 38. 38© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. National Healthcare Organization Replaces Aging IBM Platform, Seeds Data Lake as Hadoop Beachhead Challenge: • Aging IBM Infrastructure could not support new SAS Access and Visual Analytics Technology • Interest in enabling infrastructure to support for-profit healthcare analytics as a service business • Sought to provide refined data sets to other insurance companies for their own research, needed way to cleanse data Solution: • Stepwise evolution of platform onto GPDB, one of two certified platform partners for running visual analytics • Established data lake as platform for upload, cleansing and conversion of private data into publicly consumable datasets
  39. 39. 39© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Aviation: Predictive Maintenance Challenge: • An airplane’s comprehensive “gate to gate” flight data didn’t exist in a single place for reporting • Each individual flight can generate approximately 1 TB of data - economically infeasible in traditional EDW • To maintain profitability of GE Aviation's Contract Service Agreements, new analytical methods and approaches were required Solution: • Ingest all data to a data lake for data discovery and model development to increase wing time, greater aircraft uptime, improve customer satisfaction and airline profitability • Improved capacity for preventative maintenance rather than remediation, reducing expense and liability Pivotal Solution includes: GPDB, PHD, Alpine, Chorus
  40. 40. 40© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Brazilian Telco Provider Establishes Foundation for Data-Driven Culture Challenge: • Poor call quality caused massive loss of customers. No Insight into root cause of issues. • Increased scrutiny from regulators, but infrastructure did not support the requests for information needed • Difficulties with Scale: Call Data Record generates 2 Billion new records per day, no info on dropped calls due to capacity Solution: • New Data Warehouse infrastructure contains both dropped and completed calls for analysis, 3 month capacity • Hadoop infrastructure with familiar SQL interface stores 5x volume at half cost of Teradata • Reports which took 2 Months to obtain now take 1 day Pivotal Solution includes: PHD, HAWQ
  41. 41. 41© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal HD 2.0 Summary • The Foundation for Business Data Lake • The World’s Most Advanced Hadoop Stack – Pivotal HD now based on Apache 2.2 – Real-time SQL, in-memory over Pivotal HD and integrated into Spring: Pivotal GemFire XD – Enhanced Interactive SQL over Pivotal HD: HAWQ • World’s Most Advanced Big Data Analytic Platform – Most extensive set of machine learning libraries: MADlib, R and GraphLab
  42. 42. 42© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal HD 2.0 demo: PivotalBooth
  43. 43. 43© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Thank You

×