Greenplum Database                                            Overview                                               Micha...
© Copyright 2012 EMC Corporation. All rights reserved.   2
© Copyright 2012 EMC Corporation. All rights reserved.   3
© Copyright 2012 EMC Corporation. All rights reserved.   4
Greenplum Unified Analytic Platform© Copyright 2012 EMC Corporation. All rights reserved.   5
GREENPLUM DATABASE                                                         Industry Leading Database with                 ...
GREENPLUM DATABASEExtreme Performance for Analytics                                                Optimized for BI and a...
GREENPLUM DATABASEPerformance Through Parallelism              Master              Servers                                ...
GREENPLUM DATABASEGreenplum Delivers Choice & Flexibility                           Greenplum Data                Greenplu...
Core Functionality                                       GREENPLUM DATABASE© Copyright 2012 EMC Corporation. All rights re...
GREENPLUM DATABASEComponent Overview                                           CLIENT ACCESS                          3rd ...
GREENPLUM DATABASEMost Powerful Data Loading Capabilities                                                          SINGLE ...
GREENPLUM DATABASEPolymorphic Table StorageTM                                                         TABLE ‗CUSTOMER‘    ...
GREENPLUM DATABASEgNet Software Interconnect A supercomputing-based ―soft-switch‖  responsible for        – Efficiently p...
GREENPLUM DATABASEParallel Query Optimizer                                                                         PHYSICA...
Analytics Overview                                       GREENPLUM DATABASE© Copyright 2012 EMC Corporation. All rights re...
GREENPLUM DATABASEAnalytical Capabilities OverviewData Access & Query Layer                           ODBC            JDBC...
GREENPLUM DATABASEIn-Database Analytics: CategoriesData Access & Query Layer                           ODBC        JDBC   ...
GREENPLUM DATABASEAnalytics Highlight: MADlib Scalable in-database  analytics Data-parallel        –    Mathematical Alg...
Manageability, Extensions                                       GREENPLUM DATABASE© Copyright 2012 EMC Corporation. All ri...
GREENPLUM DATABASEEasy Manageability for Big Data Single console for both Database and Hadoop Administration        – St...
GREENPLUM DATABASEEasy Extension InstallationGreenplum Package Manager                                                    ...
GREENPLUM DATABASEHigh Performance gNet for HadoopParallel Query Access                                                   ...
High Availability,                                  Back up, Support                                        GREENPLUM DATA...
GREENPLUM DATABASEHigh Availability GPDB cluster        – 2 Master servers        – Multiple Segment servers Segment ser...
GREENPLUM DATABASEBackup/Restore with EMC Data Domain                                                          Integratio...
GREENPLUM DATABASEBackup/Restore with EMC Data DomainBackup and restore between remote and primary sites         Greenplum...
GREENPLUM DATABASECustomer Support Services                                                     • Remote Technical Support...
GREENPLUM DATABASEOther Relevant Greenplum SessionsSession                                                  Presenter     ...
Thank You© Copyright 2012 EMC Corporation. All rights reserved.        30
Greenplum Database Overview
Upcoming SlideShare
Loading in...5
×

Greenplum Database Overview

5,799

Published on

As the core SQL processing engine of the Greenplum Unified Analytics Platform, the Greenplum Database delivers Industry leading performance for Big Data Analytics while scaling linearly on massively parallel processing clusters of standard x86 servers. This session reviews the product's underlying architecture, identify key differentiation areas, go deep into the new features introduced in Greenplum Database Release 4.2, and discuss our plans for 2012.

Published in: Technology, Business
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,799
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
370
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Greenplum Database Overview

  1. 1. Greenplum Database Overview Michael Crutcher Greenplum Product Management© Copyright 2012 EMC Corporation. All rights reserved. 1
  2. 2. © Copyright 2012 EMC Corporation. All rights reserved. 2
  3. 3. © Copyright 2012 EMC Corporation. All rights reserved. 3
  4. 4. © Copyright 2012 EMC Corporation. All rights reserved. 4
  5. 5. Greenplum Unified Analytic Platform© Copyright 2012 EMC Corporation. All rights reserved. 5
  6. 6. GREENPLUM DATABASE Industry Leading Database with Massively Parallel Performance To Empower your Analytics© Copyright 2012 EMC Corporation. All rights reserved. 6
  7. 7. GREENPLUM DATABASEExtreme Performance for Analytics  Optimized for BI and analytics – Deep integration with statistical packages – High performance parallel implementations • Simple and automatic – Just load and query like any database – Tables are automatically distributed across nodes • Extremely scalable – MPP shared-nothing architecture – All nodes can scan and process in parallel – Linear scalability by adding nodes© Copyright 2012 EMC Corporation. All rights reserved. 7
  8. 8. GREENPLUM DATABASEPerformance Through Parallelism Master Servers ... ... Query planning & dispatch Network Interconnect Segment Servers ... ... Query processing & data storage External Sources Loading, streaming, etc.© Copyright 2012 EMC Corporation. All rights reserved. 8
  9. 9. GREENPLUM DATABASEGreenplum Delivers Choice & Flexibility Greenplum Data Greenplum Computing Appliance Software Solutions Choose Greenplum  Greenplum Database and/or Database, Hadoop, Hadoop modules in & Chorus on your ¼ rack increments x86 hardware Scale up by adding  Flexibility for any your choice of workload or additional modules environment Minimal time to value  Perpetual or subscription licenses© Copyright 2012 EMC Corporation. All rights reserved. 9
  10. 10. Core Functionality GREENPLUM DATABASE© Copyright 2012 EMC Corporation. All rights reserved. 10
  11. 11. GREENPLUM DATABASEComponent Overview CLIENT ACCESS 3rd PARTY TOOLS ADMIN TOOLS CLIENT ACCESS ODBC, JDBC, OLEDB, BI Tools, ETL Tools Greenplum Command Center & TOOLS MapReduce, etc. Data Mining, etc Greenplum Package Manager LOADING & EXT. ACCESS STORAGE & DATA ACCESS LANGUAGE SUPPORT Petabyte-Scale Loading Hybrid Storage & Execution Comprehensive SQL (Row- & Column-Oriented) Trickle Micro-Batching Native MapReduce PRODUCT Anywhere Data Access In-Database Compression SQL 2003 OLAP Extensions FEATURES Multi-Level Partitioning Programmable Analytics Indexes – Btree, Bitmap, etc. Analytics Extensions External Table Support (GeoSpatial, PR/R, PL/Java, PL/Python, PL/Perl) GREENPLUM Multi-Level Fault Tolerance DATABASE ADAPTIVE (RAID, Mirroring, DR with Online System Expansion Workload Management SERVICES Data Domain Boost) Shared-Nothing MPP Parallel Dataflow Engine CORE MPP Parallel Query Optimizer gNet™ Software Interconnect ARCHITECTURE Polymorphic Data Storage™ Scatter/Gather Streaming™ Data Loading© Copyright 2012 EMC Corporation. All rights reserved. 11
  12. 12. GREENPLUM DATABASEMost Powerful Data Loading Capabilities SINGLE RACK COMPARISON Industry leading performance at 10+TB per-hour per-rack Scatter-Gather Streaming™ provides true linear scaling Support for both large-batch and continuous real-time loading strategies Greenplum Oracle Netezza Teradata Exadata Enable complex data transformations ―in-flight‖ Greenplum load rates scale linearly with the number of racks, others do not. Transparent interfaces to loading For example, two racks = >20TB/H via support files, application, and services© Copyright 2012 EMC Corporation. All rights reserved. 12
  13. 13. GREENPLUM DATABASEPolymorphic Table StorageTM TABLE ‗CUSTOMER‘ Mar Apr May Jun Jul Aug Sept Oct Nov ‗11 ‗11 ‗11 ‗11 ‗11 ‗11 ‗11 ‗11 ‗11 Column-oriented for COLD DATA Row-oriented for HOT DATA • Storage types can be mixed within a table or database – Four table types: heap, row-oriented AO, column-oriented AO, external • Rich compression functionality, definable column by column – Block compression: Gzip (levels 1-9), QuickLZ – Stream compression: RLE (levels 1-4) • Flexible indexing, partitioning, and more© Copyright 2012 EMC Corporation. All rights reserved. 13
  14. 14. GREENPLUM DATABASEgNet Software Interconnect A supercomputing-based ―soft-switch‖ responsible for – Efficiently pumping streams of data between motion nodes during query-plan execution – Delivers messages, moves data, collects results, and coordinates work among the segments in the system gNet Software Interconnect© Copyright 2012 EMC Corporation. All rights reserved. 14
  15. 15. GREENPLUM DATABASEParallel Query Optimizer PHYSICAL EXECUTION PLAN  Cost-based optimization FROM SQL OR MAPREDUCE looks for the most Gather Motion efficient plan 4:1(Slice 3) Sort  Physical plan contains scans, joins, sorts, HashAggregate aggregations, etc. HashJoin  Global planning avoids Redistribute Motion 4:4(Slice 1) Hash sub-optimal ‘SQL HashJoin HashJoin pushing’ to segments Seq Scan on  Directly inserts ‘motion’ Seq Scan on lineitem Hash Hash customer nodes for inter-segment Seq Scan on orders Broadcast Motion 4:4(Slice 2) communication Seq Scan on motion© Copyright 2012 EMC Corporation. All rights reserved. 15
  16. 16. Analytics Overview GREENPLUM DATABASE© Copyright 2012 EMC Corporation. All rights reserved. 16
  17. 17. GREENPLUM DATABASEAnalytical Capabilities OverviewData Access & Query Layer ODBC JDBC SQL Stored SQL 2003 In-Database MapReduce Procedures OLAP Analytics GREENPLUM HD Polymorphic Storage GREENPLUM DATABASE Greenplum gNet© Copyright 2012 EMC Corporation. All rights reserved. 17
  18. 18. GREENPLUM DATABASEIn-Database Analytics: CategoriesData Access & Query Layer ODBC JDBC SQL In-Database Analytics Embedded SAS Scoring Accelerator Partner GPDB User-Written Open Source Embedded Analytical Extensions Analytics SAS/HPA Algorithms Open-Source High Performance Analytics User-written GREENPLUM DATABASE© Copyright 2012 EMC Corporation. All rights reserved. 18
  19. 19. GREENPLUM DATABASEAnalytics Highlight: MADlib Scalable in-database analytics Data-parallel – Mathematical Algorithms – Statistical Algorithms – Machine learning Algorithms – Supports structured and unstructured data. Open-source software – Source Accessibility – Converge business, academic, and open-source communities© Copyright 2012 EMC Corporation. All rights reserved. 19
  20. 20. Manageability, Extensions GREENPLUM DATABASE© Copyright 2012 EMC Corporation. All rights reserved. 20
  21. 21. GREENPLUM DATABASEEasy Manageability for Big Data Single console for both Database and Hadoop Administration – Start, Stop Database – Recover, Rebalance Segments Interactive view of System Metrics – Real-time – Historic (Configurable by time period) In-depth view for System Health – Hardware health – Software (Database, Hadoop) Query Monitoring – Search, Prioritize, Cancel Queries – View Query‘s Execution Plan Workload Management – Configure Resource Queues – Prioritize Users© Copyright 2012 EMC Corporation. All rights reserved. 21
  22. 22. GREENPLUM DATABASEEasy Extension InstallationGreenplum Package Manager Greenplum supports easy deployment of numerous extensions like Madlib, PL/Perl, PL/Java, PostGIS, etc. Master Servers Segment ... Servers ...© Copyright 2012 EMC Corporation. All rights reserved. 22
  23. 23. GREENPLUM DATABASEHigh Performance gNet for HadoopParallel Query Access  Connect any data set in Hadoop to GP DB‘s SQL Engine  Process Hadoop data in place  Parallelize import/export data from/to Hadoop thanks to GP DB‘s market leading data sharing performance gNet for Hadoop  Supported formats: – Text (compressed and uncompressed) – binary User- Text Binary Defined – proprietary/user-defined  GP HD 1.x, GP MR 1.x, CDH3u2© Copyright 2012 EMC Corporation. All rights reserved. 23
  24. 24. High Availability, Back up, Support GREENPLUM DATABASE© Copyright 2012 EMC Corporation. All rights reserved. 24
  25. 25. GREENPLUM DATABASEHigh Availability GPDB cluster – 2 Master servers – Multiple Segment servers Segment servers support multiple database instances – Primary instances that actively process queries – Standby mirror instances Block level mirroring – Low resource Set of Active consumption Segment Instances – Differential resynch capable for fast recovery© Copyright 2012 EMC Corporation. All rights reserved. 25
  26. 26. GREENPLUM DATABASEBackup/Restore with EMC Data Domain  Integration options – NFS: Data Domain device mounted Full Appliance as NFS storage + Data Domain – DD Boost: Native, client-side deduplication. Supported in GPDB 4.2 and higher Boost or NFS  Drastic reduction in backup storage requirement 2 X 10GBit IP  Backup all segment servers in parallel directly to Data Domain  Data Domain Integrates seamlessly into standard Greenplum full backup data export and data restore procedures© Copyright 2012 EMC Corporation. All rights reserved. 26
  27. 27. GREENPLUM DATABASEBackup/Restore with EMC Data DomainBackup and restore between remote and primary sites Greenplum DCA Greenplum DCA Data Domain Data Domain LAN/WAN Data Domain Replication Ideal for configurations with RPO and RTO requirements that can be specified in hours Supports: – Collection Replication for DD Boost backup – Directory-level replication for NFS backup – Encryption over the WAN© Copyright 2012 EMC Corporation. All rights reserved. 27
  28. 28. GREENPLUM DATABASECustomer Support Services • Remote Technical Support – 24x7 technical support and remote troubleshooting – Customer-managed case severity level – Four-hour response objective • Onsite Support (DCA Only) – Installation of replacement parts – Replacement parts shipped for next business day arrival – GP SW upgrade included • Proactive Service – Secure remote monitoring for hardware (DCA) – Notification of engineering technical advisories – Built-in tools maximize stability and performance • Secure Self-Help – 24x7 access to eService support tools including knowledgebase, forums, and appropriately licensed software updates© Copyright 2012 EMC Corporation. All rights reserved. 28
  29. 29. GREENPLUM DATABASEOther Relevant Greenplum SessionsSession Presenter TimesUnified Analytics Platform Introduction Brian Wilson Tues 10:00-11:00 Thurs 1:00-2:00Greenplum Hadoop Overview Susheel Kaushik Mon 10:00-11:00 Wed 4:15-5:15Greenplum DCA Overview Hanxi Chen Mon 4:00-5:00 Thurs 10:00-11:00Greenplum Analytics Workbench Apurva Desai Wed 8:30-9:30 Thurs 10:00-11:00Analytics on Hadoop Don Miner Tues 11:30-12:30 Thurs 8:30-9:30Big Data Driven Businesses in Action: Mike Maxey Wed 4:15-5:15 Thurs 11:30-12:30Creating Real Business Value UsingGreenplum UAP (Panel w/4 Customers)Analytics for Business Value: Collaboration Josh Klahr Mon 10:00-11:00 Wed 2:45-3:45Disruptive Data Science — How Data Annika Jimenez Tues 4:15-5:15 Thurs 11:30-12:30Science and Big Data are Transforming David DietrichBusiness, IT and People© Copyright 2012 EMC Corporation. All rights reserved. 29
  30. 30. Thank You© Copyright 2012 EMC Corporation. All rights reserved. 30
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×