Greenplum feature

1,229 views
1,016 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,229
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
50
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Greenplum feature

  1. 1. Greenplum Database Overview Michael Crutcher Greenplum Product Management © Copyright 2012 EMC Corporation. All rights reserved. 1
  2. 2. © Copyright 2012 EMC Corporation. All rights reserved. 2
  3. 3. © Copyright 2012 EMC Corporation. All rights reserved. 3
  4. 4. © Copyright 2012 EMC Corporation. All rights reserved. 4
  5. 5. Greenplum Unified Analytic Platform © Copyright 2012 EMC Corporation. All rights reserved. 5
  6. 6. GREENPLUM DATABASE Industry Leading Database with Massively Parallel Performance To Empower your Analytics © Copyright 2012 EMC Corporation. All rights reserved. 6
  7. 7. GREENPLUM DATABASE Extreme Performance for Analytics  Optimized for BI and analytics – Deep integration with statistical packages – High performance parallel implementations • Simple and automatic – Just load and query like any database – Tables are automatically distributed across nodes • Extremely scalable – MPP shared-nothing architecture – All nodes can scan and process in parallel – Linear scalability by adding nodes © Copyright 2012 EMC Corporation. All rights reserved. 7
  8. 8. GREENPLUM DATABASE Performance Through Parallelism Master Servers ... ... Query planning & dispatch Network Interconnect Segment Servers ... ... Query processing & data storage External Sources Loading, streaming, etc. © Copyright 2012 EMC Corporation. All rights reserved. 8
  9. 9. GREENPLUM DATABASE Greenplum Delivers Choice & Flexibility Greenplum Data Computing Appliance Greenplum Software Solutions Choose Greenplum Database and/or Hadoop modules in ¼ rack increments  Greenplum Database, Hadoop, & Chorus on your x86 hardware Scale up by adding your choice of additional modules  Flexibility for any workload or environment Minimal time to value  Perpetual or subscription licenses © Copyright 2012 EMC Corporation. All rights reserved. 9
  10. 10. Core Functionality GREENPLUM DATABASE © Copyright 2012 EMC Corporation. All rights reserved. 10
  11. 11. GREENPLUM DATABASE Component Overview CLIENT ACCESS CLIENT ACCESS & TOOLS 3rd PARTY TOOLS ADMIN TOOLS ODBC, JDBC, OLEDB, BI Tools, ETL Tools Greenplum Command Center MapReduce, etc. Data Mining, etc Greenplum Package Manager LOADING & EXT. ACCESS LANGUAGE SUPPORT Petabyte-Scale Loading PRODUCT FEATURES STORAGE & DATA ACCESS Hybrid Storage & Execution (Row- & Column-Oriented) Comprehensive SQL Trickle Micro-Batching In-Database Compression Anywhere Data Access Native MapReduce SQL 2003 OLAP Extensions Multi-Level Partitioning Indexes – Btree, Bitmap, etc. Programmable Analytics External Table Support GREENPLUM DATABASE ADAPTIVE SERVICES CORE MPP ARCHITECTURE Multi-Level Fault Tolerance (RAID, Mirroring, DR with Data Domain Boost) Analytics Extensions (GeoSpatial, PR/R, PL/Java, PL/Python, PL/Perl) Online System Expansion Workload Management Shared-Nothing MPP Parallel Dataflow Engine Parallel Query Optimizer gNet™ Software Interconnect Polymorphic Data Storage™ Scatter/Gather Streaming™ Data Loading © Copyright 2012 EMC Corporation. All rights reserved. 11
  12. 12. GREENPLUM DATABASE Most Powerful Data Loading Capabilities  Industry leading performance at 10+TB per-hour per-rack SINGLE RACK COMPARISON  Scatter-Gather Streaming™ provides true linear scaling  Support for both large-batch and continuous real-time loading strategies  Enable complex data transformations ―in-flight‖  Transparent interfaces to loading via support files, application, and services © Copyright 2012 EMC Corporation. All rights reserved. Greenplum Oracle Exadata Netezza Teradata Greenplum load rates scale linearly with the number of racks, others do not. For example, two racks = >20TB/H 12
  13. 13. GREENPLUM DATABASE Polymorphic Table StorageTM TABLE ‗CUSTOMER‘ Mar ‗11 Apr ‗11 May ‗11 Jun ‗11 Jul ‗11 Aug ‗11 Column-oriented for COLD DATA Sept ‗11 Oct ‗11 Nov ‗11 Row-oriented for HOT DATA • Storage types can be mixed within a table or database – Four table types: heap, row-oriented AO, column-oriented AO, external • Rich compression functionality, definable column by column – Block compression: Gzip (levels 1-9), QuickLZ – Stream compression: RLE (levels 1-4) • Flexible indexing, partitioning, and more © Copyright 2012 EMC Corporation. All rights reserved. 13
  14. 14. GREENPLUM DATABASE gNet Software Interconnect  A supercomputing-based ―soft-switch‖ responsible for – Efficiently pumping streams of data between motion nodes during query-plan execution – Delivers messages, moves data, collects results, and coordinates work among the segments in the system gNet Software Interconnect © Copyright 2012 EMC Corporation. All rights reserved. 14
  15. 15. GREENPLUM DATABASE Parallel Query Optimizer PHYSICAL EXECUTION PLAN FROM SQL OR MAPREDUCE  Cost-based optimization looks for the most efficient plan Gather Motion 4:1(Slice 3) Sort  Physical plan contains scans, joins, sorts, aggregations, etc.  Global planning avoids sub-optimal ‘SQL pushing’ to segments  Directly inserts ‘motion’ nodes for inter-segment communication © Copyright 2012 EMC Corporation. All rights reserved. HashAggregate HashJoin Redistribute Motion 4:4(Slice 1) Hash HashJoin HashJoin Seq Scan on lineitem Hash Seq Scan on orders Seq Scan on customer Hash Broadcast Motion 4:4(Slice 2) Seq Scan on motion 15
  16. 16. Analytics Overview GREENPLUM DATABASE © Copyright 2012 EMC Corporation. All rights reserved. 16
  17. 17. GREENPLUM DATABASE Analytical Capabilities Overview Data Access & Query Layer ODBC JDBC SQL Stored Procedures SQL 2003 OLAP MapReduce In-Database Analytics Polymorphic Storage GREENPLUM HD GREENPLUM DATABASE Greenplum gNet © Copyright 2012 EMC Corporation. All rights reserved. 17
  18. 18. GREENPLUM DATABASE In-Database Analytics: Categories Data Access & Query Layer ODBC JDBC SQL In-Database Analytics Embedded Partner Open-Source GPDB Embedded Analytics SAS Scoring Accelerator SAS/HPA High Performance Analytics Open Source Extensions User-Written Analytical Algorithms User-written GREENPLUM DATABASE © Copyright 2012 EMC Corporation. All rights reserved. 18
  19. 19. GREENPLUM DATABASE Analytics Highlight: MADlib  Scalable in-database analytics  Data-parallel – – – – Mathematical Algorithms Statistical Algorithms Machine learning Algorithms Supports structured and unstructured data.  Open-source software – Source Accessibility – Converge business, academic, and open-source communities © Copyright 2012 EMC Corporation. All rights reserved. 19
  20. 20. Manageability, Extensions GREENPLUM DATABASE © Copyright 2012 EMC Corporation. All rights reserved. 20
  21. 21. GREENPLUM DATABASE Easy Manageability for Big Data  Single console for both Database and Hadoop  Administration – Start, Stop Database – Recover, Rebalance Segments  Interactive view of System Metrics – Real-time – Historic (Configurable by time period)  In-depth view for System Health – Hardware health – Software (Database, Hadoop)  Query Monitoring – Search, Prioritize, Cancel Queries – View Query‘s Execution Plan  Workload Management – Configure Resource Queues – Prioritize Users © Copyright 2012 EMC Corporation. All rights reserved. 21
  22. 22. GREENPLUM DATABASE Easy Extension Installation Greenplum Package Manager Greenplum supports easy deployment of numerous extensions like Madlib, PL/Perl, PL/Java, PostGIS, etc. Master Servers Segment Servers ... © Copyright 2012 EMC Corporation. All rights reserved. ... 22
  23. 23. GREENPLUM DATABASE High Performance gNet for Hadoop Parallel Query Access  Connect any data set in Hadoop to GP DB‘s SQL Engine  Process Hadoop data in place  Parallelize import/export data from/to Hadoop thanks to GP DB‘s market leading data sharing performance gNet for Hadoop Text Binary UserDefined  Supported formats: – Text (compressed and uncompressed) – binary – proprietary/user-defined  GP HD 1.x, GP MR 1.x, CDH3u2 © Copyright 2012 EMC Corporation. All rights reserved. 23
  24. 24. High Availability, Back up, Support GREENPLUM DATABASE © Copyright 2012 EMC Corporation. All rights reserved. 24
  25. 25. GREENPLUM DATABASE High Availability  GPDB cluster – 2 Master servers – Multiple Segment servers  Segment servers support multiple database instances – Primary instances that actively process queries – Standby mirror instances  Block level mirroring – Low resource consumption – Differential resynch capable for fast recovery © Copyright 2012 EMC Corporation. All rights reserved. Set of Active Segment Instances 25
  26. 26. GREENPLUM DATABASE Backup/Restore with EMC Data Domain  Integration options Full Appliance + Data Domain Boost or NFS 2 X 10GBit IP – NFS: Data Domain device mounted as NFS storage – DD Boost: Native, client-side deduplication. Supported in GPDB 4.2 and higher  Drastic reduction in backup storage requirement  Backup all segment servers in parallel directly to Data Domain  Data Domain Integrates seamlessly into standard Greenplum full backup data export and data restore procedures © Copyright 2012 EMC Corporation. All rights reserved. 26
  27. 27. GREENPLUM DATABASE Backup/Restore with EMC Data Domain Backup and restore between remote and primary sites Greenplum DCA Greenplum DCA Data Domain Data Domain LAN/WAN Data Domain Replication  Ideal for configurations with RPO and RTO requirements that can be specified in hours  Supports: – Collection Replication for DD Boost backup – Directory-level replication for NFS backup – Encryption over the WAN © Copyright 2012 EMC Corporation. All rights reserved. 27
  28. 28. GREENPLUM DATABASE Customer Support Services • Remote Technical Support – 24x7 technical support and remote troubleshooting – Customer-managed case severity level – Four-hour response objective • Onsite Support (DCA Only) – Installation of replacement parts – Replacement parts shipped for next business day arrival – GP SW upgrade included • Proactive Service – Secure remote monitoring for hardware (DCA) – Notification of engineering technical advisories – Built-in tools maximize stability and performance • Secure Self-Help – © Copyright 2012 EMC Corporation. All rights reserved. 24x7 access to eService support tools including knowledgebase, forums, and appropriately licensed software updates 28
  29. 29. GREENPLUM DATABASE Other Relevant Greenplum Sessions Session Presenter Times Unified Analytics Platform Introduction Brian Wilson Tues 10:00-11:00 Thurs 1:00-2:00 Greenplum Hadoop Overview Susheel Kaushik Mon 10:00-11:00 Wed 4:15-5:15 Greenplum DCA Overview Hanxi Chen Mon 4:00-5:00 Thurs 10:00-11:00 Greenplum Analytics Workbench Apurva Desai Wed 8:30-9:30 Thurs 10:00-11:00 Analytics on Hadoop Don Miner Tues 11:30-12:30 Thurs 8:30-9:30 Big Data Driven Businesses in Action: Creating Real Business Value Using Greenplum UAP (Panel w/4 Customers) Mike Maxey Wed 4:15-5:15 Thurs 11:30-12:30 Analytics for Business Value: Collaboration Josh Klahr Mon 10:00-11:00 Wed 2:45-3:45 Disruptive Data Science — How Data Science and Big Data are Transforming Business, IT and People Annika Jimenez David Dietrich Tues 4:15-5:15 Thurs 11:30-12:30 © Copyright 2012 EMC Corporation. All rights reserved. 29
  30. 30. Thank You © Copyright 2012 EMC Corporation. All rights reserved. 30

×