Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Greenplum Summit 2019

79 views

Published on

Greenplum Summit 2019
Keaton Adams
Advisory Data Engineer

Published in: Software
  • Be the first to comment

Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Greenplum Summit 2019

  1. 1. © Copyright 2019 Pivotal Software, Inc. All rights Reserved. Keaton Adams Advisory Data Engineer Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI
  2. 2. Parallel Load / Unload Features 1 of 25
  3. 3. Pivotal Greenplum § Launched in 2005 (14 years proven technology!) § EMC Acquired in 2010 § Pivotal Acquired in 2013 § Massively Parallel Processing RDBMS § Open Source Core Based on PostgreSQL § Built w/ Pivotal Labs Practices § Over 1000 Person Years of R&D Invested § Hundreds of Global Customers in 34 countries MPP 2 of 25
  4. 4. Pivotal Open Source Strategy GOALS § Reduce Long Term Cost Structure § World Wide Technical Collaboration § Reduce Bespoke Technologies § Avoid Proprietary Pockets § Consistent Customer Interfaces § Combined Engineering Workforce § 300+ Engineers on Staff Operational OLTP Analytical MPP 3 of 25
  5. 5. A Modern Data Platform Must Be Built for Diverse Analytics 4 of 25
  6. 6. 5 of 25
  7. 7. Greenplum for Kubernetes Public CloudPrivate CloudBare-Metal Deploy Workloads on any Infrastructure Other Kubernetes (on VMs or not) Google Container Engine Greenplum Building Blocks • Pivotal blueprint + Dell reference hardware configs • Superior price/performance; no expensive proprietary hardware • The most performant way to run Greenplum on premises • Certified and supported by Pivotal New! New! The same Greenplum in all environments, including hybrid deployments via Kubernetes 6 of 25
  8. 8. All Major Public Clouds: Fully Integrated Deployment Bring Your Own License (BYOL) and Hourly 8 of 25
  9. 9. Greenplum Building Blocks It's All Just Blocks! Simple yet elegant. ● Pivotal’s Greenplum-Optimized Engineered System to deliver unrivaled Price/Performance for Next-Generation Analytics and AI! ● Leverages state-of-the-art DELL Servers, Storage and Networking technologies. ● Simple AND Flexible Sizing and Scaling to fit enterprise scale workloads from small to huge. ● Cloud Inspired, On-Premise Experienced. 7 of 25
  10. 10. Greenplum Integrated In-Database Analytics GRAPHS Analytical SQL, Aggregations, Windowing, Short Queries with Indices Enables Iterative Exploration! 9 of 25
  11. 11. Greenplum Procedural Language support Containerized Execution Current Computing Interfaces § User Defined Types § User Defined Functions § User Defined Aggregates Foundational work for containerized Python and R compute environments + + 10 of 25
  12. 12. Text Analytics: Indexing and Search with GPText GPText SQL Warehousing + Text Analytics § Text Search § Integrate Text Functions with Structured Data Analytics Internal or External Indexing § Text Search § Madlib integration for machine learning on text data § PL/Python and PL/Java integration for Natural Language Processing Natural Language & AI Integration § Apache Madlib § PL/Python and PL/Java § Open NLP & Madlib for machine learning 11 of 25
  13. 13. MPP Shared Nothing Architecture § Segment Host with one or more Segment Instances § Segment Instances process queries in parallel Performance Through Parallelism § High speed interconnect for continuous pipelining of data processing § Master Host and Standby Master Host § Master coordinates work with Segment Hosts § Segment Hosts have their own CPU, disk and memory (shared nothing) 12 of 25
  14. 14. § Physical separation of data to enable faster processing with WHERE predicates § Unrequired partitions are not processed § Facilitates Data Retention Policies on Age Vertical Partitioning Dividing Data By Access Patterns 13 of 25
  15. 15. Column-orientedRow-oriented External HDFS, RDBMS, S3 Columnar Store. Row Store. External Data Sources. Logical table with partitioned physical storage § Row oriented is faster when returning the majority of columns § HEAP for many updates and deletes § Use Indexes for drill-through queries § Columnar storage compresses better § Optimized for retrieving a subset of the columns in a wide table § Compression by column: gzip (1-9), quicklz, Delta, RLE § Pivotal Extension Framework § Kafka and Spark integration § Text, CSV, Avro, parquet, etc. § Hadoop, S3 storage support 14 of 25
  16. 16. GPORCA Optimizer GOALS § Unbreakable DW SQL Optimizer § Optimize complex SQL to produce superior runtimes 2018 Accomplishments § Incremental Analyze via Hyperloglog, Rapid Distinct Value Aggregation § Improved Optimization Time, caching and early space pruning § Large Table Join, Join Order Optimization using Greedy Algorithm § Improved cost tuning to pick index joins when appropriate § Support Geospatial Workloads with GIST indexes § Improved cardinality estimation: Left joins and predicates on text columns § Complex Nested Subqueries: optimizing for co-location (without deadlocks) 15 of 25
  17. 17. Analytics across data of wide time range with PXF Data is stored in different systems based on operational requirements Can I work with data created 5 seconds ago ? Can I run a report on data from 5 months ago ? Can I inspect the data archived 5 years ago ? Data is available for analytics with Greenplum no matter where it resides ! In-memory data grid RDBMS dataData Lake HOT WARM COLD 16 of 25
  18. 18. Greenplum-Kafka Connector Greenplum Kafka Connector § Continual data loading § Fast parallel loading via GP Data Segments § Resume on error, once only loading Features: Benefits: § Lower complexity of data load § Lower latency from event to query § Easier to manage unexpected events 17 of 25
  19. 19. Modern Enterprise : Heterogeneous Data Formats { semi-structured data } unstructured data raw data structured data 18 of 25
  20. 20. Greenplum Command Center § Database Health Indicators § Real Time Query Metrics § Locking and Blocking Views § Visual Explain § System Resource Monitoring § Workload Management 19 of 25
  21. 21. § Greenplum Command Center provides additional workload management facilities built on Resource Groups § Provides simplified management § Assign queries to workloads based on query tags or GPDB roles GPCC - Workload Management 20 of 25
  22. 22. 21 of 25
  23. 23. Real-time query progress monitoring 22 of 25
  24. 24. Query Execution insights 23 of 25
  25. 25. 24 of 25
  26. 26. Greenplum for Kubernetes Capabilities § Private and Public Clouds § Flexible Efficient Scaling § Automation, Self-Healing § Deployment Experience § Quick § Consistently Repeatable § Pre-hardened, pre-networked § Service Discovery Software Appliance Benefits § Docker image maintained by Pivotal § OS Support From Pivotal, Full Stack 1 Throat to Choke § Consistent logging and Monitoring Environments § Consistent Greenplum operational environments across public, private clouds Alana Give me a Greenplum Cluster Cluster Alana gpdb-alana:5432 25 of 25
  27. 27. #ScaleMatters © Copyright 2019 Pivotal Software, Inc. All rights Reserved.

×