Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019

117 views

Published on

Greenplum Summit 2019
Sharath Punreddy
Niranjan Sarvi

Published in: Software
  • Be the first to comment

Greenplum and Kafka: Real-time Streaming to Greenplum - Greenplum Summit 2019

  1. 1. © Copyright 2019 Pivotal Software, Inc. All rights Reserved. Sharath Punreddy and Niranjan Sarvi Pivotal Data Greenplum and Kafka: Real-time Streaming to Greenplum
  2. 2. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. “We want answers NOW”
  3. 3. Definitions of Real/Near Real/Batch https://ell.stackexchange.com/questions/8762/what-is-the-opposite-of-real-time
  4. 4. Continuous Surveillance Monitoring ● Failure-to-Rescue: ○ Decreased from 3.4 to 1.2 ● ICU Transfers: ○ Decreased from 5.6 to 2.9 http://anesthesiology.pubs.asahq.org/article.aspx?articleid=1932627
  5. 5. Credit Card Fraud Monitoring https://www.statista.com/statistics/419628/payment-card-fraud-losses-usa-by-type/
  6. 6. Other examples of Real-time Analytics ● Recommendation Engines ● Driver Assist ● Supply Chain ● Trading ● Machine Failure Detection ● Customer Experience
  7. 7. Real-time Analytics Pipeline Real time Analytics Over time Analytics Data Generators Data Processor Action/Alerts Data Persistence Data Visualization
  8. 8. Kafka High Concurrent High Available High Throughput Reliable Delivery Real-time processing Horizontal Scalability Distributed
  9. 9. Greenplum Multiple-Cloud deployment Open-Source Innovation Massively Parallel Integrated Analytics Next-Generation Data Platform
  10. 10. Real time Process Flow Parallel Load Data Consolidation, Aggregation, and Modeling Massively Parallel Processing
  11. 11. Greenplum-Kafka Connector Parallel Transfer Cloud Native Scale Up Reliable High Speed Extensible
  12. 12. ● Flexible Column mapping and Transformation. ● Strong Consistency guaranteed. ● High speed transfer ● Mini batch loading and history tracking ● Certified by Confluent ● Supports various data formats Greenplum-Kafka Connector : Features CSV, Custom delimiter & Binary Custom formatter
  13. 13. Greenplum-Kafka Connector : Architecture Partition 1 Topic A Partition 2 Partition n Reader 1 Reader 2 Reader n GP-Kafka Greenplum Data Seg 1 Seg 2 Seg m
  14. 14. Configurations GP Table DATABASE: tradedb USER: tradeuser PASSWORD: tradepwd HOST: mdw PORT: 5432 KAFKA: INPUT: SOURCE: BROKERS: kafka.c.pde-nsarvi.internal:9092 TOPIC: trade-input COLUMNS: - NAME: id TYPE: varchar - NAME: __IGNORED__ TYPE: int - NAME: units TYPE: decimal(9,2) FORMAT: csv ERROR_LIMIT: 125 LOCAL_HOSTNAME: pivotal-greenplum-byol-1-mdw.c.pde-nsarvi.internal OUTPUT: TABLE: trade MAPPING: - NAME: id EXPRESSION: id - NAME: acountId EXPRESSION: acountId - NAME: tax-due EXPRESSION: expenses * 0.035 - NAME: units CREATE TABLE trade ( id VARCHAR(100), acountid INT8, tradeid INT8, cusip VARCHAR(10), units INT4, tradedate DATE, action VARCHAR(10), amount INT8, tax-due FLOAT8 ) distributed by (id); YAML file
  15. 15. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. Demo
  16. 16. Demo - Architecture GP-Kafka Trade Generators Trade table Google Cloud
  17. 17. References ● Demo scripts https://github.com/Pivotal-Data-Engineering/realtime-streaming-gpdb ● Greenplum Kafka Connector https://gpdb.docs.pivotal.io/5170/greenplum-kafka/intro.html ● Continuous Surveillance Monitoring http://anesthesiology.pubs.asahq.org/article.aspx?articleid=1932627 ● Credit Card Fraud Monitoring https://www.statista.com/statistics/419628/payment-card-fraud-losses-usa-by-type
  18. 18. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. Thank You Q & A

×