SlideShare a Scribd company logo
1 of 29
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Christian Tzolov (@christzolov)
March 2019
Real-time Analysis of Data
Processing Pipelines
Spring Cloud Data Flow & Micrometer
About @christzolov
Pivotal engineer, Spring Cloud Data Flow team
Apache Committer/PMC member
Distributed, Data-intensive Systems and Toolkits
Agenda ■ Distributed, Data-Intensive Systems
■ Spring Cloud Data Flow toolkit
■ Operational metrics and monitoring
■ Micrometer, Time Series and Dimensions
■ Architectural Patterns and Practices
■ Q+A
“We call an application data-intensive if data is its
primary challenge—the quantity of data, the
complexity of data, or the speed at which it is
changing.”
What is Spring Cloud Data Flow
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
What is Spring Cloud Data Flow
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
What is Spring Cloud Data Flow
Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
What is Spring Cloud Data Flow
Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
What is Spring Cloud Data Flow
Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
What is Spring Cloud Data Flow
Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
Runtime and Message Broker Abstraction
Kubernetes
Cloud Foundry
Local / Dev
Rabbit MQ Apache Kafka Google PubSub
Amazon Kinesis Solace
Opportunities: Same code; Same tests; Works with a variety of Message Brokers
Common Denominator = Spring Boot => Consolidate On
Development Practices
Test Infrastructure
CI/CD Tooling and Automation
Operational Metrics and Monitoring
Data Processing Pipeline
Data Flow
Lifecycle
• Data Flow accepts the data
pipeline definition and
delegates to Skipper for
lifecycle managements.
• Operational metrics and
Monitoring.
Legacy
Spring Cloud Metrics Collector
What we want
Micrometer
Time Series (Influx, Prometheus)
Grafana
Time Series &
Dimensional Data
Model
● A time series is a series of data points ordered in
time order.
● timestamped values belonging to the same
metric and the same set of labeled dimensions.
● Every time series is uniquely identified by
its metric name and a set of key-value pairs,
known as labels, tags, dimensions.
Data Flow 2.0
Monitoring
Architecture
Support for Prometheus and
InfluxDB
Data Pipeline Monitoring
Supported
Metrics
● Spring Boot 2.0 - JVM, CPU, File descriptor…
● Spring Integration - Channel, Source, Handler
● Data Flow App Starters – Tags: stream name,
app name, type, CF
process_cpu_usage {
application_guid="20036",
application_name="log",
application_type="sink",
instance_index="0",
stream_name="s3",
instance="10.40.1.170:20036",
job="scdf",
}
spring_integration_send_seconds_count {
application_guid="20036",
application_name="log",
application_type="sink",
stream_name="s3",
instance_index="0",
name="input",
result="success",
type="channel"
exception="none",
instance="10.40.1.170:20036",
job="scdf",
}
Pull-based TSDB
(Prometheus)
Implications:
• Service Discovery
• Security
• Short Living Tasks
(PushGateway)
Data Flow in
Kubernetes
Prometheus
Data Flow in PCF
InfluxDB
PromRegator
SCDF Monitoring with hosted Grafana and InfluxDB
Data Flow
Analytics with
Micrometer
Counter Processor & Sink
Twitter Analytics
Next Steps
● Spring Cloud Stream Task support
● Monitor Data Flow and Skipper
● PromRegator and PCF improvements
● ?!
References
● Spring Cloud Data Flow – Stream Monitoring: http://docs.spring.io/spring-cloud-
dataflow/docs/2.1.0.BUILD-SNAPSHOT/reference/htmlsingle/#streams-monitoring
● Twitter Analytics Sample: https://docs.spring.io/spring-cloud-dataflow-
samples/docs/current/reference/htmlsingle/#spring-cloud-data-flow-samples-twitter-analytics-
overview
● Micrometer: https://micrometer.io
● Spring Boot Micrometer : https://docs.spring.io/spring-
boot/docs/2.1.3.RELEASE/reference/htmlsingle/#production-ready-metrics
● Spring Integration – Micrometer: https://docs.spring.io/spring-
integration/docs/5.1.2.RELEASE/reference/html/system-management-chapter.html#micrometer-
integration
● Grafana: https://grafana.com
● Prometheus: https://prometheus.io
Q&A
SCDF STREAM MONITORING

More Related Content

What's hot

API Centric Development in PHP
API Centric Development in PHPAPI Centric Development in PHP
API Centric Development in PHP
Joe Stagner
 

What's hot (20)

Tour of Dapr
Tour of DaprTour of Dapr
Tour of Dapr
 
How to develop your first cloud-native Applications with Java
How to develop your first cloud-native Applications with JavaHow to develop your first cloud-native Applications with Java
How to develop your first cloud-native Applications with Java
 
Serverless APIs with Apache OpenWhisk
Serverless APIs with Apache OpenWhiskServerless APIs with Apache OpenWhisk
Serverless APIs with Apache OpenWhisk
 
Using Google Cloud Services with Spring Boot and Pivotal Cloud Foundry (Pivot...
Using Google Cloud Services with Spring Boot and Pivotal Cloud Foundry (Pivot...Using Google Cloud Services with Spring Boot and Pivotal Cloud Foundry (Pivot...
Using Google Cloud Services with Spring Boot and Pivotal Cloud Foundry (Pivot...
 
Building Serverless Applications on the Apache OpenWhisk Platform
Building Serverless Applications on the Apache OpenWhisk PlatformBuilding Serverless Applications on the Apache OpenWhisk Platform
Building Serverless Applications on the Apache OpenWhisk Platform
 
apidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Rao
apidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Raoapidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Rao
apidays LIVE Paris - The Rise of GraphQL for database APIs by Karthic Rao
 
OpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven appsOpenWhisk - A platform for cloud native, serverless, event driven apps
OpenWhisk - A platform for cloud native, serverless, event driven apps
 
apidays LIVE Paris - GraphQL: the AppSec perspective by Vladimir de Turckheim
apidays LIVE Paris - GraphQL: the AppSec perspective by Vladimir de Turckheimapidays LIVE Paris - GraphQL: the AppSec perspective by Vladimir de Turckheim
apidays LIVE Paris - GraphQL: the AppSec perspective by Vladimir de Turckheim
 
Workshop: Develop Serverless Applications with IBM Cloud Functions
Workshop: Develop Serverless Applications with IBM Cloud FunctionsWorkshop: Develop Serverless Applications with IBM Cloud Functions
Workshop: Develop Serverless Applications with IBM Cloud Functions
 
A DevOps State of Mind with Microservices, Containers and Kubernetes
A DevOps State of Mind with Microservices, Containers and KubernetesA DevOps State of Mind with Microservices, Containers and Kubernetes
A DevOps State of Mind with Microservices, Containers and Kubernetes
 
Building serverless applications with Apache OpenWhisk
Building serverless applications with Apache OpenWhiskBuilding serverless applications with Apache OpenWhisk
Building serverless applications with Apache OpenWhisk
 
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at SantanderServerless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
Serverless Architectures in Banking: OpenWhisk on IBM Bluemix at Santander
 
Cloud Native Machine Learning
Cloud Native Machine Learning Cloud Native Machine Learning
Cloud Native Machine Learning
 
Building serverless applications with Apache OpenWhisk and IBM Cloud Functions
Building serverless applications with Apache OpenWhisk and IBM Cloud FunctionsBuilding serverless applications with Apache OpenWhisk and IBM Cloud Functions
Building serverless applications with Apache OpenWhisk and IBM Cloud Functions
 
The Future of Energy - Decentral energy distribution in a digital world
The Future of Energy - Decentral energy distribution in a digital worldThe Future of Energy - Decentral energy distribution in a digital world
The Future of Energy - Decentral energy distribution in a digital world
 
10 Steps to Cloud Happiness
10 Steps to Cloud Happiness10 Steps to Cloud Happiness
10 Steps to Cloud Happiness
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps Engineers
 
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-HealingApplying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
 
API Centric Development in PHP
API Centric Development in PHPAPI Centric Development in PHP
API Centric Development in PHP
 
TIAD : Automate everything with Google Cloud
TIAD : Automate everything with Google CloudTIAD : Automate everything with Google Cloud
TIAD : Automate everything with Google Cloud
 

Similar to Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow and Micrometer - Christian Tzolov

Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
VMware Tanzu
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft Private Cloud
 

Similar to Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow and Micrometer - Christian Tzolov (20)

Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive Applications
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre RomanSpring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
 
StreamCentral Technical Overview
StreamCentral Technical OverviewStreamCentral Technical Overview
StreamCentral Technical Overview
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream Processing
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Data Engineer's Lunch #56: Spring Cloud Data Flow with Cassandra
Data Engineer's Lunch #56: Spring Cloud Data Flow with CassandraData Engineer's Lunch #56: Spring Cloud Data Flow with Cassandra
Data Engineer's Lunch #56: Spring Cloud Data Flow with Cassandra
 
Building cloud native data microservice
Building cloud native data microserviceBuilding cloud native data microservice
Building cloud native data microservice
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
 
Confluent:AWS - GameDay.pptx
 Confluent:AWS - GameDay.pptx Confluent:AWS - GameDay.pptx
Confluent:AWS - GameDay.pptx
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 

More from VMware Tanzu

More from VMware Tanzu (20)

Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14
 
What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 

Recently uploaded

Recently uploaded (20)

How to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfHow to pick right visual testing tool.pdf
How to pick right visual testing tool.pdf
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 

Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow and Micrometer - Christian Tzolov

  • 1. © Copyright 2019 Pivotal Software, Inc. All rights Reserved. Christian Tzolov (@christzolov) March 2019 Real-time Analysis of Data Processing Pipelines Spring Cloud Data Flow & Micrometer
  • 2. About @christzolov Pivotal engineer, Spring Cloud Data Flow team Apache Committer/PMC member Distributed, Data-intensive Systems and Toolkits
  • 3. Agenda ■ Distributed, Data-Intensive Systems ■ Spring Cloud Data Flow toolkit ■ Operational metrics and monitoring ■ Micrometer, Time Series and Dimensions ■ Architectural Patterns and Practices ■ Q+A
  • 4. “We call an application data-intensive if data is its primary challenge—the quantity of data, the complexity of data, or the speed at which it is changing.”
  • 5. What is Spring Cloud Data Flow A toolkit for building data integration, real- time streaming, and batch data processing pipelines.
  • 6. What is Spring Cloud Data Flow A toolkit for building data integration, real- time streaming, and batch data processing pipelines. Data pipelines consist of Spring Boot apps, using Spring Cloud Stream for event-streaming or Spring Cloud Task for batch processes. Ready for Data Integration with >60 out-of-the- box streaming and batch Apps. DSL, GUI, and REST-APIs to build and orchestrate data pipelines onto platforms like Kubernetes and Cloud Foundry. Continuous delivery for streaming data pipelines using Spring Cloud Skipper. Cron-job scheduler for batch data pipelines using Spring Cloud Scheduler.
  • 7. What is Spring Cloud Data Flow Data pipelines consist of Spring Boot apps, using Spring Cloud Stream for event-streaming or Spring Cloud Task for batch processes. Ready for Data Integration with >60 out-of-the- box streaming and batch Apps. DSL, GUI, and REST-APIs to build and orchestrate data pipelines onto platforms like Kubernetes and Cloud Foundry. Continuous delivery for streaming data pipelines using Spring Cloud Skipper. Cron-job scheduler for batch data pipelines using Spring Cloud Scheduler. A toolkit for building data integration, real- time streaming, and batch data processing pipelines.
  • 8. What is Spring Cloud Data Flow Data pipelines consist of Spring Boot apps, using Spring Cloud Stream for event-streaming or Spring Cloud Task for batch processes. Ready for Data Integration with >60 out-of-the- box streaming and batch Apps. DSL, GUI, and REST-APIs to build and orchestrate data pipelines onto platforms like Kubernetes and Cloud Foundry. Continuous delivery for streaming data pipelines using Spring Cloud Skipper. Cron-job scheduler for batch data pipelines using Spring Cloud Scheduler. A toolkit for building data integration, real- time streaming, and batch data processing pipelines.
  • 9. What is Spring Cloud Data Flow Data pipelines consist of Spring Boot apps, using Spring Cloud Stream for event-streaming or Spring Cloud Task for batch processes. Ready for Data Integration with >60 out-of-the- box streaming and batch Apps. DSL, GUI, and REST-APIs to build and orchestrate data pipelines onto platforms like Kubernetes and Cloud Foundry. Continuous delivery for streaming data pipelines using Spring Cloud Skipper. Cron-job scheduler for batch data pipelines using Spring Cloud Scheduler. A toolkit for building data integration, real- time streaming, and batch data processing pipelines.
  • 10. What is Spring Cloud Data Flow Data pipelines consist of Spring Boot apps, using Spring Cloud Stream for event-streaming or Spring Cloud Task for batch processes. Ready for Data Integration with >60 out-of-the- box streaming and batch Apps. DSL, GUI, and REST-APIs to build and orchestrate data pipelines onto platforms like Kubernetes and Cloud Foundry. Continuous delivery for streaming data pipelines using Spring Cloud Skipper. Cron-job scheduler for batch data pipelines using Spring Cloud Scheduler. A toolkit for building data integration, real- time streaming, and batch data processing pipelines.
  • 11. Runtime and Message Broker Abstraction Kubernetes Cloud Foundry Local / Dev Rabbit MQ Apache Kafka Google PubSub Amazon Kinesis Solace Opportunities: Same code; Same tests; Works with a variety of Message Brokers
  • 12. Common Denominator = Spring Boot => Consolidate On Development Practices Test Infrastructure CI/CD Tooling and Automation Operational Metrics and Monitoring
  • 14. Data Flow Lifecycle • Data Flow accepts the data pipeline definition and delegates to Skipper for lifecycle managements. • Operational metrics and Monitoring.
  • 16. What we want Micrometer Time Series (Influx, Prometheus) Grafana
  • 17. Time Series & Dimensional Data Model ● A time series is a series of data points ordered in time order. ● timestamped values belonging to the same metric and the same set of labeled dimensions. ● Every time series is uniquely identified by its metric name and a set of key-value pairs, known as labels, tags, dimensions.
  • 18. Data Flow 2.0 Monitoring Architecture Support for Prometheus and InfluxDB
  • 20. Supported Metrics ● Spring Boot 2.0 - JVM, CPU, File descriptor… ● Spring Integration - Channel, Source, Handler ● Data Flow App Starters – Tags: stream name, app name, type, CF process_cpu_usage { application_guid="20036", application_name="log", application_type="sink", instance_index="0", stream_name="s3", instance="10.40.1.170:20036", job="scdf", } spring_integration_send_seconds_count { application_guid="20036", application_name="log", application_type="sink", stream_name="s3", instance_index="0", name="input", result="success", type="channel" exception="none", instance="10.40.1.170:20036", job="scdf", }
  • 21. Pull-based TSDB (Prometheus) Implications: • Service Discovery • Security • Short Living Tasks (PushGateway)
  • 23. Data Flow in PCF InfluxDB PromRegator
  • 24. SCDF Monitoring with hosted Grafana and InfluxDB
  • 27. Next Steps ● Spring Cloud Stream Task support ● Monitor Data Flow and Skipper ● PromRegator and PCF improvements ● ?!
  • 28. References ● Spring Cloud Data Flow – Stream Monitoring: http://docs.spring.io/spring-cloud- dataflow/docs/2.1.0.BUILD-SNAPSHOT/reference/htmlsingle/#streams-monitoring ● Twitter Analytics Sample: https://docs.spring.io/spring-cloud-dataflow- samples/docs/current/reference/htmlsingle/#spring-cloud-data-flow-samples-twitter-analytics- overview ● Micrometer: https://micrometer.io ● Spring Boot Micrometer : https://docs.spring.io/spring- boot/docs/2.1.3.RELEASE/reference/htmlsingle/#production-ready-metrics ● Spring Integration – Micrometer: https://docs.spring.io/spring- integration/docs/5.1.2.RELEASE/reference/html/system-management-chapter.html#micrometer- integration ● Grafana: https://grafana.com ● Prometheus: https://prometheus.io

Editor's Notes

  1. Compute an answer for any challenging question by looking through all the available data
  2. A Time Series Database (TSDB) is a database optimized for time-stamped or time series data. Time series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time. This could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data. A Time Series Database is built specifically for handling metrics and events or measurements that are time-stamped. A TSDB is optimized for measuring change over time. Properties that make time series data very different than other data workloads are data lifecycle management, summarization, and large range scans of many records.