Building a Data Exchange
with Spring Cloud Data Flow
October7–10, 2019
AustinConvention Center
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Introduction
What is Spring Cloud Data Flow, and what is a Data Exchange?
TransformationCase Study
• Decision factors
• Assessing currentstate
• Applying patterns
• Lessons learned
• Ways forward
Agenda
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Software development, design and architecturefor
over 25 years
Using Java since 1999, Spring since 2008
Using SCDF on Pivotal Platformsince 2016
Member of the Pivotal Vanguards
Believe software developmentis more of an art than
a science
Avid jogger and climber (until recently)
Gamer ordinaire
Introduction – Hi! Welcome!
Technical Director
Charles Schwab & Co., Inc.
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Orchestrationservice on Cloud Foundry or Kubernetes
Evolution of Spring XD
• Spring Integration
• Spring Batch
• Spring Cloud Stream
• Spring Cloud Task
Streaming and batch
What is Spring Cloud Data Flow?
dataflow.spring.io
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Provision a Rabbit Service instance
Provision a PostgreSQL Service
instance
Download SCDF server application jar
Download SCDF shell jar
Download Skipper jar
Configureand run the SCDF application
Two Ways to Deploy SCDF
Provision the Data Flow Server tile Install the Spring Boot Application
https://dataflow.spring.io/docs/installation/cloudfoundry/cf-clihttps://docs.pivotal.io/scdf/1-6
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
System or group of applications that gets data from “point a” to “point b”
Validates, transformsdata
Consumesdata in multiple formats
Disseminationto 1 or more consumers
Similar in concept to ETL
What is a Data Exchange?
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
May not know much about source
system(s)
Consumesdata in many different
formats
Disseminatesto disparate systems
Disseminatesin different formats
Sources and destinations are not
necessarily databases
May or may not own the data being
exchanged
Data Exchange vs. ETL
Consumesfrom disparatesystems,
format is generally known
Destinationsystem is typically
singular
Data is owned by the “loader”
Typically refers to databases
Data Exchange ETL
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Case Study:
Deciding on a Data Exchange
Why did we use Spring Cloud Data Flow?
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Monolithic legacy application more
than 10 years old
Ugly technology stack (Perl, .sh,
PL/SQL, Java, etc)
Band-aided, face-lifted over the years
Strategic misalignment
Exclusively batch, inherently limited
Exposure to risks – safe the way it is,
but don’t want to touch it
Decision Factors
Microservice architecture cuz …
“microservices”
Strategically aligned
“Futureproof” – not really a thing
Promotespatterns
Industry standard integrations
Addresses data protection concerns
Facilitates migrationtoward real-time
Facilitates fast time-to-market
What might we be dealing with? What do we want to build?
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Isolated segments can be used to separate systemswith different levels of
sensitivity or exposure
Hyper-converged infrastructure secures each application in its own subnet
Communication internally on the platform is secured via mutual TLS
Application security groups open ports to outside the platform only as needed
PCI DSS compliance can be achieved
How does Pivotal PlatformAddress Concerns?
Decision Factors
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Assessing Current State
Distilling Requirements
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Applying Patterns
Laying Out a Plan
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Duh… (“Gang of Four” book link)
Representrepeatable
implementations
Configured rather than built
Facilitates speed-to-market
Facilitates governance of data lineage
Focuses application specific efforts
Why Patterns?
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
What Does it Look Like?
STREAM_1 = file > :funnel-in
STREAM_2 = jms > :funnel-in
STREAM_3 = jdbc > :funnel-in
STREAM_4 = s3 > :funnel-in
STREAM_5 = rabbit > :funnel-in
STREAM_6 = mongodb > :funnel-in
STREAM_7 = sftp > :funnel-in
STREAM_8 = :funnel-in > transform >
:fan-out
STREAM_9 = :fan-out > file
STREAM_10 = :fan-out > sftp
STREAM_11 = :fan-out > rabbit
STREAM_12 = :fan-out > jdbc
STREAM_13 = :fan-out > s3
STREAM_14 = :fan-out > mongodb
STREAM_15 = :fan-out > hdfs
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
One Pattern in Detail – The File Pattern
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Lessons Learned
What have we learned that has informed our strategy?
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
RabbitMQ and Kafka binders are supported out-of-box in Spring Cloud Stream
No experience with Kafka
Managing the transport and the reliability of message delivery; implementing a backing
service for message persistence (probably not needed with Kafka) becomes something
to look into
Underlying Transport
Lessons Learned
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Billing model based on number of app instances is punitive
Streams with lots of transformations can incur high cost
Multiple foundationsin a topology can mean multiplying AIs
So learn to be careful about what is a ‘batch’ process appropriate for a Task
Build some streams that could provide shared capabilities
Ensure design of streams whereevery app is essential
Total Cost of Ownership
Lessons Learned
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
SCDF and Pivotal Platformhave evolved a lot since 2016
Staying current with tooling and platform are very desireable
Spring Boot evolved (1.4.x  2.1.x)
SCDF evolved (1.0  2.2)
Pivotal Platform evolved (PCF  Pivotal Platform)(1.4.x  2.6)
Currency
Lessons Learned
Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/
Cannot happen all at once
Deprecate all work on the old platform from a point in time
Effective to build new alongside old, and then retire old
Planning for actual retirement is essential
Retirement of Legacy Platform
Lessons Learned
Check out these related sessions:
• High-PerformanceData Processing with Spring Cloud Data Flow and Geode
• Real-Time Performance Analysisof Data-Processing Pipelines with Spring Cloud
Data Flow, Micrometer
• Streaming with Spring Cloud Stream and Apache Kafka
#springone@s1p

Building a Data Exchange with Spring Cloud Data Flow

  • 1.
    Building a DataExchange with Spring Cloud Data Flow October7–10, 2019 AustinConvention Center
  • 2.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Introduction What is Spring Cloud Data Flow, and what is a Data Exchange? TransformationCase Study • Decision factors • Assessing currentstate • Applying patterns • Lessons learned • Ways forward Agenda
  • 3.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Software development, design and architecturefor over 25 years Using Java since 1999, Spring since 2008 Using SCDF on Pivotal Platformsince 2016 Member of the Pivotal Vanguards Believe software developmentis more of an art than a science Avid jogger and climber (until recently) Gamer ordinaire Introduction – Hi! Welcome! Technical Director Charles Schwab & Co., Inc.
  • 4.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Orchestrationservice on Cloud Foundry or Kubernetes Evolution of Spring XD • Spring Integration • Spring Batch • Spring Cloud Stream • Spring Cloud Task Streaming and batch What is Spring Cloud Data Flow? dataflow.spring.io
  • 5.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Provision a Rabbit Service instance Provision a PostgreSQL Service instance Download SCDF server application jar Download SCDF shell jar Download Skipper jar Configureand run the SCDF application Two Ways to Deploy SCDF Provision the Data Flow Server tile Install the Spring Boot Application https://dataflow.spring.io/docs/installation/cloudfoundry/cf-clihttps://docs.pivotal.io/scdf/1-6
  • 6.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ System or group of applications that gets data from “point a” to “point b” Validates, transformsdata Consumesdata in multiple formats Disseminationto 1 or more consumers Similar in concept to ETL What is a Data Exchange?
  • 7.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ May not know much about source system(s) Consumesdata in many different formats Disseminatesto disparate systems Disseminatesin different formats Sources and destinations are not necessarily databases May or may not own the data being exchanged Data Exchange vs. ETL Consumesfrom disparatesystems, format is generally known Destinationsystem is typically singular Data is owned by the “loader” Typically refers to databases Data Exchange ETL
  • 8.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Case Study: Deciding on a Data Exchange Why did we use Spring Cloud Data Flow?
  • 9.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Monolithic legacy application more than 10 years old Ugly technology stack (Perl, .sh, PL/SQL, Java, etc) Band-aided, face-lifted over the years Strategic misalignment Exclusively batch, inherently limited Exposure to risks – safe the way it is, but don’t want to touch it Decision Factors Microservice architecture cuz … “microservices” Strategically aligned “Futureproof” – not really a thing Promotespatterns Industry standard integrations Addresses data protection concerns Facilitates migrationtoward real-time Facilitates fast time-to-market What might we be dealing with? What do we want to build?
  • 10.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Isolated segments can be used to separate systemswith different levels of sensitivity or exposure Hyper-converged infrastructure secures each application in its own subnet Communication internally on the platform is secured via mutual TLS Application security groups open ports to outside the platform only as needed PCI DSS compliance can be achieved How does Pivotal PlatformAddress Concerns? Decision Factors
  • 11.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Assessing Current State Distilling Requirements
  • 12.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Applying Patterns Laying Out a Plan
  • 13.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Duh… (“Gang of Four” book link) Representrepeatable implementations Configured rather than built Facilitates speed-to-market Facilitates governance of data lineage Focuses application specific efforts Why Patterns?
  • 14.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ What Does it Look Like? STREAM_1 = file > :funnel-in STREAM_2 = jms > :funnel-in STREAM_3 = jdbc > :funnel-in STREAM_4 = s3 > :funnel-in STREAM_5 = rabbit > :funnel-in STREAM_6 = mongodb > :funnel-in STREAM_7 = sftp > :funnel-in STREAM_8 = :funnel-in > transform > :fan-out STREAM_9 = :fan-out > file STREAM_10 = :fan-out > sftp STREAM_11 = :fan-out > rabbit STREAM_12 = :fan-out > jdbc STREAM_13 = :fan-out > s3 STREAM_14 = :fan-out > mongodb STREAM_15 = :fan-out > hdfs
  • 15.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ One Pattern in Detail – The File Pattern
  • 16.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Lessons Learned What have we learned that has informed our strategy?
  • 17.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ RabbitMQ and Kafka binders are supported out-of-box in Spring Cloud Stream No experience with Kafka Managing the transport and the reliability of message delivery; implementing a backing service for message persistence (probably not needed with Kafka) becomes something to look into Underlying Transport Lessons Learned
  • 18.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Billing model based on number of app instances is punitive Streams with lots of transformations can incur high cost Multiple foundationsin a topology can mean multiplying AIs So learn to be careful about what is a ‘batch’ process appropriate for a Task Build some streams that could provide shared capabilities Ensure design of streams whereevery app is essential Total Cost of Ownership Lessons Learned
  • 19.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ SCDF and Pivotal Platformhave evolved a lot since 2016 Staying current with tooling and platform are very desireable Spring Boot evolved (1.4.x  2.1.x) SCDF evolved (1.0  2.2) Pivotal Platform evolved (PCF  Pivotal Platform)(1.4.x  2.6) Currency Lessons Learned
  • 20.
    Unless otherwise indicated,these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommerc ial license: http://creativecommons .or g/licenses/ by-nc/3.0/ Cannot happen all at once Deprecate all work on the old platform from a point in time Effective to build new alongside old, and then retire old Planning for actual retirement is essential Retirement of Legacy Platform Lessons Learned
  • 21.
    Check out theserelated sessions: • High-PerformanceData Processing with Spring Cloud Data Flow and Geode • Real-Time Performance Analysisof Data-Processing Pipelines with Spring Cloud Data Flow, Micrometer • Streaming with Spring Cloud Stream and Apache Kafka #springone@s1p