Batch Processing With Spring Cloud
Data Flow Server in Cloud Foundry
By Bruce Thelen
@brucethelen
1
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Introduction
2
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
RiskMeter Order Flow
3
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Problem Statement
4
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Solution Design
5
Choose Products Upload Input Verify Input Process Batch Download
Output
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
RiskMeter Batch Order Flow
6
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Solution Overview
7
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Process To Implement Solution
8
http://www.appcontinuum.io/
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Process To Implement Solution
9
SingleApplicationwith
Namespaces
• Monolith
SingleApplicationand
Components
• Monolith
• Web
• Batch
• Shared
MultipleApplicationsand
Components
• Web
• Web
• Shared
• Data Flow Server
• Batch
• Batch
• Shared
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Solution Diagram
10
Lessons Learned
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Register Tasks With SCDF
12
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Deploy
New Web
Smoke Test
New Web
Reroute to
New Web
Stop Old
Web
Deploy
New Web
Register
New Batch
Smoke Test
New Web
Smoke Test
New Batch
Reroute to
New Web
Stop Old
Web
Cleanup
Any
Completed
Batch
Blue Green Deployment Process
13
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Cleaning Up CF Tasks
14
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Passing Data Between Web And Batch
15
@Service
public class BatchRunner {
…
public JobExecution runBatch() {
…
return jobLauncher.run(batchJob, batchJobParameters);
}
}
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Passing Data Between Web And Batch
16
public void execute(String batchJobId) {
…
MultiValueMap<String, String> params = new …
params.add("name", taskName);
params.add("arguments", "--batchJobId=" + batchJobId);
…
HttpEntity<MultiValueMap<String, String>> requestEntity =
new HttpEntity<>(params, null);
…
asyncRestTemplate.postForLocation(uri, requestEntity);
}
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Passing Data Between Web And Batch
17
$ curl -s -X POST -u scdfuser:password 
http://localhost:9393/tasks/executions -d  'name=riskmeter-batch-
app-527b9c8-task&arguments=--batchJobId%3D1'
Unless otherwise indicated, these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a
Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Injecting Configuration Into SCDF and TASK
18
spring:
cloud:
deployer:
cloudfoundry:
domain: clgxlabs.com
org: Corelogic
password: "{cipher}ThisIZAFakePazzW0Rd"
space: rmeter-prod
url: "https://api.sys.clgxlabs.io"
username: ciprodadmin
task:
buildpack: java_buildpack_offline
services: "config-server,riskmeter-rabbit,splunk-syslog"
memory: 2048
Demo
Conclusion
Learn More. Stay Connected.
Orchestrating Data Microservices with Spring Cloud Data Flow
Tomorrow 10:30am Room 2005
Latency and Event Tracing with Spring Cloud Data Flow
Tomorrow 11:50am Room 2005
21
#springone@s1p

Case Study of Batch Processing With Spring Cloud Data Flow Server in Cloud Foundry

  • 1.
    Batch Processing WithSpring Cloud Data Flow Server in Cloud Foundry By Bruce Thelen @brucethelen 1
  • 2.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Introduction 2
  • 3.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ RiskMeter Order Flow 3
  • 4.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Problem Statement 4
  • 5.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Solution Design 5 Choose Products Upload Input Verify Input Process Batch Download Output
  • 6.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ RiskMeter Batch Order Flow 6
  • 7.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Solution Overview 7
  • 8.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Process To Implement Solution 8 http://www.appcontinuum.io/
  • 9.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Process To Implement Solution 9 SingleApplicationwith Namespaces • Monolith SingleApplicationand Components • Monolith • Web • Batch • Shared MultipleApplicationsand Components • Web • Web • Shared • Data Flow Server • Batch • Batch • Shared
  • 10.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Solution Diagram 10
  • 11.
  • 12.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Register Tasks With SCDF 12
  • 13.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Deploy New Web Smoke Test New Web Reroute to New Web Stop Old Web Deploy New Web Register New Batch Smoke Test New Web Smoke Test New Batch Reroute to New Web Stop Old Web Cleanup Any Completed Batch Blue Green Deployment Process 13
  • 14.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Cleaning Up CF Tasks 14
  • 15.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Passing Data Between Web And Batch 15 @Service public class BatchRunner { … public JobExecution runBatch() { … return jobLauncher.run(batchJob, batchJobParameters); } }
  • 16.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Passing Data Between Web And Batch 16 public void execute(String batchJobId) { … MultiValueMap<String, String> params = new … params.add("name", taskName); params.add("arguments", "--batchJobId=" + batchJobId); … HttpEntity<MultiValueMap<String, String>> requestEntity = new HttpEntity<>(params, null); … asyncRestTemplate.postForLocation(uri, requestEntity); }
  • 17.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Passing Data Between Web And Batch 17 $ curl -s -X POST -u scdfuser:password http://localhost:9393/tasks/executions -d 'name=riskmeter-batch- app-527b9c8-task&arguments=--batchJobId%3D1'
  • 18.
    Unless otherwise indicated,these slides are © 2013 -2016 Piv otal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Injecting Configuration Into SCDF and TASK 18 spring: cloud: deployer: cloudfoundry: domain: clgxlabs.com org: Corelogic password: "{cipher}ThisIZAFakePazzW0Rd" space: rmeter-prod url: "https://api.sys.clgxlabs.io" username: ciprodadmin task: buildpack: java_buildpack_offline services: "config-server,riskmeter-rabbit,splunk-syslog" memory: 2048
  • 19.
  • 20.
  • 21.
    Learn More. StayConnected. Orchestrating Data Microservices with Spring Cloud Data Flow Tomorrow 10:30am Room 2005 Latency and Event Tracing with Spring Cloud Data Flow Tomorrow 11:50am Room 2005 21 #springone@s1p

Editor's Notes

  • #2 Welcome to A Case Study of Batch Processing With Spring Cloud Data Flow Server in Cloud Foundry Will be presenting lessons learned by the my team while executing a replatforming of an existing product in my company’s portfolio We will cover some introductory information leading up to the problem we faced, then the process, Lessons learned, then lastly a demo.
  • #3 Bruce Thelen Principal Software Architect with CoreLogic in the Innovation Labs department. Based in Austin TX. CoreLogic Vision is to deliver unique property-level insights that power the global real estate economy CoreLogic is currently in a cloud native transformation led by the Innovation Labs Department. Started nearly 4 years ago at Pivotal SF. New product development, porting, refreshing, or redoing certain product lines Cloud native patterns on Pivotal Cloud Foundry RiskMeter RiskMeter is a product for insurance underwriters to assess natural hazard risk to insured properties. Delivers a variety of CoreLogic and third party analytics such as Flood, Wildfire, Earthquake Risk, as well as hosted customer data such as rating territories, blackout zones, etc. Legacy version of RiskMeter serves thousand of customers daily via a web application and API which are hosted on Windows 2000 server written in VB6 and VB.Net with ASP classic. Location based application using a combination of Google Maps API and custom map layers developed by CoreLogic RiskMeter is a web application and API Angular frontend and mostly monolithic Spring Boot backend deployed on Cloud Foundry Mostly stateless back end Zero downtime deployed using a blue green deploy technique
  • #5 While executing this replatforming, the team discovered the need for recreating the legacy application’s batch processing features. Not only was this feature in the Legacy RiskMeter application, but logs showed it was used and user interviews showed its value insurance portfolio review tool - re-insurance for example Writing multiple location policies Typical batch sizes of 5-10K. Legacy capped at 5K, but users remarked they sometimes had to break up their batches into multiple pieces. Design principles for our batch implementation Fit our cloud native paradigm including 12 factor application (https://12factor.net/) Running batches should be resilient and independent of software releases, etc. At this point, Innovation Labs had done mostly web applications and Stateless API’s so this was new territory for us.
  • #6 This feature is a basic upload file -> process file -> download output style workflow Designers came up with a workflow like this
  • #7 Choose Products
  • #8 We needed to pick some technology to provide the capabilities we need. Cloud Foundry and Spring were givens because they are our Platform of choice and the framework we are already using for our app back end. Given that we are heavily invested in Spring, we looked at Spring Batch, which was a good fit Provides a mechanism for composing steps to make repeatable bulk processing jobs. But adding Spring Batch to our mostly monolithic backend did not quite meet our design principles Mostly stateless backend which is zero downtime deployed As the long running batches would be interrupted by our web application updates, thus either breaking resiliency or restricting our ability to deploy at will. We realized we need to run our Spring Batch outside of our web application, but still within cloud. Here is my code… Our research at this point let us to Spring Cloud Data Flow ServerSpring Cloud Data Flow Server http://cloud.spring.io/spring-cloud-dataflow/ Deploy pipelines onto modern runtimes such as Cloud Foundry, Kubernetes, Apache Mesos or Apache YARN Keeps the repository of the task definitions, starts tasks, reports status of tasks, etc. Spring Cloud Deployer SPI for Cloud Foundry https://github.com/spring-cloud/spring-cloud-deployer Does this as tasks or streams Spring Cloud Task Pivotal Cloud Foundry Tasks Version 1.9+ with tasks enabled via feature flag https://docs.cloudfoundry.org/devguide/using-tasks.html Tasks are short running processes within Cloud Foundry Tasks run in their own containers After a task runs, Cloud Foundry destroys the container running the task.
  • #9 Application at this point a web application with Angular frontend and mostly monolithic Spring Boot backend Chose to implement the processing in the monolith, decompose into separate jars in the same monolith, and lastly repackage those jars into separate apps. Single Application Single Application with Namespaces Single Application and Components Multiple Applications and Components Multiple Applications, Components and Services
  • #10 Steps Implement Batch Processing In place Feature flagged the batch features so we could work on master Using Spring Batch in backend monolith -- Implemented entire features in monolith, frontend and backend. Add spring boot starter for spring batch @EnableBatchProcessing Create BatchConfiguration to define the steps to take read input, process, write output Create service to start batch – JobLauncher.run Progress bar -- Use STOMP over WebSocket to power the progress bar Add AMQP starter Add web socket starter Bind to RabbitMQ Register STOMP endpoints Include front end ng2-stompjs https://stomp-js.github.io/ng2-stompjs/ Segregate the parts of the code that need to run as a batch into a separate code module and jar to be used in web application monolith Multiple Applications and Components of http://www.appcontinuum.io/ Create new Spring Boot Command Line app that gets @EnableBatchProcessing, @EnableTask, and includes same batch jar Stand up Spring Cloud Data Flow Server via a Blue/Green Jenkins Pipeline Create application definition in SCDF Type is task Call SCDF with instructions to start batch from Web application Automate deployment at each step of the process
  • #11 Dell EMC ECS S3 appliance
  • #12 Deployment – adding app registration step Passing data between the systems Injecting configuration
  • #13 SCDF is not responsible for storing the task executable. It only holds a pointer to the code to be run Which is why the shell calls this “app register” It knows three schemes: Maven URL HTTP URL Local files – no really useful for production deployments Can use GUI Command shell API Java Client We found API easiest to script out
  • #14 Prior to this split, here is how our deployment pipeline went Configuration was tested via Spring Context. Smoke test provided by PM Smoke testing is crucial due to the need for testing configuration At first skipped smoke testing the batch and left that to PM, this showed it was desirable to test Failures were due to configuration Implemented automated smoke test that executes a null batch Blue green deployment Only logical way we found to do this was to use different names for different versions
  • #15 Cleanup of tasks in cloud foundry
  • #16 In order to break the app apart, we needed to determine the best way to pass data between the web and batch components Trying to get enough data to run the job by using the JobLauncher.run giving it a Job and JobParameters We elected to store input file and the analytics to run in a db table, then pass the id to the batch Not always the best pattern but hasn’t hurt us yet.
  • #17 Here is the code that makes the REST call from the web app to the SCDF to instruct SCDF to start the batch The arguments parameter gets passed on to the Spring Cloud Task which is a Spring CommandLineRunner. Note, we decided on async rest template here to return right away. This startup could be several seconds if the SCDF has to pull the file down from maven repo
  • #18 As a curl looks like this – note this is what we do in our smoke test as part of deployment, but we leave off that job id.
  • #19 You now have 3 times the number of applications to configure. In a PCF deployed SpringBoot app, you are configuring two things Container – typically manifest Application – something you have chosen from Spring’s 17 layers of externalized configuration The RM team originally used manifests for both. We had YAGNI over Spring Cloud Config Server. About a month prior to this effort, we moved to Config Server for better secrets protection than our env var based scheme. We were fortunate to have done this because now we had 3 times the number of systems that need configuration Web SCDF Batch Our Scheme would have broken down anyway For the tasks that will get started by the SCDF, you have less control because the Spring Cloud Deployer is starting them. Manifests can’t be used by Spring Cloud Deployer You cannot set ENVIRONMENT variables for your container here. I.e. something used by a buildpack provided jar (APPD for instance) or LOCALDOMAIN ORACLE RAC SCAN (Single Client Access Name) Bottom line, Spring Cloud Config Server helped us securely provide a composable configuration for our 3 apps. We would have likely moved at this point had we not just finished that effort.
  • #21 Previous CoreLogic projects which implemented homegrown batch processing relied heavily on complex infrastructure, deployment processes (i.e. allocating dedicated VM instances, configuring app servers), and long development cycles. In one case, it took approximately a year to implement a similar system. Using Cloud Foundry, Spring Cloud DataFlow Server, Spring Cloud Task, and Spring Batch, we were able to decompose our app and deploy the batch solution in about two months without having to write custom and complex batch management tooling.