Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spring Batch Performance Tuning

9,956 views

Published on

In this presentation we will examine various scalability options in order to improve the robustness and performance of your Spring Batch applications. We start out with a single threaded Spring Batch application that we will refactor so we can demonstrate how to run it using:

* Concurrent Steps
* Remote Chunking
* AsyncItemProcessor and AsyncItemWriter
* Remote Partitioning

Additionally, we will show how you can deploy Spring Batch applications to Spring XD which provides high availability and failover capabilities. Spring XD also allows you to integrate Spring Batch applications with other Big Data processing needs.

Published in: Technology
  • Interesting - there is now an equivalent of Spring Batch for .NET C# with Summer Batch. www.summerbatch.org
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Spring Batch Performance Tuning

  1. 1. Spring Batch Performance Tuning By Chris Schaefer & Gunnar Hillert © 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.
  2. 2. Agenda • Spring Batch • Spring Integration • Spring Batch Integration • Scaling Spring Batch • Spring XD 2
  3. 3. Spring Batch 3 http://projects.spring.io/spring-batch/
  4. 4. Batch processing ... is defined as the processing of data without interaction or interruption. 4 “ Michael T. Minella, Pro Spring Batch
  5. 5. Batch Jobs • Generally long-running • Non-interactive • Often include logic for handling errors and restartability options • Process large volumes of data • More than what may fit in memory or a single transaction 5
  6. 6. Batch and offline processing • Close of business processing • Order processing, Business reporting, Account reconciliation, Payroll • Import / export handling • a.k.a. ETL jobs (Extract-Transform-Load) • Data warehouse synchronization • Large-scale output jobs • Loyalty program emails, Bank statements • Hadoop job orchestration 6
  7. 7. Features • Transaction management • Chunk based processing • Schema and Java Config support • Annotations for callback type scenarios such as Listeners • Start/Restart/Skip capabilities • Based on the Spring framework • JSR 352: Batch Applications for the Java Platform 7
  8. 8. Concepts • Job • Step • Chunk • Item 8 Repeat | Retry | Skip | Restart
  9. 9. Chunk-Oriented Processing • Read data, optionally process and write out the “chunk” within a transaction boundary. 9
  10. 10. JobLauncher 10
  11. 11. ItemReaders and ItemWriters • Flat File • XML (StAX) • Multi-File Input • Database • JDBC, JPA/Hibernate, Stored Procedures, Spring Data • JMS • AMQP • Email • Implement your own... 11
  12. 12. Simple File Load Job 12
  13. 13. Job Repository 13
  14. 14. Spring Integration 14 http://projects.spring.io/spring-integration/
  15. 15. Integration Styles • File Transfer • Shared Database • Remoting • Messaging 15
  16. 16. Integration Styles • Business to Business Integration (B2B) • Inter Application Integration (EAI) • Intra Application Integration 16 JVM JVM EAI Core Messaging B2B External Business Partner
  17. 17. Common Patterns 17 Retrieve Parse Transform Transmit
  18. 18. Enterprise Integration Patterns • By Gregor Hohpe & Bobby Woolf • Published 2003 • Collection of well-known patterns • Icon library provided 18 http://www.eaipatterns.com/eaipatterns.html
  19. 19. Spring Integration provides an extension of the Spring programming model to support the well-known enterprise integration patterns. 19 “ Spring Integration Website
  20. 20. Adapters 20 AMQP/RabbitMQ AWS File/Resource FTP/FTPS/SFTP GemFire HTTP (REST) JDBC JMS JMX JPA MongoDB POP3/IMAP/SMTP Print Redis RMI RSS/Atom SMB Splunk Spring Application Events Stored Procedures TCP/UDP Twitter Web Services XMPP XPath XQuery ! Custom Adapters
  21. 21. Samples • https://github.com/spring-projects/spring-integration-samples • Contains 50 Samples and Applications • Several Categories: • Basic • Intermediate • Advanced • Applications 21
  22. 22. Spring Batch Integration 22
  23. 23. Launching batch jobs through messages • Event-Driven execution of the JobLauncher • Spring Integration retrieves the data (e.g. file system, FTP, ...) • Easy to support separate input sources simultaneously 23 D C FTP Inbound Channel Adapter JobLauncher Transformer File JobLaunchRequest
  24. 24. JobLaunchRequest 24 public class FileMessageToJobRequest {! private Job job;! private String fileParameterName;! ...! @Transformer! public JobLaunchRequest toRequest(Message<File> message) {! JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();! jobParametersBuilder.addString(fileParameterName,! message.getPayload().getAbsolutePath());! return new JobLaunchRequest(job, jobParametersBuilder.toJobParameters());! }! }!
  25. 25. JobLaunchRequest 25 <batch-int:job-launching-gateway request-channel="requestChannel"! reply-channel="replyChannel"! job-launcher="jobLauncher"/>!
  26. 26. Get feedback with informational messages ! • Spring Batch provides support for listeners: • StepExecutionListener • ChunkListener • JobExecutionListener 26
  27. 27. Get feedback with informational messages 27 <batch:job id="importPayments"> ... <batch:listeners> <batch:listener ref="notificationExecutionsListener"/> </batch:listeners> </batch:job> ! <int:gateway id="notificationExecutionsListener" service-interface="o.s.batch.core.JobExecutionListener" default-request-channel="jobExecutions"/>
  28. 28. Launching and information messages demo in next section 28
  29. 29. Scaling Spring Batch 29
  30. 30. Scaling and externalizing batch process execution • Utilization of Spring Integration for multi process communication • Distribute complex processing • Single process o Multi-threaded steps o Parallel steps o Local partitioning • Multi process o Remote chunking o Remote partitioning • Asynchronous Item processing support • AsyncItemProcessor • AsyncItemWriter 30
  31. 31. Single Thread 31 Reader Item Result Gateway Output Input Processor Writer Item Result
  32. 32. Single Thread - Demo 32
  33. 33. Multi-threaded 33 • Simply add a TaskExecutor to your Tasklet configuration Reader Item Result Gateway Output Input Processor Writer Item Result
  34. 34. Multi-Threaded - Demo 34
  35. 35. Asynchronous Processors • AsyncItemProcessor • Dispatches ItemProcessor logic on new thread, returning a Future to the AsyncItemWriter • AsyncItemWriter • Writes the processed items after processing is complete 35
  36. 36. Asynchronous Processors - Demo 36
  37. 37. Remote Chunking 37 Step 2a ItemReader ItemProcessor ItemWriter Step 1 ItemReader ItemProcessor ItemWriter Step 2 ItemReader ItemWriter Step 3 ItemReader ItemProcessor ItemWriter Step 2b ItemReader ItemProcessor ItemWriter Step 2c ItemReader ItemProcessor ItemWriter
  38. 38. Remote Chunking - Demo 38
  39. 39. Remote Partitioning 39 Slave 1 ItemReader ItemProcessor ItemWriter Step 1 ItemReader ItemProcessor ItemWriter Master Step 3 ItemReader ItemProcessor ItemWriter Slave 2 ItemReader ItemProcessor ItemWriter Slave 3 ItemReader ItemProcessor ItemWriter Partitioner
  40. 40. Remote Partitioning - Demo 40
  41. 41. Demo - Launching via messages & informational messages 41 Does not provide scaling but demonstrates how launch job via messages and send information messages to integration points
  42. 42. Spring XD 42 http://projects.spring.io/spring-xd/
  43. 43. Tackling Big Data Complexity ! • Data Ingestion • Real-time Analytics • Workflow Orchestration • Data Export 43
  44. 44. Tackling Big Data Complexity cont. ! • Built on existing Spring assets • Spring Integration • Spring Batch • Spring Data • Spring Boot • Spring for Apache Hadoop • Spring Shell • Redis, GemFire, Hadoop 44
  45. 45. Data Ingestion Streams • DSL based on Unix pipes and filters syntax ! • Modules are parameterizable ! • Simple logic can be added via expressions or scripts 45 http | file twittersearch --query=spring | file --dir=/spring http | filter --expression=payload=='Spring' | hdfs
  46. 46. Hadoop workflow managed by Spring Batch • Reuse Batch infrastructure and features to manage Hadoop workflows • Job state management, launching, monitoring, restart/retry policies, etc. • Step can be any Hadoop job type or HDFS script • Can mix and match with other Batch readers/ writers, e.g. JDBC for import/export use-cases 46
  47. 47. Manage Batch Jobs with Spring XD 47
  48. 48. 48 Spring XD - Demo
  49. 49. Books 49
  50. 50. Learn More. Stay Connected. ! ! ! Demo code and slides: https://github.com/SpringOne2GX-2014/spring-batch-performance-tuning 50 THANK YOU!

×