Spring Batch Workshop (advanced)
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Spring Batch Workshop (advanced)

on

  • 3,133 views

Arnaud vous propose de découvrir le framework Spring Batch: du Hello World! jusqu'à l'exécution multi-threadée de batch, en passant par la lecture de fichiers CSV et la reprise sur erreur. Les ...

Arnaud vous propose de découvrir le framework Spring Batch: du Hello World! jusqu'à l'exécution multi-threadée de batch, en passant par la lecture de fichiers CSV et la reprise sur erreur. Les techniques qu'utilise le framework pour lire et écrire efficacement de grands volumes de données e vous seront pas non plus épargnées ! La présentation se base sur une approche problème/solution, avec de nombreux exemples de code et des démos. A la suite de cette présentation, vous saurez si Spring Batch convient à vos problématiques et aurez toutes les cartes en mains pour l'intégrer à vos applications batch.

Statistics

Views

Total Views
3,133
Views on SlideShare
2,704
Embed Views
429

Actions

Likes
4
Downloads
147
Comments
0

1 Embed 429

http://mj89sp3sau2k7lj1eg3k40hkeppguj6j-a-sites-opensocial.googleusercontent.com 429

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Spring Batch Workshop (advanced) Presentation Transcript

  • 1. Spring Batch Workshop (advanced) Spring Batch Workshop (advanced) Arnaud Cogolu`gnes e Consultant with Zenika, Co-author Spring Batch in Action March 20, 2012
  • 2. Spring Batch Workshop (advanced)Outline Overview XML file reading Item enrichment File reading partitioning File dropping launching Database reading partitioning Complex flow Atomic processing
  • 3. Spring Batch Workshop (advanced) OverviewOverview This workshop highlights advanced Spring Batch features Problem/solution approach A few slides to cover the feature A project to start from, just follow the TODOs Prerequisites Basics about Java and Java EE Spring: dependency injection, enterprise support Spring Batch: what the first workshop covers https://github.com/acogoluegnes/Spring-Batch-Workshop
  • 4. Spring Batch Workshop (advanced) OverviewLicense This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.
  • 5. Spring Batch Workshop (advanced) XML file reading Problem: reading items from a XML file and sending them to another source (e.g. database) Solution: using the StaxEventItemReader
  • 6. Spring Batch Workshop (advanced) XML file readingSpring Batch’s support for XML file reading Spring Batch has built-in support for XML files Through the StaxEventItemReader for reading The StaxEventItemReader handles I/O for efficient XML processing 2 main steps: Configuring the StaxEventItemReader Configuring a Spring OXM’s Unmarshaller
  • 7. Spring Batch Workshop (advanced) XML file readingThe usual suspects <?xml version="1.0" encoding="UTF-8"?> <contacts> <contact> <firstname>De-Anna</firstname> <lastname>Raghunath</lastname> <birth>2010-03-04</birth> </contact> <contact> <firstname>Susy</firstname> <lastname>Hauerstock</lastname> <birth>2010-03-04</birth> </contact> (...) </contacts> ? public class Contact { private Long id; private String firstname,lastname; private Date birth; (...) }
  • 8. Spring Batch Workshop (advanced) XML file readingThe StaxEventItemReader configuration <bean id="reader" class="org.springframework.batch.item.xml.StaxEventItemReader"> <property name="fragmentRootElementName" value="contact" /> <property name="unmarshaller"> <bean class="org.springframework.oxm.xstream.XStreamMarshaller"> <property name="aliases"> <map> <entry key="contact" value="com.zenika.workshop.springbatch.Contact" /> </map> </property> <property name="converters"> <bean class="com.thoughtworks.xstream.converters.basic.DateConverter"> <constructor-arg value="yyyy-MM-dd" /> <constructor-arg><array /></constructor-arg> <constructor-arg value="true" /> </bean> </property> </bean> </property> <property name="resource" value="classpath:contacts.xml" /> </bean> NB: Spring OXM supports XStream, JAXB2, etc.
  • 9. Spring Batch Workshop (advanced) XML file readingGoing further... StaxEventItemWriter to write XML files Spring OXM’s support for other marshallers Skipping badly formatted lines
  • 10. Spring Batch Workshop (advanced) Item enrichment Problem: I want to enrich read items with a Web Service before they get written Solution: implement an ItemProcessor to make the Web Service call
  • 11. Spring Batch Workshop (advanced) Item enrichmentUse case Reading contacts from a flat file Enriching the contact with their social security number Writing the whole contact in the database
  • 12. Spring Batch Workshop (advanced) Item enrichment 1,De-Anna,Raghunath,2010-03-04 2,Susy,Hauerstock,2010-03-04 3,Kiam,Whitehurst,2010-03-04 4,Alecia,Van Holst,2010-03-04 5,Hing,Senecal,2010-03-04 NB: no SSN in the CSV file! public class Contact { private Long id; private String firstname,lastname; private Date birth; private String ssn; (...) }
  • 13. Spring Batch Workshop (advanced) Item enrichmentThe Web Service It can be any kind of Web Service (SOAP, REST) Our Web Service URL: http://host/service?firstname=John&lastname=Doe It returns <contact> <firstname>John</firstname> <lastname>Doe</lastname> <ssn>987-65-4329</ssn> </contact>
  • 14. Spring Batch Workshop (advanced) Item enrichmentThe ItemProcessor implementation package com.zenika.workshop.springbatch; import javax.xml.transform.dom.DOMSource; import org.springframework.batch.item.ItemProcessor; import org.springframework.web.client.RestTemplate; import org.w3c.dom.NodeList; public class SsnWebServiceItemProcessor implements ItemProcessor<Contact, Contact> { private RestTemplate restTemplate = new RestTemplate(); private String url; @Override public Contact process(Contact item) throws Exception { DOMSource source = restTemplate.getForObject(url,DOMSource.class, item.getFirstname(),item.getLastname()); String ssn = extractSsnFromXml(item, source); item.setSsn(ssn); return item; } private String extractSsnFromXml(Contact item, DOMSource source) { // some DOM code } (...) }
  • 15. Spring Batch Workshop (advanced) Item enrichmentConfiguring the SsnWebServiceItemProcessor <batch:job id="itemEnrichmentJob"> <batch:step id="itemEnrichmentStep"> <batch:tasklet> <batch:chunk reader="reader" processor="processor" writer="writer" commit-interval="3"/> </batch:tasklet> </batch:step> </batch:job> <bean id="processor" class="com.zenika.workshop.springbatch.SsnWebServiceItemProcessor"> <property name="url" value="http://localhost:8085/?firstname={firstname}&amp;lastname={lastname}" /> </bean>
  • 16. Spring Batch Workshop (advanced) Item enrichmentBut my Web Service has a lot of latency! The Web Service call can benefit from multi-threading Why not spawning several processing at the same time? We could wait for the completion in the ItemWriter Let’s use some asynchronous ItemProcessor and ItemWriter Provided in the Spring Batch Integration project
  • 17. Spring Batch Workshop (advanced) Item enrichmentUsing async ItemProcessor and ItemWriter This is only about wrapping <bean id="processor" class="o.s.b.integration.async.AsyncItemProcessor"> <property name="delegate" ref="processor" /> <property name="taskExecutor" ref="taskExecutor" /> </bean> <bean id="writer" class="o.s.b.integration.async.AsyncItemWriter"> <property name="delegate" ref="writer" /> </bean> <task:executor id="taskExecutor" pool-size="5" />
  • 18. Spring Batch Workshop (advanced) Item enrichmentGoing further... Business delegation with an ItemProcessor Available ItemProcessor implementations Adapter, validator The ItemProcessor can filter items
  • 19. Spring Batch Workshop (advanced) File reading partitioning Problem: I have multiple input files and I want to process them in parallel Solution: use partitioning to parallelize the processing on multiple threads
  • 20. Spring Batch Workshop (advanced) File reading partitioningSerial processing
  • 21. Spring Batch Workshop (advanced) File reading partitioningPartitioning in Spring Batch Partition the input data e.g. one input file = one partition partition processed in a dedicated step execution Partitioning is easy to set up but need some knowledge about the data Partition handler implementation Multi-threaded Spring Integration
  • 22. Spring Batch Workshop (advanced) File reading partitioningMulti-threaded partitioning
  • 23. Spring Batch Workshop (advanced) File reading partitioningPartitioner for input files <bean id="partitioner" class="o.s.b.core.partition.support.MultiResourcePartitioner"> <property name="resources" value="file:./src/main/resources/input/*.txt" /> </bean>
  • 24. Spring Batch Workshop (advanced) File reading partitioningThe partitioner sets a context for the components <bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step"> (...) <property name="resource" value="#{stepExecutionContext[’fileName’]}" /> </bean>
  • 25. Spring Batch Workshop (advanced) File reading partitioningUsing the multi-threaded partition handler <batch:job id="fileReadingPartitioningJob"> <batch:step id="partitionedStep" > <batch:partition step="readWriteContactsPartitionedStep" partitioner="partitioner"> <batch:handler task-executor="taskExecutor" /> </batch:partition> </batch:step> </batch:job> <batch:step id="readWriteContactsPartitionedStep"> <batch:tasklet> <batch:chunk reader="reader" writer="writer" commit-interval="10" /> </batch:tasklet> </batch:step>
  • 26. Spring Batch Workshop (advanced) File reading partitioningGoing further... Spring Integration partition handler implementation Other scaling approaches parallel steps, remote chunking, multi-threaded step)
  • 27. Spring Batch Workshop (advanced) File dropping launching Problem: downloading files from a FTP server and processing them with Spring Batch Solution: use Spring Integration to poll the FTP server and trigger Spring Batch accordingly
  • 28. Spring Batch Workshop (advanced) File dropping launchingUsing Spring Integration for transfer and triggering
  • 29. Spring Batch Workshop (advanced) File dropping launchingThe launching code public class FileContactJobLauncher { public void launch(File file) throws Exception { JobExecution exec = jobLauncher.run( job, new JobParametersBuilder() .addString("input.file", "file:"+file.getAbsolutePath()) .toJobParameters() ); } } The File is the local copy
  • 30. Spring Batch Workshop (advanced) File dropping launchingListening to the FTP server <int:channel id="fileIn" /> <int-ftp:inbound-channel-adapter local-directory="file:./input" channel="fileIn" session-factory="ftpClientFactory" remote-directory="/" auto-create-local-directory="true"> <int:poller fixed-rate="1000" /> </int-ftp:inbound-channel-adapter> <bean id="ftpClientFactory" class="o.s.i.ftp.session.DefaultFtpSessionFactory"> <property name="host" value="localhost"/> <property name="port" value="2222"/> <property name="username" value="admin"/> <property name="password" value="admin"/> </bean>
  • 31. Spring Batch Workshop (advanced) File dropping launchingCalling the launcher on an inbound message <int:channel id="fileIn" /> <int:service-activator input-channel="fileIn"> <bean class="c.z.w.springbatch.integration.FileContactJobLauncher"> <property name="job" ref="fileDroppingLaunchingJob" /> <property name="jobLauncher" ref="jobLauncher" /> </bean> </int:service-activator>
  • 32. Spring Batch Workshop (advanced) File dropping launchingGoing further... Checking Spring Integration connectors Local file system, FTPS, SFTP, HTTP, JMS, etc. Checking operations on messages Filtering, transforming, routing, etc.
  • 33. Spring Batch Workshop (advanced) Database reading partitioning Problem: I want to export items from different categories from a database to files Solution: provide a partition strategy and use partitioning
  • 34. Spring Batch Workshop (advanced) Database reading partitioningThe use case ID name category 1 Book 01 books 2 DVD 01 dvds 3 DVD 02 dvds 4 Book 02 books ? ? products books.txt products dvds.txt 1,Book 01,books 2,DVD 01,dvds 4,Book 02,books 3,DVD 02,dvds
  • 35. Spring Batch Workshop (advanced) Database reading partitioningPartitioning based on categories 2 partitions in this case ID name category 1 Book 01 books 2 DVD 01 dvds 3 DVD 02 dvds 4 Book 02 books
  • 36. Spring Batch Workshop (advanced) Database reading partitioningPartitioning logic with the Partitioner interface public class ProductCategoryPartitioner implements Partitioner { (...) @Override public Map<String, ExecutionContext> partition(int gridSize) { List<String> categories = jdbcTemplate.queryForList( "select distinct(category) from product", String.class ); Map<String, ExecutionContext> results = new LinkedHashMap<String, ExecutionContext>(); for(String category : categories) { ExecutionContext context = new ExecutionContext(); context.put("category", category); results.put("partition."+category, context); } return results; } }
  • 37. Spring Batch Workshop (advanced) Database reading partitioningOutput of the Partitioner Excerpt: for(String category : categories) { ExecutionContext context = new ExecutionContext(); context.put("category", category); results.put("partition."+category, context); } Results: partition.books = { category => ’books’ } partition.dvds = { category => ’dvds’ }
  • 38. Spring Batch Workshop (advanced) Database reading partitioningComponents can refer to partition parameters They need to use the step scope <bean id="reader" class="org.springframework.batch.item.database.JdbcCursorItemReader" scope="step"> <property name="sql" value="select id,name,category from product where category = ?" /> <property name="preparedStatementSetter"> <bean class="org.springframework.jdbc.core.ArgPreparedStatementSetter"> <constructor-arg value="#{stepExecutionContext[’category’]}" /> </bean> </property> </bean> <bean id="writer" class="org.springframework.batch.item.file.FlatFileItemWriter" scope="step"> <property name="resource" value="file:./target/products_#{stepExecutionContext[’category’]}.txt}" /> (...) </bean>
  • 39. Spring Batch Workshop (advanced) Database reading partitioningConfigure the partitioned step The default implementation is multi-threaded <batch:job id="databaseReadingPartitioningJob"> <batch:step id="partitionedStep" > <batch:partition step="readWriteProductsPartitionedStep" partitioner="partitioner"> <batch:handler task-executor="taskExecutor" /> </batch:partition> </batch:step> </batch:job> <batch:step id="readWriteProductsPartitionedStep"> <batch:tasklet> <batch:chunk reader="reader" writer="writer" commit-interval="10" /> </batch:tasklet> </batch:step>
  • 40. Spring Batch Workshop (advanced) Database reading partitioningGoing further... Check existing partitioner implementations Check other partition handler implementations Check other scaling strategies
  • 41. Spring Batch Workshop (advanced) Complex flow Problem: my job has a complex flow of steps, how can Spring Batch deal with it? Solution: use the step flow attributes and tags, as well as StepExecutionListeners.
  • 42. Spring Batch Workshop (advanced) Complex flowThe next attribute for a linear flow <batch:job id="complexFlowJob"> <batch:step id="digestStep" next="flatFileReadingStep"> <batch:tasklet ref="digestTasklet" /> </batch:step> <batch:step id="flatFileReadingStep"> <batch:tasklet> <batch:chunk reader="reader" writer="writer" commit-interval="3" /> </batch:tasklet> </batch:step> </batch:job>
  • 43. Spring Batch Workshop (advanced) Complex flowThere’s also a next tag non-linear flows <batch:job id="complexFlowJob"> <batch:step id="digestStep"> <batch:tasklet ref="digestTasklet" /> <batch:next on="COMPLETED" to="flatFileReadingStep"/> <batch:next on="FAILED" to="trackIncorrectFileStep"/> </batch:step> <batch:step id="flatFileReadingStep"> <batch:tasklet> <batch:chunk reader="reader" writer="writer" commit-interval="3" /> </batch:tasklet> </batch:step> <batch:step id="trackIncorrectFileStep"> <batch:tasklet ref="trackIncorrectFileTasklet" /> </batch:step> </batch:job> NB: there are also the end, fail, and stop tags
  • 44. Spring Batch Workshop (advanced) Complex flowBut if I have more complex flows? The next attribute decision is based on the exit code You can use your own exit codes A StepExecutionListener can modify the exit code of a step execution
  • 45. Spring Batch Workshop (advanced) Complex flowModifying the exit code of a step execution package com.zenika.workshop.springbatch; import org.springframework.batch.core.ExitStatus; import org.springframework.batch.core.StepExecution; import org.springframework.batch.core.StepExecutionListener; public class SkipsListener implements StepExecutionListener { @Override public void beforeStep(StepExecution stepExecution) { } @Override public ExitStatus afterStep(StepExecution stepExecution) { String exitCode = stepExecution.getExitStatus().getExitCode(); if (!exitCode.equals(ExitStatus.FAILED.getExitCode()) && stepExecution.getSkipCount() > 0) { return new ExitStatus("COMPLETED WITH SKIPS"); } else { return null; } } }
  • 46. Spring Batch Workshop (advanced) Complex flowPlugging the SkipsListener <step id="flatFileReadingStep"> <tasklet> <chunk reader="reader" writer="writer" commit-interval="3" skip-limit="10"> <skippable-exception-classes> <include class="o.s.b.item.file.FlatFileParseException"/> </skippable-exception-classes> </chunk> </tasklet> <end on="COMPLETED"/> <next on="COMPLETED WITH SKIPS" to="trackSkipsStep"/> <listeners> <listener> <beans:bean class="c.z.workshop.springbatch.SkipsListener" /> </listener> </listeners> </step>
  • 47. Spring Batch Workshop (advanced) Complex flowGoing further... The JobExecutionDecider and the decision tag The stop, fail, and end tags
  • 48. Spring Batch Workshop (advanced) Atomic processing Problem: I want to process an input file in an all-or-nothing manner. Solution: Don’t do it. Atomic processing isn’t batch processing! Anyway, if you really need an atomic processing, use a custom CompletionPolicy.
  • 49. Spring Batch Workshop (advanced) Atomic processingWhy atomic processing is a bad idea? You loose the benefits of chunk-oriented processing speed, small memory footprint, etc. On a rollback, you loose everything (that’s perhaps the point!) The rollback can take a long time (several hours) It all depends on the amount of data and on the processing
  • 50. Spring Batch Workshop (advanced) Atomic processingI really need an atomic processing Rollback yourself, with compensating transactions Use a transaction rollback It’s only a never ending chunk!
  • 51. Spring Batch Workshop (advanced) Atomic processingQuick and dirty, large commit interval Set the commit interval to a very large value You should never have more items! <chunk reader="reader" writer="writer" commit-interval="1000000"/>
  • 52. Spring Batch Workshop (advanced) Atomic processingUse a never-ending CompletionPolicy Spring Batch uses a CompletionPolicy to know if a chunk is complete package org.springframework.batch.repeat; public interface CompletionPolicy { boolean isComplete(RepeatContext context, RepeatStatus result); boolean isComplete(RepeatContext context); RepeatContext start(RepeatContext parent); void update(RepeatContext context); }
  • 53. Spring Batch Workshop (advanced) Atomic processingPlugging in the CompletionPolicy <batch:job id="atomicProcessingJob"> <batch:step id="atomicProcessingStep"> <batch:tasklet> <batch:chunk reader="reader" writer="writer" chunk-completion-policy="atomicCompletionPolicy" /> </batch:tasklet> </batch:step> </batch:job> NB: remove the commit-interval attribute when using a CompletionStrategy
  • 54. Spring Batch Workshop (advanced) Atomic processingWhich CompletionPolicy for my atomic processing? <bean id="atomicCompletionPolicy" class="o.s.b.repeat.policy.DefaultResultCompletionPolicy" />
  • 55. Spring Batch Workshop (advanced) Atomic processingGoing further... Flow in a job SkipPolicy, RetryPolicy