SBJUG - Building Beautiful Batch Jobs

  • 728 views
Uploaded on

No software engineer is exempt from writing batch jobs, its just part of every software app stack. Lack of a standards and a reusable batch architecture in the Java platform has resulted in the …

No software engineer is exempt from writing batch jobs, its just part of every software app stack. Lack of a standards and a reusable batch architecture in the Java platform has resulted in the proliferation of many one-off in-house batch solutions.

In this talk I presented about Spring Batch Framework, how it could help an organization to standardize their batching needs. I also talked about a Real World Use Case in Dealer dot com. How we used Spring Batch along with Spring Integration to solve our job concurrency, data flow control, job resiliency & additional requirements - thus enabling us to build beautiful batch jobs.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
728
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
1
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Building Beautiful Batch Jobs ! Who says batch jobs can’t be beautiful code? SouthBay JVM User Group (SBJUG) Meetup - November 2013
  • 2. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 3. About me • Software Engineer • Worked on complex integration projects – CSIS, LAPD, UCLA • Worked on one high traffic system – Napster • Currently at Dealer.com • Fascinated by all things Engineering
  • 4. Dealer.com • Leader in Automotive Marketing • 10K+ clients, 12K+ Websites • CRM is our new product offering • It’s definitely a great place to work. I’d recommend it to a friend.
  • 5. Believe it or not – these are actually Dealer.com’s Core Values
  • 6. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 7. Background • Lack of frameworks for Java-based batch processing • Proliferation of many one-off, in-house solutions • SpringSource and Accenture changed this • June 2008 – production version of Spring Batch • Spring Batch is the only open source framework that provides a robust, enterprise-scale solution • Batch Application for Java Platform is coming soon (JSR 352)
  • 8. Usage Scenario A typical batch program reads a large number of records from a database, file, or queue, processes the data in some fashion, and then writes back data in a modified form • • • • • • Commit batch process periodically Sequential processing of dependent steps Partial processing: skip records Concurrent batch processing Massively parallel batch processing Manual or scheduled restart after failure
  • 9. Domain Language of a Batch • Job • Step • Item Reader - • Item Processor • Item Writer - • • • • - Job Launcher Job Repository Job Instance Job Execution has one to many steps has item reader, processor or writer an abstraction that represents the retrieval of input for a Step, one item at a time an abstraction that represents the business processing of an item an abstraction that represents the output of a Step, chunk of items at a time launches jobs store metadata about currently running jobs an instance of a job with its unique parameters an execution attempt of a job instance
  • 10. Batch Components
  • 11. Job, Job Instance, Job Execution
  • 12. Job Parameters
  • 13. Job – Tasklet
  • 14. Job – Sequential Flow
  • 15. Job – Conditional Flow
  • 16. Job – Chunk Oriented Processing
  • 17. Item Readers and Writers - Out of the box Item Readers Item Writers AmqpItemReader AmqpItemWriter FlatFileItemReader CompositeItemWriter HibernateCursorItemReader FlatFileItemWriter HibernatePagingItemReader GemfireItemWriter IbatisPagingItemReader HibernateItemWriter ItemReaderAdapter IbatisBatchItemWriter JdbcCursorItemReader ItemWriterAdapter JdbcPagingItemReader JdbcBatchItemWriter JmsItemReader JmsItemWriter JpaPagingItemReader JpaItemWriter ListItemReader MimeMessageItemWriter MongoItemReader MongoItemWriter Neo4jItemReader Neo4jItemWriter RepositoryItemReader RepositoryItemWriter StoredProcedureItemReader PropertyExtractingDelegatingItemWriter StaxEventItemReader StaxEventItemWriter
  • 18. Job Repository Data Model
  • 19. Let’s look at a couple of examples of building simple Spring Batch Jobs Example 1 – Load Flat file contents into database Example 2 – Load XML file contents into database
  • 20. Configure DataSource and Spring Batch Core Beans spring-batch-context.xml :
  • 21. Example1: Load Flat file contents into database PERSON Table person-data.csv Jill,Doe Joe,Doe Justin,Doe Jane,Doe John,Doe PERSON_ID 1 JILL DOE 2 JOE DOE 3 Transform Data to Upper Case FIRST_NAME LAST_NAME JUSTIN DOE 4 JANE DOE 5 JOHN DOE
  • 22. Example1: Job Config flat-file-reader-job.xml Chunk Processing: • Reader – retrieves input for a Step one item at a time • Processor – processes an item • Writer – writes the output, one item or chunk of items at a time
  • 23. Example1: Reader, Processor and Writer flat-file-reader-job.xml (cont..d)
  • 24. Example1: Person Item Processor
  • 25. Example1: Test Case to Execute Flat File Reader Job
  • 26. Example2: Load XML file contents into database record-data.xml AD_PERFORMANCE Table ID DATE IMPRESSION CLICKS EARNING 1 06/01/2013 139237 57 220.90 2 06/02/2013 339100 57 320.88 3 06/03/2013 431436 57 27.80
  • 27. Example2: Job Config xml-file-reader-job.xml
  • 28. Example2: Reader, JAXB Unmarshaller, Processor and Writer xml-file-reader-job.xml (cont..d)
  • 29. Example2: Record Item Processor
  • 30. Example2: Ad Performance Writer
  • 31. Example2: Test Case to Execute XML File Reader Job
  • 32. Spring Batch Admin Webapp
  • 33. Jobs
  • 34. Job Executions
  • 35. Job Execution Details
  • 36. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 37. Business Problem • CRM entering Dealer's day-to-day Operations • We need to Pull data from Dealer’s DMS systems into CRM • DMS Systems can be ADP or Reynolds or DealerTrack etc
  • 38. Here’s a Small Big Picture Dealer’s DMS Systems Dealer.com’s DMS & CRM Systems ADP Extract Reynolds DealerTrack DMS Load CRM
  • 39. Typical Batch Job • Download data from DMS Provider for a dealership • Load the data in CRM • Generate report on how the data was processed
  • 40. ADP Vehicle Sales ETL Job Configuration
  • 41. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 42. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 43. Pull Frequently • We have 100s of Dealerships, so each batch Job has to be run for a Dealer’s ADP Account • We schedule Jobs for each dealership to pull every 4 hours • The Job Scheduling is managed via a centralized DDC Scheduling Server – Clients issue scheduling requests via a command queue to the server – The server will then fire scheduled events back onto a queue for clients to consume – Clients and DDC Scheduling Server communicate through a single rabbit exchange. Each client is chooses an unique application key and binds to this exchange to receive messages about its scheduled events – Named ClockTower: it’s worth a separate talk in itself
  • 44. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 45. Job Concurrency • 100s of scheduled or manually initiated jobs can all go off at the same time • We want to control how many jobs should run in our Cluster concurrently • We used basic queuing to solve this – all job commands go into a queue – they get processed one at a time – we can control how many consumers we want to allow across the cluster • We use Spring Integration AMQP OutBound & InBound Adapters
  • 46. Running Jobs Concurrently – Competing Consumer Pattern DMS Service 01 Job1 Scheduled and Manually Initiated Job Commands come through the same Queue DMS Pull Job Queue Job5 Job4 Job3 DMS Service 02 Job2 • Each Node is configured with multiple concurrent Consumers (3 as of now) • As we take more Tenants we could scale horizontally by adding more Nodes
  • 47. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 48. Data Flow Control ADP Extract DMS Load CRM • We need to control the load we put on the CRM system • We don't want to EVER load too much data at the same time • We debated two ways to solve this – Synchronous – Asynchronous (via Queues)
  • 49. Sync vs Async Loading Data into CRM CRM Batch 01 DMS Service 01 Job1 CRM Batch Service Load Balancer (SYNC) CRM Batch 02 CRM Batch 01 DMS Service 01 (ASYNC) DMS Data Load Queue Job1 CRM Batch 02
  • 50. Synchronous • Haproxy load balancer - cannot be scaled dynamically • Remote call needs to be made via REST or Spring Remoting API - tightly coupled • Client has to fail the batch job or retry the request on failure - not fault tolerant • Nodes need to throttle the number of incoming requests (via tomcat threads) – have to administer tomcat threads, nodes cannot be repurposed Asynchronous • AMQP Rabbit Queue - can be scaled dynamically • Only contract is the 'message' being passed – some what loosely coupled • If a node fails, message will be unacknowledged and another node will execute the same request - fault tolerant • Each node can control the number of concurrent queue consumers – application configuration, nodes can be purposed • It does incur some extra cost, message persistence & dynamic reply queues - extra cost We settled on loading via Queue using Spring Integration AMQP Gateways (which are Bi-Directional), the call waits for response to come back via reply queue
  • 51. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 52. We send out an awesome looking email notification to an internal mailing list
  • 53. The CSV Report has Detailed information how each row was processed
  • 54. We are working towards a UI that’ll look like this
  • 55. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 56. Job Resiliency ADP Extract DMS Load CRM 100s of Jobs could go off at the same time and jobs need to be resilient to unexpected failures • While a big job is running, CRM could crash or get restarted for deployment • While a big job is running, DMS could crash or get restarted for deployment In such cases, we want to rerun the job after a short while from where it left off. • We use Spring Batch’s Job Restart-ability feature to achieve this
  • 57. What could go wrong? DMS Service 01 Job1 CRM Batch 01 X DMS Pull Job Queue X DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 X X  Nodes that could just crash or could be restarted due to a deployment – when a big job is running. Our goal is to be able to rerun the job, and resume from where left things left off. X
  • 58. Spring Batch – Restartability • Spring Batch maintains Job State in the database – which Step is completed, being processed or failed – Which item is being processed when Chunk processing • Jobs can be restarted using the Job ExecutionId • Spring Batch will skip over the steps and run the job from where it left off before • If the job had failed during Chunk processing it’ll skip processing the items that were already processed and start from where it left off before
  • 59. When CRM goes down DMS Service 01 CRM Batch 01 Job1 DMS Pull Job Queue X DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 • We have a timeout of 5 minutes for the reply from CRM • When CRM Batch Nodes are down, we’ll get a timeout Exception, which results in a new Job Command Message to the DMS Pull Job Queue • The message includes the JobExecutionId • Which ever node picks up the message will resume the job from where it left off X
  • 60. When a DMS Service Node goes down DMS Service 01 CRM Batch 01 Job1 DMS Pull Job Queue DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 X • When a DMS Node executing the Job goes down, the message will be unacknowledged, and will be picked up by any other node connected to the DMS Pull Job Queue • The node that picks up the message will inspect if this job was already running and stopped abruptly, and if so it’ll try to resume it from where it left off • (This is not in production yet, its under development)
  • 61. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 62. So, what makes it beautiful? • Simple – We just used the basic features of Spring Batch • Easy to understand – Quick look at spring configurations is all you need • Less code – We focused on the business logic • Low maintenance – Anybody can maintain it
  • 63. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 64. On Spring Batch • • • • • • Really easy to setup and user Highly configurable Chunk Processing is the bomb! Beware of the commit count The bean ‘step’ scope comes in handy ExecutionContext is limited to 4 data types
  • 65. On 3rd Party Integration • • • • • • Plan for Dev & Live accounts and environments Configure anything and everything possible Download large files via streaming Handle exceptions properly Embrace data translation errors Build jobs that are repeat runnable
  • 66. Sources • Spring Batch Reference Documentation – http://docs.spring.io/spring-batch/reference/html-single/index.html • Ad Performance Sample XML taken from – http://www.mkyong.com/spring-batch/spring-batch-example-xml-fileto-database/
  • 67. Questions?
  • 68. Shameless Plug Currently we have a few openings in the Manhattan Beach office • Java Developers • UI Developers • Web Developers If interested please apply at http://careers.dealer.com/