Building Beautiful Batch Jobs !
Who says batch jobs can’t be beautiful code?

SouthBay JVM User Group (SBJUG)
Meetup - Nov...
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
About me
• Software Engineer
• Worked on complex integration projects
– CSIS, LAPD, UCLA

• Worked on one high traffic sys...
Dealer.com
• Leader in Automotive Marketing
• 10K+ clients, 12K+ Websites
• CRM is our new product offering

• It’s defini...
Believe it or not – these are actually Dealer.com’s Core Values
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
Background
• Lack of frameworks for Java-based batch
processing
• Proliferation of many one-off, in-house solutions
• Spri...
Usage Scenario
A typical batch program reads a large number of
records from a database, file, or queue, processes
the data...
Domain Language of a Batch
• Job
• Step
• Item Reader

-

• Item Processor • Item Writer

-

•
•
•
•

-

Job Launcher
Job ...
Batch Components
Job, Job Instance, Job Execution
Job Parameters
Job – Tasklet
Job – Sequential Flow
Job – Conditional Flow
Job – Chunk Oriented Processing
Item Readers and Writers - Out of the box
Item Readers

Item Writers

AmqpItemReader

AmqpItemWriter

FlatFileItemReader

...
Job Repository Data Model
Let’s look at a couple of examples of building simple
Spring Batch Jobs

Example 1 – Load Flat file contents into database...
Configure DataSource and Spring Batch Core Beans
spring-batch-context.xml :
Example1: Load Flat file contents into database
PERSON Table

person-data.csv

Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
John,D...
Example1: Job Config
flat-file-reader-job.xml

Chunk Processing:
• Reader – retrieves input for a Step one item at a time
...
Example1: Reader, Processor and Writer
flat-file-reader-job.xml (cont..d)
Example1: Person Item Processor
Example1: Test Case to Execute Flat File Reader Job
Example2: Load XML file contents into database
record-data.xml

AD_PERFORMANCE Table
ID

DATE

IMPRESSION CLICKS EARNING

...
Example2: Job Config
xml-file-reader-job.xml
Example2: Reader, JAXB Unmarshaller, Processor and Writer
xml-file-reader-job.xml (cont..d)
Example2: Record Item Processor
Example2: Ad Performance Writer
Example2: Test Case to Execute XML File Reader Job
Spring Batch Admin Webapp
Jobs
Job Executions
Job Execution Details
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
Business Problem
• CRM entering Dealer's day-to-day Operations

• We need to Pull data from Dealer’s DMS systems into CRM
...
Here’s a Small Big Picture
Dealer’s DMS
Systems

Dealer.com’s
DMS & CRM Systems

ADP
Extract

Reynolds

DealerTrack

DMS

...
Typical Batch Job
• Download data from DMS Provider for a dealership
• Load the data in CRM
• Generate report on how the d...
ADP Vehicle Sales ETL Job Configuration
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need dat...
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need dat...
Pull Frequently
• We have 100s of Dealerships, so each batch Job has to be run
for a Dealer’s ADP Account
• We schedule Jo...
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need dat...
Job Concurrency
• 100s of scheduled or manually initiated jobs can all go off at
the same time
• We want to control how ma...
Running Jobs Concurrently – Competing Consumer Pattern
DMS Service 01

Job1

Scheduled and Manually Initiated Job
Commands...
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need dat...
Data Flow Control
ADP

Extract

DMS

Load

CRM

• We need to control the load we put on the CRM system
• We don't want to ...
Sync vs Async Loading Data into CRM
CRM
Batch 01

DMS Service 01

Job1

CRM
Batch
Service
Load
Balancer

(SYNC)
CRM
Batch ...
Synchronous
• Haproxy load balancer - cannot be scaled dynamically
• Remote call needs to be made via REST or Spring Remot...
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need dat...
We send out an awesome looking email notification to an internal mailing list
The CSV Report has Detailed information how each row was processed
We are working towards a UI that’ll look like this
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need dat...
Job Resiliency
ADP

Extract

DMS

Load

CRM

100s of Jobs could go off at the same time and jobs need to be resilient to
u...
What could go wrong?
DMS Service 01

Job1

CRM
Batch 01

X

DMS Pull Job Queue

X

DMS Data Load Queue
DMS Service 02

Job...
Spring Batch – Restartability
• Spring Batch maintains Job State in the database
– which Step is completed, being processe...
When CRM goes down
DMS Service 01

CRM
Batch 01

Job1

DMS Pull Job Queue

X

DMS Data Load Queue
DMS Service 02

Job2

CR...
When a DMS Service Node goes down
DMS Service 01

CRM
Batch 01

Job1

DMS Pull Job Queue

DMS Data Load Queue
DMS Service ...
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need dat...
So, what makes it beautiful?
• Simple
– We just used the basic features of Spring Batch

• Easy to understand
– Quick look...
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
On Spring Batch
•
•
•
•
•
•

Really easy to setup and user
Highly configurable
Chunk Processing is the bomb!
Beware of the...
On 3rd Party Integration
•
•
•
•
•
•

Plan for Dev & Live accounts and environments
Configure anything and everything poss...
Sources
• Spring Batch Reference Documentation
– http://docs.spring.io/spring-batch/reference/html-single/index.html
• Ad ...
Questions?
Shameless Plug
Currently we have a few openings in the Manhattan Beach
office
• Java Developers
• UI Developers
• Web Deve...
SBJUG - Building Beautiful Batch Jobs
SBJUG - Building Beautiful Batch Jobs
Upcoming SlideShare
Loading in...5
×

SBJUG - Building Beautiful Batch Jobs

1,367

Published on

No software engineer is exempt from writing batch jobs, its just part of every software app stack. Lack of a standards and a reusable batch architecture in the Java platform has resulted in the proliferation of many one-off in-house batch solutions.

In this talk I presented about Spring Batch Framework, how it could help an organization to standardize their batching needs. I also talked about a Real World Use Case in Dealer dot com. How we used Spring Batch along with Spring Integration to solve our job concurrency, data flow control, job resiliency & additional requirements - thus enabling us to build beautiful batch jobs.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,367
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

SBJUG - Building Beautiful Batch Jobs

  1. 1. Building Beautiful Batch Jobs ! Who says batch jobs can’t be beautiful code? SouthBay JVM User Group (SBJUG) Meetup - November 2013
  2. 2. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  3. 3. About me • Software Engineer • Worked on complex integration projects – CSIS, LAPD, UCLA • Worked on one high traffic system – Napster • Currently at Dealer.com • Fascinated by all things Engineering
  4. 4. Dealer.com • Leader in Automotive Marketing • 10K+ clients, 12K+ Websites • CRM is our new product offering • It’s definitely a great place to work. I’d recommend it to a friend.
  5. 5. Believe it or not – these are actually Dealer.com’s Core Values
  6. 6. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  7. 7. Background • Lack of frameworks for Java-based batch processing • Proliferation of many one-off, in-house solutions • SpringSource and Accenture changed this • June 2008 – production version of Spring Batch • Spring Batch is the only open source framework that provides a robust, enterprise-scale solution • Batch Application for Java Platform is coming soon (JSR 352)
  8. 8. Usage Scenario A typical batch program reads a large number of records from a database, file, or queue, processes the data in some fashion, and then writes back data in a modified form • • • • • • Commit batch process periodically Sequential processing of dependent steps Partial processing: skip records Concurrent batch processing Massively parallel batch processing Manual or scheduled restart after failure
  9. 9. Domain Language of a Batch • Job • Step • Item Reader - • Item Processor • Item Writer - • • • • - Job Launcher Job Repository Job Instance Job Execution has one to many steps has item reader, processor or writer an abstraction that represents the retrieval of input for a Step, one item at a time an abstraction that represents the business processing of an item an abstraction that represents the output of a Step, chunk of items at a time launches jobs store metadata about currently running jobs an instance of a job with its unique parameters an execution attempt of a job instance
  10. 10. Batch Components
  11. 11. Job, Job Instance, Job Execution
  12. 12. Job Parameters
  13. 13. Job – Tasklet
  14. 14. Job – Sequential Flow
  15. 15. Job – Conditional Flow
  16. 16. Job – Chunk Oriented Processing
  17. 17. Item Readers and Writers - Out of the box Item Readers Item Writers AmqpItemReader AmqpItemWriter FlatFileItemReader CompositeItemWriter HibernateCursorItemReader FlatFileItemWriter HibernatePagingItemReader GemfireItemWriter IbatisPagingItemReader HibernateItemWriter ItemReaderAdapter IbatisBatchItemWriter JdbcCursorItemReader ItemWriterAdapter JdbcPagingItemReader JdbcBatchItemWriter JmsItemReader JmsItemWriter JpaPagingItemReader JpaItemWriter ListItemReader MimeMessageItemWriter MongoItemReader MongoItemWriter Neo4jItemReader Neo4jItemWriter RepositoryItemReader RepositoryItemWriter StoredProcedureItemReader PropertyExtractingDelegatingItemWriter StaxEventItemReader StaxEventItemWriter
  18. 18. Job Repository Data Model
  19. 19. Let’s look at a couple of examples of building simple Spring Batch Jobs Example 1 – Load Flat file contents into database Example 2 – Load XML file contents into database
  20. 20. Configure DataSource and Spring Batch Core Beans spring-batch-context.xml :
  21. 21. Example1: Load Flat file contents into database PERSON Table person-data.csv Jill,Doe Joe,Doe Justin,Doe Jane,Doe John,Doe PERSON_ID 1 JILL DOE 2 JOE DOE 3 Transform Data to Upper Case FIRST_NAME LAST_NAME JUSTIN DOE 4 JANE DOE 5 JOHN DOE
  22. 22. Example1: Job Config flat-file-reader-job.xml Chunk Processing: • Reader – retrieves input for a Step one item at a time • Processor – processes an item • Writer – writes the output, one item or chunk of items at a time
  23. 23. Example1: Reader, Processor and Writer flat-file-reader-job.xml (cont..d)
  24. 24. Example1: Person Item Processor
  25. 25. Example1: Test Case to Execute Flat File Reader Job
  26. 26. Example2: Load XML file contents into database record-data.xml AD_PERFORMANCE Table ID DATE IMPRESSION CLICKS EARNING 1 06/01/2013 139237 57 220.90 2 06/02/2013 339100 57 320.88 3 06/03/2013 431436 57 27.80
  27. 27. Example2: Job Config xml-file-reader-job.xml
  28. 28. Example2: Reader, JAXB Unmarshaller, Processor and Writer xml-file-reader-job.xml (cont..d)
  29. 29. Example2: Record Item Processor
  30. 30. Example2: Ad Performance Writer
  31. 31. Example2: Test Case to Execute XML File Reader Job
  32. 32. Spring Batch Admin Webapp
  33. 33. Jobs
  34. 34. Job Executions
  35. 35. Job Execution Details
  36. 36. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  37. 37. Business Problem • CRM entering Dealer's day-to-day Operations • We need to Pull data from Dealer’s DMS systems into CRM • DMS Systems can be ADP or Reynolds or DealerTrack etc
  38. 38. Here’s a Small Big Picture Dealer’s DMS Systems Dealer.com’s DMS & CRM Systems ADP Extract Reynolds DealerTrack DMS Load CRM
  39. 39. Typical Batch Job • Download data from DMS Provider for a dealership • Load the data in CRM • Generate report on how the data was processed
  40. 40. ADP Vehicle Sales ETL Job Configuration
  41. 41. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  42. 42. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  43. 43. Pull Frequently • We have 100s of Dealerships, so each batch Job has to be run for a Dealer’s ADP Account • We schedule Jobs for each dealership to pull every 4 hours • The Job Scheduling is managed via a centralized DDC Scheduling Server – Clients issue scheduling requests via a command queue to the server – The server will then fire scheduled events back onto a queue for clients to consume – Clients and DDC Scheduling Server communicate through a single rabbit exchange. Each client is chooses an unique application key and binds to this exchange to receive messages about its scheduled events – Named ClockTower: it’s worth a separate talk in itself
  44. 44. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  45. 45. Job Concurrency • 100s of scheduled or manually initiated jobs can all go off at the same time • We want to control how many jobs should run in our Cluster concurrently • We used basic queuing to solve this – all job commands go into a queue – they get processed one at a time – we can control how many consumers we want to allow across the cluster • We use Spring Integration AMQP OutBound & InBound Adapters
  46. 46. Running Jobs Concurrently – Competing Consumer Pattern DMS Service 01 Job1 Scheduled and Manually Initiated Job Commands come through the same Queue DMS Pull Job Queue Job5 Job4 Job3 DMS Service 02 Job2 • Each Node is configured with multiple concurrent Consumers (3 as of now) • As we take more Tenants we could scale horizontally by adding more Nodes
  47. 47. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  48. 48. Data Flow Control ADP Extract DMS Load CRM • We need to control the load we put on the CRM system • We don't want to EVER load too much data at the same time • We debated two ways to solve this – Synchronous – Asynchronous (via Queues)
  49. 49. Sync vs Async Loading Data into CRM CRM Batch 01 DMS Service 01 Job1 CRM Batch Service Load Balancer (SYNC) CRM Batch 02 CRM Batch 01 DMS Service 01 (ASYNC) DMS Data Load Queue Job1 CRM Batch 02
  50. 50. Synchronous • Haproxy load balancer - cannot be scaled dynamically • Remote call needs to be made via REST or Spring Remoting API - tightly coupled • Client has to fail the batch job or retry the request on failure - not fault tolerant • Nodes need to throttle the number of incoming requests (via tomcat threads) – have to administer tomcat threads, nodes cannot be repurposed Asynchronous • AMQP Rabbit Queue - can be scaled dynamically • Only contract is the 'message' being passed – some what loosely coupled • If a node fails, message will be unacknowledged and another node will execute the same request - fault tolerant • Each node can control the number of concurrent queue consumers – application configuration, nodes can be purposed • It does incur some extra cost, message persistence & dynamic reply queues - extra cost We settled on loading via Queue using Spring Integration AMQP Gateways (which are Bi-Directional), the call waits for response to come back via reply queue
  51. 51. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  52. 52. We send out an awesome looking email notification to an internal mailing list
  53. 53. The CSV Report has Detailed information how each row was processed
  54. 54. We are working towards a UI that’ll look like this
  55. 55. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  56. 56. Job Resiliency ADP Extract DMS Load CRM 100s of Jobs could go off at the same time and jobs need to be resilient to unexpected failures • While a big job is running, CRM could crash or get restarted for deployment • While a big job is running, DMS could crash or get restarted for deployment In such cases, we want to rerun the job after a short while from where it left off. • We use Spring Batch’s Job Restart-ability feature to achieve this
  57. 57. What could go wrong? DMS Service 01 Job1 CRM Batch 01 X DMS Pull Job Queue X DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 X X  Nodes that could just crash or could be restarted due to a deployment – when a big job is running. Our goal is to be able to rerun the job, and resume from where left things left off. X
  58. 58. Spring Batch – Restartability • Spring Batch maintains Job State in the database – which Step is completed, being processed or failed – Which item is being processed when Chunk processing • Jobs can be restarted using the Job ExecutionId • Spring Batch will skip over the steps and run the job from where it left off before • If the job had failed during Chunk processing it’ll skip processing the items that were already processed and start from where it left off before
  59. 59. When CRM goes down DMS Service 01 CRM Batch 01 Job1 DMS Pull Job Queue X DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 • We have a timeout of 5 minutes for the reply from CRM • When CRM Batch Nodes are down, we’ll get a timeout Exception, which results in a new Job Command Message to the DMS Pull Job Queue • The message includes the JobExecutionId • Which ever node picks up the message will resume the job from where it left off X
  60. 60. When a DMS Service Node goes down DMS Service 01 CRM Batch 01 Job1 DMS Pull Job Queue DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 X • When a DMS Node executing the Job goes down, the message will be unacknowledged, and will be picked up by any other node connected to the DMS Pull Job Queue • The node that picks up the message will inspect if this job was already running and stopped abruptly, and if so it’ll try to resume it from where it left off • (This is not in production yet, its under development)
  61. 61. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  62. 62. So, what makes it beautiful? • Simple – We just used the basic features of Spring Batch • Easy to understand – Quick look at spring configurations is all you need • Less code – We focused on the business logic • Low maintenance – Anybody can maintain it
  63. 63. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  64. 64. On Spring Batch • • • • • • Really easy to setup and user Highly configurable Chunk Processing is the bomb! Beware of the commit count The bean ‘step’ scope comes in handy ExecutionContext is limited to 4 data types
  65. 65. On 3rd Party Integration • • • • • • Plan for Dev & Live accounts and environments Configure anything and everything possible Download large files via streaming Handle exceptions properly Embrace data translation errors Build jobs that are repeat runnable
  66. 66. Sources • Spring Batch Reference Documentation – http://docs.spring.io/spring-batch/reference/html-single/index.html • Ad Performance Sample XML taken from – http://www.mkyong.com/spring-batch/spring-batch-example-xml-fileto-database/
  67. 67. Questions?
  68. 68. Shameless Plug Currently we have a few openings in the Manhattan Beach office • Java Developers • UI Developers • Web Developers If interested please apply at http://careers.dealer.com/

×