Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Batching and Java EE (jdk.io)

1,133 views

Published on

JDK.IO 2016 (http://jdk.io)
Java EE 7 introduced a new batch processing API. This session will go over how to use the batch processing API introduced with Java EE 7. This API is makes it easy to implement long running data/compute intensive jobs which need to be scheduled or initiated on-demand. Basics of the API will be demonstrated via code samples. The API will also be compared to Spring Batching and Hadoop to provide context and guidance on when these technologies are appropriate.

Published in: Software

Batching and Java EE (jdk.io)

  1. 1. BATCHING AND JAVA EE Ryan Cuprak
  2. 2. What is Batch Processing? Batch jobs are typically: • Bulk-oriented • Non-interactive • Potentially compute intensive • May require parallel execution • Maybe invoked, ad hoc, scheduled, on-demand etc.
  3. 3. Batching Examples • Monthly reports/statements • Daily data cleanup • One-time data migrations • Data synchronization • Data analysis • Portfolio rebalancing
  4. 4. Introducing Java EE Batching • Introduced in Java EE 7 • JSR 352 - https://jcp.org/en/jsr/detail?id=352 • Reference implementation: https://github.com/WASdev/standards.jsr352.jbatch • Batch Framework: • Batch container for execution of jobs • XML Job Specification Language • Batch annotations and interfaces • Supporting classes and interfaces for interacting with the container • Depends on CDI
  5. 5. Java EE Batching Overview JobOperator Job Step JobRepository ItemReader ItemProcessor ItemWriter 1 * 1 1 1 1 1 1
  6. 6. Java EE Batching Overview JobInstance Job JobExecution * * EndOfDayJob EndOfDayJob for 9/1/2016 First attempt at EndOfDay job for 9/1/2016
  7. 7. Java EE Batching Features • Fault-tolerant – checkpoints and job persistence • Transactions - chunks execute within a JTA transaction • Listeners – notification of job status/completion/etc. • Resource management – limits concurrent jobs • Starting/stopping/restarting – job control API
  8. 8. Java EE Batching Deployment WAR EAR JAR Deploy batch jobs in: Manage jobs – split application into modules Server B app.war End of Day Job Cleanup Job Server C app2.war Analytics Job Server A frontend.war
  9. 9. Batchlet
  10. 10. Exit Codes Code Description STARTING Job has been submitted to runtime. STARTED Batch job has started executing. STOPPING Job termination has been requested. STOPPED Job has been stopped. FAILED Job has thrown an error or failured triggered by <failure> COMPLETED Job has completed normally. ABANDONDED Job cannot be restarted
  11. 11. Basic Layout CDI Configuration Job Configuration Batchlet
  12. 12. Job Configuration META-INF/batch-jobs/<job-name>.xml
  13. 13. Batch Runtime
  14. 14. Batchlet with Termination Jobs should implement and terminate when requested!
  15. 15. Batching & Resources
  16. 16. Concurrent Resources
  17. 17. IDs and Names instanceId • ID represents an instance of a job. • Created when JobOperator start method invoked. executionId • ID that represents the next attempt to run a particular job instance. • Created when a job is started/restarted. • Only one executionId for a job can be started at a time stepExecutionId • ID for an attempt to execute a particular step in a job jobName • name of the job from XML (actually id) <job id=“”> jobXMLName • name of the config file in META-INF/batch-jobs
  18. 18. JobInstance vs. JobExecution JobInstance JobExecution 1 * • BatchStatus • createTime • endTime • executionID • exitStatus • jobName • jobParameters, lastUpdateTime • startTime • instanceId • jobName
  19. 19. Managing Jobs • JobOperator – interface for operating on batch jobs. • BatchRuntime.getJobOperator() • JobOperator: • Provides information on current and completed jobs • Used to start/stop/restart/abandon jobs • Security is implementation dependent • JobOperator interacts with JobRepository • JobRepository • Implementation out-side scope of JSR • No API for deleting old jobs • Reference implementation provides no API for cleanup!
  20. 20. JobOperator Methods Type Method void Abandon(long executionId) JobExecution getJobExecution(long executionId) List<JobExecution> getJobExecutions(JobInstance instance) JobInstance getJobInstance(long executionId) int getJobInstanceCount(String jobName) List<JobInstance> getJobInstances(String jobName,int start, in count) Set<String> getJobNames() Properties getParameters(long executionId) List<Long> getRunningExecutions(String jobName) List<StepExecution> getStepExecutions(long jobExecutionId) long Restart(long executionId, Properties restartParams) long start(String jobXMLName, Properties jobParams) void Stop(long executionId)
  21. 21. Listing Batch Jobs
  22. 22. Chunking • Chunking is primary pattern for batch processing in JSR- 352. • Encapsulates the ETL pattern: • Pieces: Reader/Processor/Writer • Reader/Processor invoked until an entire chuck of data is processed. • Output is written atomically • Implementation: • Interfaces: ItemReader/ItemWriter/ItemProcessor • Classes: AbstractReader/AbstractWriter/AbstractProcessor Reader Processor Writer
  23. 23. Chunking
  24. 24. Chunk Configuration Parameter Description checkpoint-policy Possible values: item or custom item-count Number of items to be processed per chunk. Default is 10. time-limit Time in seconds before taking a checkpoint. Default is 0 (means after each chunk) skip-limit Number of exceptions a step will skip if there are configured skippable exceptions. retry-limit Number of times a step will be retried if it has throw a skippable exception.
  25. 25. Skippable Exceptions
  26. 26. Chunking Step ItemReader ItemProcessor ItemWriter read() item process(item) item read() item process(item) item write(items) execute() ExitStatus
  27. 27. Chunking: ItemReader
  28. 28. Chunking: ItemProcessor
  29. 29. Chunking: ItemWriter
  30. 30. Demo
  31. 31. Runtime Parameters Set Property Retrieve Property
  32. 32. Pre-Defined Properties Set Property Property Injected
  33. 33. Step Exceptions • Parallel running instances (partition) complete before the job completes. • Batch status transitions to FAILED
  34. 34. Job Listener Configuration Listener Config
  35. 35. Job Listener Implementation
  36. 36. Step Listener Configuration Listener Config
  37. 37. Step Listener Implementation
  38. 38. Partition Configuration
  39. 39. Partition Implementation
  40. 40. Decision Configuration Decision What next?
  41. 41. Decision Implementation Dependency Injection!
  42. 42. Split updateExisting processNewStorms Flow & Splits JCL • <flow> element is used to implement process workflows. • <split> element is used to run jobs in parallel retrieveTracking processDecider stormReader stormProcessor stormWriter updateExisting Storms
  43. 43. Flows & Splits
  44. 44. Checkpoint Algorithm Configuration
  45. 45. Checkpoint Algorithm Implementation
  46. 46. Hadoop Overview • Massively scalable storage and batch data processing system • Written in Java • Huge ecosystem • Meant for massive data processing jobs • Horizontally scalable • Uses MapReduce programming model • Handles processing of petabytes of data • Started at Yahoo! In 2005.
  47. 47. Hadoop MapReduce (Distributed Computation) HDFS (Distributed Storage) YARN Framework Common Utilities
  48. 48. Hadoop Typically Hadoop is used when: • Analysis is performed on unstructured datasets • Data is stored across multiple servers (HDFS) • Non-Java processes are fed data and managed Ex. https://svi.nl/HuygensSoftware
  49. 49. Spring vs. Java EE Batching • Spring Batch 3.0 implements JSR-352! • Batch artifacts developed against JSR-352 won’t work within a traditional Spring Batch Job • Same two processing models as Spring Batch: • Item – aka chunking • Task - aka Batchlet
  50. 50. Terminology Comparison JSR-352 Spring Batch Job Job Step Step Chunk Chunk Item Item ItemReader ItemReader/ItemStream ItemProcessor ItemProcessor ItemWriter ItemWriter/ItemStream JobInstance JobInstance JobExecution JobExecution StepExecution StepExecution JobListener JobExecutionListener StepListener StepExecutionListener
  51. 51. Scaling Batch Jobs • Traditional Spring Batch Scaling: • Split – running multiple steps in parallel • Multiple threads – executing a single step via multiple threads • Partitioning – dividing data up for parallel processing • Remote Chunking – executing the processor logic remotely • JSR-352 Job Scaling • Split – running multiple steps in parallel • Partitioning – dividing data up – implementation slightly different.
  52. 52. JSR-352/Spring/Hadoop Hadoop • Massively parallel / large jobs • Processing petabytes of data (BIG DATA) JSR-352/Spring • Traditional batch processing jobs • Structured data/business processes JSR-352 vs. Spring • Java EE versus Spring containers • Spring has better job scaling capabilities
  53. 53. JSR-352 Implementations • JBeret • http://tinyurl.com/z4qx3wo • WebSphere/WebLogic/Payara • jbatch (reference) • http://tinyurl.com/jk6vcb8 • WildFly/JBoss • SpringBatch • http://tinyurl.com/mt8v3k7
  54. 54. Best Practices • Package/deploy batch jobs separately • Implement logic to cleanup old jobs • Implement logic for auto-restart • Test restart and checkpoint logic • Configure database to store jobs • Configure thread pool for batch jobs • Only invoke batch jobs from logic that is secured (@Role etc.)
  55. 55. Resources • JSR-352 https://jcp.org/en/jsr/detail?id=352 • Java EE Support http://javaee.support/contributors/ • Spring Batch http://docs.spring.io/spring-batch/reference/html/spring-batch- intro.html • Spring JSR-352 Support http://docs.spring.io/spring-batch/reference/html/jsr-352.html
  56. 56. Resources • Java EE 7 Batch Processing and World of Warcraft http://tinyurl.com/gp8yls8 • Three Key Concepts for Understanding JSR-352 http://tinyurl.com/oxe2dhu • Java EE Tutorial https://docs.oracle.com/javaee/7/tutorial/batch- processing.htm
  57. 57. Q&A Email: rcuprak@gmail.com Twitter: @ctjava

×