Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Java Batch for Cost Optimized Efficiency


Published on

Slides for the JavaOne 2012 session on Java batch for Cost Optimized Efficiency. This session talks about the importance of Java Batch in Enterprise computing and provides a reference architecture, overview of the JSR 352 and the WebSphere Batch solutions.

Java Batch for Cost Optimized Efficiency

  1. 1. Java Batch for Cost Optimized EfficiencySridhar SudarsanWatson SolutionsChief Architect, Batch processing Strategy, October 03, 2012
  2. 2. What we’ll cover today• Why Batch ?• Batch platform in a solution• Java & Batch• Java Batch JSR 352• Java Batch offerings• Best practices and some Customer scenarios
  3. 3. Achieve business efficiency through a balanced blend of Batch and Online processing A continuous interleaving of bulk and real-time processing maximizes the balance between operational efficiency and market responsiveness on a 24x7 global basis. It enables an cost effective, always-on business environment 8 am 8 pm 8 am Optimize cycle time for business processes: Get improved information Online availability and quality with continuous Batch batch processing Reduce costs: Consolidate and standardize IT systems, services and 8 am 8 am people skills between batch and OLTP Adopt elastic batch processing: Expand or contract batch windows Batch Batch dynamically based on business Batch Batch decisions and IT resource usage BatchDevelop re-usable and composable bulk services through business process analysis, design analysis,programming models, tools and runtimes for bulk-processing services.Build and manage workloads smarter to address evolving business needs and gain a competitive edge
  4. 4. Common batch patterns • Integrate batch applications in a Service Oriented architecture – Reuse business logic between Batch and OLTP applications – Leverage rules, processes and events in batch or bulk – Compose business services using batch and real-time activities • Run managed java batch applications on the mainframe – Modernize legacy batch applications – Better integration of infrastructure and operations • Use Java batch applications to handle shrinking batch windows – More data to process in shorter windows – Business process changes to handle 24x7 batch processing • Leverage batch towards achieving agility – Faster turnaround to implement new(er) or modified business processes – Be better equipped to manage regulatory compliance quicklyIdentify the right strategy for you to help reduce the cost of running batch efficiently
  5. 5. Batch processing reference model Invocation services Ad hoc Planned Batch Partner Business process Scheduler services and event services services Invocation & Scheduling optimization Resource brokering, Split & Parallelize, Pace, Throttle Batch Bulk application container System application management development and Environment for Data access management services operations creating and Manage, monitor migrating bulk and secure bulk applications “File” Data Queue based In-memory Custom data processes access data access data access access Information storage Infrastructure services All layers above this interacts with or uses the services from the core OS Analytics & Autonomics for scheduling, check-pointing, resource management
  6. 6. Batch Applications Need Batch Middleware Batch Application - Business and custom data access logic - Library of common data access andApplication Batch Framework utilities Support Job Control - Declarative job definition (xml based) Language Batch Container - Runtime engine for batch applications Job Scheduler - Job dispatcher, operational control pointMiddleware Logging/archival - Manager for job history and output Batch PJM WLM HA - Parallelization, WLM, and availability Resource Mgmt - Rule-based CPU and file limits Security - Security for jobs and job operations
  7. 7. JSR 352 for Batch• The Expert Group membership includes: IBM, Redhat, Oracle, VMWare, Credit Suisse, Clarkson University, and an independent Swiss consultant.• Following the new JCP 2.8 process, the proceedings of the expert group are transparent, and the community is able to participate through a public mailing list.• Standards based Programming Model (PM) & batch container• Target – Java EE and Java SE• PM closer to the Record Processor model (or Spring’s Item Reader/Processor/Writer model)• IBM providing Reference Implementation and Technology Compatibility Kit• Spec complete by 4Q 2012• View JSR details here:• Subscribe to JSR 352 public mailing list here:
  8. 8. JSR 352 - ConceptsJob: Encapsulates an entire batch processStep: Encapsulates an independent, sequential phase of a batch jobItemReader: Represents retrieval of input for a step, one item at a timeItemProcessor: Represents business processing of an itemItemWriter: Represents the output of step, one batch or chunks of items at a timeJobOperator: Interface to manage all aspects of job processing 1 ItemReader 1 1 * 1 1 JobOperator Job Step ItemProcessor 1 ItemWriter 1 JobRepository
  9. 9. JSR 352 – Concepts …JobInstance: Logical job run Job StepJobExecution: Single attempt to run a job JobInstanceStepExecution: Single attempt to run a step JobExecution StepExecutionChunk: Is a type of Step which implementsthe reader-processor-writer pattern.Configurable Tx management andcheckpoint Chunk or BatchletBatchlet: Second type of Step whichspecifies a task-oriented batch step
  10. 10. JSR 352 – Chunk-oriented Processing Chunk-oriented processing – primary processing style Read one item at a time Process individual item Collect the processed items in the writer and write out in chunks
  11. 11. JSR 352 - Job Specification LanguageSpecifies a job, steps and directs their executionImplemented in XMLSupports inheritance of job, step, flow and split
  12. 12. JSR 352 – Job ParallelizationJob steps can be run in parallelParallelization Models Partitioned Step/flow run as multiple instances across multiple threads Each thread runs the same step/flow Concurrent Flows/steps defined by a split run concurrently across multiple threads One flow/step of the split per thread
  13. 13. The Batch Programming Model Functions and class libraries supplied with the Feature Pack and Compute Grid Batch Container Batch Controller Bean Job Control xJCL -- very much like Part of the Batch Container traditional JCL, except it is code supplied by IBM coded in XML. Equivalents to JOB cards, DD statements, STEPs, etc. Job Step Control Batch App Invoking and coordinating POJO processing between steps Step 1 Batch Data Streams Step 2 Development Provides data input and output services for the job steps Libraries RAD or Eclipse Step n Checkpoint Algorithms Service to programmatically determine and handle checkpointing Results and Return Codes WAS Runtime Interfaces Services to determine, manipulate and act JDBC, JCA, Security, Transaction, upon return codes, both at the Logging, Deployment, etc., etc. application and system level
  14. 14. WebSphere Batch – An overviewIt is a batch execution framework within the WebSphere Application Serverruntime platform: WebSphere Application Server AppServer WebSphere Batch adds function to an existing Web EJB Batch WebSphere Application Container Container Container Server runtime Web EJB Batch environment on all Modules Modules Modules platforms WebSphere Application Server Foundation Services Security, Transaction, Data Access, Logging ... Runtime Platform and Operating System
  15. 15. Lets look at the basic WebSphere Batch runtime WAS Server WAS Server 1 WAS Server WAS Server 1Dispatcher interfaces Batch Batch Container Command window Job Container xJCL Job dispatcher EJB/Web Services dispatcher App call Data store Batch App Batch App JMC Job # 1 Job Job # 1 Repository wsgrid Job Scheduler/Dispatcher (JS) Grid Endpoints (GEE) • The job entry point to WebSphere Batch • Executes the actual business logic of the batch • Job life-cycle management (Submit, Stop, job Cancel, etc) and monitoring • Hosts the programming model • Dispatches workload to either the PJM or xJCL GEE • XML descriptor for the job • Hosts the Job Management Console (JMC) • Allows variable substitution
  16. 16. WebSphere Batch runtime components with PJM WAS Server 1 Job WAS Server Repository Batch Container Job xJCL Scheduler SubJob Batch App Collector SPI logical transaction SubJob # 1 scope WAS ServerSub Job … Name Batch WAS Server N Container Parameterizer SPI Batch Container Logical TX Synchronization Parallel SPI Job Manager SubJob Batch App SubJob Analyzer Collector SPI SPI SubJob # N
  17. 17. Checkpoint & Restart with Batch Data Streams WebSphere Batch makes it easy for developers to encapsulate input/output data streams using POJOs that optionally support checkpoint/restart semantics.Job Start Job Restart 1 1 open() open() 2 internalizeCheckpoint() 2 positionAtInitialCheckpoint() Batch Batch 3Container positionAtCurrentCheckpoint() 3 externalizeCheckpoint() Container 4 externalizeCheckpoint() 4 5 close() close()
  18. 18. WebSphere Batch: OSGi Batch Applications •Enables use of OSGi for batch applications developmentbundle … bundle •Full batch programming model available to OSGi framework •Supports standard and blueprint .eba bundles •Enterprise Bundle Archive deployment WAS WAS Batch Job
  19. 19. JD: Multi-threading xJCL:<job name=… > <run instances=”multiple” •Option to run parallel job on jvm=“single” /> multiple threads. <step name=… > … •Parallel Job Manager local </step></job> optimization. •Alternative to running parallelRuntime: Top job job across multiple JVMs. •Optimizes shorter running subjobs. Thread Thread ThreadSubjob Subjob Subjob
  20. 20. JD: Heterogeneous StepsxJCL: •Ability to mix various step<job name=… > types in the same job1 <step name=“TxBatch” > … </step> Transactional Batch <step name=“MultiProcess”> <run instances=”multiple” jvm=“multiple” /> … Parallel </step> <step name=“CI” > … </step> Compute Intensive <step name=“ShellCmd”> … </step></job> Native Execution 1 Overcomes v6.1.1 limitations
  21. 21. OP: Memory Overload Protection incoming jobs … job •Protection against job job over-scheduling jobs to an application serverRoom for next job? running jobs •Batch Container monitors job memory demand against available JVM heap job job job job space •Prevents Java OutOfMemoryError free space •Automatic real time job memory JVM Heap estimation with declarative xJCL overridexJCL: <job name=… [memory=N] … >
  22. 22. Batch Tooling (Rational Application Developer) Integrated (F1) Help Full Online Documentation c/topics/batch/c_config_develop.html22
  23. 23. Best practices: Batch application design• Reuse existing application services, where applicable• Adopt a phased, incremental development approach – Build the infrastructure to deploy simple batch applications first• Build reusable components to form part of your custom application framework based on the WebSphere Java Batch programming model – Externalize your components to increase opportunity to reuse• Identify applications that exploit capabilities of WebSphere Batch and validate the infrastructure - specifically for migrations • Use tooling available in Rational Application Developer/Rational Software Architect/Rational Developer for z to take achieve higher developer productivity
  24. 24. Best practices: Tuning Java batch applications• Tune each unit of work first – Unit of work is invoked in a loop; any inefficiencies get multiplied by the amount of data being processed – Tune the application using standard tuning best practices• Parallelize workload – Use parallelism to gain maximize advantage – Don’t over-parallelize• Apply outside-in optimization techniques – Apply WAS JVM tuning parameters – Watch out for bottlenecks in downstream components – Scale horizontally and/or vertically• Achieve elastic 24x7 batch processing – Manage checkpoint frequencies to drive 24x7 processing – Pace and throttle jobs as needed• Set up job classes to distinguish between job characteristics
  25. 25. A sampling of Customer scenariosCustomer Scenario Platform Business ResultsLargest Re- Batch modernization - COBOL to Java zOS Operational simplicity with out-of-box connector to Enterprise Scheduler (TWS). Processinsurance conversion 20Million records a week, with database of 35 TB with 100 billion rows and 40’000 batchcompany jobs.Investment & Mainframe batch modernization/optimize zOS Optimize batch processes, run 24x7 to help with the strategy to reduce batchTrading co MIPS usage development, operating and runtime costs.Wall Street Bank Extreme batch payments transaction Distributed High Performance, Highly-Parallel Batch Jobs with WebSphere Compute Grid and processing eXtreme Scale on Distributed Platforms at about 400K transactions/hour, need to get 1 mn transaction/hrBank/Credit card Ab Initio and WCG comparison Linux Realized difference between ETL and business batch; and did some initial tests tocompany validate performance with and without data lookups.Insurance Replace home grown batch framework Linux Deployed a horizontally scalable java batch environmentCompany with WCG Developer and operational productivity with reuse of code between web and batch applications, reuse of administrative scripts from WAS environment Stability through isolating resource intensive applications to their own clusters Operational simplicity through reuse of applications by pushing the input and output descriptors into the xJCLBank Business Intelligence Reporting with zOS Comparing the WCG+Dataquant solution with Accentuate Dataquant / WCGGerman AIX/zOS Dynamically adjust IT resources to meet changing business needsinsurancecompany •Reduce load on backend the data store to manageable levels •Improve transaction throughput and response times •Improve developer productivity •Scale easily as business transaction volumes grow
  26. 26. An architecture overview for file processing Online request WebSphere Compute Grid File on Shared Store Validate, check entitlements Init (Stream Input from File) Persistence layer ETL – Augment,File arrives Validate, transform, Unit of work chunk WebSphere Summarize Results eXtreme Database Scale
  27. 27. Additional slides for reference
  28. 28. JSR 352 – Programing Model - ChunkChunkPackage: javax.batch.annotation@ItemReader @Named[("<id>")] @Open void <method-name>(Externalizable checkpoint) throws Exception @Close void <method-name>() throws Exception @ReadItem <item-type> <method-name> () throws Exception @CheckpointInfo Externalizable <method-name> () throws Exception@ItemProcessor @Named[("<id>")] @ProcessItem <output-item-type> <method-name>(<item-type> item) throws Exception@ItemWriter @Named[("<id>")] @Open void <method-name>(Externalizable checkpoint) throws Exception @Close void <method-name>() throws Exception @WriteItems void <method-name> (List<item-type> items) throws Exception @CheckpointInfo Externalizable <method-name> () throws Exception@CheckpointAlgorithm @Named[("<id>")] @CheckpointTimeout int <method-name> (int timeout) throws Exception @BeginCheckpoint void <method-name> () throws Exception @IsReadyToCheckpoint boolean <method-name> () throws Exception @EndCheckpoint void <method-name> () throws Exception
  29. 29. JSR 352 – Programming Model - BatchletPackage: javax.batch.annotationBatchlet@Batchlet @Named[("<id>")] @Process String <method-name> () throws Exception @Stop void <method-name> () throws Exception
  30. 30. JSR 352 – Programming Model - ListenersListeners@ JobListener @Named[("<id>")] Package: javax.batch.annotation.joblistener @BeforeJob void <method-name> () throws Exception @AfterJob void <method-name> () throws Exception@ StepListener @Named[("<id>")] Package: javax.batch.annotation.steplistener @BeforeStep void <method-name> () throws Exception @AfterStep void <method-name> () throws Exception@ CheckpointListener @Named[("<id>")] Package: javax.batch.annotation.checkpointlistener @BeforeCheckpoint void <method-name> () throws Exception @AfterCheckpoint void <method-name> () throws ExceptionOther listeners ItemReaderListener ItemProcessorListener ItemWriterListener SkipListener RetryListener
  31. 31. JSR 352 – Programming Model - Parallelization@PartitionMapper @Named[("<id>")] @CalculatePartitions PartitionPlan <method-name>( ) throws Exception@PartitionReducer @Named[("<id>")] package: javax.batch.annotation.partitionreducer @Begin void <method-name>() throws Exception @BeforeCompletion void <method-name>() throws Exception @Rollback void <method-name>() throws Exception @AfterCompletion void <method-name>(String status) throws Exception@PartitionCollector @Named[("<id>")] package: javax.batch.annotation.partitioncollector @CollectPartitionData Externalizable <method-name>() throws Exception@PartitionAnalyzer @Named[("<id>")] package: javax.batch.annotation.partitionanalyzer @AnalyzeCollectorData void <method-name>(Externalizable data) throws Exception @AnalyzeExitStatus void <method-name>(String exitStatus) throws Exception
  32. 32. Hindi Thai Traditional Chinese Russian Gracias Thank You SpanishDziękuję Polish English Obrigado Brazilian Portuguese Arabic Danke German Grazie Italian Simplified Chinese Merci French Japanese Tamil Korean