Java Batch for Cost Optimized Efficiency




Sridhar Sudarsan
Watson Solutions
Chief Architect, Batch processing Strategy, IBM
sudarsa@us.ibm.com




                                                  October 03, 2012
What we’ll cover today
• Why Batch ?
• Batch platform in a solution
• Java & Batch
• Java Batch JSR 352
• Java Batch offerings
• Best practices and some Customer scenarios
Achieve business efficiency through a balanced blend of Batch and
 Online processing

   A continuous interleaving of bulk and real-time processing maximizes the balance between
   operational efficiency and market responsiveness on a 24x7 global basis. It enables an cost effective,
   always-on business environment

  8 am                                  8 pm                       8 am
                                                                          Optimize cycle time for business
                                                                          processes: Get improved information
               Online                                                     availability and quality with continuous
                                                   Batch
                                                                          batch processing
                                                                          Reduce costs: Consolidate and
                                                                          standardize IT systems, services and
  8 am                                                             8 am   people skills between batch and OLTP
                                                                          Adopt elastic batch processing:
                                                                          Expand or contract batch windows
                                           Batch           Batch          dynamically based on business
              Batch
                                Batch                                     decisions and IT resource usage
                        Batch




Develop re-usable and composable bulk services through business process analysis, design analysis,
programming models, tools and runtimes for bulk-processing services.
Build and manage workloads smarter to address evolving business needs and gain a competitive edge
Common batch patterns
 •   Integrate batch applications in a Service Oriented architecture
      –   Reuse business logic between Batch and OLTP applications
      –   Leverage rules, processes and events in batch or bulk
      –   Compose business services using batch and real-time activities

 •   Run managed java batch applications on the mainframe
      –   Modernize legacy batch applications
      –   Better integration of infrastructure and operations

 •   Use Java batch applications to handle shrinking batch windows
      –   More data to process in shorter windows
      –   Business process changes to handle 24x7 batch processing

 •   Leverage batch towards achieving agility
      –   Faster turnaround to implement new(er) or modified business processes
      –   Be better equipped to manage regulatory compliance quickly



Identify the right strategy for you to help reduce the cost of running batch efficiently
Batch processing reference model
                                           Invocation services
                       Ad hoc                                                               Planned


                    Batch Partner                 Business process                         Scheduler
                      services                    and event services                        services

                                   Invocation & Scheduling optimization
                                 Resource brokering, Split & Parallelize, Pace, Throttle


    Batch                                  Bulk application container                                    System
  application                                                                                          management
 development                                                                                               and
 Environment for                    Data access management services                                     operations
  creating and                                                                                         Manage, monitor
  migrating bulk                                                                                       and secure bulk
   applications    “File” Data         Queue based              In-memory              Custom data
                                                                                                         processes
                     access            data access              data access              access



                                              Information storage



                                             Infrastructure services
                       All layers above this interacts with or uses the services from the core OS


                                             Analytics & Autonomics
                                  for scheduling, check-pointing, resource management
Batch Applications Need Batch Middleware
              Batch Application     - Business and custom data access logic
                                    - Library of common data access and
Application



              Batch Framework         utilities
 Support




                Job Control         - Declarative job definition (xml based)
                 Language

              Batch Container       - Runtime engine for batch applications

               Job Scheduler        - Job dispatcher, operational control point
Middleware




              Logging/archival      - Manager for job history and output
  Batch




              PJM     WLM      HA   - Parallelization, WLM, and availability
              Resource Mgmt         - Rule-based CPU and file limits
                    Security        - Security for jobs and job operations
JSR 352 for Batch
•    The Expert Group membership includes: IBM, Redhat, Oracle, VMWare, Credit Suisse,
     Clarkson University, and an independent Swiss consultant.
•    Following the new JCP 2.8 process, the proceedings of the expert group are
     transparent, and the community is able to participate through a public mailing list.
•    Standards based Programming Model (PM) & batch container
•    Target – Java EE and Java SE
•    PM closer to the Record Processor model (or Spring’s Item Reader/Processor/Writer
     model)
•    IBM providing Reference Implementation and Technology Compatibility Kit

•    Spec complete by 4Q 2012

•    View JSR details here: http://jcp.org/en/jsr/detail?id=352

•    Subscribe to JSR 352 public mailing list here: http://java.net/projects/jbatch
JSR 352 - Concepts
Job: Encapsulates an entire batch process


Step: Encapsulates an independent, sequential phase of a batch job


ItemReader: Represents retrieval of input for a step, one item at a time


ItemProcessor: Represents business processing of an item


ItemWriter: Represents the output of step, one batch or chunks of items at a time



JobOperator: Interface to manage all aspects of job processing                                 1
                                                                                                    ItemReader

                                                                                       1
                                                                      1    *               1   1
                          JobOperator                    Job                    Step               ItemProcessor
                                                                                       1

                                                                                                    ItemWriter
                                                                                               1
                                                 JobRepository
JSR 352 – Concepts …

JobInstance: Logical job run                     Job               Step


JobExecution: Single attempt to run a job
                                              JobInstance
StepExecution: Single attempt to run a step


                                              JobExecution        StepExecution



Chunk: Is a type of Step which implements
the reader-processor-writer pattern.
Configurable Tx management and
checkpoint                                        Chunk      or           Batchlet

Batchlet: Second type of Step which
specifies a task-oriented batch step
JSR 352 – Chunk-oriented Processing

 Chunk-oriented processing – primary processing
 style
   Read one item at a time
   Process individual item
   Collect the processed items in the writer and write out in
   chunks
JSR 352 - Job Specification Language

Specifies a job, steps and directs their execution
Implemented in XML
Supports inheritance of job, step, flow and split
JSR 352 – Job Parallelization

Job steps can be run in parallel
Parallelization Models
  Partitioned
     Step/flow run as multiple instances across multiple
      threads
     Each thread runs the same step/flow
  Concurrent
     Flows/steps defined by a split run concurrently across
       multiple threads
     One flow/step of the split per thread
The Batch Programming Model
 Functions and class libraries supplied with the Feature Pack and Compute Grid
                                   Batch Container
     Batch Controller Bean                                                           Job Control
                                                                                  xJCL -- very much like
        Part of the Batch Container
                                                                               traditional JCL, except it is
           code supplied by IBM
                                                                               coded in XML. Equivalents
                                                                                    to JOB cards, DD
                                                                                 statements, STEPs, etc.
         Job Step Control                              Batch App
         Invoking and coordinating                     POJO
         processing between steps
                                                     Step 1


       Batch Data Streams                            Step 2
                                                                                              Development
       Provides data input and output
          services for the job steps                                                            Libraries
                                                                                                 RAD or Eclipse
                                                     Step n
     Checkpoint Algorithms
   Service to programmatically determine
         and handle checkpointing


   Results and Return Codes                     WAS Runtime Interfaces
  Services to determine, manipulate and act
                                                 JDBC, JCA, Security, Transaction,
       upon return codes, both at the
                                                  Logging, Deployment, etc., etc.
        application and system level
WebSphere Batch – An overview
It is a batch execution framework within the WebSphere Application Server
runtime platform:

     WebSphere Application Server AppServer
                                                                      WebSphere Batch adds
                                                                      function to an existing
      Web                   EJB                     Batch             WebSphere Application
    Container             Container                Container          Server runtime
        Web                     EJB                         Batch     environment on all
        Modules                 Modules                     Modules
                                                                      platforms

  WebSphere Application Server Foundation Services
          Security, Transaction, Data Access, Logging ...




    Runtime Platform and Operating System
Lets look at the basic WebSphere Batch runtime

                                  WAS Server             WAS Server 1
                                   WAS Server             WAS Server 1
Dispatcher interfaces
                                                              Batch
                                                               Batch
                                                            Container
  Command window
                                          Job                Container
                        xJCL               Job
                                       dispatcher
 EJB/Web Services                       dispatcher                                      App
       call
                                                                                      Data store
                                                             Batch App
                                                              Batch App
       JMC                                                     Job # 1
                                          Job                   Job # 1
                                        Repository
      wsgrid




       Job Scheduler/Dispatcher (JS)                 Grid Endpoints (GEE)
       • The job entry point to WebSphere Batch      • Executes the actual business logic of the batch
       • Job life-cycle management (Submit, Stop,      job
         Cancel, etc) and monitoring                 • Hosts the programming model
       • Dispatches workload to either the PJM or    xJCL
         GEE
                                                     • XML descriptor for the job
       • Hosts the Job Management Console (JMC)
                                                     • Allows variable substitution
WebSphere Batch runtime components with PJM
                                               WAS Server 1
   Job        WAS Server
 Repository
                                                 Batch
                                                Container
                   Job
  xJCL          Scheduler
                                                               SubJob
                                                 Batch App     Collector
                                                                 SPI
                                     logical
                                transaction       SubJob # 1
                                      scope
              WAS Server
Sub Job                                              …
 Name
                 Batch                         WAS Server N
                Container     Parameterizer
                                  SPI            Batch
                                                Container
                               Logical TX
                             Synchronization
                 Parallel          SPI
               Job Manager      SubJob           Batch App     SubJob
                                Analyzer                       Collector
                                  SPI                            SPI
                                                 SubJob # N
Checkpoint & Restart with Batch Data Streams


  WebSphere Batch makes it easy for developers to encapsulate input/output data
   streams using POJOs that optionally support checkpoint/restart semantics.



Job Start                                         Job Restart
              1                                                 1
                  open()                                            open()


                                                                2   internalizeCheckpoint()
              2
                  positionAtInitialCheckpoint()

 Batch                                             Batch        3
Container                                                           positionAtCurrentCheckpoint()
              3
                  externalizeCheckpoint()         Container
                                                                4
                                                                    externalizeCheckpoint()

              4                                                 5
                  close()
                                                                    close()
WebSphere Batch: OSGi Batch Applications
                         •Enables use of OSGi for batch
                         applications development
bundle    …     bundle
                         •Full batch programming model
                         available to OSGi framework
                         •Supports standard and blueprint
         .eba            bundles
                         •Enterprise Bundle Archive
                         deployment
         WAS


    WAS Batch
      Job
JD: Multi-threading
 xJCL:
<job name=… >
     <run instances=”multiple”            •Option to run parallel job on
          jvm=“single” />                 multiple threads.
     <step name=… >
     …                                    •Parallel Job Manager local
     </step>
</job>                                    optimization.
                                          •Alternative to running parallel
Runtime:      Top job                     job across multiple JVMs.
                                          •Optimizes shorter running
                                          subjobs.
 Thread         Thread           Thread



Subjob         Subjob            Subjob
JD: Heterogeneous Steps
xJCL:                                                 •Ability to mix various step
<job name=… >                                         types in the same job1
     <step name=“TxBatch” >
     …
     </step>                                                    Transactional Batch
     <step name=“MultiProcess”>
          <run instances=”multiple”
           jvm=“multiple” />
     …                                                               Parallel
     </step>
     <step name=“CI” >
     …
     </step>                                                    Compute Intensive
     <step name=“ShellCmd”>
     …
     </step>
</job>                                                            Native Execution
                           1 Overcomes   v6.1.1 limitations
OP: Memory Overload Protection
                     incoming jobs

                            …        job
                                           •Protection against
              job    job
                                           over-scheduling jobs to an application
                                           server
Room for next job?

                     running jobs
                                           •Batch Container monitors job memory
                                           demand against available JVM heap
              job    job    job      job   space

                                           •Prevents Java OutOfMemoryError
 free space

                                           •Automatic real time job memory
                JVM Heap                   estimation with declarative xJCL
                                           override

xJCL: <job name=… [memory=N] … >
Batch Tooling (Rational Application Developer)
                                                         Integrated (F1) Help




     Full Online Documentation
     http://publib.boulder.ibm.com/infocenter/radhelp/v8/topic/com.ibm.servertools.do
     c/topics/batch/c_config_develop.html

22
Best practices: Batch application design
• Reuse existing application services, where applicable
• Adopt a phased, incremental development approach
    – Build the infrastructure to deploy simple batch applications first

• Build reusable components to form part of your custom application
  framework based on the WebSphere Java Batch programming model
    – Externalize your components to increase opportunity to reuse
• Identify applications that exploit capabilities of WebSphere Batch and
  validate the infrastructure - specifically for migrations
    •   Use tooling available in Rational Application Developer/Rational Software
        Architect/Rational Developer for z to take achieve higher developer productivity
Best practices: Tuning Java batch applications
• Tune each unit of work first
   – Unit of work is invoked in a loop; any inefficiencies get multiplied by the amount of data
     being processed
   – Tune the application using standard tuning best practices
• Parallelize workload
   – Use parallelism to gain maximize advantage
   – Don’t over-parallelize
• Apply outside-in optimization techniques
   – Apply WAS JVM tuning parameters
   – Watch out for bottlenecks in downstream components
   – Scale horizontally and/or vertically
• Achieve elastic 24x7 batch processing
   – Manage checkpoint frequencies to drive 24x7 processing
   – Pace and throttle jobs as needed
• Set up job classes to distinguish between job characteristics
A sampling of Customer scenarios
Customer           Scenario                                 Platform      Business Results
Largest Re-        Batch modernization - COBOL to Java      zOS           Operational simplicity with out-of-box connector to Enterprise Scheduler (TWS). Process
insurance          conversion                                             20Million records a week, with database of 35 TB with 100 billion rows and 40’000 batch
company                                                                   jobs.
Investment &       Mainframe batch modernization/optimize   zOS           Optimize batch processes, run 24x7 to help with the strategy to reduce batch
Trading co         MIPS usage                                             development, operating and runtime costs.

Wall Street Bank   Extreme batch payments transaction       Distributed   High Performance, Highly-Parallel Batch Jobs with WebSphere Compute Grid and
                   processing                                             eXtreme Scale on Distributed Platforms at about 400K transactions/hour, need to get 1
                                                                          mn transaction/hr
Bank/Credit card   Ab Initio and WCG comparison             Linux         Realized difference between ETL and business batch; and did some initial tests to
company                                                                   validate performance with and without data lookups.

Insurance          Replace home grown batch framework       Linux         Deployed a horizontally scalable java batch environment
Company            with WCG
                                                                          Developer and operational productivity with reuse of code between web and batch
                                                                          applications, reuse of administrative scripts from WAS environment

                                                                          Stability through isolating resource intensive applications to their own clusters

                                                                           Operational simplicity through reuse of applications by pushing the input and output
                                                                          descriptors into the xJCL

Bank               Business Intelligence Reporting with     zOS           Comparing the WCG+Dataquant solution with Accentuate
                   Dataquant / WCG

German                                                      AIX/zOS       Dynamically adjust IT resources to meet changing business needs
insurance
company                                                                   •Reduce load on backend the data store to manageable levels

                                                                          •Improve transaction throughput and response times

                                                                          •Improve developer productivity

                                                                          •Scale easily as business transaction volumes grow
An architecture overview for file processing

                                                Online
                                               request


                                  WebSphere Compute Grid
                     File on
                   Shared Store
                                                                      Validate, check entitlements
                                      Init (Stream Input from File)        Persistence layer
                  ETL –
               Augment,
File arrives    Validate,
               transform,                     Unit of work
                  chunk

                                                                             WebSphere
                                          Summarize Results                   eXtreme
                                                                                                     Database
                                                                               Scale
Additional slides for reference
JSR 352 – Programing Model - Chunk
Chunk

Package: javax.batch.annotation

@ItemReader @Named[("<id>")]

        @Open void <method-name>(Externalizable checkpoint) throws Exception

        @Close void <method-name>() throws Exception

        @ReadItem <item-type> <method-name> () throws Exception

        @CheckpointInfo Externalizable <method-name> () throws Exception

@ItemProcessor @Named[("<id>")]

        @ProcessItem <output-item-type> <method-name>(<item-type> item) throws Exception

@ItemWriter @Named[("<id>")]

        @Open void <method-name>(Externalizable checkpoint) throws Exception

        @Close void <method-name>() throws Exception

        @WriteItems void <method-name> (List<item-type> items) throws Exception

        @CheckpointInfo Externalizable <method-name> () throws Exception

@CheckpointAlgorithm @Named[("<id>")]

        @CheckpointTimeout int <method-name> (int timeout) throws Exception

        @BeginCheckpoint void <method-name> () throws Exception

        @IsReadyToCheckpoint boolean <method-name> () throws Exception

        @EndCheckpoint void <method-name> () throws Exception
JSR 352 – Programming Model - Batchlet

Package: javax.batch.annotation


Batchlet


@Batchlet @Named[("<id>")]
   @Process String <method-name> () throws Exception
   @Stop void <method-name> () throws Exception
JSR 352 – Programming Model - Listeners
Listeners

@ JobListener @Named[("<id>")]
       Package: javax.batch.annotation.joblistener

        @BeforeJob void <method-name> () throws Exception
        @AfterJob void <method-name> () throws Exception

@ StepListener @Named[("<id>")]
       Package: javax.batch.annotation.steplistener

        @BeforeStep void <method-name> () throws Exception
            @AfterStep void <method-name> () throws Exception

@ CheckpointListener @Named[("<id>")]
       Package: javax.batch.annotation.checkpointlistener

        @BeforeCheckpoint void <method-name> () throws Exception
        @AfterCheckpoint void <method-name> () throws Exception

Other listeners
        ItemReaderListener
        ItemProcessorListener
        ItemWriterListener
        SkipListener
        RetryListener
JSR 352 – Programming Model - Parallelization
@PartitionMapper @Named[("<id>")]

      @CalculatePartitions PartitionPlan <method-name>( ) throws Exception


@PartitionReducer @Named[("<id>")]

      package: javax.batch.annotation.partitionreducer
      @Begin void <method-name>() throws Exception
      @BeforeCompletion void <method-name>() throws Exception
      @Rollback void <method-name>() throws Exception
      @AfterCompletion void <method-name>(String status) throws Exception


@PartitionCollector @Named[("<id>")]

      package: javax.batch.annotation.partitioncollector
      @CollectPartitionData Externalizable <method-name>() throws Exception


@PartitionAnalyzer @Named[("<id>")]

      package: javax.batch.annotation.partitionanalyzer
      @AnalyzeCollectorData void <method-name>(Externalizable data) throws Exception
      @AnalyzeExitStatus void <method-name>(String exitStatus) throws Exception
Hindi
                                                                                             Thai


                                                 Traditional Chinese




                            Russian
                                                                                                   Gracias
                                               Thank You
                                                                                                             Spanish




Dziękuję
      Polish
                                                             English
                                                                          Obrigado
                                                                            Brazilian Portuguese

                                      Arabic


                                                                                                             Danke
                                                                                                                  German

                       Grazie
                          Italian
                                                     Simplified Chinese            Merci
                                                                                           French




                                                        Japanese
    Tamil

                                                                                                    Korean

Java Batch for Cost Optimized Efficiency

  • 1.
    Java Batch forCost Optimized Efficiency Sridhar Sudarsan Watson Solutions Chief Architect, Batch processing Strategy, IBM sudarsa@us.ibm.com October 03, 2012
  • 2.
    What we’ll covertoday • Why Batch ? • Batch platform in a solution • Java & Batch • Java Batch JSR 352 • Java Batch offerings • Best practices and some Customer scenarios
  • 3.
    Achieve business efficiencythrough a balanced blend of Batch and Online processing A continuous interleaving of bulk and real-time processing maximizes the balance between operational efficiency and market responsiveness on a 24x7 global basis. It enables an cost effective, always-on business environment 8 am 8 pm 8 am Optimize cycle time for business processes: Get improved information Online availability and quality with continuous Batch batch processing Reduce costs: Consolidate and standardize IT systems, services and 8 am 8 am people skills between batch and OLTP Adopt elastic batch processing: Expand or contract batch windows Batch Batch dynamically based on business Batch Batch decisions and IT resource usage Batch Develop re-usable and composable bulk services through business process analysis, design analysis, programming models, tools and runtimes for bulk-processing services. Build and manage workloads smarter to address evolving business needs and gain a competitive edge
  • 4.
    Common batch patterns • Integrate batch applications in a Service Oriented architecture – Reuse business logic between Batch and OLTP applications – Leverage rules, processes and events in batch or bulk – Compose business services using batch and real-time activities • Run managed java batch applications on the mainframe – Modernize legacy batch applications – Better integration of infrastructure and operations • Use Java batch applications to handle shrinking batch windows – More data to process in shorter windows – Business process changes to handle 24x7 batch processing • Leverage batch towards achieving agility – Faster turnaround to implement new(er) or modified business processes – Be better equipped to manage regulatory compliance quickly Identify the right strategy for you to help reduce the cost of running batch efficiently
  • 5.
    Batch processing referencemodel Invocation services Ad hoc Planned Batch Partner Business process Scheduler services and event services services Invocation & Scheduling optimization Resource brokering, Split & Parallelize, Pace, Throttle Batch Bulk application container System application management development and Environment for Data access management services operations creating and Manage, monitor migrating bulk and secure bulk applications “File” Data Queue based In-memory Custom data processes access data access data access access Information storage Infrastructure services All layers above this interacts with or uses the services from the core OS Analytics & Autonomics for scheduling, check-pointing, resource management
  • 6.
    Batch Applications NeedBatch Middleware Batch Application - Business and custom data access logic - Library of common data access and Application Batch Framework utilities Support Job Control - Declarative job definition (xml based) Language Batch Container - Runtime engine for batch applications Job Scheduler - Job dispatcher, operational control point Middleware Logging/archival - Manager for job history and output Batch PJM WLM HA - Parallelization, WLM, and availability Resource Mgmt - Rule-based CPU and file limits Security - Security for jobs and job operations
  • 7.
    JSR 352 forBatch • The Expert Group membership includes: IBM, Redhat, Oracle, VMWare, Credit Suisse, Clarkson University, and an independent Swiss consultant. • Following the new JCP 2.8 process, the proceedings of the expert group are transparent, and the community is able to participate through a public mailing list. • Standards based Programming Model (PM) & batch container • Target – Java EE and Java SE • PM closer to the Record Processor model (or Spring’s Item Reader/Processor/Writer model) • IBM providing Reference Implementation and Technology Compatibility Kit • Spec complete by 4Q 2012 • View JSR details here: http://jcp.org/en/jsr/detail?id=352 • Subscribe to JSR 352 public mailing list here: http://java.net/projects/jbatch
  • 8.
    JSR 352 -Concepts Job: Encapsulates an entire batch process Step: Encapsulates an independent, sequential phase of a batch job ItemReader: Represents retrieval of input for a step, one item at a time ItemProcessor: Represents business processing of an item ItemWriter: Represents the output of step, one batch or chunks of items at a time JobOperator: Interface to manage all aspects of job processing 1 ItemReader 1 1 * 1 1 JobOperator Job Step ItemProcessor 1 ItemWriter 1 JobRepository
  • 9.
    JSR 352 –Concepts … JobInstance: Logical job run Job Step JobExecution: Single attempt to run a job JobInstance StepExecution: Single attempt to run a step JobExecution StepExecution Chunk: Is a type of Step which implements the reader-processor-writer pattern. Configurable Tx management and checkpoint Chunk or Batchlet Batchlet: Second type of Step which specifies a task-oriented batch step
  • 10.
    JSR 352 –Chunk-oriented Processing Chunk-oriented processing – primary processing style Read one item at a time Process individual item Collect the processed items in the writer and write out in chunks
  • 11.
    JSR 352 -Job Specification Language Specifies a job, steps and directs their execution Implemented in XML Supports inheritance of job, step, flow and split
  • 12.
    JSR 352 –Job Parallelization Job steps can be run in parallel Parallelization Models Partitioned Step/flow run as multiple instances across multiple threads Each thread runs the same step/flow Concurrent Flows/steps defined by a split run concurrently across multiple threads One flow/step of the split per thread
  • 13.
    The Batch ProgrammingModel Functions and class libraries supplied with the Feature Pack and Compute Grid Batch Container Batch Controller Bean Job Control xJCL -- very much like Part of the Batch Container traditional JCL, except it is code supplied by IBM coded in XML. Equivalents to JOB cards, DD statements, STEPs, etc. Job Step Control Batch App Invoking and coordinating POJO processing between steps Step 1 Batch Data Streams Step 2 Development Provides data input and output services for the job steps Libraries RAD or Eclipse Step n Checkpoint Algorithms Service to programmatically determine and handle checkpointing Results and Return Codes WAS Runtime Interfaces Services to determine, manipulate and act JDBC, JCA, Security, Transaction, upon return codes, both at the Logging, Deployment, etc., etc. application and system level
  • 14.
    WebSphere Batch –An overview It is a batch execution framework within the WebSphere Application Server runtime platform: WebSphere Application Server AppServer WebSphere Batch adds function to an existing Web EJB Batch WebSphere Application Container Container Container Server runtime Web EJB Batch environment on all Modules Modules Modules platforms WebSphere Application Server Foundation Services Security, Transaction, Data Access, Logging ... Runtime Platform and Operating System
  • 15.
    Lets look atthe basic WebSphere Batch runtime WAS Server WAS Server 1 WAS Server WAS Server 1 Dispatcher interfaces Batch Batch Container Command window Job Container xJCL Job dispatcher EJB/Web Services dispatcher App call Data store Batch App Batch App JMC Job # 1 Job Job # 1 Repository wsgrid Job Scheduler/Dispatcher (JS) Grid Endpoints (GEE) • The job entry point to WebSphere Batch • Executes the actual business logic of the batch • Job life-cycle management (Submit, Stop, job Cancel, etc) and monitoring • Hosts the programming model • Dispatches workload to either the PJM or xJCL GEE • XML descriptor for the job • Hosts the Job Management Console (JMC) • Allows variable substitution
  • 16.
    WebSphere Batch runtimecomponents with PJM WAS Server 1 Job WAS Server Repository Batch Container Job xJCL Scheduler SubJob Batch App Collector SPI logical transaction SubJob # 1 scope WAS Server Sub Job … Name Batch WAS Server N Container Parameterizer SPI Batch Container Logical TX Synchronization Parallel SPI Job Manager SubJob Batch App SubJob Analyzer Collector SPI SPI SubJob # N
  • 17.
    Checkpoint & Restartwith Batch Data Streams WebSphere Batch makes it easy for developers to encapsulate input/output data streams using POJOs that optionally support checkpoint/restart semantics. Job Start Job Restart 1 1 open() open() 2 internalizeCheckpoint() 2 positionAtInitialCheckpoint() Batch Batch 3 Container positionAtCurrentCheckpoint() 3 externalizeCheckpoint() Container 4 externalizeCheckpoint() 4 5 close() close()
  • 18.
    WebSphere Batch: OSGiBatch Applications •Enables use of OSGi for batch applications development bundle … bundle •Full batch programming model available to OSGi framework •Supports standard and blueprint .eba bundles •Enterprise Bundle Archive deployment WAS WAS Batch Job
  • 19.
    JD: Multi-threading xJCL: <jobname=… > <run instances=”multiple” •Option to run parallel job on jvm=“single” /> multiple threads. <step name=… > … •Parallel Job Manager local </step> </job> optimization. •Alternative to running parallel Runtime: Top job job across multiple JVMs. •Optimizes shorter running subjobs. Thread Thread Thread Subjob Subjob Subjob
  • 20.
    JD: Heterogeneous Steps xJCL: •Ability to mix various step <job name=… > types in the same job1 <step name=“TxBatch” > … </step> Transactional Batch <step name=“MultiProcess”> <run instances=”multiple” jvm=“multiple” /> … Parallel </step> <step name=“CI” > … </step> Compute Intensive <step name=“ShellCmd”> … </step> </job> Native Execution 1 Overcomes v6.1.1 limitations
  • 21.
    OP: Memory OverloadProtection incoming jobs … job •Protection against job job over-scheduling jobs to an application server Room for next job? running jobs •Batch Container monitors job memory demand against available JVM heap job job job job space •Prevents Java OutOfMemoryError free space •Automatic real time job memory JVM Heap estimation with declarative xJCL override xJCL: <job name=… [memory=N] … >
  • 22.
    Batch Tooling (RationalApplication Developer) Integrated (F1) Help Full Online Documentation http://publib.boulder.ibm.com/infocenter/radhelp/v8/topic/com.ibm.servertools.do c/topics/batch/c_config_develop.html 22
  • 23.
    Best practices: Batchapplication design • Reuse existing application services, where applicable • Adopt a phased, incremental development approach – Build the infrastructure to deploy simple batch applications first • Build reusable components to form part of your custom application framework based on the WebSphere Java Batch programming model – Externalize your components to increase opportunity to reuse • Identify applications that exploit capabilities of WebSphere Batch and validate the infrastructure - specifically for migrations • Use tooling available in Rational Application Developer/Rational Software Architect/Rational Developer for z to take achieve higher developer productivity
  • 24.
    Best practices: TuningJava batch applications • Tune each unit of work first – Unit of work is invoked in a loop; any inefficiencies get multiplied by the amount of data being processed – Tune the application using standard tuning best practices • Parallelize workload – Use parallelism to gain maximize advantage – Don’t over-parallelize • Apply outside-in optimization techniques – Apply WAS JVM tuning parameters – Watch out for bottlenecks in downstream components – Scale horizontally and/or vertically • Achieve elastic 24x7 batch processing – Manage checkpoint frequencies to drive 24x7 processing – Pace and throttle jobs as needed • Set up job classes to distinguish between job characteristics
  • 25.
    A sampling ofCustomer scenarios Customer Scenario Platform Business Results Largest Re- Batch modernization - COBOL to Java zOS Operational simplicity with out-of-box connector to Enterprise Scheduler (TWS). Process insurance conversion 20Million records a week, with database of 35 TB with 100 billion rows and 40’000 batch company jobs. Investment & Mainframe batch modernization/optimize zOS Optimize batch processes, run 24x7 to help with the strategy to reduce batch Trading co MIPS usage development, operating and runtime costs. Wall Street Bank Extreme batch payments transaction Distributed High Performance, Highly-Parallel Batch Jobs with WebSphere Compute Grid and processing eXtreme Scale on Distributed Platforms at about 400K transactions/hour, need to get 1 mn transaction/hr Bank/Credit card Ab Initio and WCG comparison Linux Realized difference between ETL and business batch; and did some initial tests to company validate performance with and without data lookups. Insurance Replace home grown batch framework Linux Deployed a horizontally scalable java batch environment Company with WCG Developer and operational productivity with reuse of code between web and batch applications, reuse of administrative scripts from WAS environment Stability through isolating resource intensive applications to their own clusters Operational simplicity through reuse of applications by pushing the input and output descriptors into the xJCL Bank Business Intelligence Reporting with zOS Comparing the WCG+Dataquant solution with Accentuate Dataquant / WCG German AIX/zOS Dynamically adjust IT resources to meet changing business needs insurance company •Reduce load on backend the data store to manageable levels •Improve transaction throughput and response times •Improve developer productivity •Scale easily as business transaction volumes grow
  • 26.
    An architecture overviewfor file processing Online request WebSphere Compute Grid File on Shared Store Validate, check entitlements Init (Stream Input from File) Persistence layer ETL – Augment, File arrives Validate, transform, Unit of work chunk WebSphere Summarize Results eXtreme Database Scale
  • 27.
  • 28.
    JSR 352 –Programing Model - Chunk Chunk Package: javax.batch.annotation @ItemReader @Named[("<id>")] @Open void <method-name>(Externalizable checkpoint) throws Exception @Close void <method-name>() throws Exception @ReadItem <item-type> <method-name> () throws Exception @CheckpointInfo Externalizable <method-name> () throws Exception @ItemProcessor @Named[("<id>")] @ProcessItem <output-item-type> <method-name>(<item-type> item) throws Exception @ItemWriter @Named[("<id>")] @Open void <method-name>(Externalizable checkpoint) throws Exception @Close void <method-name>() throws Exception @WriteItems void <method-name> (List<item-type> items) throws Exception @CheckpointInfo Externalizable <method-name> () throws Exception @CheckpointAlgorithm @Named[("<id>")] @CheckpointTimeout int <method-name> (int timeout) throws Exception @BeginCheckpoint void <method-name> () throws Exception @IsReadyToCheckpoint boolean <method-name> () throws Exception @EndCheckpoint void <method-name> () throws Exception
  • 29.
    JSR 352 –Programming Model - Batchlet Package: javax.batch.annotation Batchlet @Batchlet @Named[("<id>")] @Process String <method-name> () throws Exception @Stop void <method-name> () throws Exception
  • 30.
    JSR 352 –Programming Model - Listeners Listeners @ JobListener @Named[("<id>")] Package: javax.batch.annotation.joblistener @BeforeJob void <method-name> () throws Exception @AfterJob void <method-name> () throws Exception @ StepListener @Named[("<id>")] Package: javax.batch.annotation.steplistener @BeforeStep void <method-name> () throws Exception @AfterStep void <method-name> () throws Exception @ CheckpointListener @Named[("<id>")] Package: javax.batch.annotation.checkpointlistener @BeforeCheckpoint void <method-name> () throws Exception @AfterCheckpoint void <method-name> () throws Exception Other listeners ItemReaderListener ItemProcessorListener ItemWriterListener SkipListener RetryListener
  • 31.
    JSR 352 –Programming Model - Parallelization @PartitionMapper @Named[("<id>")] @CalculatePartitions PartitionPlan <method-name>( ) throws Exception @PartitionReducer @Named[("<id>")] package: javax.batch.annotation.partitionreducer @Begin void <method-name>() throws Exception @BeforeCompletion void <method-name>() throws Exception @Rollback void <method-name>() throws Exception @AfterCompletion void <method-name>(String status) throws Exception @PartitionCollector @Named[("<id>")] package: javax.batch.annotation.partitioncollector @CollectPartitionData Externalizable <method-name>() throws Exception @PartitionAnalyzer @Named[("<id>")] package: javax.batch.annotation.partitionanalyzer @AnalyzeCollectorData void <method-name>(Externalizable data) throws Exception @AnalyzeExitStatus void <method-name>(String exitStatus) throws Exception
  • 32.
    Hindi Thai Traditional Chinese Russian Gracias Thank You Spanish Dziękuję Polish English Obrigado Brazilian Portuguese Arabic Danke German Grazie Italian Simplified Chinese Merci French Japanese Tamil Korean