Batch processing with WebSphere


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The new Parallel Job Manage allows you to partition a job into multiple sub jobs and execute them in parallel. There are various SPIs that allow business logic to be applied to the partitioning process. Click A parallel Job is submitted to the Job Scheduler as a transactional batch job that targets the Parallel Job Manager as the target application. It is then dispatched to an instance of the Parallel Job Manager in a server or cluster member. Click The PJM will call the parameterize method of the ParameterizerSPI. The parameterize method will determine how many sub-jobs will be dispatched and will create an array of Properties objects with one for each of the indicated subjobs. These properties are intended to describe the partitioning boundaries for each of the subjobs. The properties and the way they are modified for each subjob is determined by the design of the target applicatoin. The subjob count and the Properties array are returned in a Parameters object. Click With the appropriate subjob count and properties provided by the Parameterizer and the subjob name property provided by the xJLCL for the top level job, the PJM can submit N jobs to the job scheduler with the indicated properties. The xJCL for the job is accessed in the job schedulers job repository using the provided subjob name property. Click The Job Scheduler will disptach the various subjobs to servers or cluster memebers which are hostingin the batch app indicated in the subjobs xJCL. Depending on the size of the clsuter and the number of subjobs Click The Logical TX Synchronization SPI to receives callbacks that demarcate the lifecycle of a parallel job logical transactions A logical transaction provides a logical unit of work scope that spans all subjobs belonging to a parallel job. The ParallelJobManager starts a logical transaction before the subjobs are submitted and completes the logical transaction after all subjobs have completed. Click The SubJob collector/analyzer SPIs provide for the collection of application specific information by the collectors in the sub jobs servers and the marshalling and subsequent aggregation of that information by the analyzer residing with the PJM’s server. The SubJobAnalyzer also gathers the return codes of the subjobs and determines the final return code of the entire logical parallel job. .
  • Scheduling a job is similar to submitting a job by its xJCL or a job from the repository for delayed submission. You can schedule a job to be submitted once at a given date and time from the “Submit a job” panel or you can schedule the job to run periodically. From the “View schedules” panel, you can view the details of the job schedule or cancel the schedule.
  • Batch processing with WebSphere

    1. 1. Batch processing with WebSphere Sridhar Sudarsan, Chief Architect, Batch processing strategy [email_address] 4 th March 2011
    2. 2. Outline and objectives <ul><li>WebSphere batch solutions Overview </li></ul><ul><ul><li>Architecture </li></ul></ul><ul><ul><li>Components </li></ul></ul><ul><ul><li>Topology </li></ul></ul><ul><li>WebSphere Batch offerings </li></ul><ul><ul><li>WebSphere Feature Pack for modern batch </li></ul></ul><ul><ul><li>WebSphere Compute Grid </li></ul></ul><ul><li>Summary </li></ul>
    3. 3. WebSphere Extended Deployment (XD) is now 3 separate products Software to virtualize, control, and turbo-charge your application infrastructure Infrastructure Optimization Intelligent Workload Management Virtualization (VIRTUAL ENTERPRISE) Data Fabrics & Caching (EXTREME SCALE) Innovative Application Patterns, like Java batch, beyond OLTP (COMPUTE GRID) Automatic Sense & Respond Management (VIRTUAL ENTERPRISE)
    4. 4. What is Compute Grid (CG) ? -- J(2)EE View <ul><li>Set of binaries that get deployed to WebSphere Application Server Network Deployment (WAS ND) nodes within a cell </li></ul><ul><li>Those nodes are then CG-enabled, and become the potential “Grid Execution Environment (GEE)” (or “Long Running Execution Environment (LREE)” </li></ul><ul><li>Java developers use the CG framework to code the batch application and deploy it as a typical .ear file </li></ul><ul><li>ND admins manage the .ear like any other WAS ND application (same console, same skills) </li></ul><ul><li>Some additional components, and a Job Management Console, of which more later. </li></ul>
    5. 5. Batch Applications Need Batch Middleware Batch Application Batch Container Job Control Language Job Scheduler PJM WLM Security Logging/archival Resource Mgmt Batch Framework Application Support Batch Middleware - Parallelization, WLM, and availability - Library of common data access and utilities - Declarative job definition (xml based) - Runtime engine for batch applications - Job dispatcher, operational control point - Manager for job history and output HA - Business and custom data access logic - Rule-based CPU and file limits - Security for jobs and job operations
    6. 7. Lets look at the basic WebSphere Batch runtime Batch App xJCL Batch Container Job Repository Job # 1 App Data store <ul><li>Job Scheduler/Dispatcher (JS) </li></ul><ul><ul><li>The job entry point to Compute Grid </li></ul></ul><ul><ul><li>Job life-cycle management (Submit, Stop, Cancel, etc) and monitoring </li></ul></ul><ul><ul><li>Dispatches workload to either the PJM or GEE </li></ul></ul><ul><ul><li>Hosts the Job Management Console (JMC) </li></ul></ul>Batch App Batch Container Job # 1 <ul><li>Grid Endpoints (GEE) </li></ul><ul><ul><li>Executes the actual business logic of the batch job </li></ul></ul><ul><ul><li>Hosts the programming model </li></ul></ul><ul><li>xJCL </li></ul><ul><ul><li>XML descriptor for the job </li></ul></ul><ul><ul><li>Allows variable substitution </li></ul></ul>Dispatcher interfaces Command window EJB call JMC WAS Server 1 WAS Server Job dispatcher WAS Server Job dispatcher WAS Server 1
    7. 8. WebSphere Compute Grid: Job step WebSphere Compute Grid enables Java as a language for batch workloads on the mainframe or in distributed environments to create an infrastructure for Batch and OLTP processing that can share business logic to lower costs, eliminate Batch window and deliver high availability. Input Output Batch Job Step Fixed Block Dataset Variable Block Dataset JDBC File IBATIS More to come… Fixed Block Dataset Variable Block Dataset JDBC JDBC w/ Batching File IBATIS More to come…. Map Data to Object Transform Object Map Object to Data A simplified batch job
    8. 10. Batch Programming Model The anatomy of a transactional batch application – batch job step Lifecycle … Batch Container setProperties(Properties p) { } … createJobStep() { … } processJobStep() { … } destroyJobStep() { … } 1 2 3 4 Compute Grid makes it easy for developers to create transactional batch applications by allowing them to use a streamlined POJO model and to focus on business logic and not on the batch infrastructure
    9. 11. Checkpoint & Restart with Batch Data Streams WebSphere Batch makes it easy for developers to encapsulate input/output data streams using POJOs that optionally support checkpoint/restart semantics. Job Start Batch Container open() positionAtInitialCheckpoint() externalizeCheckpoint() close() 1 2 3 4 Job Restart Batch Container open() internalizeCheckpoint() positionAtCurrentCheckpoint() externalizeCheckpoint() close() 1 2 3 5 4
    10. 13. Parallel job manager (PJM) <ul><li>PJM breaks large batch jobs into smaller partitions for parallel execution </li></ul><ul><ul><li>Installed as a system application </li></ul></ul><ul><ul><li>Can be installed to a single server or a cluster </li></ul></ul><ul><ul><li>Provides out of the box and custom SPIs to implement </li></ul></ul><ul><li>PJM is the target application of a parallel job </li></ul><ul><ul><li>PJM does not process batch data streams </li></ul></ul><ul><ul><li>It submits or restarts sub jobs under the control of step properties which identify the sub job in the job repository and the count of sub jobs to process. </li></ul></ul><ul><ul><li>A parallel job is submitted using the xJCL for its ‘top-level’ job that specifies these details. </li></ul></ul><ul><li>PJM is a Sub job Manager, where a subjob is </li></ul><ul><ul><li>An instance of a regular batch job that can be bounded by substitution properties specified in its xJCL. </li></ul></ul><ul><ul><li>Submitted to the job scheduler by the PJM. </li></ul></ul><ul><ul><li>Aggregated by the PJM into one logical top level job for status, result code, life cycle management </li></ul></ul>
    11. 14. Compute Grid runtime components with PJM Batch App Batch App … Batch Container Parallel Job Manager Parameterizer SPI Logical TX Synchronization SPI SubJob Analyzer SPI SubJob Collector SPI xJCL SubJob Collector SPI logical transaction scope Batch Container Batch Container Job Repository Sub Job Name SubJob # 1 SubJob # N WAS Server 1 WAS Server N WAS Server Job Scheduler WAS Server
    12. 15. logical Deployment Job Scheduler Batch Container Workload Connector Workload Scheduler (e.g. TWS) Batch Container Batch Container Per Line of Business Jobs Jobs Jobs Jobs Jobs Jobs Job Job Job Jobs Jobs Jobs Jobs Console Jobs Online Applications APIs Parallel Job Manager public submit(Job j) { _sched.submit(j); }
    13. 16. Physical Deployment - Distributed Per Line of Business Batch Containers Batch Containers Batch Containers Enterprise scheduler like TWS Online Applications WAS ND Cell Jobs Jobs Jobs Jobs Jobs Jobs Jobs Job Scheduler PJM
    14. 17. Admin & Configuration with WAS admin console
    15. 18. Integrated operational control <ul><li>Provides an operational infrastructure for job life cycle </li></ul><ul><li>Integrates with existing enterprise schedulers such as Tivoli Workload Scheduler </li></ul><ul><li>Provides log management and integrates with archiving and auditing </li></ul><ul><li>Provides resource usage monitoring </li></ul><ul><li>Integrates with existing security and disaster recovery procedures </li></ul><ul><li>Configures as a highly available component </li></ul>Bulk application container WCG Batch Container Information storage Data access management services “ File” Data access Queue based data access In-memory data access Custom data access Infrastructure services WCG Batch Framework Bulk application development Environment for creating and migrating bulk applications System management and operations Manage, monitor and secure bulk processes Analytics for scheduling, check-pointing, resource management WCG Eclipse Plugin WCG Scheduler Gateway Bulk Partner services Business process and event services Scheduler services Invocation & Scheduling optimization Resource brokering, Split & Parallelize, Pace, Throttle Invocation services Ad hoc Planned
    16. 19. Job Management Console – View jobs
    17. 20. Job management console: Job schedules <ul><li>Save a job definition </li></ul><ul><ul><li>xJCL </li></ul></ul><ul><ul><li>Schedule </li></ul></ul><ul><ul><ul><li>Date and time </li></ul></ul></ul><ul><ul><ul><li>Repeating </li></ul></ul></ul><ul><li>Manage schedules </li></ul><ul><ul><li>View details </li></ul></ul><ul><ul><li>Cancel </li></ul></ul>
    18. 21. Benefits of running WebSphere Compute Grid on z/OS
    19. 22. Essential Story – Exploitation of Lower Level Benefits WebSphere Compute Grid Function Common Across All Platforms WebSphere Application Server z/OS Function Common Across All Platforms System z Inherent Reliability zAAPs z/OS and Parallel Sysplex WLM RRS SMF Shared Data SAF CoLocation Awareness of WAS z/OS Exploit WAS z/OS Function Specific to z/OS Exploit Platform
    20. 23. zAAPs – Providing a Java Cost Advantage on z/OS <ul><li>Java workload offloaded to zAAP processors </li></ul><ul><li>Completely transparent to Java applications, including batch </li></ul><ul><li>Benefits: </li></ul><ul><ul><li>MIPs related to Java on zAAPs not counted towards other software monthly license charges </li></ul></ul><ul><ul><li>Frees GPs to do traditional z/OS work, such as CICS, DB2 and IMS </li></ul></ul>
    21. 24. RRS – Sysplex Wide Global Transaction Syncpoint Coordinator Very fast and reliable Excels at TX rollback when needed
    22. 25. WLM Classification – Prioritize Work Prioritize Compute Grid Relative to Other Tasks within the z/OS System Prioritize Batch Jobs Relative to Other Batch Jobs within Compute Grid Example of WAS z/OS is exploiting WLM WebSphere Compute Grid z/OS Higher Priority Job Medium Priority Job Lower Priority Job Relatively more system resources Relatively less system resources
    23. 26. SMF – Accounting Information ... Very Efficient, Very Fast SMF Data Sets Data Analysis Tools Other z/OS subsystems and facilities <ul><li>Chargeback </li></ul><ul><li>Performance and Tuning </li></ul><ul><li>Capacity Planning </li></ul>WebSphere Compute Grid z/OS z/OS SMF Interface WAS z/OS RMF DB2 CICS MQ Memory Buffers Job identifier Job submitter Final Job state Server Node Accounting information Job start time Last update time CPU consumed Type 120, Subtype 20
    24. 27. Parallel Sysplex – Availability and Scalability Proven Scalability Near linear up to 32 nodes in Sysplex Direct value to Compute Grid and Your Batch Processes z/OS Instance WAS z/OS + Compute Grid DB2, CICS, IMS, MQ z/OS Instance WAS z/OS + Compute Grid DB2, CICS, IMS, MQ z/OS Instance WAS z/OS + Compute Grid DB2, CICS, IMS, MQ Local Data Caches Centralized shared data structures with integrated data locking and update Availability This provides the foundation for a highly available architecture Parallel Jobs Excellent platform on which to use Compute Grid’s Parallel Job Manager
    25. 28. SAF – Centralized Security <ul><li>Centralized SAF Security Repository </li></ul><ul><li>Userids and Groups </li></ul><ul><li>EJBROLE Role Enforcement </li></ul><ul><li>Digital Certificates and Keyrings </li></ul><ul><li>Much more related to WAS z/OS Security </li></ul><ul><li>Extensive auditing </li></ul>Proven secure, and centralized enables tighter control
    26. 29. External Scheduler Integration on z/OS WebSphere Batch control by external workload scheduler (e.g. Control-M, etc) Tivoli Workload Scheduler JES //JOB1 JOB ‘…’ //STEP1 PGM=IDCAMS //STEP2 PGM= WSGRID , //WGJOB DD * <job … > … </job> submit monitor WebSphere Batch Scheduler submit monitor WAS Batch App <job name=“JOB1&quot; … <job-step name=“STEP2&quot;> … WSGrid Job Schedule MQ Messages <ul><li>JCL/xJCL jobs have synchronized lifecycle </li></ul><ul><li>xJCL job restartable from JCL job </li></ul><ul><li>xJCL job log piped to JCL job, written to SYSOUT dataset </li></ul><ul><li>xJCL job RC is step RC in JCL job </li></ul>
    27. 30. WSGrid JCL Example
    28. 31. WSGrid JCL Job Output (SYSPRINT DD – Top of File)
    29. 32. WSGrid JCL Job Output (SYSPRINT DD – Bottom of File)
    30. 34. Feature-set Options WebSphere App Server WebSphere Batch Feature Pack Job Scheduler Batch Toolkit WebSphere Compute Grid Product Parallel Job Manager Start with the Feature Pack; grow into Compute Grid! Batch Container Job Scheduler Batch Toolkit Batch Container Enterprise Connectors Advanced Operations Pack
    31. 35. Deployment Options Features and QoS Guidance to choose optimal deployment option for Batch workloads √ Disaster Recovery with operational state transfer √ Interoperability between Java and COBOL on z/OS √ √ Integration with WLM on z/OS Common batch container, development tools to develop batch applications, operational commands to manage batch job life cycle √ √ √ Container managed checkpoint/restart capabilities √ √ √ Job management console √ √ √ Application Execution Platform √ √ √ Basic Scheduler/Job dispatcher √ √ √ System managed job logs √ √ √ High availability and clustering of Batch Job Scheduler/Job Dispatcher √ √ Multi-site disaster recovery for batch platform √ √ Non-disruptive batch application update/endpoint quiesce √ Job usage accounting, including SMF integration on z/OS √ Job classes and workload classification √ Integrated “Parallel Job Manager” for job parallelization across multi-JVMs √ Enterprise Scheduler connectors √ Enterprise Monitoring capabilities √ Integration with VE for goal oriented job placement √ WebSphere Compute Grid WAS on z/OS, WAS ND with FeP for Modern Batch WAS Base with FeP for Modern Batch
    32. 36. Summary <ul><li>WebSphere Batch solutions create the separation of concerns between business and application logic and the batch infrastructure. </li></ul><ul><li>WebSphere Batch solutions provide an environment and infrastructure for running mixed workloads in Java efficiently </li></ul><ul><li>WebSphere Batch solutions are strategically important and a fundamental component of IBM’s Batch infrastructure leadership </li></ul><ul><ul><li>WebSphere Compute Grid provides market leading capabilities for development to accelerate time-to-value for clients </li></ul></ul><ul><ul><li>WebSphere Compute Grid is production ready with many customers running mission critical Batch workloads </li></ul></ul>
    33. 37. Back Office Operation Center – New Assets Overview and Insights
    34. 38. Comparison of JZOS and WebSphere Compute Grid (WCG) <ul><li>Java Batch Execution </li></ul><ul><ul><li>Both JZOS and WCG provide an environment to execute Java batch programs </li></ul></ul><ul><li>JES/JCL Jobs </li></ul><ul><ul><li>Both JZOS and WCG workload can be described/submitted/run through JES/JCL </li></ul></ul><ul><li>Control-M Scheduling </li></ul><ul><ul><li>Both JZOS and WCG workload can be directly scheduled/controlled by TWS </li></ul></ul><ul><li>Managed Job Restart </li></ul><ul><ul><li>Both JZOS and WCG workload can be restarted through TWS </li></ul></ul><ul><li>SMF Usage Recording </li></ul><ul><ul><li>Both JZOS and WCG workload can be measured with SMF records </li></ul></ul>Where they are the same …
    35. 39. Comparison of JZOS and WCG – Where they differ … J2SE, JZOS, J2EE, WS* J2SE, JZOS Java Services JES-managed Batch Initiator + WebSphere Application Server JES-managed Batch Initiator Environment Reusable JVM (operational efficiency) Deposable JVM JVM LifeCycle System-managed (operational control) Ad-hoc or roll-your-own Parallelization System-managed (operational optimization) Application-managed Checkpoints Java/COBOL interoperability with DB2 connection sharing Java/COBOL interoperability, but NO connection sharing Inter-language Remote calls Local, optimized calls (co-location) Remote calls only Service Integration Local transaction mode (1PC) RRS transaction mode (2PC) XA transaction mode (2PC) Local transaction mode only Transactionality WCG JZOS Feature