Batch Processing
Processamento em Lotes no Mundo Corporativo
Rodrigo Cândido da Silva
@rcandidosilva
About Me
• JUG Leader do GUJavaSC
• http://gujavasc.org
• Twitter
• @rcandidosilva
• Contatos
• http://rodrigocandido.me
Agenda
• Conceitos
• Batch Domain Language
• Chunk vs. Batchlet
• Partitioned Step
• Flow, Split e Decision
• Listeners e Exceptions
• Execution
• Integration
• Demo
Porque Batch?
• É muito comum em aplicações
• Várias soluções "personalizadas"
• Produtos começaram a surgir
• Spring Batch
• WebSphere Compute Grid
• Ideal para sistemas ETL
Batch API
• Chunk / Batchlet
• Implementação de um Step
• Contexts
• Job e Step at runtime
• Persistência de metadados
• Listeners
• Callback lifecycle events
• Partitioning
• Processamento paralelo
Batch Domain Language
• Batch job XML definition
• Descreve os steps como um agrupamento de batch
artefacts
Batch Domain Language
<job id="adressJob" version="1.0">
<listeners>
<listener ref="MyJobListener"/>
</listeners>
<step id="buildingData" next="adressStep">
<batchlet ref="GenerateDataBatchlet" />
</step>
<step id="adressStep">
<listeners>
<listener ref="MyStepListener"/>
</listeners>
<chunk item-count="10">
<reader ref="adressItemReader" />
<processor ref="adressItemProcessor" />
<writer ref="adressItemWriter" />
</chunk>
</step>
</job>
Chunk vs. Batchlet
• Implementam step dentro do job
• Chunk
• Encapsula padrão ETL
• Single Reader, Processor e Writer
• Executado por pedaços de dados (chunk)
• Chunk output é escrito unitariamente
• Batchlet
• Promove a execução de um único e simples processo
• Executado até o fim produzindo um código de retorno
Chunk vs. Batchlet
Chunk Batchlet
Batchlet
@Named
public class MyBatchlet {
@Process
public String process() throws Exception {..}
@Stop
public void stopMe() throws Exception {..}
}
<step id="step1">
<batchlet ref="MyBatchlet"/>
</step>
public class MyBatchlet implements Batchlet {..}
Chunk
<step id="sendStatements">
<chunk reader="accountReader"
processor="accountProcessor"
writer="emailWriter" item-count="10"/>
</step>
@Named(“accountReader")
...implements ItemReader... {

public Account readItem() {

// read account using JPA
@Named(“accountProcessor")
...implements ItemProcessor... {
public Statement processItems(Account account) {

// read Account, return Statement
@Named(“emailWriter")
...implements ItemWriter... {
public void writeItems(List<Statements> statements) {

// use JavaMail to send email
• Step Job
Chunk
public interface ItemReader<T> {
public void open(Externalizable checkpoint);
public T readItem();
public Externalizable checkpointInfo();
public void close();
}
public interface ItemWriter<T> {
public void open(Externalizable checkpoint);
public void writeItems(List<T> items);
public Externalizable checkpointInfo();
public void close();
}
public interface ItemProcessor<T, R> {
public R processItem(T item);
}
Checkpoint
• Para tarefas intensivas, longo período de tempo
• Checkpoint/restart é bastante utilizado
• Basicamente…
• Armazena estado do ItemReader, ItemWriter
• Método chamados
• reader.checkpointInfo()
• writer.checkpointInfo()
public interface ItemReader<T> {
public void open(Externalizable checkpoint);
public Externalizable checkpointInfo();
}
public interface ItemWriter<T> {
public void open(Externalizable checkpoint);
public Externalizable checkpointInfo();
}
<chunk checkpoint-policy="item"
commit-interval="10" item-count="10">
Partitioned Step
• Step pode rodar particionado
• [N] instâncias do mesmo step em [N] Threads
• Uma partição por Thread
<step id="step1">
<chunk>
<partition>
<plan partitions="10" threads="2"/>
<reducer />
</partition>
</chunk>
</step>
Partitioned Step
• Partition Mapper
• Decide dinamicamente o número de partições
• Partition Plan
• Partition Reducer
• Demarca a unidade lógica de trabalho
• Partition Collector
• Enviar resultados de processamento das partições
• Partition Analyzer
• Ponto de controle e análise dos resultados enviados
Flow, Split e Decision
Flow
Step I
Task
Step II
Chunk
ItemReader
ItemWriter
Step III
Chunk
Deci-
sion
ItemReader
ItemWriter
Step IV
Chunk
ItemReader
ItemWriter
EndStart
ItemProcess
or
ItemProcess
or
ItemProcess
or
Flow
• Define a lista de steps a ser executado (unitário)
<flow id="flow-1" next="{flow, step, decision}-id">
<step id="flow_1_step_1">
</step>
<step id="flow_1_step_2">
</step>
</flow>
Split
• Define a lista de flows a serem executados (paralelo)
• Coletores e analisadores para monitoramento
<split >
<flow /> <!-- each flow runs on a separate thread -->
<flow />
</split>
Decision
• Possibilita a implementação de workflows
Decision
@Named
public class Decider {
public String decide(BatchContext context)
throws Exception {
String exit = context.getExitStatus();
if (“SUCCESS”.equals(exit)) {
return “SKIP”;
}
return exit;
}
}
<step id="step1">
<decision id="decision1" ref="Decider">
<next on="SKIP" to="step3"/>
<next on="*" to="step2"/>
</decision>
</step>
<step id="step2" next="step3"/>
<step id="step3"/>
Lifecycle
STOPPED
STARTING STARTED COMPLETED
FAILED
STOPPING
ABANDONED
stop()
start()
abandon()
abandon()
abandon()
restart()
restart()
Listeners
@Named
public class StepListener {
@BatchContext
StepContext context;
@BeforeStep
public void beforeStep() {..}
@AfterStep
public void afterStep() {..}
}
<step id="step1">
<listeners>
<listener
ref="StepListener"/>
</listeners>
</step>
• Step
• StepListener, ItemReadListener, ItemProcessListener, ItemWriterListener,
ChunkListener, RetryReadListener, RetryProcessListener, RetryWriteListener,
SkipReadListener, SkipProcessListener, SkipWriteListener
• Job
• JobListener
Exceptions
<job id="...">
<chunk skip-limit="5" retry-limit="5">
<skippable-exception-classes>
<include class="java.lang.Exception"/>
<exclude class="java.io.FileNotFoundException"/>
</skippable-exception-classes>
<retryable-exception-classes>

</retryable-exception-classes>
<no-rollback-exception-classes>
...
</no-rollback-exception-classes>
</chunk>
</job>
• JobOperator
• Runtime interface para gerenciamento
• start, stop, restart
• JobRepository interface commands
• JobRepository
• Contém informações sobre os jobs
• Completos e em execução
JobOperator e Repository
Execution
• JobInstance
• Representação lógica de um job
runtime
• JobExecution
• Suporte clustering, segurança,
gerenciamento de recursos
• StepExecution
• Tentativa de rodar um step de um
job
Integration
• Suporte ao Java SE
• Application Server Runtime
• Suporte clustering, segurança, gerenciamento de recursos
• Dependency Injection com CDI
• XML descriptors
• META-INF/batch-jobs/myJob.xml
• Empacotamento
• JAR, WAR, EJB
Demo
• Java EE 7 Samples
• Diferentes exemplos de utilização Batch API
• https://github.com/javaee-samples/javaee7-samples/tree/master/batch
Perguntas
?
Referências
• https://jcp.org/en/jsr/detail?id=352
• https://java.net/projects/jbatch
• http://projects.spring.io/spring-batch/
• http://docs.oracle.com/javaee/7/tutorial/doc/batch-processing.htm
• http://www.oracle.com/technetwork/articles/java/batch-1965499.html
• https://github.com/javaee-samples/javaee7-samples/
• http://blog.arungupta.me/2014/07/schedule-javaee7-batch-jobs-techtip36/
• http://www.planetjones.co.uk/blog/25-05-2013/introducing-jsr-352-java-
batch-ee-7.html
Muito obrigado!
@rcandidosilva
rodrigocandido.me

JavaOne LATAM 2015 - Batch Processing: Processamento em Lotes no Mundo Corporativo

  • 1.
    Batch Processing Processamento emLotes no Mundo Corporativo Rodrigo Cândido da Silva @rcandidosilva
  • 2.
    About Me • JUGLeader do GUJavaSC • http://gujavasc.org • Twitter • @rcandidosilva • Contatos • http://rodrigocandido.me
  • 3.
    Agenda • Conceitos • BatchDomain Language • Chunk vs. Batchlet • Partitioned Step • Flow, Split e Decision • Listeners e Exceptions • Execution • Integration • Demo
  • 4.
    Porque Batch? • Émuito comum em aplicações • Várias soluções "personalizadas" • Produtos começaram a surgir • Spring Batch • WebSphere Compute Grid • Ideal para sistemas ETL
  • 5.
    Batch API • Chunk/ Batchlet • Implementação de um Step • Contexts • Job e Step at runtime • Persistência de metadados • Listeners • Callback lifecycle events • Partitioning • Processamento paralelo
  • 6.
    Batch Domain Language •Batch job XML definition • Descreve os steps como um agrupamento de batch artefacts
  • 7.
    Batch Domain Language <jobid="adressJob" version="1.0"> <listeners> <listener ref="MyJobListener"/> </listeners> <step id="buildingData" next="adressStep"> <batchlet ref="GenerateDataBatchlet" /> </step> <step id="adressStep"> <listeners> <listener ref="MyStepListener"/> </listeners> <chunk item-count="10"> <reader ref="adressItemReader" /> <processor ref="adressItemProcessor" /> <writer ref="adressItemWriter" /> </chunk> </step> </job>
  • 8.
    Chunk vs. Batchlet •Implementam step dentro do job • Chunk • Encapsula padrão ETL • Single Reader, Processor e Writer • Executado por pedaços de dados (chunk) • Chunk output é escrito unitariamente • Batchlet • Promove a execução de um único e simples processo • Executado até o fim produzindo um código de retorno
  • 9.
  • 10.
    Batchlet @Named public class MyBatchlet{ @Process public String process() throws Exception {..} @Stop public void stopMe() throws Exception {..} } <step id="step1"> <batchlet ref="MyBatchlet"/> </step> public class MyBatchlet implements Batchlet {..}
  • 11.
    Chunk <step id="sendStatements"> <chunk reader="accountReader" processor="accountProcessor" writer="emailWriter"item-count="10"/> </step> @Named(“accountReader") ...implements ItemReader... {
 public Account readItem() {
 // read account using JPA @Named(“accountProcessor") ...implements ItemProcessor... { public Statement processItems(Account account) {
 // read Account, return Statement @Named(“emailWriter") ...implements ItemWriter... { public void writeItems(List<Statements> statements) {
 // use JavaMail to send email • Step Job
  • 12.
    Chunk public interface ItemReader<T>{ public void open(Externalizable checkpoint); public T readItem(); public Externalizable checkpointInfo(); public void close(); } public interface ItemWriter<T> { public void open(Externalizable checkpoint); public void writeItems(List<T> items); public Externalizable checkpointInfo(); public void close(); } public interface ItemProcessor<T, R> { public R processItem(T item); }
  • 13.
    Checkpoint • Para tarefasintensivas, longo período de tempo • Checkpoint/restart é bastante utilizado • Basicamente… • Armazena estado do ItemReader, ItemWriter • Método chamados • reader.checkpointInfo() • writer.checkpointInfo() public interface ItemReader<T> { public void open(Externalizable checkpoint); public Externalizable checkpointInfo(); } public interface ItemWriter<T> { public void open(Externalizable checkpoint); public Externalizable checkpointInfo(); } <chunk checkpoint-policy="item" commit-interval="10" item-count="10">
  • 14.
    Partitioned Step • Steppode rodar particionado • [N] instâncias do mesmo step em [N] Threads • Uma partição por Thread <step id="step1"> <chunk> <partition> <plan partitions="10" threads="2"/> <reducer /> </partition> </chunk> </step>
  • 15.
    Partitioned Step • PartitionMapper • Decide dinamicamente o número de partições • Partition Plan • Partition Reducer • Demarca a unidade lógica de trabalho • Partition Collector • Enviar resultados de processamento das partições • Partition Analyzer • Ponto de controle e análise dos resultados enviados
  • 16.
    Flow, Split eDecision Flow Step I Task Step II Chunk ItemReader ItemWriter Step III Chunk Deci- sion ItemReader ItemWriter Step IV Chunk ItemReader ItemWriter EndStart ItemProcess or ItemProcess or ItemProcess or
  • 17.
    Flow • Define alista de steps a ser executado (unitário) <flow id="flow-1" next="{flow, step, decision}-id"> <step id="flow_1_step_1"> </step> <step id="flow_1_step_2"> </step> </flow>
  • 18.
    Split • Define alista de flows a serem executados (paralelo) • Coletores e analisadores para monitoramento <split > <flow /> <!-- each flow runs on a separate thread --> <flow /> </split>
  • 19.
    Decision • Possibilita aimplementação de workflows
  • 20.
    Decision @Named public class Decider{ public String decide(BatchContext context) throws Exception { String exit = context.getExitStatus(); if (“SUCCESS”.equals(exit)) { return “SKIP”; } return exit; } } <step id="step1"> <decision id="decision1" ref="Decider"> <next on="SKIP" to="step3"/> <next on="*" to="step2"/> </decision> </step> <step id="step2" next="step3"/> <step id="step3"/>
  • 21.
  • 22.
    Listeners @Named public class StepListener{ @BatchContext StepContext context; @BeforeStep public void beforeStep() {..} @AfterStep public void afterStep() {..} } <step id="step1"> <listeners> <listener ref="StepListener"/> </listeners> </step> • Step • StepListener, ItemReadListener, ItemProcessListener, ItemWriterListener, ChunkListener, RetryReadListener, RetryProcessListener, RetryWriteListener, SkipReadListener, SkipProcessListener, SkipWriteListener • Job • JobListener
  • 23.
    Exceptions <job id="..."> <chunk skip-limit="5"retry-limit="5"> <skippable-exception-classes> <include class="java.lang.Exception"/> <exclude class="java.io.FileNotFoundException"/> </skippable-exception-classes> <retryable-exception-classes>
 </retryable-exception-classes> <no-rollback-exception-classes> ... </no-rollback-exception-classes> </chunk> </job>
  • 24.
    • JobOperator • Runtimeinterface para gerenciamento • start, stop, restart • JobRepository interface commands • JobRepository • Contém informações sobre os jobs • Completos e em execução JobOperator e Repository
  • 25.
    Execution • JobInstance • Representaçãológica de um job runtime • JobExecution • Suporte clustering, segurança, gerenciamento de recursos • StepExecution • Tentativa de rodar um step de um job
  • 26.
    Integration • Suporte aoJava SE • Application Server Runtime • Suporte clustering, segurança, gerenciamento de recursos • Dependency Injection com CDI • XML descriptors • META-INF/batch-jobs/myJob.xml • Empacotamento • JAR, WAR, EJB
  • 27.
    Demo • Java EE7 Samples • Diferentes exemplos de utilização Batch API • https://github.com/javaee-samples/javaee7-samples/tree/master/batch
  • 28.
  • 29.
    Referências • https://jcp.org/en/jsr/detail?id=352 • https://java.net/projects/jbatch •http://projects.spring.io/spring-batch/ • http://docs.oracle.com/javaee/7/tutorial/doc/batch-processing.htm • http://www.oracle.com/technetwork/articles/java/batch-1965499.html • https://github.com/javaee-samples/javaee7-samples/ • http://blog.arungupta.me/2014/07/schedule-javaee7-batch-jobs-techtip36/ • http://www.planetjones.co.uk/blog/25-05-2013/introducing-jsr-352-java- batch-ee-7.html
  • 30.