Your SlideShare is downloading. ×
0
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop vs Java Batch Processing JSR 352

9,206

Published on

Hadoop has become synonymous to Big Data. Oracle has release the latest standard to Java EE stack: Batch Processing JSR 352. Batch processing has been around for decades and there are many Java …

Hadoop has become synonymous to Big Data. Oracle has release the latest standard to Java EE stack: Batch Processing JSR 352. Batch processing has been around for decades and there are many Java framework already available such Spring Batch. This talks provides a perspective about Hadoop and JSR352. Knowing when to use or the other or both together.

Published in: Technology
1 Comment
5 Likes
Statistics
Notes
No Downloads
Views
Total Views
9,206
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
88
Comments
1
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. AGENDA • Introduction • What is batch processing? • Batch processing using Hadoop • Batch processing using Java Batch Processing JSR 352 • When to use Hadoop or JSR 352? • Conclusion A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 2
  • 2. INTRODUCTION Motivation for this presentation are: • Petabytes of data available in the wild (Internet, cars, fridge…) • Need for competitive edge • Processing large dataset • Analysing large complex data (ETL) • Generating reports A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 3
  • 3. WHAT IS BATCH PROCESSING? Batch processing is execution of a series of programs ("jobs") on a computer without manual intervention. Batch processing has these benefits: • It can shift the time of job processing to when the computing resources are less busy. • It avoids idling the computing resources with minute-by- minute manual intervention and supervision. • By keeping high overall rate of utilization, it amortizes the computer, especially an expensive one. • It allows the system to use different priorities for batch and interactive work. Source: Wikipedia A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 4
  • 4. BATCH PROCESSING USING HADOOP Hadoop is a massively scalable storage and batch data processing system. It provides an integrated storage and processing fabric that scales horizontally with commodity hardware and provides fault tolerance through software. Rather than replace existing systems, Hadoop augments them by offloading the particularly difficult problem of simultaneously ingesting, processing and delivering/exporting large volumes of data so existing systems can focus on what they were designed to do: whether that be serve real time transactional data or provide interactive business intelligence. A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 5
  • 5. BATCH PROCESSING WITH HADOOP CONT… • Hadoop uses the MapReduce programming model • Parallel job processing – no need to worry about synchronization, concurrency, hardware failure, etc… • Databases: Using the RDBMS built-in tools to dump the data or Hadoop native JDBC tools to extract data • Unstructured data such as log files can be processed using Hadoop • Hardware and Data agnostic A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 6
  • 6. BATCH PROCESSING USING JAVA BATCH PROCESSING JSR 352 Batch processing refers to running batch jobs on a computer system. Java EE includes a batch processing framework that provides the batch execution infrastructure common to all batch applications, enabling developers to concentrate on the business logic of their batch applications. The batch framework consists of a job specification language based on XML, a set of batch annotations and interfaces for application classes that implement the business logic, a batch container that manages the execution of batch jobs, and supporting classes and interfaces to interact with the batch container. A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 7
  • 7. BATCH PROCESSING USING JAVA BATCH PROCESSING JSR 352 CONT… Java EE includes a batch processing framework that consists of the following elements: • A batch runtime that manages the execution of jobs. • A job specification language based on XML. • A Java API to interact with the batch runtime. • A Java API to implement steps, decision elements, and other batch artefacts. JSR-325 is easily integrated in SOA architecture, JMX for monitoring, Java Messaging Services and the full Java EE stack. The learning curve for a Java EE developer is substantially reduced. A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 8
  • 8. WHEN TO USE HADOOP OR JSR 352? Java EE Batch Processing is not a competitive technology to Apache Hadoop. They were built for different uses cases. Here are some examples of use cases where I believe they can be best: Financial Risk Modelling Creating reports from Database Internet Threat Analysis System housekeepin g Hadoop JBatch JSR 352 A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 9
  • 9. WHEN TO USE HADOOP OR JSR 352? CONT… When deciding which technology to implement, you may want to consider the following: • Source of data • Size of data • Processing/ business logic • Does the batch process integrates with your existing architecture • What do with the processed data A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 10
  • 10. CONCLUSION • JSR 352 is not a replacement for Hadoop • You can use them both together, maybe JSR 352 as a trigger for Hadoop jobs • JSR 352 is better suited for small batch jobs such as generating sales reports • Hadoop should be used when large dataset (>1TB) need to be analysed • JSR352 can be easily integrated in your Enterprise Service Bus architecture A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 11
  • 11. END. A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 12 Armel Nene is software architect and developer. He is also the founder of ETAPIX Global Limited – The Big Data Company - www.etapix.com Armel Nene Recruitment - www.armelnene.com is an IT specialist recruitment based in London, UK. @armelnene http://uk.linkedin.com/in/armelnene/

×