SlideShare is now on Android. 15 million presentations at your fingertips.  Get the app

×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

The three generations of Big Data processing

by on Nov 11, 2013

  • 1,457 views

Big Data is often characterized by the 3 “Vs”: variety, volume and velocity. While variety refers to the nature of the information (multiple sources, schema-less data, etc), both volume and ...

Big Data is often characterized by the 3 “Vs”: variety, volume and velocity. While variety refers to the nature of the information (multiple sources, schema-less data, etc), both volume and velocity refer to processing issues that have to be addressed by different processing paradigms.

Assuming that the volumes of data are larger than those conventional relational database infrastructures can cope with, the processing solution break down broadly into massively parallel processing (batch processing). Batch processing is an efficient way of processing high volumes of data is where a group of transactions is collected over a period of time. Data is collected, entered, processed and then the batch results are produced.

Several applications require real-time processing of data streams from heterogeneous sources, in contrast with the approach of batch processing. Real time processing involves a continual input, process and output of data. Data must be processed in a small time period (or near real time). Domains of application include smart cities, entertainment of disaster management. The low latency is the main goal of this processing paradigm.

Batch processing provides strong results since it can use more data and, for example, perform better training of predictive models. But it is not feasible for domains where a low response time is a critical issue. Real time processing solves this issue, but the analyzed information is limited in order to achieve low latency. Many domains require the benefit of both batch and real time processing approaches so a new processing paradigm is needed: the hybrid model. To obtain a complete result, the batch and real-time results must be queried and the results merged together. Synchronization, results composition and other non-trivial issues have to be addressed at this stage in which could be considered a key element of the hybrid modell.

This walk will overview the time-evolution of the big data processing techniques, identify main hits (both technologies and scientific publications) and give and introduction of the key technologies to understand the complex Big Data processing domain.

Statistics

Views

Total Views
1,457
Views on SlideShare
1,436
Embed Views
21

Actions

Likes
8
Downloads
94
Comments
0

1 Embed 21

https://twitter.com 21

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

The three generations of Big Data processing The three generations of Big Data processing Presentation Transcript