Big Data is often characterized by the 3 “Vs”: variety, volume and velocity. While variety refers to the nature of the information (multiple sources, schema-less data, etc), both volume and velocity refer to processing issues that have to be addressed by different processing paradigms.
Assuming that the volumes of data are larger than those conventional relational database infrastructures can cope with, the processing solution break down broadly into massively parallel processing (batch processing). Batch processing is an efficient way of processing high volumes of data is where a group of transactions is collected over a period of time. Data is collected, entered, processed and then the batch results are produced.
Several applications require real-time processing of data streams from heterogeneous sources, in contrast with the approach of batch processing. Real time processing involves a continual input, process and output of data. Data must be processed in a small time period (or near real time). Domains of application include smart cities, entertainment of disaster management. The low latency is the main goal of this processing paradigm.
Batch processing provides strong results since it can use more data and, for example, perform better training of predictive models. But it is not feasible for domains where a low response time is a critical issue. Real time processing solves this issue, but the analyzed information is limited in order to achieve low latency. Many domains require the benefit of both batch and real time processing approaches so a new processing paradigm is needed: the hybrid model. To obtain a complete result, the batch and real-time results must be queried and the results merged together. Synchronization, results composition and other non-trivial issues have to be addressed at this stage in which could be considered a key element of the hybrid modell.
This walk will overview the time-evolution of the big data processing techniques, identify main hits (both technologies and scientific publications) and give and introduction of the key technologies to understand the complex Big Data processing domain.