Big Data and Hadoop are increasingly important tools for managing large and complex data sets. Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers. It uses MapReduce as its programming model and allows for the storage and retrieval of data sets using the Hadoop Distributed File System (HDFS). Hadoop has become popular for applications involving large, complex datasets and is scalable for applications involving petabytes of data.
13. DWH MPP – Map Reduce
Feature MPP Map Reduce
Hardware High End Commodity
License $$$ Open Source (30%)
Data Structure Relational Unstructured: Files
(SQL engine on nodes) Structured: Columnar
Query Language Declerative (SQL) Imperative
(Any programming
language)
Skills Similar to DB Use Current skills
Similar to Programming
17. BIG DATA
INTERGATION
Read & Deliver data
Load data Transform Data
Cloud
Data Base Enterprise
Weblogs, Mobile data Social Application
Data warehouse Application
data