The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
ย
big data analytics and hadoop comparions
1. 1
Outlines
1. Introduction
2. Types of data under Big Data
3. Five Vโs in Big Data
4. Big data and Hadoop
5. Characteristics of Hadoop
6. Hadoop Architecture
2. Introduction
Big data is a collection of large datasets that cannot be processed using traditional computing techniques.
It is not a single technique or a tool, rather it has become a complete subject, which involves various
tools, techniques and frameworks.
3. Types of data under Big Data
โข Structured - Data having a defined data model, format , structure
E.g. Relational data.
โข Semi Structured - Textual data files with an apparent pattern, enabling analysis
E.g. Spreadsheets and XML files
โข Unstructured - Data that has no inherent structure and is usually stored as different types of files.
E.g. Text documents , PDFs, images and video .
4. Five Vโs in Big Data
๏ฑ Volume - Huge amount of data.
๏ฑ Variety - Different formats of data from various sources.
๏ฑ Velocity - High speed of receiving and transferring of data.
๏ฑ Veracity - Inconsistencies and uncertainty in data.
๏ฑ Value - Extract useful data.
5. Big data and Hadoop
Hadoop is a framework that allows distributed processing of large datasets
across clusters of commodity computers using simple programming models.
6. Characteristics of Hadoop
๏ฑ Scalable - Can follow both horizontal and vertical scaling
๏ฑ Flexible - Can store huge data and decide to use it later
๏ฑ Economical - Can use ordinary computers for data processing
๏ฑ Reliable - Stores copies of the data on different machines and is resistant to hardware failure
7. Hadoop Architecture
๏ฑ At its core, Hadoop has two major layers
namely โ
Processing/Computation layer (Map Reduce)
Storage layer (Hadoop Distributed File System).
๏ฑ Apart from the above-mentioned two core
components, Hadoop framework also includes
the following two modules โ
Hadoop Common โ These are Java libraries
and utilities required by other Hadoop modules.
Hadoop YARN โ This is a framework for job
scheduling and cluster resource management.