Big Data Hadoop

Big Data Hadoop
For project and thesis

Introduction
Big Data refers to large volume of data which may be structured or
unstructured and which make use of certain new technologies and techniques
to handle it. Organised form of data is known as structured data while
unorganised form of data is known as unstructured data. The data sets in
big data are so large and complex that we cannot handle them using
traditional application softwares. There are certain frameworks like Hadoop
designed for processing big data. These techniques are also used to extract
useful insights from data using predictive analysis, user behavior and
analytics.

3 Vs of Big Data
● Volume – It refers to the amount of data that is generated. The data can be low-density, high
volume, structured/unstructured or data with unknown value. This unknown data is converted
into useful one using technologies like Hadoop. The data can range from terabytes to
petabytes.
● Velocity – It refers to the rate at which the data is generated. The data is received at
an unprecedented speed and is acted upon in a timely manner. It also require real time
evaluation and action in case of Internet of Things(IoT) applications
● Variety – Variety refers to different formats of data. It may be structured, unstructured or
semistructured. The data can be audio, video, text or email. In this additional
processing is required to derive the meaning of data and also to support the metadata.

Hadoop
Hadoop is an open-source framework
provided to process and store big
data. Hadoop make use of simple
programming models to process big
data in a distributed environment
across clusters of computers. Hadoop
provides storage for large volume of
data along with advanced processing
power. It also gives the ability to
handle multiple tasks and jobs.

HDFS is the main component of Hadoop architecture. It stands for Hadoop
Distributed File Systems. It is used to store large amount of data and multiple
machines are used for this storage. MapReduce Overview is another component of
big data architecture. The data is processed here in a distributed manner across
multiple machines. YARN component is used for data processing resources like
CPU, RAM, and memory. Resource Manager and Node Manager are the elements of
YARN. These two elements work as master and slave. Resource Manager is the
master and assigns resources to the slave i.e. Node Manager. Node Manager sends
signal to the master when it is going to start the work. Big Data Hadoop for thesis
will be plus point for you.

Importance of Hadoop in Big
Data

Hadoop is important in Big Data due to:
● Processing of huge chunks of data – With Hadoop, we can process and store huge amount of data mainly the
data from social media and IoT(Internet of Things) applications.
● Computation power – The computation power of hadoop is high as it can process big data pretty fast. Hadoop
make use of distributed models for processing of data.
● Fault tolerance – Hadoop provide protection against any form of malware as well as from hardware failure. If a
node in the distributed model goes down, then other nodes continue to function.
● Flexibility – As much data as you require can be stored using Hadoop. There is no requirement of
preprocessing the data.
● Low Cost – Hadoop is an open-source framework and free to use. It provides additional hardware to store the
large quantities of data.
● Scalability – The system can be grown easily just by adding nodes in the system according to the requirements.
Minimal administration is required.

Applications of Big Data
Government
Big Data is used within governmental services with efficiency in cost, productivity and innovation. The
common example of this is the Indian Elections of 2014 in which BJP tried this to win the elections.
Finance
Big Data is used in finance for market prediction. It is used for compliance and regulatory reporting,
risk analysis, fraud detection, high speed trading and for analytics.
Healthcare
Big Data is used in healthcare services for clinical data analysis, disease pattern analysis, medical
devices and medicines supply, drug discovery and various other such analytics.

Media
Media uses Big Data for various mechanisms like ad targeting, forecasting,
clickstream analytics, campaign management and loyalty programs. It is mainly
focused on following three points:
Targeting consumers
Capturing of data
Data journalism
Information Technology
Big Data has helped employees working in Information Technology to work
efficiently and for widespread distribution of Information Technology.

Challenges of Big
Data
The main challenges of Big Data are:
Data Storage and quality of Data – The data is
growing at a fast pace as the number of companies and
organizations are growing. Proper storage of this data
has become a challenge.
Lack of big data analysts – There is huge demand for
data scientists and analysts who can understand and
analyze this data.
Quality Analysis - The data should also be accurate as
inaccurate data can lead to wrong decisions that will
affect the company's business.
Security and Privacy of Data – Security and privacy
are the biggest risks in big data.

Thanks!
Techsparks, 2nd floor, D-185,
Phase 8B, Industrial Area,
Sahibzada Ajit Singh Nagar,
Mohali, Punjab 160055
+91-9465330425
http://www.techsparks.co.in/
techsparks2013@gmail.com
Contact Us

Big Data Hadoop

More Related Content

What's hot

Similar to Big Data Hadoop

More from Techsparks

Recently uploaded

Big Data Hadoop