Big_data_ppt

Presented By:
SADHANA SINGH
M.TECH 1ST YEAR (COMPUTER
SCIENCE)

 Introduction
 Evolution of big data
 Characteristics
 Examples of big data generation
 Big data v/s RDBMS
 Hadoop
 HDFS
 MapReduce
 references

 Big data is a term for DATASETS that are so
large or complex that traditional data
processing applications are inadequate.
 Big data is the capability to manage huge
volume of disparate data, at the right speed
and within the right time frame to allow real
time analysis.

Wave 1: Creating
manageable data structures
Wave 2: Web and content
management
Wave 3: Managing big data

source: understanding big data analytics for enterprise class hadoop and
streaming data

V’sofbigdata
VOLUME
VELOCITY
VARIETY
VERACITY
VARIABILITY

Black box data
Social media data
Stock exchange data
Power grid data
Transport data
Search engine data

 Huge competition in market:
retails- customer analytics and predictive analytics
travel- travel pattern of customers
website- understand users navigation pattern,
interest, conversion etc.
 Sensors, satellite and geospatial data
 Military and intelligence

 Big Data includes huge volume, high
velocity, and extensible variety of data.
The data in it will be of three types.
• Relational data.Structured
data:
• XML data
Semi
Structured
data:
• Word, PDF, Text,
Media Logs
Unstructured
data:

Relational database Big data
 Single-computer
platform that scales
with better CPUs,
centralized
processing.
 Relational database
(SQL), centralized
storage.
 Batched, descriptive,
centralized
 Cluster platforms that
scale to thousands of
nodes, distributed
process
 Non-relational
databases that manage
varied data types and
formats (NoSQL),
distributed storage.
 Real-time, predictive
and prescriptive,
distributed analytics

 An open source apache foundation
framework.
 It allows distributed processing of large
datasets across clusters of computers using
simple programming models.
 Hadoop runs applications using the
MapReduce algorithm, where the data is
processed in parallel with others.
 It Uses the concept of Data locality.

 Processing/Comput
ation layer
(MapReduce), and
 Storage layer
(Hadoop
Distributed File
System).
source : hadoop tutorial on www.tutorialspoint.com

 Hadoop framework allows the user to
quickly write and test distributed systems. It
is efficient, and it automatic distributes the
data and work across the machines and in
turn, utilizes the underlying parallelism of
the CPU cores.
 Hadoop does not rely on hardware to
provide fault-tolerance and high availability
(FTHA), rather Hadoop library itself has been
designed to detect and handle failures at the
application layer.
 hadoop is designed to be self-healing.

 HDFS is a file system designed for storing
very large files with streaming data access
patterns, running on clusters of commodity
hardware.
 It can be defined as, "A reliable, high
bandwidth, low-cost, data-storage cluster
that facilitates the management of related
files across machines.”

Basic architecture of HDFS
source: J. Hurwitz, et al., “Big Data for Dummies,” Wiley, 2013, ISBN:978-1-118-50422-2.

Replica placement
source: Hadoop: The Definitive Guide, by Tom White, 2015, ISBN: 978-1-491-90163-2

 Hadoop mapReduce is an implementation of
mapReduce algorithm.
 Map reduce is a batch query processor, and
the ability to run an adhoc query against
whole dataset and get the results in a
reasonable time is TRANSFORMATIVE.

source: J. Hurwitz, et al., “Big Data for Dummies,” Wiley, 2013, ISBN:978-1-118-50422-2.

 Example of air temperature analysis.
 Problems :
 Dividing the work into equal size pieces is not
easy.
 Combininng the results from independent process
may requirefurther processing.
 The processing capacity of a single machine is
limited.

i. J. Hurwitz, et al., “Big Data for Dummies,”
Wiley, 2013, ISBN:978-1-118-50422-2.
ii. http://www.cse.wustl.edu/~jain/cse570-
13/
iii. Hadoop: The Definitive Guide, by Tom
White, 2015, ISBN: 978-1-491-90163-2
iv. Hadoop tutorials on
www.tutorialspoint.com

Big_data_ppt

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big_data_ppt

Similar to Big_data_ppt (20)

Recently uploaded

Recently uploaded (20)

Big_data_ppt

Editor's Notes