Big data analytics.

What is Big data?
 ‘Big Data’ is similar to ‘small data’, but bigger in size.
 Big data is a term for data sets that are so large or complex
that traditional data processing applications are inadequate
to deal with them.

Evolution of
Technology
Conventional Systems
to Smart Systems
Telephone Desktop Car
Mobile Cloud Smart Car

Social Media
204,000,000 Emails
1,736,111 Pics
4,166,667 Likes &
200,000 Pics
300 Hours of video
uploaded
347,222 Tweets

Three Vs of Big Data
Velocity
• Data speed
Volume
• Data
quantity
Variety
• Data Types

Volume
10,000
20,000
30,000
40,000
2010 2011 2012 2013 201620152014 20182017 2019 2020
4.4 zettabytes of
today will grow up to
44 zettabytes or 44
trillion gigabytes, by
2020
Large amount of data generated every sec

Variety
Different kinds of data is being generated from various sources.
Structured Semi-Structured Unstructured

Velocity
Mobile,
Social
Media,
Cloud …
InternetClient/ServerMainframe
Data is being generated at an alarming rate.
Every 60 Seconds
100,000+Tweets
650,000+ status update
11000,000+Instant chat 698,445+google search
168,000,000+Emails
217+ New users

Problems with Big Data
Problem 1: Storing exponentially growing huge data sets
Solution: A distributed file system
Problem 2: Processing data having complex structure
Solution: A storage which does not use any particular schema to
store data.
Problem 3: Processing data faster

Hadoop is the solution
Hadoop is a frame work that allows us to store and process large data sets of
different types in parallel and distributed fashion.
HDFS (Storage) MapReduce (Processing)
Allows to store various data formats across a
data cluster
Allows parallel processing of data stored in
HDFS

History of Hadoop
 Hadoop was created by computer scientists Doug Cutting and
Mike Cafarella in 2005.
 It was inspired by Google's MapReduce, a software framework
in which an application is broken down into numerous small
parts.
 Doug named it after his son’s toy elephant.

Hadoop Distributed File System
HDFS creates a level of abstraction over the resources, from where we can see
the whole HDFS as a single unit.
HDFS has two core components: Name Node and Data
Node.
The Name Node is the main node that contains meta data about the data
stored.
Data is stored on the Data Nodes which are commodity hardware in the
distributed environment.

Storing Data(Solution is HDFS)
Storage unit of Hadoop
It is a distributed file system
Divide files (input data) into smaller chunks and stores it across the cluster
Scalable as per requirement
512 MB File
Distributed into four 128 MB Files

Big data analytics.

More Related Content

What's hot

Similar to Big data analytics.

More from GauravBiswas9

Recently uploaded

Big data analytics.