Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Big data analytics.
1.
2. What is Big data?
‘Big Data’ is similar to ‘small data’, but bigger in size.
Big data is a term for data sets that are so large or complex
that traditional data processing applications are inadequate
to deal with them.
5. Three Vs of Big Data
Velocity
• Data speed
Volume
• Data
quantity
Variety
• Data Types
6. Volume
10,000
20,000
30,000
40,000
2010 2011 2012 2013 201620152014 20182017 2019 2020
4.4 zettabytes of
today will grow up to
44 zettabytes or 44
trillion gigabytes, by
2020
Large amount of data generated every sec
7. Variety
Different kinds of data is being generated from various sources.
Structured Semi-Structured Unstructured
9. Problems with Big Data
Problem 1: Storing exponentially growing huge data sets
Solution: A distributed file system
Problem 2: Processing data having complex structure
Solution: A storage which does not use any particular schema to
store data.
Problem 3: Processing data faster
10. Hadoop is the solution
Hadoop is a frame work that allows us to store and process large data sets of
different types in parallel and distributed fashion.
HDFS (Storage) MapReduce (Processing)
Allows to store various data formats across a
data cluster
Allows parallel processing of data stored in
HDFS
11. History of Hadoop
Hadoop was created by computer scientists Doug Cutting and
Mike Cafarella in 2005.
It was inspired by Google's MapReduce, a software framework
in which an application is broken down into numerous small
parts.
Doug named it after his son’s toy elephant.
12. Hadoop Distributed File System
HDFS creates a level of abstraction over the resources, from where we can see
the whole HDFS as a single unit.
HDFS has two core components: Name Node and Data
Node.
The Name Node is the main node that contains meta data about the data
stored.
Data is stored on the Data Nodes which are commodity hardware in the
distributed environment.
13. Storing Data(Solution is HDFS)
Storage unit of Hadoop
It is a distributed file system
Divide files (input data) into smaller chunks and stores it across the cluster
Scalable as per requirement
512 MB File
Distributed into four 128 MB Files