2. BIG DATA
Big data is also the simple data but in huge amount is termed as Big data.
Big data is a term for data sets that are so large or complex that
traditional data processing application software's are inadequate to deal with
them.
Challenges including capture, storage, analysis, search, sharing, transfer,
visualization, querying, updating and information and privacy.
5. Need of Big data
Over 2.5 Exabyte(2.5 billion gigabytes) of data is generated every day.
A typical, large stock exchange captures more than 1 TB of data every day.
There are around 5 billion mobile phones (including 1.75 billion smart phones)
in the world.
A simple stock exchange market exchange more than 1TB of data on the daily
basis.
6. 4V’s BY IBM
Volume:- Size of the data.
Velocity:-At what rate data is generating and getting analyzed.
Variety:-Types of data like .jpg, .mp4, .txt, .xml, etc.
Veracity:-Data veracity tells up to which point data is precise and tells
uncertainty.
7.
8. Types of Big data
Structured Data
Unstructured Data
Semi-structured Data
10. Hadoop
Hadoop is an open source, Java-based programming framework that supports
the processing and storage of extremely large data sets in a distributed
computing environment. It is part of the Apache project sponsored by the
Apache Software Foundation.
The core of Apache Hadoop consists of a storage part, known as Hadoop
Distributed File System (HDFS), and a processing part which is a MapReduce
programming model.
13. HIVE
Hive is a data warehouse infrastructure tool to process structured data in
Hadoop(used for structure and semi structured data analysis and processing).
It resides on top of Hadoop to summarize Big Data, and makes querying and
analyzing easy.
Initially Hive was developed by Facebook, later the Apache Software
Foundation took it up and developed it further as an open source under the
name Apache Hive. It is used by different companies. For example, Amazon
Hive is not a relational database.
15. Mapreduce
MapReduce is a programming model suitable for processing of huge data.
Hadoop is capable of running MapReduce programs written in various
languages: Java, Ruby, Python, and C++. MapReduce programs are parallel in
nature, thus are very useful for performing large-scale data analysis using
multiple machines in the cluster.
Mapreduce works in two different phase.
1.Map phase
2.Reduce phase.
17. Conclusion
Data is growing day by day and there is only one way to manage such a huge
amount of data and that is BIG DATA.
Big data software’s:
Apache Hadoop
Hive
Mapreduce
Scala
Spark etc.