1. Presentation on Big Data
Presented by:-
Takrim Ul Islam Laskar(120103006)
Anurag Prasad(120103024)
2. CONTENTS
• 1> What is Big Data?
• 2> Why Big Data?
• 3> Who are Generating Big Data?
• 4> Characteristics of Big Data.
• 5> What Technology Do We Have For Big Data ?
3. Introduction
• What is big data?
Big data is an all-encompassing term for any collection of data
sets so large and complex that it becomes difficult to process
using on-hand data management tools or traditional data
processing applications.
4. Big data is defined as any kind of data source that has at least three
shared characteristics:
✓ Extremely large Volumes of data
✓ Extremely high Velocity of data
✓ Extremely wide Variety of data.
6. When we are dealing with so much information in so many
different forms, it is impossible to think about data management
in traditional ways. That is when the opportunity and challenges
of BIG DATA arises.
8. Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
9. Life cycle of BIG DATA Management
capture
organize
integrate
act
analyze
11. 1. Scale (Volume)
• Data Volume
– 44x increase from 2009 to 2020
– From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
12. From the beginning of recorded time until 2003,
We created 5 billion gigabytes ( Exabyte ) of data.
In 2011, the same amount was created every two days.
In 2013, the same amount of data is created every 10 minutes.
13. 2. Varity
• Various formats, types, and structures
• Text, numerical, images, audio, video,
sequences, time series, social media
data, multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be
generating/collecting many types of
data
14. 3. Velocity
• Data is begin generated fast and need to be processed
fast
• Online Data Analytics
• Late decisions missing opportunities
16. HDFS ( Hadoop Distributed File System)
The Hadoop Distributed File System (HDFS) is the primary storage system used
by Hadoop applications.
Hadoop is an open-source software framework for storage and large-scale
processing of data-sets on clusters of commodity hardware.
17. Map/Reduce Program
MapReduce was designed by Google. It is a framework for writing/executing
distributed, fault tolerant algorithms functions map which divides a large problem
into smaller problems and then performs the same function on all smaller
problems and reduce which then combines the results.
18. Sqoop (SQL-to-HADOOP)
Sqoop is a command-line interface application for transferring data between
relational databases and Hadoop.
Hive & Pig
Hive was created by Facebook and is SQL-like, while Pig was created by Yahoo and
is more procedural; both target MapReduce jobs. However due to the complexity of
MapReduce, HiveQL was created to combine the best features of SQL with MapReduce.
19. TOPIC FOR NEXT SEMINAR
1. Technology Used In Big Data
2. Big Data Architecture
3. Big Data Management
20. Refferences :
1. Youtube Lecture video on chennal ‘ Training on Big Data and
Hadoop ’ By User ‘Edureka’.
2. ‘White Book Of Big Data’ By ‘Fujistu’ .
3. ‘Big Data For Dummies’ by ‘A Wiley Brand’ .
4. Research paper by ‘Kalapriya Kannan’ in ‘IBM Research Labs’.