Presentation on Big Data
Presented by:-
Takrim Ul Islam Laskar(120103006)
Anurag Prasad(120103024)
CONTENTS
• 1> What is Big Data?
• 2> Why Big Data?
• 3> Who are Generating Big Data?
• 4> Characteristics of Big Data.
• 5> What Technology Do We Have For Big Data ?
Introduction
• What is big data?
Big data is an all-encompassing term for any collection of data
sets so large and complex that it becomes difficult to process
using on-hand data management tools or traditional data
processing applications.
 Big data is defined as any kind of data source that has at least three
shared characteristics:
✓ Extremely large Volumes of data
✓ Extremely high Velocity of data
✓ Extremely wide Variety of data.
Why Big Data?
When we are dealing with so much information in so many
different forms, it is impossible to think about data management
in traditional ways. That is when the opportunity and challenges
of BIG DATA arises.
Who’s Generating Big Data?
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
Life cycle of BIG DATA Management
capture
organize
integrate
act
analyze
Characteristics of Big Data
 1. Scale (Volume)
• Data Volume
– 44x increase from 2009 to 2020
– From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
From the beginning of recorded time until 2003,
We created 5 billion gigabytes ( Exabyte ) of data.
In 2011, the same amount was created every two days.
In 2013, the same amount of data is created every 10 minutes.
2. Varity
• Various formats, types, and structures
• Text, numerical, images, audio, video,
sequences, time series, social media
data, multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be
generating/collecting many types of
data
3. Velocity
• Data is begin generated fast and need to be processed
fast
• Online Data Analytics
• Late decisions  missing opportunities
What Technology Do We Have For Big Data ?
 HDFS ( Hadoop Distributed File System)
The Hadoop Distributed File System (HDFS) is the primary storage system used
by Hadoop applications.
Hadoop is an open-source software framework for storage and large-scale
processing of data-sets on clusters of commodity hardware.
 Map/Reduce Program
MapReduce was designed by Google. It is a framework for writing/executing
distributed, fault tolerant algorithms functions map which divides a large problem
into smaller problems and then performs the same function on all smaller
problems and reduce which then combines the results.
 Sqoop (SQL-to-HADOOP)
Sqoop is a command-line interface application for transferring data between
relational databases and Hadoop.
 Hive & Pig
Hive was created by Facebook and is SQL-like, while Pig was created by Yahoo and
is more procedural; both target MapReduce jobs. However due to the complexity of
MapReduce, HiveQL was created to combine the best features of SQL with MapReduce.
TOPIC FOR NEXT SEMINAR
1. Technology Used In Big Data
2. Big Data Architecture
3. Big Data Management
Refferences :
1. Youtube Lecture video on chennal ‘ Training on Big Data and
Hadoop ’ By User ‘Edureka’.
2. ‘White Book Of Big Data’ By ‘Fujistu’ .
3. ‘Big Data For Dummies’ by ‘A Wiley Brand’ .
4. Research paper by ‘Kalapriya Kannan’ in ‘IBM Research Labs’.
THANK YOU.
We appreciate your patience.

Big data(1st presentation)

  • 1.
    Presentation on BigData Presented by:- Takrim Ul Islam Laskar(120103006) Anurag Prasad(120103024)
  • 2.
    CONTENTS • 1> Whatis Big Data? • 2> Why Big Data? • 3> Who are Generating Big Data? • 4> Characteristics of Big Data. • 5> What Technology Do We Have For Big Data ?
  • 3.
    Introduction • What isbig data? Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.
  • 4.
     Big datais defined as any kind of data source that has at least three shared characteristics: ✓ Extremely large Volumes of data ✓ Extremely high Velocity of data ✓ Extremely wide Variety of data.
  • 5.
  • 6.
    When we aredealing with so much information in so many different forms, it is impossible to think about data management in traditional ways. That is when the opportunity and challenges of BIG DATA arises.
  • 7.
  • 8.
    Social media andnetworks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data)
  • 9.
    Life cycle ofBIG DATA Management capture organize integrate act analyze
  • 10.
  • 11.
     1. Scale(Volume) • Data Volume – 44x increase from 2009 to 2020 – From 0.8 zettabytes to 35zb • Data volume is increasing exponentially
  • 12.
    From the beginningof recorded time until 2003, We created 5 billion gigabytes ( Exabyte ) of data. In 2011, the same amount was created every two days. In 2013, the same amount of data is created every 10 minutes.
  • 13.
    2. Varity • Variousformats, types, and structures • Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc… • Static data vs. streaming data • A single application can be generating/collecting many types of data
  • 14.
    3. Velocity • Datais begin generated fast and need to be processed fast • Online Data Analytics • Late decisions  missing opportunities
  • 15.
    What Technology DoWe Have For Big Data ?
  • 16.
     HDFS (Hadoop Distributed File System) The Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware.
  • 17.
     Map/Reduce Program MapReducewas designed by Google. It is a framework for writing/executing distributed, fault tolerant algorithms functions map which divides a large problem into smaller problems and then performs the same function on all smaller problems and reduce which then combines the results.
  • 18.
     Sqoop (SQL-to-HADOOP) Sqoopis a command-line interface application for transferring data between relational databases and Hadoop.  Hive & Pig Hive was created by Facebook and is SQL-like, while Pig was created by Yahoo and is more procedural; both target MapReduce jobs. However due to the complexity of MapReduce, HiveQL was created to combine the best features of SQL with MapReduce.
  • 19.
    TOPIC FOR NEXTSEMINAR 1. Technology Used In Big Data 2. Big Data Architecture 3. Big Data Management
  • 20.
    Refferences : 1. YoutubeLecture video on chennal ‘ Training on Big Data and Hadoop ’ By User ‘Edureka’. 2. ‘White Book Of Big Data’ By ‘Fujistu’ . 3. ‘Big Data For Dummies’ by ‘A Wiley Brand’ . 4. Research paper by ‘Kalapriya Kannan’ in ‘IBM Research Labs’.
  • 21.