SlideShare a Scribd company logo
1 of 12
What is Big Data..??
Big Data is the collection of the large and complex
amount of data that it becomes difficult for traditional
database processing applications to store and
process it.
From what point onwards
big data starts..?
Assumption: After certain size, data is
said to be big data else it is small data. But, this is
not the case.
Company A
1 TB of data Client C
2 TB of data
Client B
500 GB of data
I will process your
data as my system’s
processing capability
is up to 2 TB
(terabytes)
Sorry, I cannot
process your data
because my system’s
capability for data
processing is up to
500GB only
Hey, I want
to process my
1 TB data
Company A
1 TB of data
Client C
2 TB of data
Client B
500 GB of data
Big Data can start from anywhere. It depends
upon the capability of the organization.
Data
Is
Big Data
Unable to
handle the
processing
request of
data with
size more
than 500 GB
Classification of Big Data
Big Data is classified into the concept of 5 V’s which
are helpful in determining which type of data will be
difficult for us to process and which not.
Following 5 V’s are:
 Volume
 Variety
 Velocity
 Veracity
 Value
Let us understand them one by one.
Volume
Volume refers to the amount of data
Let us understand this with a simple scenario.
At any social media platform, say Facebook, there are 5
million users. So, these users exchange pictures, share
videos, send or post messages hence generating
terabytes or petabytes of data.
With time, the number of users is expected to increase
and hence amount of data it will generate will be very
large.
Large amount of data results in the creation of large
files.
Variety
Variety of data is different types of data that is being
generated from various sources
Data can be:
• Structured Data is a type of data that is stored in the
form of any record or file. It is easy to queried, or
analysed e.g. tables
• Semi-structured Data is a type of data that is not
stored in any kind of repository like RDBMS. Rather, it
contains data that has information associated with it
e.g. XML document, Log files
• Unstructured Data is a type of data that is not
organized into any format. They can be accessed
easily e.g. photos, videos.
Velocity
Velocity refers to the speed of processing of data
It basically keeps the record of number of users per unit
of time.
More number of users ultimately results in the
generation of large amount of data thereby affecting the
speed to process the data.
Veracity
As we know that data collected from various sources
will have lots of inconsistencies and uncertainties. So, it
is obvious that when you will extract useful information
from such big amount of data, then on dumping
remaining data, there will be some data packages that
are bound to loose in the process.
What we have to do is, we have to fill in the gaps and
again mine it and process it to achieve desired goals.
Value
Value of data is meaningful information
As the amount of data is increasing with time so the
bigger problem arises which is, how to extract useful
data from this large amount of data.
What we have to do first is, we have to extract
meaningful data from the collection of data and then
some analytics has to be performed over the extracted
data.
The result obtained after analysis should be of some
value.
Extracting value out of big amount of data is itself a
challenge.
Sources of Big Data
Some of the sources of Big Data are:
• Users
• Systems
• Applications and Sensors
• Social Media
• Small-scale, mid-scale and large-scale Industries and
so on
These sources are generating large and large amount of
data with varying speeds and also with varying formats
of data. All these factors are creating challenges for
traditional database systems and hence giving the term
‘BIG DATA’
Problems with Big Data
• Storing exponentially huge datasets
• Processing the data with complex structures i.e. data
can be structured, semi-structured or unstructured
• Speed of data processing
In other words, we can conclude that big data problem
arises on the basis of 3 prime factors i.e. VOLUME,
VARIETY and VELOCITY
Solution..?
APACHE
HADOOP

More Related Content

What's hot

Class 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptxClass 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptx
tejayasam
 

What's hot (20)

What is Big Data ?
What is Big Data ?What is Big Data ?
What is Big Data ?
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
 
What is big data
What is big dataWhat is big data
What is big data
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 
Beekman5 std ppt_08
Beekman5 std ppt_08Beekman5 std ppt_08
Beekman5 std ppt_08
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Protection of big data privacy
Protection of big data privacyProtection of big data privacy
Protection of big data privacy
 
U0 vqmtq3m tc=
U0 vqmtq3m tc=U0 vqmtq3m tc=
U0 vqmtq3m tc=
 
Big data
Big dataBig data
Big data
 
Class 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptxClass 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptx
 
Datamining
DataminingDatamining
Datamining
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Big data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managersBig data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managers
 
Big Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 

Similar to Video 1 big data

Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
varun453331
 
Big Data in Practice.pdf
Big Data in Practice.pdfBig Data in Practice.pdf
Big Data in Practice.pdf
Tom Tan
 

Similar to Video 1 big data (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 
Big Data
Big DataBig Data
Big Data
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Big data
Big dataBig data
Big data
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big data explanation with real time use case
 Big data explanation with real time use case Big data explanation with real time use case
Big data explanation with real time use case
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data in Practice.pdf
Big Data in Practice.pdfBig Data in Practice.pdf
Big Data in Practice.pdf
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Recently uploaded (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 

Video 1 big data

  • 1. What is Big Data..?? Big Data is the collection of the large and complex amount of data that it becomes difficult for traditional database processing applications to store and process it. From what point onwards big data starts..? Assumption: After certain size, data is said to be big data else it is small data. But, this is not the case.
  • 2. Company A 1 TB of data Client C 2 TB of data Client B 500 GB of data I will process your data as my system’s processing capability is up to 2 TB (terabytes) Sorry, I cannot process your data because my system’s capability for data processing is up to 500GB only Hey, I want to process my 1 TB data
  • 3. Company A 1 TB of data Client C 2 TB of data Client B 500 GB of data Big Data can start from anywhere. It depends upon the capability of the organization. Data Is Big Data Unable to handle the processing request of data with size more than 500 GB
  • 4. Classification of Big Data Big Data is classified into the concept of 5 V’s which are helpful in determining which type of data will be difficult for us to process and which not. Following 5 V’s are:  Volume  Variety  Velocity  Veracity  Value Let us understand them one by one.
  • 5. Volume Volume refers to the amount of data Let us understand this with a simple scenario. At any social media platform, say Facebook, there are 5 million users. So, these users exchange pictures, share videos, send or post messages hence generating terabytes or petabytes of data. With time, the number of users is expected to increase and hence amount of data it will generate will be very large. Large amount of data results in the creation of large files.
  • 6. Variety Variety of data is different types of data that is being generated from various sources Data can be: • Structured Data is a type of data that is stored in the form of any record or file. It is easy to queried, or analysed e.g. tables • Semi-structured Data is a type of data that is not stored in any kind of repository like RDBMS. Rather, it contains data that has information associated with it e.g. XML document, Log files • Unstructured Data is a type of data that is not organized into any format. They can be accessed easily e.g. photos, videos.
  • 7. Velocity Velocity refers to the speed of processing of data It basically keeps the record of number of users per unit of time. More number of users ultimately results in the generation of large amount of data thereby affecting the speed to process the data.
  • 8. Veracity As we know that data collected from various sources will have lots of inconsistencies and uncertainties. So, it is obvious that when you will extract useful information from such big amount of data, then on dumping remaining data, there will be some data packages that are bound to loose in the process. What we have to do is, we have to fill in the gaps and again mine it and process it to achieve desired goals.
  • 9. Value Value of data is meaningful information As the amount of data is increasing with time so the bigger problem arises which is, how to extract useful data from this large amount of data. What we have to do first is, we have to extract meaningful data from the collection of data and then some analytics has to be performed over the extracted data. The result obtained after analysis should be of some value. Extracting value out of big amount of data is itself a challenge.
  • 10. Sources of Big Data Some of the sources of Big Data are: • Users • Systems • Applications and Sensors • Social Media • Small-scale, mid-scale and large-scale Industries and so on These sources are generating large and large amount of data with varying speeds and also with varying formats of data. All these factors are creating challenges for traditional database systems and hence giving the term ‘BIG DATA’
  • 11. Problems with Big Data • Storing exponentially huge datasets • Processing the data with complex structures i.e. data can be structured, semi-structured or unstructured • Speed of data processing In other words, we can conclude that big data problem arises on the basis of 3 prime factors i.e. VOLUME, VARIETY and VELOCITY Solution..?