SlideShare a Scribd company logo
20CS5904
Fundamental of Big Data
COURSE OUTCOMES
Students will be able to
CO1- Identify the characteristics and challenges of big data
analytics
CO2- Implement the Hadoop and MapReduce framework for
processing massive volume of data
CO3- Analyze the structured and semi structured data using
Hive and Pig
CO4- Implement CRUD operations effectively using MongoDB
and Report generation using Jaspersoft studio
C05- Explore the usage of Hadoop and its integration tools to
manage Big Data and Use Visualization Techniques
UNIT-1
Classification of Digital Data - Introduction to Big
Data: Characteristics – Evolution – Definition -
Challenges with Big Data - Other Characteristics of
Data - Why Big Data - Traditional Business Intelligence
versus Big Data - Data Warehouse and Hadoop
Environment.
Total Periods :9
UNIT-1
Introduction to Big Data
Today we are surrounded by data. Such large amount of
data is generated from many sources. consider the
following:
Facebook hosts more than 240 billion photos,
695,000 status updates, growing the data @ of 7
petabytes per month
In Twitter, Every second 98,000+ tweets are posted
A single Jet engine can generate 10+terabytes of
data in 30 minutes of flight time. With many
thousand flights per day, generation of data reaches
up to many Petabytes
The New York Stock Exchange generates
about 4-5 terabytes of new trade data per day
Every hour Walmart handles more than 1
million customer transactions
Everyday we create 2.5 Quintillion bytes of
data;
90% of the data in the world today has been
created in the last two years alone
Introduction to Big Data
What is Data?
• The quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of
electrical signals and recorded on magnetic, optical, or mechanical
recording media.
What is Big Data?
• Big Data is also data but with a huge size.
• Big Data is a term used to describe a collection of data that is huge in
volume and yet growing exponentially with time.
• In short such data is so large and complex that none of the traditional data
management tools are able to store it or process it efficiently.
Tabular Representation of various Memory Sizes
Name Equal To Size(In Bytes)
Bit 1 bit 1/8
Nibble 4 bits 1/2 (rare)
Byte 8 bits 1
Kilobyte 1024 bytes 1024
Megabyte 1, 024kilobytes 1, 048, 576
Gigabyte 1, 024 megabytes 1, 073, 741, 824
Terrabyte 1, 024 gigabytes 1, 099, 511, 627, 776
Petabyte 1, 024 terrabytes 1, 125, 899, 906, 842, 624
Exabyte 1, 024 petabytes 1, 152, 921, 504, 606, 846, 976
Zettabyte 1, 024 exabytes 1, 180, 591, 620, 717, 411, 303,424 Yottabyte
1, 024 zettabytes 1, 208, 925, 819, 614, 629, 174, 706, 176
Characteristics Of Big Data
1.Volume
2.Velocity
3.Variety
4.Veracity
1. Volume:
• Volume means “How much Data is generated”. Now-a-days,
Organizations or Human Beings or Systems are generating or getting
very vast amount of Data say TB(Tera Bytes) to PB(Peta Bytes) to Exa
Byte(EB) and more.
VOLUME= Very Large Amount of Data
2. Velocity:
Velocity means “How fast produce Data”. Now-a-days, Organizations
or Human Beings or Systems are generating huge amounts of Data at
very fast rate.
VELOCITY=Produce data at very fast rate
3. Variety:
Variety means “Different forms of Data”. Now-a-days, Organizations or
Human Beings or Systems are generating very huge amount of data at
very fast rate in different formats. We will discuss in details about
different formats of Data soon.
VARIETY=Produce data in different formats
4.Veracity
• Veracity means “The Quality or Correctness or Accuracy of Captured
Data”.
• Out of 4Vs, it is most important V for any Big Data Solutions.
• without Correct Information or Data, there is no use of storing large
amount of data at fast rate and different formats. That data should give
correct business value.
Types of Digital Data
1.Structured
2.Unstructured
3.Semi-structured
Structured
 Any data that can be stored, accessed and processed in the
form of fixed format is termed as a 'structured' data.
 we are foreseeing issues when a size of such data grows to a huge
extent, typical sizes are being in the range of multiple zettabytes.
• Examples Of Structured Data
An 'Employee' table in a database is an example of Structured Data
Employee_ID Employee_Name Gender Department Salary_In_lacs
2365 Rajesh Kulkarni Male Finance 650000
3398 Pratibha Joshi Female Admin 650000
7465 Shushil Roy Male Admin 500000
7500 Shubhojit Das Male Finance 500000
7699 Priya Sane Female Finance 550000
 Unstructured
 Any data with unknown form or the structure is classified as unstructured data.
 In addition to the size being huge, un-structured data poses multiple challenges in terms
of its processing for deriving value out of it.
 A typical example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc.
 Now day organizations have wealth of data available with them but unfortunately, they
don't know how to derive value out of it since this data is in its raw form or unstructured
format.
• Examples Of Un-structured Data
The output returned by 'Google Search'
Semi-structured
 Semi-structured data can contain both the forms of data.
 We can see semi-structured data as a structured in form but it is actually not
defined with e.g. a table definition in relational DBMS.
 Example of semi-structured data is a data represented in an XML file.
 Examples Of Semi-structured Data
Personal data stored in an XML file-
<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>
Challenges in Bigdata
Evolution of big data
Business Intelligence vs Big Data
• Traditional BI methodology is based on the principle of
grouping all business data into a central server.
• A Big Data solution differs in many aspects to BI to use.
These are the main differences between
Big Data and Business Intelligence:
• In a Big Data environment, information is stored on a
distributed file system, rather than on a central server.
• It is a much safer and more flexible space.
• Big Data solutions carry the processing functions to the data,
rather than the data to the functions.
• Big Data can analyze data in different formats, both structured
and unstructured.
• Big Data solutions solve them by allowing a global analysis of
various sources of information.
• Data processed by Big Data solutions can be historical or come
from real-time sources.
• Big Data technology uses parallel mass processing (MPP)
concepts, which improves the speed of analysis.
Hadoop Environment

More Related Content

Similar to Bigdata Unit1.pptx

Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
varshakumar21
 
U - 2 Emerging.pptx
U - 2 Emerging.pptxU - 2 Emerging.pptx
U - 2 Emerging.pptx
MulukenTamrat2
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
AbhijeetPandey71
 
Data and Information.docx
Data and Information.docxData and Information.docx
Data and Information.docx
swarna627082
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
Prof .Pragati Khade
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
kasthurimukila
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
IRJET Journal
 
20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptx20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptx
SyauqiAsyhabira1
 
Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
subhashchandra197
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
RUHULAMINHAZARIKA
 
Video 1 big data
Video 1 big dataVideo 1 big data
Video 1 big data
BrushUpYourSkills
 
big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptx
NATASHABANO
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
IRJET Journal
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
bobosenthil
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
SpringPeople
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
Dr Pradhan PL Pradhan
 
Class 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptxClass 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptx
tejayasam
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
berasrujana
 
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their UsageThe Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
IRJET Journal
 

Similar to Bigdata Unit1.pptx (20)

Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
U - 2 Emerging.pptx
U - 2 Emerging.pptxU - 2 Emerging.pptx
U - 2 Emerging.pptx
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
 
Data and Information.docx
Data and Information.docxData and Information.docx
Data and Information.docx
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptx20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptx
 
Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Video 1 big data
Video 1 big dataVideo 1 big data
Video 1 big data
 
big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptx
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Class 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptxClass 1 - Introduction to Big data.pptx
Class 1 - Introduction to Big data.pptx
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their UsageThe Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
 

More from ssuser92282c

unit-4-apache pig-.pdf
unit-4-apache pig-.pdfunit-4-apache pig-.pdf
unit-4-apache pig-.pdf
ssuser92282c
 
hive in bigdata.pptx
hive in bigdata.pptxhive in bigdata.pptx
hive in bigdata.pptx
ssuser92282c
 
unit-5 SPM.pptx
unit-5 SPM.pptxunit-5 SPM.pptx
unit-5 SPM.pptx
ssuser92282c
 
Planning.ppt
Planning.pptPlanning.ppt
Planning.ppt
ssuser92282c
 
UNIT 4-SPM.pptx
UNIT 4-SPM.pptxUNIT 4-SPM.pptx
UNIT 4-SPM.pptx
ssuser92282c
 
SPM-Lecture -3.ppt
SPM-Lecture -3.pptSPM-Lecture -3.ppt
SPM-Lecture -3.ppt
ssuser92282c
 
SPM-Lecture 2.pptx
SPM-Lecture 2.pptxSPM-Lecture 2.pptx
SPM-Lecture 2.pptx
ssuser92282c
 
IP UNIT III PPT.pptx
 IP UNIT III PPT.pptx IP UNIT III PPT.pptx
IP UNIT III PPT.pptx
ssuser92282c
 

More from ssuser92282c (8)

unit-4-apache pig-.pdf
unit-4-apache pig-.pdfunit-4-apache pig-.pdf
unit-4-apache pig-.pdf
 
hive in bigdata.pptx
hive in bigdata.pptxhive in bigdata.pptx
hive in bigdata.pptx
 
unit-5 SPM.pptx
unit-5 SPM.pptxunit-5 SPM.pptx
unit-5 SPM.pptx
 
Planning.ppt
Planning.pptPlanning.ppt
Planning.ppt
 
UNIT 4-SPM.pptx
UNIT 4-SPM.pptxUNIT 4-SPM.pptx
UNIT 4-SPM.pptx
 
SPM-Lecture -3.ppt
SPM-Lecture -3.pptSPM-Lecture -3.ppt
SPM-Lecture -3.ppt
 
SPM-Lecture 2.pptx
SPM-Lecture 2.pptxSPM-Lecture 2.pptx
SPM-Lecture 2.pptx
 
IP UNIT III PPT.pptx
 IP UNIT III PPT.pptx IP UNIT III PPT.pptx
IP UNIT III PPT.pptx
 

Recently uploaded

Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 

Recently uploaded (20)

Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 

Bigdata Unit1.pptx

  • 2. COURSE OUTCOMES Students will be able to CO1- Identify the characteristics and challenges of big data analytics CO2- Implement the Hadoop and MapReduce framework for processing massive volume of data CO3- Analyze the structured and semi structured data using Hive and Pig CO4- Implement CRUD operations effectively using MongoDB and Report generation using Jaspersoft studio C05- Explore the usage of Hadoop and its integration tools to manage Big Data and Use Visualization Techniques
  • 3. UNIT-1 Classification of Digital Data - Introduction to Big Data: Characteristics – Evolution – Definition - Challenges with Big Data - Other Characteristics of Data - Why Big Data - Traditional Business Intelligence versus Big Data - Data Warehouse and Hadoop Environment. Total Periods :9
  • 5. Today we are surrounded by data. Such large amount of data is generated from many sources. consider the following: Facebook hosts more than 240 billion photos, 695,000 status updates, growing the data @ of 7 petabytes per month In Twitter, Every second 98,000+ tweets are posted A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes
  • 6. The New York Stock Exchange generates about 4-5 terabytes of new trade data per day Every hour Walmart handles more than 1 million customer transactions Everyday we create 2.5 Quintillion bytes of data; 90% of the data in the world today has been created in the last two years alone
  • 7. Introduction to Big Data What is Data? • The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. What is Big Data? • Big Data is also data but with a huge size. • Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. • In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.
  • 8. Tabular Representation of various Memory Sizes Name Equal To Size(In Bytes) Bit 1 bit 1/8 Nibble 4 bits 1/2 (rare) Byte 8 bits 1 Kilobyte 1024 bytes 1024 Megabyte 1, 024kilobytes 1, 048, 576 Gigabyte 1, 024 megabytes 1, 073, 741, 824 Terrabyte 1, 024 gigabytes 1, 099, 511, 627, 776 Petabyte 1, 024 terrabytes 1, 125, 899, 906, 842, 624 Exabyte 1, 024 petabytes 1, 152, 921, 504, 606, 846, 976 Zettabyte 1, 024 exabytes 1, 180, 591, 620, 717, 411, 303,424 Yottabyte 1, 024 zettabytes 1, 208, 925, 819, 614, 629, 174, 706, 176
  • 9. Characteristics Of Big Data 1.Volume 2.Velocity 3.Variety 4.Veracity 1. Volume: • Volume means “How much Data is generated”. Now-a-days, Organizations or Human Beings or Systems are generating or getting very vast amount of Data say TB(Tera Bytes) to PB(Peta Bytes) to Exa Byte(EB) and more. VOLUME= Very Large Amount of Data
  • 10. 2. Velocity: Velocity means “How fast produce Data”. Now-a-days, Organizations or Human Beings or Systems are generating huge amounts of Data at very fast rate. VELOCITY=Produce data at very fast rate 3. Variety: Variety means “Different forms of Data”. Now-a-days, Organizations or Human Beings or Systems are generating very huge amount of data at very fast rate in different formats. We will discuss in details about different formats of Data soon. VARIETY=Produce data in different formats
  • 11. 4.Veracity • Veracity means “The Quality or Correctness or Accuracy of Captured Data”. • Out of 4Vs, it is most important V for any Big Data Solutions. • without Correct Information or Data, there is no use of storing large amount of data at fast rate and different formats. That data should give correct business value.
  • 12. Types of Digital Data 1.Structured 2.Unstructured 3.Semi-structured Structured  Any data that can be stored, accessed and processed in the form of fixed format is termed as a 'structured' data.  we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the range of multiple zettabytes.
  • 13. • Examples Of Structured Data An 'Employee' table in a database is an example of Structured Data Employee_ID Employee_Name Gender Department Salary_In_lacs 2365 Rajesh Kulkarni Male Finance 650000 3398 Pratibha Joshi Female Admin 650000 7465 Shushil Roy Male Admin 500000 7500 Shubhojit Das Male Finance 500000 7699 Priya Sane Female Finance 550000
  • 14.  Unstructured  Any data with unknown form or the structure is classified as unstructured data.  In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it.  A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc.  Now day organizations have wealth of data available with them but unfortunately, they don't know how to derive value out of it since this data is in its raw form or unstructured format. • Examples Of Un-structured Data The output returned by 'Google Search'
  • 15. Semi-structured  Semi-structured data can contain both the forms of data.  We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS.  Example of semi-structured data is a data represented in an XML file.  Examples Of Semi-structured Data Personal data stored in an XML file- <rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec> <rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec> <rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec> <rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec> <rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>
  • 18. Business Intelligence vs Big Data • Traditional BI methodology is based on the principle of grouping all business data into a central server. • A Big Data solution differs in many aspects to BI to use. These are the main differences between Big Data and Business Intelligence: • In a Big Data environment, information is stored on a distributed file system, rather than on a central server. • It is a much safer and more flexible space. • Big Data solutions carry the processing functions to the data, rather than the data to the functions.
  • 19. • Big Data can analyze data in different formats, both structured and unstructured. • Big Data solutions solve them by allowing a global analysis of various sources of information. • Data processed by Big Data solutions can be historical or come from real-time sources. • Big Data technology uses parallel mass processing (MPP) concepts, which improves the speed of analysis.