SlideShare a Scribd company logo
1 of 40
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
DECS 43A – Big Data Analysis
II Year IV Semester
Dr. S. P. Ponnusamy
Assistant Professor and Head
1
Unit -1
Introduction to Big Data
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
2
Unit -1
Introduction to Big Data
 Data
 Characteristics of data
 Types of digital data: Unstructured, Semi-structured and Structured,
 Sources of data
 Working with unstructured data
 Evolution and Definition of big data
 Characteristics and Need of big data
 Challenges of big data
 Data environment versus big data environment
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Data
3
• The quantities, characters, or symbols on which operations are performed by a
computer, which may be stored and transmitted in the form of electrical signals and
recorded on magnetic, optical, or mechanical recording media
• Big Data is a collection of data that is huge in volume, yet growing exponentially
with time.
• It is a data with so large size and complexity that none of traditional data
management tools can store it or process it efficiently.
• It includes data mining, data storage, data analysis, data sharing, and data
visualization.
Big Data
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Data vs Big Data
4
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Data Growth
5
• 1,024 bytes = 1 kilobyte (KB).
• 1,024 kilobytes (KB) = 1 MB.
• 1,024 MB = 1 GB.
• 1,024 GB = 1 TB
• 1,024 TB = 1 petabyte (PB).
• 1,024 PB = an exabyte (EB).
• 1,024 EB = a zettabyte (ZB)
• 1,024 ZB = 1 YB (Yottabyte).
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Data Growth
6
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of Data
7
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Structured Data
8
• This is the data which is in an organized form (e.g., in rows and columns) and can be
easily used by a computer program.
• Relationships exist between entities of data, such as classes and their objects.
• Data stored in databases is an example of structured data.
• Structured data is also called relational data.
• It is split into multiple tables to enhance the integrity of the data by creating a
single record to depict an entity.
• A Structured Query Language (SQL) is needed to bring the data together.
• Structured data is easy to enter, query, and analyze.
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Structured Data
9
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Structured Data - Sources
10
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Ease with Structured Data
11
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Semi-Structured Data
12
• This is the data which does not conform to a data model but has some structure.
• However, it is not in a form which can be used easily by a computer program.
• Example, emails, XML, markup languages like HTML, JSON document, etc.
• Metadata for this data is available but is not sufficient.
• It is commonly called NoSQL data
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Semi-Structured Data - Sources
13
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Semi-Structured Data –XML Example
14
<ProgrammerDetails>
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<CodingPlatforms>
<CodingPlatform Type="Fav">GeeksforGeeks</CodingPlatform>
<CodingPlatform Type="2ndFav">Code4Eva!</CodingPlatform>
<CodingPlatform Type="3rdFav">CodeisLife</CodingPlatform>
</CodingPlatforms>
</ProgrammerDetails>
<!--The 2ndFav and 3rdFav Coding Platforms are imaginative because Geeksforgeeks is
the best!-->
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Semi Structured Data – JSON Example
15
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Semi-Structured Data
16
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Unstructured Data
17
• This is the data which does not conform to a data model or is not in a form which can be
used easily by a computer program.
• Data can not be stored in the form of rows and columns as in Databases
• Data does not follows any semantic or rules
• Data lacks any particular format or sequence
• Data has no easily identifiable structure
• Due to lack of identifiable structure, it can not used by computer programs easily
• About 80–90% data of an organization is in this format.
• Example: memos, chat rooms, PowerPoint presentations, images, videos, letters, researches,
white papers, body of an email, etc.
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Unstructured Data – Example
18
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Unstructured Data – Sources
19
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Unstructured Data – issues
20
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Dealing with Unstructured Data
21
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Definition of Big Data
22
Big Data is high-volume, high-
velocity, and high-variety
information assets that demand
cost effective, innovative forms
of information processing for
enhanced insight and decision
making.
Source: Gartner IT Glossary
High-volume
High-velocity
High-variety
Cost-effective, innovative forms of
information processing
Enhanced insight & decision
making
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Data
23
1. Composition: The composition of data deals with the structure of data, that is,
the sources of data, the granularity, the types, and the nature of data as to
whether it is static or real-time streaming.
2. Condition: The condition of data deals with the state of data, that is, "Can one
use this data as is for analysis?" or "Does it require cleansing for further
enhancement and enrichment?"
3. Context: The context of data deals with "Where has this data been generated?"
"Why was this data generated?" How sensitive is this data?" "What are the events
associated with this data?" and so on.
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Evolution of Big Data
24
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Evolution of Big Data
25
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Evolution of Big Data
26
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Evolution of Big Data
27
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Why of Big Data?
28
More Data
More Accurate Analysis
More Confidence in decision making
Greater operational efficiencies, Cost reduction,
Time reduction, New product development, Optimized offerings, etc.
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Need of Big Data
29
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Big Data/What is Big Data?
30
• Volume: the size and amounts of big data that companies manage and
analyse.
• Variety: the diversity and range of different data types, including
unstructured data, semi-structured data and structured data.
• Velocity: the speed at which companies receive, store and manage data
– e.g., the specific number of social media posts or search queries
received within a day, hour or other unit of time.
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Big Data
31
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Big Data
32
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Big Data – other V’s
33
• Value: refers to the value that big data can
provide, and it relates directly to what
organizations can do with that collected
data.
• Veracity: the “truth” or accuracy of data
and information assets, which often
determines executive-level confidence
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Challenges of Big Data
34
Challenges with Big Data
Capture
Storage
Curation
Search
Transfer
Visualization
Privacy
Violations
Analysis
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Sources of Big Data
35
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Traditional Business Intelligence (BI) versus Big Data
36
• In traditional BI environment, data resides in a central server whereas
in big data environment, data resides in a distributed file system.
• Traditional BI  Move data to code
• Big Data Environment  Move code to data
• In traditional BI environment, data is analyzed in offline mode
whereas in big data environment data is analyzed in both real time as
well as offline mode.
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
A Typical Data Warehouse Environment
37
Data Warehouse
ERP
CRM
Legacy
3rd party Apps
Reporting /
Dashboarding
OLAP
Ad hoc querying
Modeling
• In a typical DW environment, data is collected from multiple disparate sources,
integrated, cleansed and transformed before loading it to a data warehouse.
• A host of market leading BI tools can then be used on top of the data warehouse for
reporting/dashboarding, ad hoc querying and modelling.
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
A Typical Hadoop Environment
38
Hadoop takes care of storage and processing using the following:
a)HDFS (Hadoop Distributed File System) (distributed storage)
b)MapReduce (distributed processing)
ODS-operational Data store
Web Logs
Images and Videos
Social Media
(Twitter, Facebook, etc.)
Docs & PDFs
HDFS
Operational
Systems
Data Warehouse
Data Marts
ODS
Hadoop
MapReduce
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Co-existence of Big Data and Data Warehouse
39
Web Logs
Images and Videos
Social Media
(Twitter, Facebook, etc.)
Docs & PDFs
HDFS
Operational
Systems
Data Warehouse
Data Marts
ODS
Hadoop
MapReduce
Data Warehouse
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
End
40

More Related Content

Similar to UNIT_1-BD.pptx

sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxcalf_ville86
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAkshata Humbe
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013oj08
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective ApproachIRJET Journal
 
BigData and Beyond
BigData and BeyondBigData and Beyond
BigData and BeyondJohn Avery
 
A Deep Dissertion Of Data Science Related Issues And Its Applications
A Deep Dissertion Of Data Science  Related Issues And Its ApplicationsA Deep Dissertion Of Data Science  Related Issues And Its Applications
A Deep Dissertion Of Data Science Related Issues And Its ApplicationsTracy Hill
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargShiv Shakti Ghosh
 
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...IRJET Journal
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptxarpit206900
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal1
 
Research in Big Data - An Overview
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overviewieijjournal
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 

Similar to UNIT_1-BD.pptx (20)

sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptx
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
BigData and Beyond
BigData and BeyondBigData and Beyond
BigData and Beyond
 
A Deep Dissertion Of Data Science Related Issues And Its Applications
A Deep Dissertion Of Data Science  Related Issues And Its ApplicationsA Deep Dissertion Of Data Science  Related Issues And Its Applications
A Deep Dissertion Of Data Science Related Issues And Its Applications
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
 
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
 
Big Data Analytics (1).ppt
Big Data Analytics (1).pptBig Data Analytics (1).ppt
Big Data Analytics (1).ppt
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
RESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
 
Research in Big Data - An Overview
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overview
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 

UNIT_1-BD.pptx

  • 1. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science DECS 43A – Big Data Analysis II Year IV Semester Dr. S. P. Ponnusamy Assistant Professor and Head 1 Unit -1 Introduction to Big Data
  • 2. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science 2 Unit -1 Introduction to Big Data  Data  Characteristics of data  Types of digital data: Unstructured, Semi-structured and Structured,  Sources of data  Working with unstructured data  Evolution and Definition of big data  Characteristics and Need of big data  Challenges of big data  Data environment versus big data environment
  • 3. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Data 3 • The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media • Big Data is a collection of data that is huge in volume, yet growing exponentially with time. • It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. • It includes data mining, data storage, data analysis, data sharing, and data visualization. Big Data
  • 4. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Data vs Big Data 4
  • 5. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Data Growth 5 • 1,024 bytes = 1 kilobyte (KB). • 1,024 kilobytes (KB) = 1 MB. • 1,024 MB = 1 GB. • 1,024 GB = 1 TB • 1,024 TB = 1 petabyte (PB). • 1,024 PB = an exabyte (EB). • 1,024 EB = a zettabyte (ZB) • 1,024 ZB = 1 YB (Yottabyte).
  • 6. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Data Growth 6
  • 7. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Types of Data 7
  • 8. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Structured Data 8 • This is the data which is in an organized form (e.g., in rows and columns) and can be easily used by a computer program. • Relationships exist between entities of data, such as classes and their objects. • Data stored in databases is an example of structured data. • Structured data is also called relational data. • It is split into multiple tables to enhance the integrity of the data by creating a single record to depict an entity. • A Structured Query Language (SQL) is needed to bring the data together. • Structured data is easy to enter, query, and analyze.
  • 9. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Structured Data 9
  • 10. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Structured Data - Sources 10
  • 11. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Ease with Structured Data 11
  • 12. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Semi-Structured Data 12 • This is the data which does not conform to a data model but has some structure. • However, it is not in a form which can be used easily by a computer program. • Example, emails, XML, markup languages like HTML, JSON document, etc. • Metadata for this data is available but is not sufficient. • It is commonly called NoSQL data
  • 13. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Semi-Structured Data - Sources 13
  • 14. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Semi-Structured Data –XML Example 14 <ProgrammerDetails> <FirstName>Jane</FirstName> <LastName>Doe</LastName> <CodingPlatforms> <CodingPlatform Type="Fav">GeeksforGeeks</CodingPlatform> <CodingPlatform Type="2ndFav">Code4Eva!</CodingPlatform> <CodingPlatform Type="3rdFav">CodeisLife</CodingPlatform> </CodingPlatforms> </ProgrammerDetails> <!--The 2ndFav and 3rdFav Coding Platforms are imaginative because Geeksforgeeks is the best!-->
  • 15. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Semi Structured Data – JSON Example 15
  • 16. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Characteristics of Semi-Structured Data 16
  • 17. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Unstructured Data 17 • This is the data which does not conform to a data model or is not in a form which can be used easily by a computer program. • Data can not be stored in the form of rows and columns as in Databases • Data does not follows any semantic or rules • Data lacks any particular format or sequence • Data has no easily identifiable structure • Due to lack of identifiable structure, it can not used by computer programs easily • About 80–90% data of an organization is in this format. • Example: memos, chat rooms, PowerPoint presentations, images, videos, letters, researches, white papers, body of an email, etc.
  • 18. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Unstructured Data – Example 18
  • 19. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Unstructured Data – Sources 19
  • 20. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Unstructured Data – issues 20
  • 21. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Dealing with Unstructured Data 21
  • 22. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Definition of Big Data 22 Big Data is high-volume, high- velocity, and high-variety information assets that demand cost effective, innovative forms of information processing for enhanced insight and decision making. Source: Gartner IT Glossary High-volume High-velocity High-variety Cost-effective, innovative forms of information processing Enhanced insight & decision making
  • 23. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Characteristics of Data 23 1. Composition: The composition of data deals with the structure of data, that is, the sources of data, the granularity, the types, and the nature of data as to whether it is static or real-time streaming. 2. Condition: The condition of data deals with the state of data, that is, "Can one use this data as is for analysis?" or "Does it require cleansing for further enhancement and enrichment?" 3. Context: The context of data deals with "Where has this data been generated?" "Why was this data generated?" How sensitive is this data?" "What are the events associated with this data?" and so on.
  • 24. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Evolution of Big Data 24
  • 25. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Evolution of Big Data 25
  • 26. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Evolution of Big Data 26
  • 27. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Evolution of Big Data 27
  • 28. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Why of Big Data? 28 More Data More Accurate Analysis More Confidence in decision making Greater operational efficiencies, Cost reduction, Time reduction, New product development, Optimized offerings, etc.
  • 29. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Need of Big Data 29
  • 30. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Characteristics of Big Data/What is Big Data? 30 • Volume: the size and amounts of big data that companies manage and analyse. • Variety: the diversity and range of different data types, including unstructured data, semi-structured data and structured data. • Velocity: the speed at which companies receive, store and manage data – e.g., the specific number of social media posts or search queries received within a day, hour or other unit of time.
  • 31. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Characteristics of Big Data 31
  • 32. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Characteristics of Big Data 32
  • 33. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Characteristics of Big Data – other V’s 33 • Value: refers to the value that big data can provide, and it relates directly to what organizations can do with that collected data. • Veracity: the “truth” or accuracy of data and information assets, which often determines executive-level confidence
  • 34. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Challenges of Big Data 34 Challenges with Big Data Capture Storage Curation Search Transfer Visualization Privacy Violations Analysis
  • 35. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Sources of Big Data 35
  • 36. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Traditional Business Intelligence (BI) versus Big Data 36 • In traditional BI environment, data resides in a central server whereas in big data environment, data resides in a distributed file system. • Traditional BI  Move data to code • Big Data Environment  Move code to data • In traditional BI environment, data is analyzed in offline mode whereas in big data environment data is analyzed in both real time as well as offline mode.
  • 37. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science A Typical Data Warehouse Environment 37 Data Warehouse ERP CRM Legacy 3rd party Apps Reporting / Dashboarding OLAP Ad hoc querying Modeling • In a typical DW environment, data is collected from multiple disparate sources, integrated, cleansed and transformed before loading it to a data warehouse. • A host of market leading BI tools can then be used on top of the data warehouse for reporting/dashboarding, ad hoc querying and modelling.
  • 38. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science A Typical Hadoop Environment 38 Hadoop takes care of storage and processing using the following: a)HDFS (Hadoop Distributed File System) (distributed storage) b)MapReduce (distributed processing) ODS-operational Data store Web Logs Images and Videos Social Media (Twitter, Facebook, etc.) Docs & PDFs HDFS Operational Systems Data Warehouse Data Marts ODS Hadoop MapReduce
  • 39. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science Co-existence of Big Data and Data Warehouse 39 Web Logs Images and Videos Social Media (Twitter, Facebook, etc.) Docs & PDFs HDFS Operational Systems Data Warehouse Data Marts ODS Hadoop MapReduce Data Warehouse
  • 40. Big Data Analytics Government Arts and Science College Tittagudi-606106 Department of Computer Science End 40