SlideShare a Scribd company logo
1 of 37
Data Analytics
A Lecture Series
on
Dr.Chitra A.Dhawale
P.R.Pote College of Engg.and Mgmt.
Data Analytics (MCA19304)
COURSE OUTCOMES
AT THE END OF COURSE THE STUDENT SHOULD BE ABLE TO :
1. DEVELOP AND MAINTAIN RELIABLE, SCALABLE SYSTEMS USING
APACHE, HADOOP
2. WRITE MAP REDUCE BASED APPLICATION
3. DIFFERENTIATE BETWEEN CONVENTIONAL SQL AND NOSQL
4. ANALYZE AND DEVELOP BIG DATA SOLUTIONS USING HIVE AND PIG
Data Analytics (MCA19304)
UNIT I
• DISTRIBUTED FILE SYSTEM AND ITS ISSUES
• INTRODUCTION TO BIG DATA,
• BIG DATA CHARACTERISTICS
• TYPES OF BIG DATA
• TRADITIONAL VS. BIG DATA APPROACH
• BIG DATA APPLICATIONS
Distributed file system and its issues
Distributed file system and its issues
• A single machine with 4 Hard disks with 1 tb of data (I/O Channel), with 100
Mbps speed. For Processing needs 45 mins.
• For Faster Processing :
• Divide data and store it on multiple machines with same configuration as above
– Assume all machines are processing data in parallel manner then , It will take
45/5 = 9 mins for processing.
• Processing will be 5 times faster than a single machine.
Distributed file system and its issues
Distributed file system and its issues
Each machine have its own local file system (physical file system ) where you store data i.e create folders and
subfolders and so on.
Distributed file system is not physical, it is virtual or logical file system.
Hadoop used DFS.
Install libraries on every machine running as a separate process in different machines.
These are creating virtual layer over the physical file system under it.
This virtual layer is called distributed file system
Distributed File System
Distributed file system and its issues
• Virtual File System is a software i.e set of programs—obviously….Set of
commands
• Ex. Dfs -copy source file destination file
• Dfs -copy file1 file 2
• It read file1 which is distributed on 5 machines say ( A,B,C,D,E ), user having
no idea about it….Where each part of file is ?
• ( path is virtual path) nowhere it is existing.
• Any dfs follows master slave architecture
DFS
Master Machine
Slave Machines
 Upper Machine is Master Machine and Lower 5
are Slave ones.
 Data is splitted and stored on slave machines.
 Master does not store any data. It only stores
metadata.
 Master Machine only know (as File is divided into
blocks (File to Block Mapping and blocks are
distributed on slave machines i.e Block to Slave
mapping)
 Data can only be accessed via Master as Only
Master know the actual location of data on each
slave.
HDFS
• While reading data, if any of the node failure then client may get partial data.
• To overcome this at the time of configuring HDFS, replication factor is set i.e if
replication factor = 2 , it means every block is replicated ( copied at two places) i.e 2
copies are maintained for each block.
• In case of failure of one node, block can be accessed from another node. Data is
transmitted to machine (Server) where program is running.
.
Features of DFS
Transparency :
 Structure transparency –
There is no need for the client to know about the number or locations of file servers and the
storage devices.
 Access transparency –
Both local and remote files should be accessible in the same manner.
 Naming transparency –
Once a name is given to the file, it should not be changed during transferring from one
node to another.
Features of DFS
• Replication transparency –
If a file is copied on multiple nodes, both the copies of the file and their
locations should be hidden from one node to another.
 User mobility :
It will automatically bring the user’s home directory to the node where the
user logs in.
• Performance :
Performance is based on the average amount of time needed to convince the
client requests.
• This time covers the CPU time + time taken to access secondary storage +
network access time.
Features of DFS
 Simplicity and ease of use :
The user interface of a file system should be simple and the number of commands in the file should be
small.
 High availability :
A Distributed File System should be able to continue in case of any partial failures like a link failure, a
node failure, or a storage drive crash.
A high authentic and adaptable distributed file system should have different and independent file servers
for controlling different and independent storage devices.
 Scalability :
Since growing the network by adding new machines or joining two networks together is routine, the
distributed system will inevitably grow over time. As a result, a good distributed file system should be
built to scale quickly as the number of nodes and users in the system grows. Service should not be
substantially disrupted as the number of nodes and users grows.
Features of DFS
 High reliability :
A file system should create backup copies of key files that can be used if the originals are lost.
Many file systems employ stable storage as a high-reliability strategy.
 Data integrity :
 Multiple users frequently share a file system.
 The integrity of data saved in a shared file must be guaranteed by the file system.
 That is, concurrent access requests from many users who are competing for access to the
same file must be correctly synchronized using a concurrency control method.
 Atomic transactions are a high-level concurrency management mechanism for data
integrity that is frequently offered to users by a file system.
Features of DFS
 Security :
Users of heterogeneous distributed systems have the option of using multiple computer
platforms for different purposes.
 Heterogeneity :
 To safeguard the information contained in the file system from unwanted & unauthorized
access, security mechanisms must be implemented.
 A distributed file system should be secure so that its users may trust that their data will be
kept private.
Issues with DFS
 In Distributed File System nodes and connections needs to be secured therefore
we can say that security is at stake.
 There is a possibility of lose of messages and data in the network while movement
from one node to another.
 Database connection in case of Distributed File System is complicated.
 Also handling of the database is not easy in Distributed File System as compared
to a single user system.
 There are chances that overloading will take place if all nodes tries to send data
at once.do with the local
Factors- Big Data Generation
Evolution of Technology
Factors- Big Data Generation
IOT
Factors- Big Data Generation
Social Media
Factors- Big Data Generation
Others
What is Big Data?
Characteristics – Big Data
FIVE V’S OF BIG DATA : 1 . VOLUME
Characteristics – Big Data
FIVE V’S OF BIG DATA : 2. VARIETY
Characteristics – Big Data
FIVE V’S OF BIG DATA : 3 . VELOCITY
Characteristics – Big Data
FIVE V’S OF BIG DATA : 4. VALUE
Characteristics – Big Data
FIVE V’S OF BIG DATA : 4. VERACITY
Characteristics of Big Data at a glance
Types of Big Data
Types of Big Data
• Structured
The structured data includes all the data that can be stored in a tabular column.
Relational databases are examples of structured data.
It is easy to make sense of the relational databases.
Most of the modern computers are able to make sense of structured data.
Types of Big Data
Unstructured
• Unstructured data refers to the data that lacks any specific form or structure whatsoever.
• The unstructured data is the one that cannot be stored in a spreadsheet;
• Unstructured data, on the other hand, is the one which cannot be fit into tabular databases.
• Examples of unstructured data include audio, video, and other sorts of data which comprise such a big chunk
of the big data today. Email is an example of unstructured data.
Types of Big Data
Semi-structured
• The semi-structured data includes both structured and unstructured data.
• This type of data sets include a proper structure, but still it might not be possible
to sort or process that data due to some constraints.
• This type of data includes the XML data, JSON files, and others.
Traditional Vs. Big Data
• 1.Traditional data
• Traditional data is the structured data which is being majorly maintained by all types of businesses
starting from very small to big organizations.
• In traditional database system a centralized database architecture used to store and maintain the data in
a fixed format or fields in a file.
• For managing and accessing the data structured query language (SQL) is used.
• 2. Big data :
Big data deal with too large or complex data sets which is difficult to manage in traditional data-processing
application software.
• It deals with large volume of both structured, semi structured and unstructured data. Volume, velocity and
variety, veracity and value.
• Big data not only refers to large amount of data it refers to extracting meaningful data by analyzing the
huge amount of complex data sets.
S.No. TRADITIONAL DATA BIG DATA
01. Traditional data is generated in enterprise level. Big data is generated in outside and enterprise level.
02. Its volume ranges from Gigabytes to Terabytes. Its volume ranges from Petabytes to Zettabytes or Exabytes.
03. Traditional database system deals with structured data. Big data system deals with structured, semi structured and unstructured data.
04. Traditional data is generated per hour or per day or more. But big data is generated more frequently mainly per seconds.
05.
Traditional data source is centralized and it is managed in centralized
form. Big data source is distributed and it is managed in distributed form.
06. Data integration is very easy. Data integration is very difficult.
07. Normal system configuration is capable to process traditional data. High system configuration is required to process big data.
08. The size of the data is very small. The size is more than the traditional data size.
09.
Traditional data base tools are required to perform any data base
operation. Special kind of data base tools are required to perform any data base operation.
10. Normal functions can manipulate data. Special kind of functions can manipulate data.
11. Its data model is strict schema based and it is static. Its data model is flat schema based and it is dynamic.
12.. Traditional data is stable and inter relationship. Big data is not stable and unknown relationship.
13. Traditional data is in manageable volume. Big data is in huge volume which becomes unmanageable.
14. It is easy to manage and manipulate the data. It is difficult to manage and manipulate the data.
15.
Its data sources includes ERP transaction data, CRM transaction data,
financial data, organizational data, web transaction data etc. Its data sources includes social media, device data, sensor data, video, images, audio etc.
Applications of Big Data
•Big data in retail
•Big data in healthcare
•Big data in education
•Big data in e-commerce
•Big data in media and entertainment
•Big data in finance
•Big data in travel industry
•Big data in telecom
•Big data in automobile
Applications of Big Data

More Related Content

What's hot

Cp 121 lecture 01
Cp 121 lecture 01Cp 121 lecture 01
Cp 121 lecture 01ITNet
 
Introduction & history of dbms
Introduction & history of dbmsIntroduction & history of dbms
Introduction & history of dbmssethu pm
 
Trends in the Database
Trends in the DatabaseTrends in the Database
Trends in the DatabaseMarlon Jamera
 
Chapter 01 Fundamental of Database Management System (DBMS)
Chapter 01  Fundamental of Database Management System (DBMS)Chapter 01  Fundamental of Database Management System (DBMS)
Chapter 01 Fundamental of Database Management System (DBMS)Abdurehman Mahmud
 
CS3270 - DATABASE SYSTEM - Lecture (1)
CS3270 - DATABASE SYSTEM -  Lecture (1)CS3270 - DATABASE SYSTEM -  Lecture (1)
CS3270 - DATABASE SYSTEM - Lecture (1)Dilawar Khan
 
Database assignment
Database assignmentDatabase assignment
Database assignmentHudiKhatib
 
Database Management Systems - Management Information System
Database Management Systems - Management Information SystemDatabase Management Systems - Management Information System
Database Management Systems - Management Information SystemNijaz N
 
1 introduction ddbms
1 introduction ddbms1 introduction ddbms
1 introduction ddbmsamna izzat
 
SULTHAN's ICT-2 for UG courses
SULTHAN's ICT-2 for UG coursesSULTHAN's ICT-2 for UG courses
SULTHAN's ICT-2 for UG coursesSULTHAN BASHA
 
Distributed web based systems
Distributed web based systemsDistributed web based systems
Distributed web based systemsReza Gh
 
Emerging database technology multimedia database
Emerging database technology   multimedia databaseEmerging database technology   multimedia database
Emerging database technology multimedia databaseSalama Al Busaidi
 
Lecture 3 multimedia databases
Lecture 3   multimedia databasesLecture 3   multimedia databases
Lecture 3 multimedia databasesRanjana N Jinde
 
Trends in Database Management
Trends in Database ManagementTrends in Database Management
Trends in Database ManagementMarlon Jamera
 
Multimedia Database
Multimedia DatabaseMultimedia Database
Multimedia Databaseshaikh2016
 
DBMS FOR STUDENTS MUST DOWNLOAD AND READ
DBMS FOR STUDENTS MUST DOWNLOAD AND READDBMS FOR STUDENTS MUST DOWNLOAD AND READ
DBMS FOR STUDENTS MUST DOWNLOAD AND READamitp26
 

What's hot (20)

Cp 121 lecture 01
Cp 121 lecture 01Cp 121 lecture 01
Cp 121 lecture 01
 
Introduction & history of dbms
Introduction & history of dbmsIntroduction & history of dbms
Introduction & history of dbms
 
Distributed Database
Distributed DatabaseDistributed Database
Distributed Database
 
Trends in the Database
Trends in the DatabaseTrends in the Database
Trends in the Database
 
Chapter 01 Fundamental of Database Management System (DBMS)
Chapter 01  Fundamental of Database Management System (DBMS)Chapter 01  Fundamental of Database Management System (DBMS)
Chapter 01 Fundamental of Database Management System (DBMS)
 
CS3270 - DATABASE SYSTEM - Lecture (1)
CS3270 - DATABASE SYSTEM -  Lecture (1)CS3270 - DATABASE SYSTEM -  Lecture (1)
CS3270 - DATABASE SYSTEM - Lecture (1)
 
Database assignment
Database assignmentDatabase assignment
Database assignment
 
Database Management Systems - Management Information System
Database Management Systems - Management Information SystemDatabase Management Systems - Management Information System
Database Management Systems - Management Information System
 
1 introduction ddbms
1 introduction ddbms1 introduction ddbms
1 introduction ddbms
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
SULTHAN's ICT-2 for UG courses
SULTHAN's ICT-2 for UG coursesSULTHAN's ICT-2 for UG courses
SULTHAN's ICT-2 for UG courses
 
Distributed web based systems
Distributed web based systemsDistributed web based systems
Distributed web based systems
 
Database Systems
Database SystemsDatabase Systems
Database Systems
 
Emerging database technology multimedia database
Emerging database technology   multimedia databaseEmerging database technology   multimedia database
Emerging database technology multimedia database
 
Lecture 3 multimedia databases
Lecture 3   multimedia databasesLecture 3   multimedia databases
Lecture 3 multimedia databases
 
Trends in Database Management
Trends in Database ManagementTrends in Database Management
Trends in Database Management
 
Introduction to Data Management and Sharing
Introduction to Data Management and SharingIntroduction to Data Management and Sharing
Introduction to Data Management and Sharing
 
Multimedia Database
Multimedia DatabaseMultimedia Database
Multimedia Database
 
DBMS FOR STUDENTS MUST DOWNLOAD AND READ
DBMS FOR STUDENTS MUST DOWNLOAD AND READDBMS FOR STUDENTS MUST DOWNLOAD AND READ
DBMS FOR STUDENTS MUST DOWNLOAD AND READ
 
Database systems
Database systemsDatabase systems
Database systems
 

Similar to Data Analytics: HDFS with Big Data : Issues and Application

UNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdfUNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdfShitalGhotekar
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxAnkitChauhan817826
 
FILE SYSTEM VS DBMS ppt.pptx
FILE SYSTEM VS DBMS ppt.pptxFILE SYSTEM VS DBMS ppt.pptx
FILE SYSTEM VS DBMS ppt.pptxSakshiRawat394090
 
Database Systems Lec 1.pptx
Database Systems Lec 1.pptxDatabase Systems Lec 1.pptx
Database Systems Lec 1.pptxNishaTariq1
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
System Analysis And Design
System Analysis And DesignSystem Analysis And Design
System Analysis And DesignLijo Stalin
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by SunnyDignitasDigital1
 
Santosh Kumar Meher(2105040008) DISTRIBUTED DATABASE.pptx
Santosh Kumar Meher(2105040008) DISTRIBUTED DATABASE.pptxSantosh Kumar Meher(2105040008) DISTRIBUTED DATABASE.pptx
Santosh Kumar Meher(2105040008) DISTRIBUTED DATABASE.pptxSANTOSH KUMAR MEHER
 
Distributed file system
Distributed file systemDistributed file system
Distributed file systemAnamika Singh
 
Chapter 4 security part ii auditing database systems
Chapter 4 security part ii auditing database systemsChapter 4 security part ii auditing database systems
Chapter 4 security part ii auditing database systemsjayussuryawan
 
Chapter-1 Introduction to Database Management Systems
Chapter-1 Introduction to Database Management SystemsChapter-1 Introduction to Database Management Systems
Chapter-1 Introduction to Database Management SystemsKunal Anand
 

Similar to Data Analytics: HDFS with Big Data : Issues and Application (20)

UNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdfUNIT 5- Other Databases.pdf
UNIT 5- Other Databases.pdf
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
 
FILE SYSTEM VS DBMS ppt.pptx
FILE SYSTEM VS DBMS ppt.pptxFILE SYSTEM VS DBMS ppt.pptx
FILE SYSTEM VS DBMS ppt.pptx
 
Database Systems Lec 1.pptx
Database Systems Lec 1.pptxDatabase Systems Lec 1.pptx
Database Systems Lec 1.pptx
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
System Analysis And Design
System Analysis And DesignSystem Analysis And Design
System Analysis And Design
 
Overview of Big Data by Sunny
Overview of Big Data by SunnyOverview of Big Data by Sunny
Overview of Big Data by Sunny
 
DBMS.pptx
DBMS.pptxDBMS.pptx
DBMS.pptx
 
Chapter-5-DFS.ppt
Chapter-5-DFS.pptChapter-5-DFS.ppt
Chapter-5-DFS.ppt
 
Santosh Kumar Meher(2105040008) DISTRIBUTED DATABASE.pptx
Santosh Kumar Meher(2105040008) DISTRIBUTED DATABASE.pptxSantosh Kumar Meher(2105040008) DISTRIBUTED DATABASE.pptx
Santosh Kumar Meher(2105040008) DISTRIBUTED DATABASE.pptx
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
Dbms Useful PPT
Dbms Useful PPTDbms Useful PPT
Dbms Useful PPT
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Chapter 4 security part ii auditing database systems
Chapter 4 security part ii auditing database systemsChapter 4 security part ii auditing database systems
Chapter 4 security part ii auditing database systems
 
8.DBMS.pptx
8.DBMS.pptx8.DBMS.pptx
8.DBMS.pptx
 
Distributed dbms (ddbms)
Distributed dbms (ddbms)Distributed dbms (ddbms)
Distributed dbms (ddbms)
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 
Chapter-1 Introduction to Database Management Systems
Chapter-1 Introduction to Database Management SystemsChapter-1 Introduction to Database Management Systems
Chapter-1 Introduction to Database Management Systems
 
1_DBMS_Introduction.pdf
1_DBMS_Introduction.pdf1_DBMS_Introduction.pdf
1_DBMS_Introduction.pdf
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 

Data Analytics: HDFS with Big Data : Issues and Application

  • 1. Data Analytics A Lecture Series on Dr.Chitra A.Dhawale P.R.Pote College of Engg.and Mgmt.
  • 2.
  • 3. Data Analytics (MCA19304) COURSE OUTCOMES AT THE END OF COURSE THE STUDENT SHOULD BE ABLE TO : 1. DEVELOP AND MAINTAIN RELIABLE, SCALABLE SYSTEMS USING APACHE, HADOOP 2. WRITE MAP REDUCE BASED APPLICATION 3. DIFFERENTIATE BETWEEN CONVENTIONAL SQL AND NOSQL 4. ANALYZE AND DEVELOP BIG DATA SOLUTIONS USING HIVE AND PIG
  • 4. Data Analytics (MCA19304) UNIT I • DISTRIBUTED FILE SYSTEM AND ITS ISSUES • INTRODUCTION TO BIG DATA, • BIG DATA CHARACTERISTICS • TYPES OF BIG DATA • TRADITIONAL VS. BIG DATA APPROACH • BIG DATA APPLICATIONS
  • 5. Distributed file system and its issues
  • 6. Distributed file system and its issues • A single machine with 4 Hard disks with 1 tb of data (I/O Channel), with 100 Mbps speed. For Processing needs 45 mins. • For Faster Processing : • Divide data and store it on multiple machines with same configuration as above – Assume all machines are processing data in parallel manner then , It will take 45/5 = 9 mins for processing. • Processing will be 5 times faster than a single machine.
  • 7. Distributed file system and its issues
  • 8. Distributed file system and its issues Each machine have its own local file system (physical file system ) where you store data i.e create folders and subfolders and so on. Distributed file system is not physical, it is virtual or logical file system. Hadoop used DFS. Install libraries on every machine running as a separate process in different machines. These are creating virtual layer over the physical file system under it. This virtual layer is called distributed file system Distributed File System
  • 9. Distributed file system and its issues • Virtual File System is a software i.e set of programs—obviously….Set of commands • Ex. Dfs -copy source file destination file • Dfs -copy file1 file 2 • It read file1 which is distributed on 5 machines say ( A,B,C,D,E ), user having no idea about it….Where each part of file is ? • ( path is virtual path) nowhere it is existing. • Any dfs follows master slave architecture
  • 10. DFS Master Machine Slave Machines  Upper Machine is Master Machine and Lower 5 are Slave ones.  Data is splitted and stored on slave machines.  Master does not store any data. It only stores metadata.  Master Machine only know (as File is divided into blocks (File to Block Mapping and blocks are distributed on slave machines i.e Block to Slave mapping)  Data can only be accessed via Master as Only Master know the actual location of data on each slave.
  • 11. HDFS • While reading data, if any of the node failure then client may get partial data. • To overcome this at the time of configuring HDFS, replication factor is set i.e if replication factor = 2 , it means every block is replicated ( copied at two places) i.e 2 copies are maintained for each block. • In case of failure of one node, block can be accessed from another node. Data is transmitted to machine (Server) where program is running. .
  • 12. Features of DFS Transparency :  Structure transparency – There is no need for the client to know about the number or locations of file servers and the storage devices.  Access transparency – Both local and remote files should be accessible in the same manner.  Naming transparency – Once a name is given to the file, it should not be changed during transferring from one node to another.
  • 13. Features of DFS • Replication transparency – If a file is copied on multiple nodes, both the copies of the file and their locations should be hidden from one node to another.  User mobility : It will automatically bring the user’s home directory to the node where the user logs in. • Performance : Performance is based on the average amount of time needed to convince the client requests. • This time covers the CPU time + time taken to access secondary storage + network access time.
  • 14. Features of DFS  Simplicity and ease of use : The user interface of a file system should be simple and the number of commands in the file should be small.  High availability : A Distributed File System should be able to continue in case of any partial failures like a link failure, a node failure, or a storage drive crash. A high authentic and adaptable distributed file system should have different and independent file servers for controlling different and independent storage devices.  Scalability : Since growing the network by adding new machines or joining two networks together is routine, the distributed system will inevitably grow over time. As a result, a good distributed file system should be built to scale quickly as the number of nodes and users in the system grows. Service should not be substantially disrupted as the number of nodes and users grows.
  • 15. Features of DFS  High reliability : A file system should create backup copies of key files that can be used if the originals are lost. Many file systems employ stable storage as a high-reliability strategy.  Data integrity :  Multiple users frequently share a file system.  The integrity of data saved in a shared file must be guaranteed by the file system.  That is, concurrent access requests from many users who are competing for access to the same file must be correctly synchronized using a concurrency control method.  Atomic transactions are a high-level concurrency management mechanism for data integrity that is frequently offered to users by a file system.
  • 16. Features of DFS  Security : Users of heterogeneous distributed systems have the option of using multiple computer platforms for different purposes.  Heterogeneity :  To safeguard the information contained in the file system from unwanted & unauthorized access, security mechanisms must be implemented.  A distributed file system should be secure so that its users may trust that their data will be kept private.
  • 17. Issues with DFS  In Distributed File System nodes and connections needs to be secured therefore we can say that security is at stake.  There is a possibility of lose of messages and data in the network while movement from one node to another.  Database connection in case of Distributed File System is complicated.  Also handling of the database is not easy in Distributed File System as compared to a single user system.  There are chances that overloading will take place if all nodes tries to send data at once.do with the local
  • 18. Factors- Big Data Generation Evolution of Technology
  • 19. Factors- Big Data Generation IOT
  • 20. Factors- Big Data Generation Social Media
  • 21. Factors- Big Data Generation Others
  • 22. What is Big Data?
  • 23. Characteristics – Big Data FIVE V’S OF BIG DATA : 1 . VOLUME
  • 24. Characteristics – Big Data FIVE V’S OF BIG DATA : 2. VARIETY
  • 25. Characteristics – Big Data FIVE V’S OF BIG DATA : 3 . VELOCITY
  • 26. Characteristics – Big Data FIVE V’S OF BIG DATA : 4. VALUE
  • 27. Characteristics – Big Data FIVE V’S OF BIG DATA : 4. VERACITY
  • 28. Characteristics of Big Data at a glance
  • 29. Types of Big Data
  • 30. Types of Big Data • Structured The structured data includes all the data that can be stored in a tabular column. Relational databases are examples of structured data. It is easy to make sense of the relational databases. Most of the modern computers are able to make sense of structured data.
  • 31. Types of Big Data Unstructured • Unstructured data refers to the data that lacks any specific form or structure whatsoever. • The unstructured data is the one that cannot be stored in a spreadsheet; • Unstructured data, on the other hand, is the one which cannot be fit into tabular databases. • Examples of unstructured data include audio, video, and other sorts of data which comprise such a big chunk of the big data today. Email is an example of unstructured data.
  • 32. Types of Big Data Semi-structured • The semi-structured data includes both structured and unstructured data. • This type of data sets include a proper structure, but still it might not be possible to sort or process that data due to some constraints. • This type of data includes the XML data, JSON files, and others.
  • 33. Traditional Vs. Big Data • 1.Traditional data • Traditional data is the structured data which is being majorly maintained by all types of businesses starting from very small to big organizations. • In traditional database system a centralized database architecture used to store and maintain the data in a fixed format or fields in a file. • For managing and accessing the data structured query language (SQL) is used. • 2. Big data : Big data deal with too large or complex data sets which is difficult to manage in traditional data-processing application software. • It deals with large volume of both structured, semi structured and unstructured data. Volume, velocity and variety, veracity and value. • Big data not only refers to large amount of data it refers to extracting meaningful data by analyzing the huge amount of complex data sets.
  • 34. S.No. TRADITIONAL DATA BIG DATA 01. Traditional data is generated in enterprise level. Big data is generated in outside and enterprise level. 02. Its volume ranges from Gigabytes to Terabytes. Its volume ranges from Petabytes to Zettabytes or Exabytes. 03. Traditional database system deals with structured data. Big data system deals with structured, semi structured and unstructured data. 04. Traditional data is generated per hour or per day or more. But big data is generated more frequently mainly per seconds. 05. Traditional data source is centralized and it is managed in centralized form. Big data source is distributed and it is managed in distributed form. 06. Data integration is very easy. Data integration is very difficult. 07. Normal system configuration is capable to process traditional data. High system configuration is required to process big data.
  • 35. 08. The size of the data is very small. The size is more than the traditional data size. 09. Traditional data base tools are required to perform any data base operation. Special kind of data base tools are required to perform any data base operation. 10. Normal functions can manipulate data. Special kind of functions can manipulate data. 11. Its data model is strict schema based and it is static. Its data model is flat schema based and it is dynamic. 12.. Traditional data is stable and inter relationship. Big data is not stable and unknown relationship. 13. Traditional data is in manageable volume. Big data is in huge volume which becomes unmanageable. 14. It is easy to manage and manipulate the data. It is difficult to manage and manipulate the data. 15. Its data sources includes ERP transaction data, CRM transaction data, financial data, organizational data, web transaction data etc. Its data sources includes social media, device data, sensor data, video, images, audio etc.
  • 36. Applications of Big Data •Big data in retail •Big data in healthcare •Big data in education •Big data in e-commerce •Big data in media and entertainment •Big data in finance •Big data in travel industry •Big data in telecom •Big data in automobile