SlideShare a Scribd company logo
Web Mining
Prepared by
Dr. Swarnalatha K.S
Professor, Dept. of ISE
NMIT
Table of contents
• Introduction to web mining
• Types of web mining
• Challenges in web mining
• Applications of web mining
• Over the last few years, the World Wide Web has become a
significant source of information and simultaneously a
popular platform for business.
• Web mining can define as the method of utilizing data
mining techniques and algorithms to extract useful
information directly from the web, such as Web documents
and services, hyperlinks, Web content, and server logs.
• The World Wide Web contains a large amount of data that
provides a rich source to data mining. The objective of Web
mining is to look for patterns in Web data by collecting and
examining data in order to gain insights.
What is Web Mining?
• Web mining can widely be seen as the application of
adapted data mining techniques to the web, whereas data
mining is defined as the application of the algorithm to
discover patterns on mostly structured data embedded into
a knowledge discovery process.
• Web mining has a distinctive property to provide a set of
various data types. The web has multiple aspects that yield
different approaches for the mining process, such as web
pages consist of text, web pages are linked via hyperlinks,
and user activity can be monitored via web server logs.
These three features lead to the differentiation between
the three areas are web content mining, web structure
mining, web usage mining.
Types of web mining
1. Web Content Mining:
• Web content mining can be used to extract useful data,
information, knowledge from the web page content.
• In web content mining, each web page is considered as an
individual document.
• The individual can take advantage of the semi-structured nature of
web pages, as HTML provides information that concerns not only
the layout but also logical structure.
• The primary task of content mining is data extraction, where
structured data is extracted from unstructured websites. The
objective is to facilitate data aggregation over various web sites by
using the extracted structured data.
• Web content mining can be utilized to distinguish topics on the
web. For Example, if any user searches for a specific task on the
search engine, then the user will get a list of suggestions.
2. Web Structured Mining:
• The web structure mining can be used to find the link structure of
hyperlink. It is used to identify that data either link the web pages or
direct link network.
• In Web Structure Mining, an individual considers the web as a directed
graph, with the web pages being the vertices that are associated with
hyperlinks.
• The most important application in this regard is the Google search engine,
which estimates the ranking of its outcomes primarily with the PageRank
algorithm.
• It characterizes a page to be exceptionally relevant when frequently
connected by other highly related pages. Structure and content mining
methodologies are usually combined.
• For example, web structured mining can be beneficial to organizations to
regulate the network between two commercial sites.
Web Usage Mining:
• Web usage mining is used to extract useful data, information,
knowledge from the weblog records, and assists in recognizing the
user access patterns for web pages.
• In Mining, the usage of web resources, the individual is thinking
about records of requests of visitors of a website, that are often
collected as web server logs.
• While the content and structure of the collection of web pages
follow the intentions of the authors of the pages, the individual
requests demonstrate how the consumers see these pages.
• Web usage mining may disclose relationships that were not
proposed by the creator of the pages.
• Some of the methods to identify and analyze the web usage
patterns are given below:
I. Session and visitor analysis:
• The analysis of preprocessed data can be
accomplished in session analysis, which
incorporates the guest records, days, time,
sessions, etc. This data can be utilized to
analyze the visitor's behavior.
• The document is created after this analysis,
which contains the details of repeatedly
visited web pages, common entry, and exit.
II. OLAP (Online Analytical Processing):
• OLAP accomplishes a multidimensional
analysis of advanced data.
• OLAP can be accomplished on various parts of
log related data in a specific period.
• OLAP tools can be used to infer important
business intelligence metrics
Challenges in Web Mining:
the web pretends incredible challenges for resources, and knowledge
discovery based on the following observations:
The complexity of web pages:
• The site pages don't have a unifying structure. They are extremely
complicated as compared to traditional text documents. There are
enormous amounts of documents in the digital library of the web.
These libraries are not organized according to a specific order.
The web is a dynamic data source:
• The data on the internet is quickly updated. For example, news,
climate, shopping, financial news, sports, and so on.
Diversity of client networks:
• The client network on the web is quickly expanding. These clients
have different interests, backgrounds, and usage purposes. There
are over a hundred million workstations that are associated with
the internet and still increasing tremendously.
Relevancy of data:
• It is considered that a specific person is generally
concerned about a small portion of the web,
while the rest of the segment of the web contains
the data that is not familiar to the user and may
lead to unwanted results.
The web is too broad:
• The size of the web is tremendous and rapidly
increasing. It appears that the web is too huge for
data warehousing and data mining.
Application of Web Mining:
Web mining has an extensive application because of
various uses of the web. The list of some
applications of web mining is given below.
• Marketing and conversion tool
• Data analysis on website and application
accomplishment.
• Audience behavior analysis
• Advertising and campaign accomplishment
analysis.
• Testing and analysis of a site.

More Related Content

What's hot

Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseBui Ha
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedAnant Kumar
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntSteven Moy
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...Avinash Ramineni
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 

What's hot (20)

Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
 
Disaster Recovery Site Implementation with MySQL
Disaster Recovery Site Implementation with MySQLDisaster Recovery Site Implementation with MySQL
Disaster Recovery Site Implementation with MySQL
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure Hunt
 
SQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery ImplementationSQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery Implementation
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
2022 02 Integration Bootcamp
2022 02 Integration Bootcamp2022 02 Integration Bootcamp
2022 02 Integration Bootcamp
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 

Similar to Web mining

Personal web usage mining
Personal web usage miningPersonal web usage mining
Personal web usage miningDaminda Herath
 
Personal Web Usage Mining
Personal Web Usage MiningPersonal Web Usage Mining
Personal Web Usage MiningDaminda Herath
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING ijcax
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMSai Kumar Ale
 
WEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfWEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfSowmyaJyothi3
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningAarshDhokai
 

Similar to Web mining (20)

Web mining
Web miningWeb mining
Web mining
 
Gaurav web mining
Gaurav web miningGaurav web mining
Gaurav web mining
 
Web mining
Web miningWeb mining
Web mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining
Web miningWeb mining
Web mining
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
 
Personal web usage mining
Personal web usage miningPersonal web usage mining
Personal web usage mining
 
Personal Web Usage Mining
Personal Web Usage MiningPersonal Web Usage Mining
Personal Web Usage Mining
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
WEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdfWEBMINING_SOWMYAJYOTHI.pdf
WEBMINING_SOWMYAJYOTHI.pdf
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
 
Aa03401490154
Aa03401490154Aa03401490154
Aa03401490154
 
Bb31269380
Bb31269380Bb31269380
Bb31269380
 

Recently uploaded

Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdfKamal Acharya
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientistgettygaming1
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-IVigneshvaranMech
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdfKamal Acharya
 
fundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionfundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionjeevanprasad8
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdfKamal Acharya
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdfKamal Acharya
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationRobbie Edward Sayers
 
Danfoss NeoCharge Technology -A Revolution in 2024.pdf
Danfoss NeoCharge Technology -A Revolution in 2024.pdfDanfoss NeoCharge Technology -A Revolution in 2024.pdf
Danfoss NeoCharge Technology -A Revolution in 2024.pdfNurvisNavarroSanchez
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectRased Khan
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfKamal Acharya
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdfPratik Pawar
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Krakówbim.edu.pl
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdfKamal Acharya
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industriesMuhammadTufail242431
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234AafreenAbuthahir2
 
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfA CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfKamal Acharya
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsAtif Razi
 

Recently uploaded (20)

Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientist
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
fundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projectionfundamentals of drawing and isometric and orthographic projection
fundamentals of drawing and isometric and orthographic projection
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdf
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Danfoss NeoCharge Technology -A Revolution in 2024.pdf
Danfoss NeoCharge Technology -A Revolution in 2024.pdfDanfoss NeoCharge Technology -A Revolution in 2024.pdf
Danfoss NeoCharge Technology -A Revolution in 2024.pdf
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfA CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
 

Web mining

  • 1. Web Mining Prepared by Dr. Swarnalatha K.S Professor, Dept. of ISE NMIT
  • 2. Table of contents • Introduction to web mining • Types of web mining • Challenges in web mining • Applications of web mining
  • 3. • Over the last few years, the World Wide Web has become a significant source of information and simultaneously a popular platform for business. • Web mining can define as the method of utilizing data mining techniques and algorithms to extract useful information directly from the web, such as Web documents and services, hyperlinks, Web content, and server logs. • The World Wide Web contains a large amount of data that provides a rich source to data mining. The objective of Web mining is to look for patterns in Web data by collecting and examining data in order to gain insights.
  • 4. What is Web Mining? • Web mining can widely be seen as the application of adapted data mining techniques to the web, whereas data mining is defined as the application of the algorithm to discover patterns on mostly structured data embedded into a knowledge discovery process. • Web mining has a distinctive property to provide a set of various data types. The web has multiple aspects that yield different approaches for the mining process, such as web pages consist of text, web pages are linked via hyperlinks, and user activity can be monitored via web server logs. These three features lead to the differentiation between the three areas are web content mining, web structure mining, web usage mining.
  • 5. Types of web mining 1. Web Content Mining: • Web content mining can be used to extract useful data, information, knowledge from the web page content. • In web content mining, each web page is considered as an individual document. • The individual can take advantage of the semi-structured nature of web pages, as HTML provides information that concerns not only the layout but also logical structure. • The primary task of content mining is data extraction, where structured data is extracted from unstructured websites. The objective is to facilitate data aggregation over various web sites by using the extracted structured data. • Web content mining can be utilized to distinguish topics on the web. For Example, if any user searches for a specific task on the search engine, then the user will get a list of suggestions.
  • 6. 2. Web Structured Mining: • The web structure mining can be used to find the link structure of hyperlink. It is used to identify that data either link the web pages or direct link network. • In Web Structure Mining, an individual considers the web as a directed graph, with the web pages being the vertices that are associated with hyperlinks. • The most important application in this regard is the Google search engine, which estimates the ranking of its outcomes primarily with the PageRank algorithm. • It characterizes a page to be exceptionally relevant when frequently connected by other highly related pages. Structure and content mining methodologies are usually combined. • For example, web structured mining can be beneficial to organizations to regulate the network between two commercial sites.
  • 7. Web Usage Mining: • Web usage mining is used to extract useful data, information, knowledge from the weblog records, and assists in recognizing the user access patterns for web pages. • In Mining, the usage of web resources, the individual is thinking about records of requests of visitors of a website, that are often collected as web server logs. • While the content and structure of the collection of web pages follow the intentions of the authors of the pages, the individual requests demonstrate how the consumers see these pages. • Web usage mining may disclose relationships that were not proposed by the creator of the pages. • Some of the methods to identify and analyze the web usage patterns are given below:
  • 8. I. Session and visitor analysis: • The analysis of preprocessed data can be accomplished in session analysis, which incorporates the guest records, days, time, sessions, etc. This data can be utilized to analyze the visitor's behavior. • The document is created after this analysis, which contains the details of repeatedly visited web pages, common entry, and exit.
  • 9. II. OLAP (Online Analytical Processing): • OLAP accomplishes a multidimensional analysis of advanced data. • OLAP can be accomplished on various parts of log related data in a specific period. • OLAP tools can be used to infer important business intelligence metrics
  • 10. Challenges in Web Mining: the web pretends incredible challenges for resources, and knowledge discovery based on the following observations: The complexity of web pages: • The site pages don't have a unifying structure. They are extremely complicated as compared to traditional text documents. There are enormous amounts of documents in the digital library of the web. These libraries are not organized according to a specific order. The web is a dynamic data source: • The data on the internet is quickly updated. For example, news, climate, shopping, financial news, sports, and so on. Diversity of client networks: • The client network on the web is quickly expanding. These clients have different interests, backgrounds, and usage purposes. There are over a hundred million workstations that are associated with the internet and still increasing tremendously.
  • 11. Relevancy of data: • It is considered that a specific person is generally concerned about a small portion of the web, while the rest of the segment of the web contains the data that is not familiar to the user and may lead to unwanted results. The web is too broad: • The size of the web is tremendous and rapidly increasing. It appears that the web is too huge for data warehousing and data mining.
  • 12. Application of Web Mining: Web mining has an extensive application because of various uses of the web. The list of some applications of web mining is given below. • Marketing and conversion tool • Data analysis on website and application accomplishment. • Audience behavior analysis • Advertising and campaign accomplishment analysis. • Testing and analysis of a site.