SlideShare a Scribd company logo
Web Data Mining
7/2/2019 Compiled by: Kamal Acharya 1
7/2/2019 Compiled by: Kamal Acharya 2
Introduction
• Web: A huge, widely-distributed, highly heterogeneous, semi-
structured, interconnected information repository
• Web is a huge collection of documents plus
– Hyper-link information
– Access and usage information
7/2/2019 Compiled by: Kamal Acharya 3
Contd..
• What is Web Mining?
– Web mining is the application of data mining techniques to
find interesting and potentially useful knowledge from web
data.
– Web data:
• Web content data : Text, image, records, etc.
• Web structure data: Hyperlinks
• Web usages data: server logs
7/2/2019 Compiled by: Kamal Acharya 4
Contd..
– Web mining is usually divided into the following three categories.
• Web content mining
• Web usage mining and
• Web structure mining
Fig: Types of web mining
7/2/2019 Compiled by: Kamal Acharya 5
Web usages mining
• Automatic discovery of patterns in clickstreams(usages) and associated
data collected or generated as a result of user interactions with one or
more Web sites.
• Goal: analyze the behavioral patterns and profiles of users interacting
with a Web site.
• The discovered patterns are usually represented as collections of pages,
objects, or resources that are frequently accessed by groups of users
with common interests.
7/2/2019 Compiled by: Kamal Acharya 6
Contd..
• Application: Analyzing click stream data can help :
– determine the life-time value of clients,
– design cross-marketing strategies across products and services,
– evaluate the effectiveness of promotional campaigns,
– optimize the functionality of Web-based applications,
– provide more personalized content to visitors, and find the most effective
logical structure for their Web space.
7/2/2019 Compiled by: Kamal Acharya 7
Contd..
• Phase of Web Usage Mining:
– There are generally three distinctive phases in web usage mining:
• Data collection and preprocessing,
• Knowledge discovery,
• and pattern analysis
7/2/2019 Compiled by: Kamal Acharya 8
Contd..
• Data Collection and Pre-processing Phase:
– It deals with generating and cleaning of web data and
transforming it to a set of user transactions representing
activities of each user during his/her website visit.
– This step will influence the quality and result of the pattern
discovery and analysis. Therefore, it needs to be done very
carefully.
7/2/2019 Compiled by: Kamal Acharya 9
Contd..
• Pattern Discovery Phase
– Knowledge or pattern discovery is the key component of the Web mining,
which uses the algorithms and techniques from data mining.
– At present the usually used data mining methods mainly have clustering,
classifying and association rule mining.
– Each method has its own excellence and shortcomings, but the quite
effective method mainly is classifying and clustering at the present.
7/2/2019 Compiled by: Kamal Acharya 10
Contd..
• Pattern Analysis Phase:
– Pattern Analysis is the final stage of the Web usage mining.
– Challenges of Pattern Analysis are to filter uninteresting
information and to visualize and interpret the interesting
patterns to the user.
7/2/2019 Compiled by: Kamal Acharya 11
Web Content mining
• Web Content Mining is the process of extracting useful
information from the contents of Web documents.
• Content data corresponds to the collection of facts a Web page
was designed to convey to the users.
• It may consist of text, images, audio, video, or structured records
such as lists and tables as shown in Figure below.
7/2/2019 Compiled by: Kamal Acharya 12
Contd..
7/2/2019 Compiled by: Kamal Acharya 13
Web structure mining
• Web structure mining, one of three categories of web mining for
data, is a tool used to identify the relationship between Web
pages linked by information or direct link connection.
• It is used to study the topology of hyperlinks with or without
the description of the links.
7/2/2019 Compiled by: Kamal Acharya 14
Contd..
• The main purpose for structure mining is to extract previously
unknown relationships between Web pages.
• This structure data mining provides use for a business to link the
information of its own Web site to enable navigation and cluster
information into site maps.
• This allows its users the ability to access the desired information
through keyword association and content mining.
7/2/2019 Compiled by: Kamal Acharya 15
Contd..
• According to the type of web structural data, web structure
mining can be divided into two kinds: Hyperlinks
and Document Structure as shown in Figure below:
7/2/2019 Compiled by: Kamal Acharya 16
Issues and Challenges in Web Mining
• There are various issues and challenges with the web. Some
challenges include:
– The Web pages are dynamic that is the information is changes constantly.
Copping the changes and monitoring them is an important issue for many
applications.
– Noise elimination on the web is another issue. A user feels noisy
environment during searching the content, if the information comes from
different sources. Typical Web page involves many pieces of information
for instance the navigation links, main content of the page, copyright
notices, advertisements, and privacy policies. Only part of the information
is useful for a particular application but the rest is considered noise.
7/2/2019 Compiled by: Kamal Acharya 17
Contd..
• The diversity of the information on the multiple pages show
similar information in different words or formats, based on the
diverse authorship of Web pages that make the integration of
information from multiple pages as a challenging problem.
• Handing Big Data on the web is most important challenge, which
is scalable in term of volume, variety, variability, and complexity.
7/2/2019 Compiled by: Kamal Acharya 18
Contd..
• To maintain security and privacy of web data is not an easy task.
Advanced cryptographic algorithm is required for optimal service
on the web.
• Discovery of advance hyperlink topology and its management is
the other mining issue on the web.
7/2/2019 Compiled by: Kamal Acharya 19
Web Mining Application Areas
• Web mining is an important tool to gather knowledge of the
behavior of Websites visitors and thereby to allow for appropriate
adjustments and decisions with respect to Websites‘ actual users
and traffic patterns.
• Along with a description of the processes involved in Web
mining states that Website Design, Web Traffic Handling, e-
Business and Web Personalization are four major application
areas for Web mining. These are briefly described in the
following sections.
7/2/2019 Compiled by: Kamal Acharya 20
Contd..
• Website Design:
– The content and structure of the Website is important to the user
experience/impression of the site and the site‘s usability. The problem is
that different types of users have different preferences, background,
knowledge etc. making it difficult (if not impossible) to find a design that
is optimal for all users.
– Web usage mining can then be used to detect which types of users are
accessing the website, and their behavior, knowledge which can then be
used to manually design/re-design the website, or to automatically change
the structure and content based on the profile of the user visiting it.
7/2/2019 Compiled by: Kamal Acharya 21
Contd..
• Web Traffic Handling:
– The performance and service of Websites can be improved using
knowledge of the Web traffic in order to predict the navigation path of the
current user. This may be used for cashing, load balancing or data
distribution to improve the performance. The path prediction can also be
used to detect fraud, break-ins, intrusion etc.
7/2/2019 Compiled by: Kamal Acharya 22
Contd..
• Web Personalization:
– Based on Web Mining Techniques, websites are designed to have the look-
and-feel and contents are personalized to the needs of an individual end-
user.
– Web Personalization or customization is an attractive application area for
Web based companies, allowing for recommendations, marketing
campaigns etc. to be specifically customized for different categories of
users, and more importantly to do this in real-time, automatically, as the
user accesses the Website.
7/2/2019 Compiled by: Kamal Acharya 23
Contd..
• e-Business:
– For Web based companies, Web mining is a powerful tool to collect
business intelligence by using electronic business to get competitive
advantages.
– Patterns of the customer’s activities on the Website can be used as
important knowledge in the decision-making process, e.g. predicting
customer’s future behavior; recruiting new customers and developing new
products are beneficial choices.
7/2/2019 Compiled by: Kamal Acharya 24
Contd..
• E-Learning and Digital Library:
– Web mining can be used for improving the performance of electronic
learning. Applications of web mining towards e-learning are usually web
usage based. Machine learning and web usage mining improve web based
learning.
7/2/2019 Compiled by: Kamal Acharya 25
Contd..
• Security and Crime Investigation:
– Along with the rapid popularity of the Internet, crime
information on the web is becoming increasingly rampant,
and the majority of them are in the form of text.
– Because a lot of crime information in documents is described
through events, event-based semantic technology can be used
to study the patterns and trends of web-oriented crimes
7/2/2019 Compiled by: Kamal Acharya 26
Time series data mining
• Sequential data (or time series) refers to data that appear in a specific order.
– The order defines a time axis, that differentiates this data from other cases
we have seen so far
• Examples
– The price of a stock (or of many stocks) over time
– Environmental data (pressure, temperature, precipitation etc) over time
– The sequence of queries in a search engine, or the frequency of a query
over time
– The words in a document as they appear in order, and etc.
7/2/2019 Compiled by: Kamal Acharya 27
Contd..
• Why deal with sequential data?
– Because all data is sequential
• All data items arrive in the data store in some order
– In some (many) cases the order does not matter
• E.g., we can assume a bag of words model for a document
– In many cases the order is of interest
• E.g., stock prices do not make sense without the time
information.
7/2/2019 Compiled by: Kamal Acharya 28
Contd..
Fig: General time series data mining framework
7/2/2019 Compiled by: Kamal Acharya 29
Homework
• What is web data mining? In what situations can web data
mining techniques can be useful?
• What are the aims of web data mining?
• Explain the difference between the three types of web data
mining.
• What data mining techniques can be used for log data analysis?
• What are time series data? Explain about time series data mining.
Thank You !
Compiled by: Kamal Acharya 307/2/2019

More Related Content

What's hot

Data mining
Data mining Data mining
Web mining (1)
Web mining (1)Web mining (1)
Web mining (1)
ajaybabu1314
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
mahavir_a
 
5.5 graph mining
5.5 graph mining5.5 graph mining
5.5 graph mining
Krish_ver2
 
cyber security and forensic tools
cyber security and forensic toolscyber security and forensic tools
cyber security and forensic toolsSonu Sunaliya
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
VijayasankariS
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
Krish_ver2
 
Web Scraping and Data Extraction Service
Web Scraping and Data Extraction ServiceWeb Scraping and Data Extraction Service
Web Scraping and Data Extraction Service
PromptCloud
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
Monu Chaudhary
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
Krish_ver2
 
Simple overview of machine learning
Simple overview of machine learningSimple overview of machine learning
Simple overview of machine learning
priyadharshini R
 
Firewall, Trusted Systems,IP Security ,ESP Encryption and Authentication
Firewall, Trusted Systems,IP Security ,ESP Encryption and AuthenticationFirewall, Trusted Systems,IP Security ,ESP Encryption and Authentication
Firewall, Trusted Systems,IP Security ,ESP Encryption and Authentication
Gopal Sakarkar
 
Web mining
Web miningWeb mining
Web mining
Innovative Pencils
 
Email security
Email securityEmail security
Email security
Baliram Yadav
 
Text mining
Text miningText mining
Text mining
Koshy Geoji
 
Data mining
Data mining Data mining
Data mining
sayalipatil528
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
Carlos Castillo (ChaTo)
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
Krish_ver2
 
Secure electronic transaction ppt
Secure electronic transaction pptSecure electronic transaction ppt
Secure electronic transaction ppt
Subhash Gupta
 
Data Mining
Data MiningData Mining
Data Mining
SHIKHA GAUTAM
 

What's hot (20)

Data mining
Data mining Data mining
Data mining
 
Web mining (1)
Web mining (1)Web mining (1)
Web mining (1)
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
5.5 graph mining
5.5 graph mining5.5 graph mining
5.5 graph mining
 
cyber security and forensic tools
cyber security and forensic toolscyber security and forensic tools
cyber security and forensic tools
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Web Scraping and Data Extraction Service
Web Scraping and Data Extraction ServiceWeb Scraping and Data Extraction Service
Web Scraping and Data Extraction Service
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Simple overview of machine learning
Simple overview of machine learningSimple overview of machine learning
Simple overview of machine learning
 
Firewall, Trusted Systems,IP Security ,ESP Encryption and Authentication
Firewall, Trusted Systems,IP Security ,ESP Encryption and AuthenticationFirewall, Trusted Systems,IP Security ,ESP Encryption and Authentication
Firewall, Trusted Systems,IP Security ,ESP Encryption and Authentication
 
Web mining
Web miningWeb mining
Web mining
 
Email security
Email securityEmail security
Email security
 
Text mining
Text miningText mining
Text mining
 
Data mining
Data mining Data mining
Data mining
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 
Secure electronic transaction ppt
Secure electronic transaction pptSecure electronic transaction ppt
Secure electronic transaction ppt
 
Data Mining
Data MiningData Mining
Data Mining
 

Similar to Web Mining

Search Engines
Search EnginesSearch Engines
Search Engines
Kamal Acharya
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
ijcax
 
Web Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningWeb Page Recommendation Using Web Mining
Web Page Recommendation Using Web Mining
IJERA Editor
 
WEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESSWEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESS
acijjournal
 
Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...
Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...
Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...
IRJET Journal
 
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniques
ijctet
 
Pxc3893553
Pxc3893553Pxc3893553
Pxc3893553
Ouzza Brahim
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
anchalsinghdm
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
shabnamfsayyad
 
Implementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server MonitoringImplementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server Monitoring
iosrjce
 
C017231726
C017231726C017231726
C017231726
IOSR Journals
 
Web mining
Web miningWeb mining
Web mining
SwarnaLatha177
 

Similar to Web Mining (20)

Search Engines
Search EnginesSearch Engines
Search Engines
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
 
Web Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningWeb Page Recommendation Using Web Mining
Web Page Recommendation Using Web Mining
 
WEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESSWEB MINING – A CATALYST FOR E-BUSINESS
WEB MINING – A CATALYST FOR E-BUSINESS
 
Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...
Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...
Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...
 
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniques
 
Ab03401550159
Ab03401550159Ab03401550159
Ab03401550159
 
Pxc3893553
Pxc3893553Pxc3893553
Pxc3893553
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
 
01635156
0163515601635156
01635156
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
Implementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server MonitoringImplementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server Monitoring
 
C017231726
C017231726C017231726
C017231726
 
Web mining
Web miningWeb mining
Web mining
 

More from Kamal Acharya

Programming the basic computer
Programming the basic computerProgramming the basic computer
Programming the basic computer
Kamal Acharya
 
Computer Arithmetic
Computer ArithmeticComputer Arithmetic
Computer Arithmetic
Kamal Acharya
 
Introduction to Computer Security
Introduction to Computer SecurityIntroduction to Computer Security
Introduction to Computer Security
Kamal Acharya
 
Session and Cookies
Session and CookiesSession and Cookies
Session and Cookies
Kamal Acharya
 
Functions in php
Functions in phpFunctions in php
Functions in php
Kamal Acharya
 
Web forms in php
Web forms in phpWeb forms in php
Web forms in php
Kamal Acharya
 
Making decision and repeating in PHP
Making decision and repeating  in PHPMaking decision and repeating  in PHP
Making decision and repeating in PHP
Kamal Acharya
 
Working with arrays in php
Working with arrays in phpWorking with arrays in php
Working with arrays in php
Kamal Acharya
 
Text and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHPText and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHP
Kamal Acharya
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
Kamal Acharya
 
Capacity Planning of Data Warehousing
Capacity Planning of Data WarehousingCapacity Planning of Data Warehousing
Capacity Planning of Data Warehousing
Kamal Acharya
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
Kamal Acharya
 
Information Privacy and Data Mining
Information Privacy and Data MiningInformation Privacy and Data Mining
Information Privacy and Data Mining
Kamal Acharya
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
Kamal Acharya
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
Kamal Acharya
 
Introduction to Data Mining and Data Warehousing
Introduction to Data Mining and Data WarehousingIntroduction to Data Mining and Data Warehousing
Introduction to Data Mining and Data Warehousing
Kamal Acharya
 
Functions in Python
Functions in PythonFunctions in Python
Functions in Python
Kamal Acharya
 
Python Flow Control
Python Flow ControlPython Flow Control
Python Flow Control
Kamal Acharya
 

More from Kamal Acharya (20)

Programming the basic computer
Programming the basic computerProgramming the basic computer
Programming the basic computer
 
Computer Arithmetic
Computer ArithmeticComputer Arithmetic
Computer Arithmetic
 
Introduction to Computer Security
Introduction to Computer SecurityIntroduction to Computer Security
Introduction to Computer Security
 
Session and Cookies
Session and CookiesSession and Cookies
Session and Cookies
 
Functions in php
Functions in phpFunctions in php
Functions in php
 
Web forms in php
Web forms in phpWeb forms in php
Web forms in php
 
Making decision and repeating in PHP
Making decision and repeating  in PHPMaking decision and repeating  in PHP
Making decision and repeating in PHP
 
Working with arrays in php
Working with arrays in phpWorking with arrays in php
Working with arrays in php
 
Text and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHPText and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHP
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
 
Capacity Planning of Data Warehousing
Capacity Planning of Data WarehousingCapacity Planning of Data Warehousing
Capacity Planning of Data Warehousing
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Information Privacy and Data Mining
Information Privacy and Data MiningInformation Privacy and Data Mining
Information Privacy and Data Mining
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Introduction to Data Mining and Data Warehousing
Introduction to Data Mining and Data WarehousingIntroduction to Data Mining and Data Warehousing
Introduction to Data Mining and Data Warehousing
 
Functions in Python
Functions in PythonFunctions in Python
Functions in Python
 
Python Flow Control
Python Flow ControlPython Flow Control
Python Flow Control
 

Recently uploaded

The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
ShivajiThube2
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
DhatriParmar
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 

Recently uploaded (20)

The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 

Web Mining

  • 1. Web Data Mining 7/2/2019 Compiled by: Kamal Acharya 1
  • 2. 7/2/2019 Compiled by: Kamal Acharya 2 Introduction • Web: A huge, widely-distributed, highly heterogeneous, semi- structured, interconnected information repository • Web is a huge collection of documents plus – Hyper-link information – Access and usage information
  • 3. 7/2/2019 Compiled by: Kamal Acharya 3 Contd.. • What is Web Mining? – Web mining is the application of data mining techniques to find interesting and potentially useful knowledge from web data. – Web data: • Web content data : Text, image, records, etc. • Web structure data: Hyperlinks • Web usages data: server logs
  • 4. 7/2/2019 Compiled by: Kamal Acharya 4 Contd.. – Web mining is usually divided into the following three categories. • Web content mining • Web usage mining and • Web structure mining Fig: Types of web mining
  • 5. 7/2/2019 Compiled by: Kamal Acharya 5 Web usages mining • Automatic discovery of patterns in clickstreams(usages) and associated data collected or generated as a result of user interactions with one or more Web sites. • Goal: analyze the behavioral patterns and profiles of users interacting with a Web site. • The discovered patterns are usually represented as collections of pages, objects, or resources that are frequently accessed by groups of users with common interests.
  • 6. 7/2/2019 Compiled by: Kamal Acharya 6 Contd.. • Application: Analyzing click stream data can help : – determine the life-time value of clients, – design cross-marketing strategies across products and services, – evaluate the effectiveness of promotional campaigns, – optimize the functionality of Web-based applications, – provide more personalized content to visitors, and find the most effective logical structure for their Web space.
  • 7. 7/2/2019 Compiled by: Kamal Acharya 7 Contd.. • Phase of Web Usage Mining: – There are generally three distinctive phases in web usage mining: • Data collection and preprocessing, • Knowledge discovery, • and pattern analysis
  • 8. 7/2/2019 Compiled by: Kamal Acharya 8 Contd.. • Data Collection and Pre-processing Phase: – It deals with generating and cleaning of web data and transforming it to a set of user transactions representing activities of each user during his/her website visit. – This step will influence the quality and result of the pattern discovery and analysis. Therefore, it needs to be done very carefully.
  • 9. 7/2/2019 Compiled by: Kamal Acharya 9 Contd.. • Pattern Discovery Phase – Knowledge or pattern discovery is the key component of the Web mining, which uses the algorithms and techniques from data mining. – At present the usually used data mining methods mainly have clustering, classifying and association rule mining. – Each method has its own excellence and shortcomings, but the quite effective method mainly is classifying and clustering at the present.
  • 10. 7/2/2019 Compiled by: Kamal Acharya 10 Contd.. • Pattern Analysis Phase: – Pattern Analysis is the final stage of the Web usage mining. – Challenges of Pattern Analysis are to filter uninteresting information and to visualize and interpret the interesting patterns to the user.
  • 11. 7/2/2019 Compiled by: Kamal Acharya 11 Web Content mining • Web Content Mining is the process of extracting useful information from the contents of Web documents. • Content data corresponds to the collection of facts a Web page was designed to convey to the users. • It may consist of text, images, audio, video, or structured records such as lists and tables as shown in Figure below.
  • 12. 7/2/2019 Compiled by: Kamal Acharya 12 Contd..
  • 13. 7/2/2019 Compiled by: Kamal Acharya 13 Web structure mining • Web structure mining, one of three categories of web mining for data, is a tool used to identify the relationship between Web pages linked by information or direct link connection. • It is used to study the topology of hyperlinks with or without the description of the links.
  • 14. 7/2/2019 Compiled by: Kamal Acharya 14 Contd.. • The main purpose for structure mining is to extract previously unknown relationships between Web pages. • This structure data mining provides use for a business to link the information of its own Web site to enable navigation and cluster information into site maps. • This allows its users the ability to access the desired information through keyword association and content mining.
  • 15. 7/2/2019 Compiled by: Kamal Acharya 15 Contd.. • According to the type of web structural data, web structure mining can be divided into two kinds: Hyperlinks and Document Structure as shown in Figure below:
  • 16. 7/2/2019 Compiled by: Kamal Acharya 16 Issues and Challenges in Web Mining • There are various issues and challenges with the web. Some challenges include: – The Web pages are dynamic that is the information is changes constantly. Copping the changes and monitoring them is an important issue for many applications. – Noise elimination on the web is another issue. A user feels noisy environment during searching the content, if the information comes from different sources. Typical Web page involves many pieces of information for instance the navigation links, main content of the page, copyright notices, advertisements, and privacy policies. Only part of the information is useful for a particular application but the rest is considered noise.
  • 17. 7/2/2019 Compiled by: Kamal Acharya 17 Contd.. • The diversity of the information on the multiple pages show similar information in different words or formats, based on the diverse authorship of Web pages that make the integration of information from multiple pages as a challenging problem. • Handing Big Data on the web is most important challenge, which is scalable in term of volume, variety, variability, and complexity.
  • 18. 7/2/2019 Compiled by: Kamal Acharya 18 Contd.. • To maintain security and privacy of web data is not an easy task. Advanced cryptographic algorithm is required for optimal service on the web. • Discovery of advance hyperlink topology and its management is the other mining issue on the web.
  • 19. 7/2/2019 Compiled by: Kamal Acharya 19 Web Mining Application Areas • Web mining is an important tool to gather knowledge of the behavior of Websites visitors and thereby to allow for appropriate adjustments and decisions with respect to Websites‘ actual users and traffic patterns. • Along with a description of the processes involved in Web mining states that Website Design, Web Traffic Handling, e- Business and Web Personalization are four major application areas for Web mining. These are briefly described in the following sections.
  • 20. 7/2/2019 Compiled by: Kamal Acharya 20 Contd.. • Website Design: – The content and structure of the Website is important to the user experience/impression of the site and the site‘s usability. The problem is that different types of users have different preferences, background, knowledge etc. making it difficult (if not impossible) to find a design that is optimal for all users. – Web usage mining can then be used to detect which types of users are accessing the website, and their behavior, knowledge which can then be used to manually design/re-design the website, or to automatically change the structure and content based on the profile of the user visiting it.
  • 21. 7/2/2019 Compiled by: Kamal Acharya 21 Contd.. • Web Traffic Handling: – The performance and service of Websites can be improved using knowledge of the Web traffic in order to predict the navigation path of the current user. This may be used for cashing, load balancing or data distribution to improve the performance. The path prediction can also be used to detect fraud, break-ins, intrusion etc.
  • 22. 7/2/2019 Compiled by: Kamal Acharya 22 Contd.. • Web Personalization: – Based on Web Mining Techniques, websites are designed to have the look- and-feel and contents are personalized to the needs of an individual end- user. – Web Personalization or customization is an attractive application area for Web based companies, allowing for recommendations, marketing campaigns etc. to be specifically customized for different categories of users, and more importantly to do this in real-time, automatically, as the user accesses the Website.
  • 23. 7/2/2019 Compiled by: Kamal Acharya 23 Contd.. • e-Business: – For Web based companies, Web mining is a powerful tool to collect business intelligence by using electronic business to get competitive advantages. – Patterns of the customer’s activities on the Website can be used as important knowledge in the decision-making process, e.g. predicting customer’s future behavior; recruiting new customers and developing new products are beneficial choices.
  • 24. 7/2/2019 Compiled by: Kamal Acharya 24 Contd.. • E-Learning and Digital Library: – Web mining can be used for improving the performance of electronic learning. Applications of web mining towards e-learning are usually web usage based. Machine learning and web usage mining improve web based learning.
  • 25. 7/2/2019 Compiled by: Kamal Acharya 25 Contd.. • Security and Crime Investigation: – Along with the rapid popularity of the Internet, crime information on the web is becoming increasingly rampant, and the majority of them are in the form of text. – Because a lot of crime information in documents is described through events, event-based semantic technology can be used to study the patterns and trends of web-oriented crimes
  • 26. 7/2/2019 Compiled by: Kamal Acharya 26 Time series data mining • Sequential data (or time series) refers to data that appear in a specific order. – The order defines a time axis, that differentiates this data from other cases we have seen so far • Examples – The price of a stock (or of many stocks) over time – Environmental data (pressure, temperature, precipitation etc) over time – The sequence of queries in a search engine, or the frequency of a query over time – The words in a document as they appear in order, and etc.
  • 27. 7/2/2019 Compiled by: Kamal Acharya 27 Contd.. • Why deal with sequential data? – Because all data is sequential • All data items arrive in the data store in some order – In some (many) cases the order does not matter • E.g., we can assume a bag of words model for a document – In many cases the order is of interest • E.g., stock prices do not make sense without the time information.
  • 28. 7/2/2019 Compiled by: Kamal Acharya 28 Contd.. Fig: General time series data mining framework
  • 29. 7/2/2019 Compiled by: Kamal Acharya 29 Homework • What is web data mining? In what situations can web data mining techniques can be useful? • What are the aims of web data mining? • Explain the difference between the three types of web data mining. • What data mining techniques can be used for log data analysis? • What are time series data? Explain about time series data mining.
  • 30. Thank You ! Compiled by: Kamal Acharya 307/2/2019