This presentation introduces Md. Anik Hasan and their topic on web page classification features and algorithms. The objective is to review background on web classification, describe commonly used features and algorithms, and discuss related issues. While the research provides a clear overview, it does not cover sentiment classification, genre classification, or search engine spam classification.
Russian Call girls in Dubai +971563133746 Dubai Call girls
web page classification and algorithmn.pdf
1. Welcome to my
presentation
Name: Md. Anik Hasan
ID: 201-35-572
Section: PC-A
Department of Software
Engineering
Instructor
Name: MD. MARUF HASSAN
Department of Software
engineering
2. Topics : Web Page Classification: Features and
Algorithms
Problem statement : general problem of web page classification
3. Objective :
To assess the background of web classification and related Work
To describe features and algorithms used in classification
To discuss several related issues in web classification
point out some interesting direction of web algorithm
4. Contribution :
Very clear review of useful web-specific features for classification.
an enumeration of the major applications for web classification
a clear view of future research directions.
Research gap:
it can not deal with sentiment classification, genre classification
search engine spam classification and so on. This research only
focuses on subject and functional classification. It lack an analysis of
features specific to the web
5.
6. Sumaraize the result :
• There is also research that utilizes both structural and content information.
• In their algorithms, a web site can be represented by a single virtual page consisting
of all pages in the site, by a vector of topic frequencies, or by a tree of its pages with
topics.
• Researching blog classification can be broken into three types: blog identification (to
determine whether a web document is a blog), mood classification, and genre
classification.
• It has been shown that there is close correlation between a web site's link structure
and its functionality.
• The second category of research includes identification of the mood or sentiment of
• The third category focuses on the genre of blogs.
• So far, it seems research from both the second and the third category suffers from the
lack of a well-defined taxonomy.
7. Cycle body
Parameter of result:
Constructing, maintaining or expanding web directories.
Improving quality of search results.
Building efficient focused crawlers or vertical (domain-specific) search engines
Visual analysis
Utilizing artificial links
Significans of research : We have surveyed the space of published
approaches to web page classification from various viewpoints, and summarized
their findings and contributions. We found that the appropriate use of textual and
visual features that reside directly on the page can improve classification
performance. Feature selection and the combination of multiple techniques can
bring further improvement.
8. Limitation:
The lack of a standardized dataset, especially one with the spatial locality
representative of the web, is a significant disadvantage in web classification
research.Search engine spam is a significant concern in web information retrieval.
cocolution: Web page classification aims to categorize web pages into predefined
categories. Classification tasks include assigning documents on the basis of subject,
function, sentiment, genre, and more. Unlike more general text classification, web
page classification methods can take advantage of the semi-structured content and
connections to other pages within the Web. How much do a text and link similarity
measures reflect the semantic similarity between documents? How might neighbor
(or portions of neighbors) be weighted or selected to the best match the likely value
of the evidence provided? Hyperlink information often encodes semantic
relationships along with voting for representative or important pages.