Web mining is the use of data mining techniques to automatically discover and extract information from web documents and web usage data. There are three types of web mining: web content mining, web structure mining, and web usage mining. Web content mining analyzes the contents of web pages such as text and images. Web structure mining analyzes the hyperlink structure of the web to discover communities and page rankings. Web usage mining analyzes user interactions with websites through web logs to understand user behavior. Popular algorithms for web mining include PageRank for ranking pages and HITS for identifying hubs and authorities on a topic. Web mining has applications in areas like e-commerce, security, and prediction.
Digital Marketing Course Week 6: Search Engine Optimization (SEO)Ayca Turhan
Sixth week slides of eMarketing Course at Hacettepe University taught by Ayca Turhan. Topics covered within the presentation include:
Search Engine Optimization Strategies
For more please visit: www.aycaturhan.com/man423
SriG Systems is the Best Search Engine Optimization- SEO, Search Engine Marketing & Digital Marketing company. We make sure your website at all times shows up in the right place at the right time using only white hat SEO techniques. Our SEO Consultants are extremely familiar in everything SEO related with thorough knowledge in keyword Research and Density.
Digital Marketing Course Week 6: Search Engine Optimization (SEO)Ayca Turhan
Sixth week slides of eMarketing Course at Hacettepe University taught by Ayca Turhan. Topics covered within the presentation include:
Search Engine Optimization Strategies
For more please visit: www.aycaturhan.com/man423
SriG Systems is the Best Search Engine Optimization- SEO, Search Engine Marketing & Digital Marketing company. We make sure your website at all times shows up in the right place at the right time using only white hat SEO techniques. Our SEO Consultants are extremely familiar in everything SEO related with thorough knowledge in keyword Research and Density.
Hey, this presentation would let you cover up with the concept of Web Mining. This was the presentation that i presented as my class assignment. This ppt. covers up the headlines of the topic "Web Mining" and lists the characteristics for the same. hope you guys find it useful. Thanks in Advance.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.iosrjce
The internet is a vast collection of billions of web pages containing terabytes of information
arranged in thousands of servers using HTML. The size of this collection itself is a formidable obstacle in
retrieving necessary and relevant information. This made search engines an important part of our lives. Search
engines strive to retrieve information as relevant as possible. One of the building blocks of search engines is the
Web Crawler. We tend to propose a two - stage framework, specifically two smart Crawler, for efficient
gathering deep net interfaces. Within the first stage, smart Crawler, performs site-based sorting out centre
pages with the assistance of search engines, avoiding visiting an oversized variety of pages. To realize
additional correct results for a targeted crawl, smart Crawler, ranks websites to order extremely relevant ones
for a given topic. Within the second stage, smart Crawler, achieves quick in – site looking by excavating most
relevant links with associate degree accommodative link -ranking
The Research on Related Technologies of Web CrawlerIRJESJOURNAL
ABSTRACT: Web crawler is a computer program which can automatically download page or automation scripts, and it is an important part of the search engine. With the rapid growth of Internet, more and more network resources, search engines have been unable to meet people's need for useful information. As an important part of the search engine, web crawler is becoming more and more important role. This article mainly discusses about the working principle, classification of web crawler, etc were related in this paper. And then discusses the research and the subject of the search engine important topic web crawler.
Search Engine Optimization Tips: SEO Tips For Beginners in 2015waqas ahmad
Visit: http://www.latesttutorial.com/
How to do SEO in 2015? search engine optimization tips for biggners. Top SEO Tips and Tricks for Beginners to do SEO in 2015 to optimize their blog or website perfectly.
In this slide you can find:
seo tips for beginners
seo tips
seo optimization tips
seo tips and tricks
search engine optimization
local seo
seo optimization
search engine optimization tips
search engine marketing
best seo companies
search engine optimization companies
seo link building
seo for beginners
seo for dummies
local search engine optimization
seo basics
seo training
website optimization
best seo
seo software
small business seo
search engine optimisation
free seo tips
search engine marketing company
seo strategy
best seo software
local seo services
ecommerce seo
top seo tips
search optimization
simple seo tips
white hat seo
search engine ranking
website marketing
seo search engine optimization
black hat seo
what is seo
organic search engine optimization
affordable search engine optimization
search engine optimization tools
on page seo
search engine optimization software
free seo tools
search engine optimization seo
seo audit
seo firms
easy seo tips
learn seo
off page seo
seo tools
seo content writing
seo for small business
search engine optimization training
seo web design
seo copywriting
organic seo
cheap seo services
seo optimisation
search engines optimization
search engine optimization service
seo news
seo writing
seo
seo ranking software
seo consulting
guaranteed seo
improve search engine ranking
best seo tips
best seo tools
seo checklist
seo blog tips
seo services
seo program
seo company
do it yourself seo
local search seo
website seo tips
search engine optimization pricing
seo companies
seo business
seo techniques
web optimization
search engine placement
seo plan
search engine optimizer
seo service
search engine marketing services
real estate seo
search engine optimization techniques
seo marketing
seo professional
seo help
how does seo work
affordable seo service
seo strategies
search engine optimization firm
search engine optimization packages
how to improve seo
mobile seo
best seo company
search engine optimization firms
free seo software
seo search
diy seo
blog seo tips
search marketing
local seo company
how to seo
search engine optimization consultants
top 10 seo tips
seo agency
seo website design
seo course
seo articles
seo marketing tips
seo packages
seo firm
seo report
professional seo
free seo
local business seo
link building seo
seo guide
best seo services
seo software reviews
seo reports
seo consultants
seo book
search engine optimization consultant
seo beginners guide
cheap search engine optimization
learn search engine optimization
affordable seo
how to do seo
seo sem
seo campaign
seo results
seo website
seo experts
seo work
optimization seo
seo certification
seo advice
An introduction to Search Engine Optimization and different techniques applicable. The presentation also goes into the history of web, and how things changed from time to time.
Search Engine Optimization is the process of improving the visibility of a website on organic ("natural" or un-paid) search engine result pages (SERPs), by incorporating search engine friendly elements into a website.
Natural," or "organic," search engine optimization (SEO) is designing, writing, and HTML-coding a Web site to maximize the chance its pages will appear at the top of spider-based search engine results for selected keywords and phrases
Hey, this presentation would let you cover up with the concept of Web Mining. This was the presentation that i presented as my class assignment. This ppt. covers up the headlines of the topic "Web Mining" and lists the characteristics for the same. hope you guys find it useful. Thanks in Advance.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.iosrjce
The internet is a vast collection of billions of web pages containing terabytes of information
arranged in thousands of servers using HTML. The size of this collection itself is a formidable obstacle in
retrieving necessary and relevant information. This made search engines an important part of our lives. Search
engines strive to retrieve information as relevant as possible. One of the building blocks of search engines is the
Web Crawler. We tend to propose a two - stage framework, specifically two smart Crawler, for efficient
gathering deep net interfaces. Within the first stage, smart Crawler, performs site-based sorting out centre
pages with the assistance of search engines, avoiding visiting an oversized variety of pages. To realize
additional correct results for a targeted crawl, smart Crawler, ranks websites to order extremely relevant ones
for a given topic. Within the second stage, smart Crawler, achieves quick in – site looking by excavating most
relevant links with associate degree accommodative link -ranking
The Research on Related Technologies of Web CrawlerIRJESJOURNAL
ABSTRACT: Web crawler is a computer program which can automatically download page or automation scripts, and it is an important part of the search engine. With the rapid growth of Internet, more and more network resources, search engines have been unable to meet people's need for useful information. As an important part of the search engine, web crawler is becoming more and more important role. This article mainly discusses about the working principle, classification of web crawler, etc were related in this paper. And then discusses the research and the subject of the search engine important topic web crawler.
Search Engine Optimization Tips: SEO Tips For Beginners in 2015waqas ahmad
Visit: http://www.latesttutorial.com/
How to do SEO in 2015? search engine optimization tips for biggners. Top SEO Tips and Tricks for Beginners to do SEO in 2015 to optimize their blog or website perfectly.
In this slide you can find:
seo tips for beginners
seo tips
seo optimization tips
seo tips and tricks
search engine optimization
local seo
seo optimization
search engine optimization tips
search engine marketing
best seo companies
search engine optimization companies
seo link building
seo for beginners
seo for dummies
local search engine optimization
seo basics
seo training
website optimization
best seo
seo software
small business seo
search engine optimisation
free seo tips
search engine marketing company
seo strategy
best seo software
local seo services
ecommerce seo
top seo tips
search optimization
simple seo tips
white hat seo
search engine ranking
website marketing
seo search engine optimization
black hat seo
what is seo
organic search engine optimization
affordable search engine optimization
search engine optimization tools
on page seo
search engine optimization software
free seo tools
search engine optimization seo
seo audit
seo firms
easy seo tips
learn seo
off page seo
seo tools
seo content writing
seo for small business
search engine optimization training
seo web design
seo copywriting
organic seo
cheap seo services
seo optimisation
search engines optimization
search engine optimization service
seo news
seo writing
seo
seo ranking software
seo consulting
guaranteed seo
improve search engine ranking
best seo tips
best seo tools
seo checklist
seo blog tips
seo services
seo program
seo company
do it yourself seo
local search seo
website seo tips
search engine optimization pricing
seo companies
seo business
seo techniques
web optimization
search engine placement
seo plan
search engine optimizer
seo service
search engine marketing services
real estate seo
search engine optimization techniques
seo marketing
seo professional
seo help
how does seo work
affordable seo service
seo strategies
search engine optimization firm
search engine optimization packages
how to improve seo
mobile seo
best seo company
search engine optimization firms
free seo software
seo search
diy seo
blog seo tips
search marketing
local seo company
how to seo
search engine optimization consultants
top 10 seo tips
seo agency
seo website design
seo course
seo articles
seo marketing tips
seo packages
seo firm
seo report
professional seo
free seo
local business seo
link building seo
seo guide
best seo services
seo software reviews
seo reports
seo consultants
seo book
search engine optimization consultant
seo beginners guide
cheap search engine optimization
learn search engine optimization
affordable seo
how to do seo
seo sem
seo campaign
seo results
seo website
seo experts
seo work
optimization seo
seo certification
seo advice
An introduction to Search Engine Optimization and different techniques applicable. The presentation also goes into the history of web, and how things changed from time to time.
Search Engine Optimization is the process of improving the visibility of a website on organic ("natural" or un-paid) search engine result pages (SERPs), by incorporating search engine friendly elements into a website.
Natural," or "organic," search engine optimization (SEO) is designing, writing, and HTML-coding a Web site to maximize the chance its pages will appear at the top of spider-based search engine results for selected keywords and phrases
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
2. What isWeb mining?
Web mining is the use of data mining techniques to automatically discover and
extract information fromWeb documents and services.
3. What is web mining?
• Mining of data related toWWW
• Data present inWeb pages or data related to web activity
• Web data is classified
• Content of web pages
• Intra page structure which include code and actual linkage
• Usage data – how used by visitors
• User profiles
5. Web Content Mining
• Extension of basic search engines
• Search engines are keyword-based
• Traditional search engines use crawlers
• to search theWeb
• gather information
• indexing techniques to store the information
• query processing to provide fast and accurate information to users
6. Taxonomy ofWeb content mining
AGENT BASED APPROACH
WEB CONTENT MINING
DATABASEAPPROACH
USE SOFTWARE SYSTEMSTO PERFORM
THE CONTENT MINING
EG. SEARCH ENGINES
VIEWSWEB DATA AS BELONGINGTO
DATABASE
WEB IS A MULTILEVEL DATABASEAND
QUERY LANGUAGESARE USED FOR
QUERYINGTHE DATA
CONTENT MINING ISATYPE OF TEXT MINING
9. How do crawlers work?
• Robot, spider, crawler is a program that traverses the hypertext structure in
the web
• Page that the crawler starts is referred to as seed URL
• All links from that page are recorded and saved in a queue
• The new pages are in turn searched and their links are saved
• The crawlers collect information about each page, extract keywords, store
indices for users
10. Crawling the web
A Web crawler is an Internet bot which systematically browses the World Wide
Web, typically for the purpose of Web indexing (web spidering). Web search
engines and some other sites use Web crawling or spidering software to
update their web content or indices of others sites' web content.
Including a robots.txt file can request bots to index only parts of a website, or
nothing at all.
11. Crawling
When the Google visit your website for the purpose of tracking, Google does
this with help of machine, known as web crawler, spider, Google bot, internet
bot, automatic indexer
The process of Crawling: Google uses huge set of computer to fetch or crawl”
million of web pages on the web. Googlebot discovers new and updated pages
with the help of site map to be added to the Google to crawl. Crawler obtains
information; add in to the Google index, this is where crawling works.
12. Indexing
Once the crawling process has been done from the web
crawler, the result will store in the Google index. The Google
index is alike to an index or library, which lists information
about all the books or library. If you want more pages included
in the Google index, you can create and submit a
Sitemap through Webmaster Tools.
The index is basically a big list of words and the web pages
that feature them on the basis of keywords location of the
term in that particular webpage will store.
13. Ranking
“Ranking – Determining what each page is about, and
how it should rank for relevant queries”
Search engines have two major functions: crawling and
building an index, and providing search users with a
ranked list of the websites they've determined as the most
relevant.
14. Types of crawlers
• Periodic crawlers: activated periodically; every time it is activated it replaces
the existing index
• Incremental crawler: updates the index incrementally instead of replacing it
• Focused crawler: visits pages related to topics of interest
16. Web Harvesting
• Web harvesting also known asWeb Scraping is in an increasingly popular
method used by websites to channel customer’s searches to their website
• Web harvesting software automatically extracts information from the Web
and picks up where search engines leave off, doing the work the search
engine can't.
• FMiner
17. VirtualWebView
• Large amount of unstructured data can be handled using multiple layered
database(MLDB) on top of the web data
• Every layer of this dbase is more generalized then the preceding layer
• The upper layer are structured and can be accessed using SQL
• View of MLDB-Virtual WebView(VWV)
18. WebML
• Query language which supports data mining operations on MLDB
• Four primitive operations inWebML are
• COVERS
• COVERED BY
• LIKE
• CLOSETO
SELECT *
FROM document in “www.engr.smu.edu”
WHERE ONE OF keywords COVERS “cat”
19. Personalization
• Contents of a web page are modified to fit the desires of the user
• Advertisements are sent to a potential customer based on his specific knowledge
• Personalization is performed on target web page
• Targeting is different from personalization
• In targeting businesses display advertisements at other sites visited by their users
• In personalization when a person visits aWeb site, the advertising can be designed
specifically for that person
20. Personalization Contd….
• Personalization is a combination of clustering, classification and prediction
• Types of personalization are
• Manual techniques – user registration details
• Collaborative filtering
• Content-based filtering
• Eg. MyYahoo
21. Web Usage Mining
It deals with understanding user behavior in interacting
with the web or with a website.
Aim
To obtain information that may assist web sites for
reorganization or adaptation to better suit the user.
23. Web Usage Mining Applications
• Personalization
• Improve structure of a site’s Web pages
• Aid in caching and prediction of future page references
• Improve design of individual pages
• Improve effectiveness of e-commerce (sales and advertising)
24. Web Usage Mining Activities
• Preprocessing Web log
• Cleanse
• Remove extraneous information
• Sessionize
Session: Sequence of pages referenced by one user at a sitting.
• Pattern Discovery
• Count patterns that occur in sessions
• Pattern is sequence of pages references in session.
• Similar to association rules
• Transaction: session
• Itemset: pattern (or subset)
• Order is important
• Pattern Analysis
25. Web Usage Mining Issues
• Identification of exact user not possible.
• Exact sequence of pages referenced by a user not possible due to caching.
• Session not well defined
• Security, privacy, and legal issues
26. Web Log Cleansing
• Replace source IP address with unique but non-identifying ID.
• Replace exact URL of pages referenced with unique but non-identifying ID.
• Delete error records and records containing not page data (such as figures
and code)
27. Web Structure Mining
• Creating a model of the web organization
• Used to classifyWeb pages or to create similarity measures between
documents
• Mine structure (links, graph) of theWeb
• Techniques
• PageRank
• HITS
28. Page Rank
• Designed to increase the effectiveness of search engines and improve their
efficiency
• Used to
• Measure the importance of a page
• Prioritize the pages returned from a traditional search engine using keyword searching
• Page Rank is calculated based on the number of pages that point to it
29. Page Rank
Search engine that uses link structure to calculate a quality ranking (PageRank) for each page
Intuition: PageRank can be seen as the probability that a “random surfer” visits a page
A page is important if important pages link to it
30. PageRank
Page Rank: A page is important if many important pages link to it.
(PageRank) + (Website Content) = Overall Rank in Results
Link
ij :
i considers j important.
the more important i, the more
important j becomes.
if i has many out-links: links are less
important.
31. Let OutDegreei = # out-links of page i
Adjust pj:
PageRank ( j ) (1 d ) +d
PageRank (i )
OutDegree(i)
This is the weighted sum of the importance of the pages
referring to Pj
d-damping factor
Parameter d is probability that the surfer gets bored and starts on
a new random page
(1-d) is the probability that the random surfer follows a link on
current page
33. Hyperlink-induced topic search(HITS)
• Finds hubs and authoritative pages
• HITS has two components
• Based on a given set of keywords relevant pages are found
• Hubs and authority measures are associated with these pages. Pages with highest
values are returned
34. Authorities and hubs
• The algorithm produces two types of pages:
- Authority: pages that provide an important, trustworthy information on a
given topic (highly-referenced pages on a topic)
- Hub: pages that “point” to authorities
• A better hub points to many good authorities .A better authority is pointed
to by many good hubs
35. Definitions
• Authority: pages that provide an important, trustworthy information on a
given topic
• Hubs: pages that contain links to authorities
• Indegree: number of incoming links to a given node, used to measure the
authoritativeness
• Outdegree: number of outgoing links from a given node, here it is used to
measure the hubness
36. 36
HITS Algorithm
• Hubs point to lots of authorities.
• Authorities are pointed to by lots of hubs.
• Together they form a bipartite graph:
Hubs Authorities
37. HITS
Pages that link to a collection of authoritative pages on a broad topic
Hubs
39. HITS
Steps for Discovering Hubs and Authorities on a
specific topic
Collect seed set of pages S (returned by search engine)
Expand seed set to contain pages that point to or are pointed
to by pages in seed set (removes links inside a site)
Iteratively update hub weight h(p) and authority weight a(p)
for each page:
a (p ) h(q ) h(p ) a (q )
q p p q
After a fixed number of iterations, pages with highest
hub/authority weights form core of community
40. Strengths and weaknesses of HITS
Strength: its ability to rank pages according to the
query topic, which may be able to provide more relevant
authority and hub pages.
Weaknesses:
It is easily spammed. It is in fact quite easy to influence HITS
since adding out-links in one’s own page is so easy.
Topic drift. Many pages in the expanded set may not be on topic.
Inefficiency at query time: The query time evaluation is slow.
42. Application areas of web mining
1. E-commerce: personalized marketing;
2. Fight against terrorism: classify threats;
3. Prediction;
4. And others :)
43. Future research directions
1. Multimedia data mining: a picture is worth a thousand words;
2. Multilingual knowledge extraction: web page translations;
3. Semantic web mining.