SlideShare a Scribd company logo
1 of 16
WEB CLUSTERING
ENGINES
Deepak Sharma
MCA
1409114016
Search Engine?
• Search engines are an invaluable tool for
retrieving information from the Web.
In response to a user query, they return a
list of results ranked in order of relevance
to the query.
• Eg: Google,Yahoo,Credo,Grokker etc.
• Google (Flat Ranked Search Engine)
Flat Ranked VS Clustered
• Yippy(Web Clustering Engine)
Why Web Clustering
Engines?
• Conventional Engines are not much
efficient in ‘Ambiguous’ queries.
• The search results returned by
conventional search engines on query will
be mixed together in the list,irrelevant
items occurs.
In this context clustering of search results
come in to picture!!
• Search engine
• Clustering is the act of grouping similar
object into sets.
• The distance between the objects in the
same cluster(inter-cluster variations)
should be minimum
• The distance between objects in different
clusters(intra-cluster variations) should be
maximum.
Web Clustering Engines?
• This systems group the results returned by
a search engine into a hierarchy of labeled
clusters (also called categories).
Web clustering engines:
1. Northern Light - predefined set of clusters
2. Vivısimo - cluster labels were dynamically generated
3. Clusty,
4. Grokker,
5. KartOO,
6. Lingo3G,
7. CREDO,etc
• Short input data description.
• Meaningful labels.
• Selection of similarity measure.
• Grouping of objects into clusters.
• Computational efficiency.
• Unknown number of clusters.
Issues in Implementation Of
clusters
Architecture & Techniques
Search Results Acquisition
• Provides input for the rest of the system.
• Based on the query, the acquisition
component must deliver 50 to 500 results,
each of which should contain a title, a
contextual snippet, and the URL
• The source of search results can be any
public search engines, such as
Google,Yahoo etc.
• Fetching results from other search
engines by API of these engines.
Preprocessing of Search
results
• Primary aim is to convert the search
results into ‘features’
steps:
i.Language identification
ii.Tokenization
iii.Stemming
iv.Selection features
ii.Tokenization:
Text of each search result gets split into a
sequence of basic independent units called
tokens represent by word,number or
symbol.
More complex for languages where white
spaces are not present (such as Chinese)
or switch direction (such as an Arabic text).
iii.Stemming:
Remove the inflectional prefixes and suffixes
of each word to reduce different grammatical
form of the word to a common base form
called a ‘stem’.
Eg:
connected,connecting & interconnection
↓ ↓ ↓
‘connect’
iv.Selection features:
•Extract features for each search result
present in the input.
•Features are atomic entities by which we
can describe an object and represent its
most important characteristic to an
algorithm.
•Features vary from single word to tuples of
word.
How can represent a feature/text?
• Vector Space Model(VSM)
• Document d is represented in the VSM as a
vector [wt0 , wt1 , . . .wtn]
where t0, t1, . . . tn is a set of words/features
and wti is the weight/importance of feature ti
Eg:
d→“Polly had a dog and the dog had Polly”
vsm representation
THANK YOU

More Related Content

Similar to webclustering engine

web clustering engines
web clustering enginesweb clustering engines
web clustering enginesArun TR
 
Internet Research Presentation
Internet Research PresentationInternet Research Presentation
Internet Research Presentationadeason
 
SEO Introduction
SEO IntroductionSEO Introduction
SEO IntroductionSSAA60
 
Whats new in search in SharePoint 2013
Whats new in search in SharePoint 2013Whats new in search in SharePoint 2013
Whats new in search in SharePoint 2013Michal Pisarek
 
best Digital Marketing ppt for all......
best Digital Marketing ppt for all......best Digital Marketing ppt for all......
best Digital Marketing ppt for all......Smayara
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine OptimizationSD Sharma
 
Knowledge Panels, Rich Snippets and Semantic Markup
Knowledge Panels, Rich Snippets and Semantic MarkupKnowledge Panels, Rich Snippets and Semantic Markup
Knowledge Panels, Rich Snippets and Semantic MarkupBill Slawski
 
Seo top amazing ppt
Seo  top amazing pptSeo  top amazing ppt
Seo top amazing pptMamthaz M
 
PPT Web Clustering Engine.pptx
PPT Web Clustering Engine.pptxPPT Web Clustering Engine.pptx
PPT Web Clustering Engine.pptxDhammanandLonare
 
Search engine optimsation
Search engine optimsationSearch engine optimsation
Search engine optimsationAneenaBinoy2
 
digital marketing on search engine material for marketing students
digital marketing on search engine material for marketing studentsdigital marketing on search engine material for marketing students
digital marketing on search engine material for marketing studentsAlazerTesfayeErsasuT
 
Deep-Dive to Azure Search
Deep-Dive to Azure SearchDeep-Dive to Azure Search
Deep-Dive to Azure SearchGunnar Peipman
 
Introduction to SEO Basics
Introduction to SEO BasicsIntroduction to SEO Basics
Introduction to SEO BasicsJenifer Renjini
 
Understanding Search Marketing :SEO & SEM
Understanding Search Marketing :SEO & SEMUnderstanding Search Marketing :SEO & SEM
Understanding Search Marketing :SEO & SEMAnubha Rastogi
 
Digital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMDigital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMChinmayKale14
 
Digital Marketing Classes in PCMC -SIM
Digital Marketing Classes in PCMC -SIMDigital Marketing Classes in PCMC -SIM
Digital Marketing Classes in PCMC -SIMChinmayKale14
 
Search engines by Gulshan K Maheshwari(QAU)
Search engines by Gulshan  K Maheshwari(QAU)Search engines by Gulshan  K Maheshwari(QAU)
Search engines by Gulshan K Maheshwari(QAU)GulshanKumar368
 
SEARCH ENGINE OPTIMIZATION
SEARCH ENGINE OPTIMIZATIONSEARCH ENGINE OPTIMIZATION
SEARCH ENGINE OPTIMIZATIONnetultimateemp
 

Similar to webclustering engine (20)

CAB 2.pptx
CAB 2.pptxCAB 2.pptx
CAB 2.pptx
 
web clustering engines
web clustering enginesweb clustering engines
web clustering engines
 
Internet Research Presentation
Internet Research PresentationInternet Research Presentation
Internet Research Presentation
 
SEO Introduction
SEO IntroductionSEO Introduction
SEO Introduction
 
Whats new in search in SharePoint 2013
Whats new in search in SharePoint 2013Whats new in search in SharePoint 2013
Whats new in search in SharePoint 2013
 
best Digital Marketing ppt for all......
best Digital Marketing ppt for all......best Digital Marketing ppt for all......
best Digital Marketing ppt for all......
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine Optimization
 
Knowledge Panels, Rich Snippets and Semantic Markup
Knowledge Panels, Rich Snippets and Semantic MarkupKnowledge Panels, Rich Snippets and Semantic Markup
Knowledge Panels, Rich Snippets and Semantic Markup
 
Seo top amazing ppt
Seo  top amazing pptSeo  top amazing ppt
Seo top amazing ppt
 
PPT Web Clustering Engine.pptx
PPT Web Clustering Engine.pptxPPT Web Clustering Engine.pptx
PPT Web Clustering Engine.pptx
 
Search engine optimsation
Search engine optimsationSearch engine optimsation
Search engine optimsation
 
digital marketing on search engine material for marketing students
digital marketing on search engine material for marketing studentsdigital marketing on search engine material for marketing students
digital marketing on search engine material for marketing students
 
Deep-Dive to Azure Search
Deep-Dive to Azure SearchDeep-Dive to Azure Search
Deep-Dive to Azure Search
 
Introduction to SEO Basics
Introduction to SEO BasicsIntroduction to SEO Basics
Introduction to SEO Basics
 
Digital marketing course
Digital marketing course Digital marketing course
Digital marketing course
 
Understanding Search Marketing :SEO & SEM
Understanding Search Marketing :SEO & SEMUnderstanding Search Marketing :SEO & SEM
Understanding Search Marketing :SEO & SEM
 
Digital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIMDigital Marketing Classes in Pune- SIM
Digital Marketing Classes in Pune- SIM
 
Digital Marketing Classes in PCMC -SIM
Digital Marketing Classes in PCMC -SIMDigital Marketing Classes in PCMC -SIM
Digital Marketing Classes in PCMC -SIM
 
Search engines by Gulshan K Maheshwari(QAU)
Search engines by Gulshan  K Maheshwari(QAU)Search engines by Gulshan  K Maheshwari(QAU)
Search engines by Gulshan K Maheshwari(QAU)
 
SEARCH ENGINE OPTIMIZATION
SEARCH ENGINE OPTIMIZATIONSEARCH ENGINE OPTIMIZATION
SEARCH ENGINE OPTIMIZATION
 

Recently uploaded

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 

Recently uploaded (20)

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

webclustering engine

  • 2. Search Engine? • Search engines are an invaluable tool for retrieving information from the Web. In response to a user query, they return a list of results ranked in order of relevance to the query. • Eg: Google,Yahoo,Credo,Grokker etc.
  • 3. • Google (Flat Ranked Search Engine) Flat Ranked VS Clustered
  • 5. Why Web Clustering Engines? • Conventional Engines are not much efficient in ‘Ambiguous’ queries. • The search results returned by conventional search engines on query will be mixed together in the list,irrelevant items occurs. In this context clustering of search results come in to picture!!
  • 6. • Search engine • Clustering is the act of grouping similar object into sets. • The distance between the objects in the same cluster(inter-cluster variations) should be minimum • The distance between objects in different clusters(intra-cluster variations) should be maximum. Web Clustering Engines?
  • 7. • This systems group the results returned by a search engine into a hierarchy of labeled clusters (also called categories). Web clustering engines: 1. Northern Light - predefined set of clusters 2. Vivısimo - cluster labels were dynamically generated 3. Clusty, 4. Grokker, 5. KartOO, 6. Lingo3G, 7. CREDO,etc
  • 8. • Short input data description. • Meaningful labels. • Selection of similarity measure. • Grouping of objects into clusters. • Computational efficiency. • Unknown number of clusters. Issues in Implementation Of clusters
  • 10. Search Results Acquisition • Provides input for the rest of the system. • Based on the query, the acquisition component must deliver 50 to 500 results, each of which should contain a title, a contextual snippet, and the URL • The source of search results can be any public search engines, such as Google,Yahoo etc. • Fetching results from other search engines by API of these engines.
  • 11. Preprocessing of Search results • Primary aim is to convert the search results into ‘features’ steps: i.Language identification ii.Tokenization iii.Stemming iv.Selection features
  • 12. ii.Tokenization: Text of each search result gets split into a sequence of basic independent units called tokens represent by word,number or symbol. More complex for languages where white spaces are not present (such as Chinese) or switch direction (such as an Arabic text).
  • 13. iii.Stemming: Remove the inflectional prefixes and suffixes of each word to reduce different grammatical form of the word to a common base form called a ‘stem’. Eg: connected,connecting & interconnection ↓ ↓ ↓ ‘connect’
  • 14. iv.Selection features: •Extract features for each search result present in the input. •Features are atomic entities by which we can describe an object and represent its most important characteristic to an algorithm. •Features vary from single word to tuples of word.
  • 15. How can represent a feature/text? • Vector Space Model(VSM) • Document d is represented in the VSM as a vector [wt0 , wt1 , . . .wtn] where t0, t1, . . . tn is a set of words/features and wti is the weight/importance of feature ti Eg: d→“Polly had a dog and the dog had Polly” vsm representation