• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps
 

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps

on

  • 757 views

http://nlp.uned.es/~alpgarcia/pub_index.php

http://nlp.uned.es/~alpgarcia/pub_index.php

Statistics

Views

Total Views
757
Views on SlideShare
755
Embed Views
2

Actions

Likes
0
Downloads
20
Comments
0

1 Embed 2

http://www.slashdocs.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps Presentation Transcript

    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez NLP & IR Group, UNED December 12, 2008
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 2
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 3
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Objectives Group HTML documents by content similarity. Self-Organizing Maps (SOM) to organize, visualize and navigate through the collection. Term weighting function taking advantage of HTML tags Combining, by means of fuzzy logic, heuristic criteria based on the inherent semantics of some HTML tags and word positions in the document. Hypothesis An improvement in document representation will involve an increase in map quality.Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 4
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 5
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Fuzzy logic Capturing human expert knowledge. Close to natural language. Knowledge base: defined by a set of IF-THEN rules. Linguistic variables Defined using natural language words and fuzzy sets. These sets allow the description of the membership degree of an object to a particular class.Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 6
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 7
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 8
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 9
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 10
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 11
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 12
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 13
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 14
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 15
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 16
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 17
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of CriteriaAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 18
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 19
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Linguistic VariablesAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 20
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Linguistic VariablesAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 21
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Linguistic VariablesAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 22
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Linguistic VariablesAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 23
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Linguistic VariablesAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 24
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Linguistic VariablesAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 25
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 26
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Knowledge BaseAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 27
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Knowledge BaseAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 28
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Knowledge BaseAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 29
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Knowledge BaseAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 30
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 ConclusionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 31
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Dimensionality Reduction Input vectors dimension ranging from 100 to 5000 Stopwords, puntuaction marks suffixes, and words occurring less than 50 times in the whole corpus were removed. Two well known methods: Document frequency reduction. Random projection method. Three proposed rank-based methods: Most Valued Terms. Fixed reduction method. More Frequent Terms until n level.Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 32
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 ConclusionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 33
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Document Map Construction Benchmark dataset for clustering: Banksearch1 10000 documents 10 classes SOM size was set equal to the number of classes of input documents, i.e. 5x2, in order to compare clustering results. 1 M. P. Sinka and D. W. Corne. A large benchmark dataset for web document clustering. Soft Computing Systems: Design, Management, and Applications, 2002.Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 34
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 ConclusionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 35
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Evaluation Methods Weighted average of the F-measure for each class. After mapping the collection in the trained map, the class with greater number of documents mapped on a neuron will be selected to label the unit. All the document vectors in a neuron which class is different from the neuron label will be counted as errors.Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 36
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 37
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Best reduction for each term weighting functionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 38
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion MFTn reduction provides stabilityAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 39
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion EFCC+MFTn obtains its best results with the smallest number of featuresAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 40
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 ConclusionAlberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 41
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Conclusion Unsupervised document representation method, based on fuzzy logic, focused on clustering HTML documents by means of self-organizing maps. MFTn reduction is the most stable reduction in all cases. EFCC representation allows to obtain better results using a smaller vocabulary. Smaller number of features needed to represent the input documents and SOM unit vectors, which implies an improvement in computational cost.Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 42
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Thank You!Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 43
    • Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing MapsObjectives Our Approach Experiment Description Results Conclusion Related Work VSM Topic Document Weighting Modifies Information Type Function SOM Self organization of a Massive Document Yes Yes Text Shannon’s Entrophy No Collection2 Document Clustering Yes No Text Binary, TF, TF-IDF No using Phrases3 Document Clustering Yes Yes Text ESVM, HSVM, HyM No using WordNet4 Conceptional SOM5 Yes No Text TF Yes 2 T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a massive document collection. IEEE Trans. on Neural Networks, 2000. 3 J. Bakus, M. Hussin, and M. Kamel. A som-based document clustering using phrases. In ICONIP, 2002. 4 C. Hung and S. Wermter. Neural network based document clustering using wordnet ontologies. Int. J. Hybrid Intell. Syst., 2004 5 Y. Liu, X. Wang, and C. Wu. Consom: A conceptional som model for text clustering. In Neurocomputing, 2008Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 44