Fuzzy Combinations of Criteria: An Application to     Web Page Representation for Clustering  Alberto P´rez Garc´         ...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation        Understanding the system                  Improving the Combination                   Summary      Motiv...
Motivation        Understanding the system                  Improving the Combination                   Summary      Motiv...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation         Understanding the system                  Improving the Combination                   Summary      Web ...
Motivation        Understanding the system                  Improving the Combination                   Summary      Differ...
Motivation         Understanding the system                  Improving the Combination                   Summary      Diffe...
Motivation        Understanding the system                  Improving the Combination                   Summary      Differ...
Motivation        Understanding the system                  Improving the Combination                   Summary      Differ...
Motivation       Understanding the system                  Improving the Combination                   Summary      Differe...
Motivation       Understanding the system                  Improving the Combination                   Summary      Differe...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation                  Understanding the system                    Improving the Combination                     Summ...
Motivation                  Understanding the system                    Improving the Combination                     Summ...
Motivation    Understanding the system                 Improving the Combination                   Summary      Example: a...
Motivation    Understanding the system                 Improving the Combination                   Summary      Example: a...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation                      Understanding the system                   Improving the Combination                    Su...
Motivation                      Understanding the system                   Improving the Combination                    Su...
Motivation                      Understanding the system                   Improving the Combination                    Su...
Motivation    Understanding the system                  Improving the Combination                   Summary      Example: ...
Motivation    Understanding the system                  Improving the Combination                   Summary      Example: ...
Motivation           Understanding the system                  Improving the Combination                   Summary      A ...
Motivation           Understanding the system                  Improving the Combination                   Summary      A ...
Motivation           Understanding the system                  Improving the Combination                   Summary      A ...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation           Understanding the system                  Improving the Combination                   Summary      Ba...
Motivation           Understanding the system                  Improving the Combination                   Summary      Ba...
Motivation           Understanding the system                  Improving the Combination                   Summary      Ba...
Motivation           Understanding the system                  Improving the Combination                   Summary      Ba...
Motivation           Understanding the system                  Improving the Combination                   Summary      Ba...
Motivation           Understanding the system                  Improving the Combination                   Summary      Ba...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation         Understanding the system                  Improving the Combination                   Summary      Dime...
Motivation         Understanding the system                  Improving the Combination                   Summary          ...
Motivation         Understanding the system                  Improving the Combination                   Summary          ...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation           Understanding the system                  Improving the Combination                   Summary      Re...
Motivation           Understanding the system                  Improving the Combination                   Summary      Re...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation        Understanding the system                  Improving the Combination                   Summary      Syste...
Motivation        Understanding the system                  Improving the Combination                   Summary      Syste...
Motivation             Understanding the system                  Improving the Combination                   Summary      ...
Motivation           Understanding the system                  Improving the Combination                   Summary      Su...
Motivation           Understanding the system                  Improving the Combination                   Summary      Su...
Motivation           Understanding the system                  Improving the Combination                   Summary      Su...
Motivation           Understanding the system                  Improving the Combination                   Summary      Su...
Motivation           Understanding the system                  Improving the Combination                   Summary      Su...
Motivation           Understanding the system                  Improving the Combination                   Summary      Su...
Motivation   Understanding the system                  Improving the Combination                   Summary      Thank You!...
Upcoming SlideShare
Loading in …5
×

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

348 views

Published on

Slides for CICLing 2012, New Delhi, India.

http://nlp.uned.es/~alpgarcia/pub_index.php

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
348
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering - Cicling12

  1. 1. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering Alberto P´rez Garc´ e ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez NLP & IR Group, Distance Learning University (UNED) CICLing 2012, New Delhi, India March 15, 2012
  2. 2. Motivation Understanding the system Improving the Combination Summary Table of Contents 1 Motivation Web Page Representation Linear Combination of Criteria Nonlinear Combination of Criteria 2 Understanding the system Experimental Settings Dimension Reduction Analysis Study of Individual Criteria 3 Improving the Combination 4 Summary Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 2 / 30
  3. 3. Motivation Understanding the system Improving the Combination Summary Motivation Main goal To understand how to represent web pages for clustering. Question How to combine different page features to represent web pages? Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 3 / 30
  4. 4. Motivation Understanding the system Improving the Combination Summary Motivation Main goal To understand how to represent web pages for clustering. Question How to combine different page features to represent web pages? Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 3 / 30
  5. 5. Motivation Understanding the system Improving the Combination Summary Table of Contents 1 Motivation Web Page Representation Linear Combination of Criteria Nonlinear Combination of Criteria 2 Understanding the system Experimental Settings Dimension Reduction Analysis Study of Individual Criteria 3 Improving the Combination 4 Summary Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 4 / 30
  6. 6. Motivation Understanding the system Improving the Combination Summary Web Page Representation Hypothesis A good document representation should be based on how humans read documents. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 5 / 30
  7. 7. Motivation Understanding the system Improving the Combination Summary Different Criteria for Web Page Representation Criteria: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 6 / 30
  8. 8. Motivation Understanding the system Improving the Combination Summary Different Criteria for Web Page Representation § ¤ Criteria: ¦ Title ¥ Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 6 / 30
  9. 9. Motivation Understanding the system Improving the Combination Summary Different Criteria for Web Page Representation § ¤§ ¤ Criteria: ¦ Title ¥Emphasis ¦ ¥ Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 6 / 30
  10. 10. Motivation Understanding the system Improving the Combination Summary Different Criteria for Web Page Representation Word positions: Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 6 / 30
  11. 11. Motivation Understanding the system Improving the Combination Summary Different Criteria for Web Page Representation § ¤ Word positions: ¦ Preferential ¥ Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 6 / 30
  12. 12. Motivation Understanding the system Improving the Combination Summary Different Criteria for Web Page Representation § ¤§ ¤ Word positions: ¦ Preferential ¥Standard ¥ ¦ Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 6 / 30
  13. 13. Motivation Understanding the system Improving the Combination Summary Table of Contents 1 Motivation Web Page Representation Linear Combination of Criteria Nonlinear Combination of Criteria 2 Understanding the system Experimental Settings Dimension Reduction Analysis Study of Individual Criteria 3 Improving the Combination 4 Summary Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 7 / 30
  14. 14. Motivation Understanding the system Improving the Combination Summary Linear Combination of Criteria For example: Analytical Combination of Criteria (acc)1 . Importance of a term in a document: Ik = tk it + ek ie + fk if + pk ip (1) Ik = 1 ∗ 0.4 + 0.6 ∗ 0.3 + 0 ∗ 0.2 + 0 ∗ 0.1 = 0.4 (2) Drawback The importance of a term in a component is calculated regardless the rest of the components. 1 V. Fresno and A. Ribeiro. An analytical approach to concept extraction in html environments. J. Intell. Inf. Syst., 22(3):215–235, 2004. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 8 / 30
  15. 15. Motivation Understanding the system Improving the Combination Summary Linear Combination of Criteria For example: Analytical Combination of Criteria (acc)1 . Importance of a term in a document: Ik = tk it + ek ie + fk if + pk ip (1) Ik = 1 ∗ 0.4 + 0.6 ∗ 0.3 + 0 ∗ 0.2 + 0 ∗ 0.1 = 0.4 (2) Drawback The importance of a term in a component is calculated regardless the rest of the components. 1 V. Fresno and A. Ribeiro. An analytical approach to concept extraction in html environments. J. Intell. Inf. Syst., 22(3):215–235, 2004. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 8 / 30
  16. 16. Motivation Understanding the system Improving the Combination Summary Example: acc Call to Arms Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 9 / 30
  17. 17. Motivation Understanding the system Improving the Combination Summary Example: acc Example of rethoric title “Call to arms” is the title of a page that contains an article about the new trades made by New York Yankees baseball team and how these trades affect to Boston Red Sox, their main rival in the Major League Baseball. Drawback Title terms are not related to document topic. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 10 / 30
  18. 18. Motivation Understanding the system Improving the Combination Summary Table of Contents 1 Motivation Web Page Representation Linear Combination of Criteria Nonlinear Combination of Criteria 2 Understanding the system Experimental Settings Dimension Reduction Analysis Study of Individual Criteria 3 Improving the Combination 4 Summary Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 11 / 30
  19. 19. Motivation Understanding the system Improving the Combination Summary Nonlinear Combination of Criteria Fuzzy Combination of Criteria (fcc)2 allows nonlinear combinations of criteria. It is possible to define related conditions. It produces vectors within the VSM. 2 A. Ribeiro, V. Fresno, M. C. Garcia-Alegre, and D. Guinea. A fuzzy system for the web page representation. 2003. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 12 / 30
  20. 20. Motivation Understanding the system Improving the Combination Summary Nonlinear Combination of Criteria Fuzzy Combination of Criteria (fcc)2 allows nonlinear combinations of criteria. It is possible to define related conditions. It produces vectors within the VSM. 2 A. Ribeiro, V. Fresno, M. C. Garcia-Alegre, and D. Guinea. A fuzzy system for the web page representation. 2003. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 12 / 30
  21. 21. Motivation Understanding the system Improving the Combination Summary Nonlinear Combination of Criteria Fuzzy Combination of Criteria (fcc)2 allows nonlinear combinations of criteria. It is possible to define related conditions. It produces vectors within the VSM. 2 A. Ribeiro, V. Fresno, M. C. Garcia-Alegre, and D. Guinea. A fuzzy system for the web page representation. 2003. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 12 / 30
  22. 22. Motivation Understanding the system Improving the Combination Summary Example: fcc Example of rethoric title Now, we can express that a term should appear in the title and emphasized to be considered important. Nonlinearity Title terms can be considered not important because they do not appear in the rest of the text. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 13 / 30
  23. 23. Motivation Understanding the system Improving the Combination Summary Example: fcc Example of rethoric title Now, we can express that a term should appear in the title and emphasized to be considered important. Nonlinearity Title terms can be considered not important because they do not appear in the rest of the text. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 13 / 30
  24. 24. Motivation Understanding the system Improving the Combination Summary A quick glance at fcc Close to natural language. Knowledge base: defined by a set of IF-THEN rules. Rules are based on how humans read documents. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 14 / 30
  25. 25. Motivation Understanding the system Improving the Combination Summary A quick glance at fcc Close to natural language. Knowledge base: defined by a set of IF-THEN rules. Rules are based on how humans read documents. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 14 / 30
  26. 26. Motivation Understanding the system Improving the Combination Summary A quick glance at fcc Close to natural language. Knowledge base: defined by a set of IF-THEN rules. Rules are based on how humans read documents. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 14 / 30
  27. 27. Motivation Understanding the system Improving the Combination Summary Table of Contents 1 Motivation Web Page Representation Linear Combination of Criteria Nonlinear Combination of Criteria 2 Understanding the system Experimental Settings Dimension Reduction Analysis Study of Individual Criteria 3 Improving the Combination 4 Summary Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 15 / 30
  28. 28. Motivation Understanding the system Improving the Combination Summary Table of Contents 1 Motivation Web Page Representation Linear Combination of Criteria Nonlinear Combination of Criteria 2 Understanding the system Experimental Settings Dimension Reduction Analysis Study of Individual Criteria 3 Improving the Combination 4 Summary Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 16 / 30
  29. 29. Motivation Understanding the system Improving the Combination Summary Basic Clustering Settings We remove stopwords, punctuation and suffixes (Porter’s algorithm). Clustering: Cluto-rbr with default parameters. Web page representations: tf-idf and fcc Dimension reduction techniques (100, 500, 1000, 2000 and 5000 features): mft and lsi. Banksearch and Webkb. F-measure to evaluate clustering quality. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 17 / 30
  30. 30. Motivation Understanding the system Improving the Combination Summary Basic Clustering Settings We remove stopwords, punctuation and suffixes (Porter’s algorithm). Clustering: Cluto-rbr with default parameters. Web page representations: tf-idf and fcc Dimension reduction techniques (100, 500, 1000, 2000 and 5000 features): mft and lsi. Banksearch and Webkb. F-measure to evaluate clustering quality. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 17 / 30
  31. 31. Motivation Understanding the system Improving the Combination Summary Basic Clustering Settings We remove stopwords, punctuation and suffixes (Porter’s algorithm). Clustering: Cluto-rbr with default parameters. Web page representations: tf-idf and fcc Dimension reduction techniques (100, 500, 1000, 2000 and 5000 features): mft and lsi. Banksearch and Webkb. F-measure to evaluate clustering quality. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 17 / 30
  32. 32. Motivation Understanding the system Improving the Combination Summary Basic Clustering Settings We remove stopwords, punctuation and suffixes (Porter’s algorithm). Clustering: Cluto-rbr with default parameters. Web page representations: tf-idf and fcc Dimension reduction techniques (100, 500, 1000, 2000 and 5000 features): mft and lsi. Banksearch and Webkb. F-measure to evaluate clustering quality. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 17 / 30
  33. 33. Motivation Understanding the system Improving the Combination Summary Basic Clustering Settings We remove stopwords, punctuation and suffixes (Porter’s algorithm). Clustering: Cluto-rbr with default parameters. Web page representations: tf-idf and fcc Dimension reduction techniques (100, 500, 1000, 2000 and 5000 features): mft and lsi. Banksearch and Webkb. F-measure to evaluate clustering quality. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 17 / 30
  34. 34. Motivation Understanding the system Improving the Combination Summary Basic Clustering Settings We remove stopwords, punctuation and suffixes (Porter’s algorithm). Clustering: Cluto-rbr with default parameters. Web page representations: tf-idf and fcc Dimension reduction techniques (100, 500, 1000, 2000 and 5000 features): mft and lsi. Banksearch and Webkb. F-measure to evaluate clustering quality. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 17 / 30
  35. 35. Motivation Understanding the system Improving the Combination Summary Table of Contents 1 Motivation Web Page Representation Linear Combination of Criteria Nonlinear Combination of Criteria 2 Understanding the system Experimental Settings Dimension Reduction Analysis Study of Individual Criteria 3 Improving the Combination 4 Summary Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 18 / 30
  36. 36. Motivation Understanding the system Improving the Combination Summary Dimension Reduction Analysis Hypothesis If lsi improves mft, then the weighting function is not able to find the most representative terms. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 19 / 30
  37. 37. Motivation Understanding the system Improving the Combination Summary Rep. Avg. S.D. Banksearch tf-idf mft 0,748 0,028 tf-idf lsi 0,756 0,005 fcc mft 0,756 0,019 fcc lsi 0,769 0,011 Webkb tf-idf mft 0,460 0,051 tf-idf lsi 0,507 0,006 fcc mft 0,469 0,009 fcc lsi 0,466 0,011 Conclusion The weighting function is not working as well as it could. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 20 / 30
  38. 38. Motivation Understanding the system Improving the Combination Summary Rep. Avg. S.D. Banksearch tf-idf mft 0,748 0,028 tf-idf lsi 0,756 0,005 fcc mft 0,756 0,019 fcc lsi 0,769 0,011 Webkb tf-idf mft 0,460 0,051 tf-idf lsi 0,507 0,006 fcc mft 0,469 0,009 fcc lsi 0,466 0,011 Conclusion Results for fcc in Webkb dataset are surprisingly bad. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 20 / 30
  39. 39. Motivation Understanding the system Improving the Combination Summary Table of Contents 1 Motivation Web Page Representation Linear Combination of Criteria Nonlinear Combination of Criteria 2 Understanding the system Experimental Settings Dimension Reduction Analysis Study of Individual Criteria 3 Improving the Combination 4 Summary Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 21 / 30
  40. 40. Motivation Understanding the system Improving the Combination Summary Results for Criteria Analysis Rep.Dim. 100 500 1000 2000 5000 Banksearch fcc mft 0,723 0,757 0,768 0,765 0,768 title 0,626 0,646 0,632 0,634 0,639 emphasis 0,586 0,671 0,674 0,685 0,693 frequency 0,689 0,715 0,720 0,724 0,731 position 0,310 0,525 0,538 0,599 0,608 For Banksearch, fcc get always higher values than individual criteria, so the combination works better in all cases. Frequency seems to be the best among the individual criteria. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 22 / 30
  41. 41. Motivation Understanding the system Improving the Combination Summary Results for Criteria Analysis Rep.Dim. 100 500 1000 2000 5000 Webkb fcc mft 0,453 0,472 0,475 0,468 0,475 title 0,432 0,433 0,404 0,488 0,479 emphasis 0,415 0,431 0,433 0,465 0,489 frequency 0,441 0,460 0,460 0,468 0,446 position 0,301 0,283 0,317 0,281 0,286 For Webkb, fcc does not always outperform the others. Frequency is not always the best among the individual criteria. When title and emphasis could lead to a better clustering, the combination get worse. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 23 / 30
  42. 42. Motivation Understanding the system Improving the Combination Summary Table of Contents 1 Motivation Web Page Representation Linear Combination of Criteria Nonlinear Combination of Criteria 2 Understanding the system Experimental Settings Dimension Reduction Analysis Study of Individual Criteria 3 Improving the Combination 4 Summary Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 24 / 30
  43. 43. Motivation Understanding the system Improving the Combination Summary Improving the Combination Frequency should influence the decision more than position. IF Title AND Frequency AND Emphasis AND Position THEN Importance Low Medium Low Preferential ⇒ Low Low Medium Low Standard ⇒ No Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 25 / 30
  44. 44. Motivation Understanding the system Improving the Combination Summary Extended Fuzzy Combination of Criteria (efcc) IF Title AND Frequency AND Emphasis AND Position THEN Importance High High ⇒ Very High High Medium Preferential ⇒ High High Medium Standard ⇒ Medium High Low Preferential ⇒ Medium High Low Standard ⇒ Low Low High Preferential ⇒ High Low High Standard ⇒ Medium Low Medium Preferential ⇒ Medium Low Medium Standard ⇒ Low Low Low Preferential ⇒ Low Low Low Standard ⇒ No High ⇒ Very High Medium ⇒ Medium Low ⇒ No Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 26 / 30
  45. 45. Motivation Understanding the system Improving the Combination Summary System Comparison With efcc, both reduction methods get similar results. Rep. Avg. S.D. Banksearch tf-idf lsi 0,756 0,005 fcc lsi 0,769 0,011 efcc mft 0,760 0,014 efcc lsi 0,758 0,013 Webkb tf-idf lsi 0,507 0,006 fcc mft 0,469 0,009 efcc mft 0,532 0,032 efcc lsi 0,483 0,000 Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 27 / 30
  46. 46. Motivation Understanding the system Improving the Combination Summary System Comparison efcc solves the problems of fcc in Webkb. Rep. Avg. S.D. Banksearch tf-idf lsi 0,756 0,005 fcc lsi 0,769 0,011 efcc mft 0,760 0,014 efcc lsi 0,758 0,013 Webkb tf-idf lsi 0,507 0,006 fcc mft 0,469 0,009 efcc mft 0,532 0,032 efcc lsi 0,483 0,000 Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 27 / 30
  47. 47. Motivation Understanding the system Improving the Combination Summary Table of Contents 1 Motivation Web Page Representation Linear Combination of Criteria Nonlinear Combination of Criteria 2 Understanding the system Experimental Settings Dimension Reduction Analysis Study of Individual Criteria 3 Improving the Combination 4 Summary Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 28 / 30
  48. 48. Motivation Understanding the system Improving the Combination Summary Summary We present a term weighting function based on how human read documents. The representation is not oriented to concrete sets of web pages. Nonlinear systems help express relations among criteria. With a good term weighting function it is possible to use lightweight dimension reduction techniques. Our system try to ease the communication between technical and linguistic experts. Anchor texts were also studied as a way of adding contextual information. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 29 / 30
  49. 49. Motivation Understanding the system Improving the Combination Summary Summary We present a term weighting function based on how human read documents. The representation is not oriented to concrete sets of web pages. Nonlinear systems help express relations among criteria. With a good term weighting function it is possible to use lightweight dimension reduction techniques. Our system try to ease the communication between technical and linguistic experts. Anchor texts were also studied as a way of adding contextual information. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 29 / 30
  50. 50. Motivation Understanding the system Improving the Combination Summary Summary We present a term weighting function based on how human read documents. The representation is not oriented to concrete sets of web pages. Nonlinear systems help express relations among criteria. With a good term weighting function it is possible to use lightweight dimension reduction techniques. Our system try to ease the communication between technical and linguistic experts. Anchor texts were also studied as a way of adding contextual information. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 29 / 30
  51. 51. Motivation Understanding the system Improving the Combination Summary Summary We present a term weighting function based on how human read documents. The representation is not oriented to concrete sets of web pages. Nonlinear systems help express relations among criteria. With a good term weighting function it is possible to use lightweight dimension reduction techniques. Our system try to ease the communication between technical and linguistic experts. Anchor texts were also studied as a way of adding contextual information. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 29 / 30
  52. 52. Motivation Understanding the system Improving the Combination Summary Summary We present a term weighting function based on how human read documents. The representation is not oriented to concrete sets of web pages. Nonlinear systems help express relations among criteria. With a good term weighting function it is possible to use lightweight dimension reduction techniques. Our system try to ease the communication between technical and linguistic experts. Anchor texts were also studied as a way of adding contextual information. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 29 / 30
  53. 53. Motivation Understanding the system Improving the Combination Summary Summary We present a term weighting function based on how human read documents. The representation is not oriented to concrete sets of web pages. Nonlinear systems help express relations among criteria. With a good term weighting function it is possible to use lightweight dimension reduction techniques. Our system try to ease the communication between technical and linguistic experts. Anchor texts were also studied as a way of adding contextual information. Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 29 / 30
  54. 54. Motivation Understanding the system Improving the Combination Summary Thank You! Fuzzy Combinations of Criteria: An Application to Web Page Representation for Clustering 30 / 30

×