Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Weblog Extraction With Fuzzy Classification Methods

1,049 views

Published on

Presentation for the Second International Conference on the Applications of Digital Information and Web Technologies

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Weblog Extraction With Fuzzy Classification Methods

  1. 1. Weblog Extraction with Fuzzy Classification Methods<br />Edy Portmann - <br />University of Fribourg - Switzerland<br />
  2. 2. Content<br />Introduction<br />Weblog extraction – Folksonomies - Fuzzy logic – Fuzzy data clustering<br />Fuzzy weblog extraction<br />Building blocks – Interface - Query engine - Meta search engine - Aggregated documents<br />Example<br />Concluding Remarks<br />Questions and Answers<br />
  3. 3. Weblog extraction<br />Website with regular (reverse-chronological) entries of comments, descriptions of events, or other material<br />Provide instantnews on a particular subject and the readers can leave comments<br />Data extraction is the act or process of retrieving data out of unstructured data sources<br />
  4. 4. Folksonomies<br />Practice and technique to create and manipulate tags collaboratively and annotate and categorize content collaboratively <br />Freely chosen keywords instead of controlled vocabulary<br />User-generated taxonomy<br /><ul><li>To harvest social knowledge from tags
  5. 5. To generate an ontology</li></li></ul><li>Fuzzy logic<br />1<br />Adult<br />Teenager<br />Infant<br /> - 20<br /> - 7.5<br />0 – 7.5<br />20 – 22.5<br />10 - 20<br />7.5 - 10<br />22.5 - <br />22.5 - <br />10 - <br />O<br />Membership Level<br />Adult<br />Infant<br />
  6. 6. Hard vs. fuzzy clustering<br />In hard clustering, data is divided into distinct clusters, where each data element belongs to exactly one cluster<br />In fuzzy clustering, data elements can belong to more than one cluster, and associated with each element is a set of membership levels<br />
  7. 7. Content<br />Introduction<br />Weblog extraction – Folksonomies - Fuzzy logic – Fuzzy data clustering<br />Fuzzy weblog extraction<br />Building blocks – Interface - Query engine - Meta search engine - Aggregated documents<br />Example<br />Concluding Remarks<br />Questions and Answers<br />
  8. 8. Building blocks<br />1<br />4<br />2<br />3<br />
  9. 9. Interface<br />Blogretrievr<br />www.blogretrievr.com/<br />Blogretrievr™<br />I<br />Yo-yo<br /> I<br />1<br />3<br />FuzzynessFactor<br />2<br />Caption<br />1. Search box<br />2. Fuzzyness Factor<br />3. Go!<br />
  10. 10. Query engine: Grassroots Tagging<br />Tags<br />Yo-yo<br />According to these tags, yo-yo, triangle and the colours green, red and blue they must be related in some way!<br />But in which way?<br />Triangle<br />Green<br />Tags<br />Yo-yo<br />Triangle<br />Red<br />Tags<br />Yo-yo<br />Triangle<br />Blue<br />
  11. 11. Query engine: Jaccard coefficient<br />B<br />A<br />Jaccard coefficient<br />AB<br />BA<br />AB<br />AB<br />A<br />A<br />B<br />A<br />B<br />B<br />A<br />B<br />C<br />Not at all similar<br />Somewhatsimilar<br />Quitesimilar<br />
  12. 12. Query engine: fuzzy c-means (FCM)<br />d<br />FCM is a method of clustering which allows one piece of data to belong to two or more clusters<br />d<br />d<br />d<br />d<br />
  13. 13. Query engine: fuzzy c-means (FCM)<br />The algorithm defines for each term the belonging to a certain cluster<br />It is possible that a term belongs to more than one cluster<br />
  14. 14. Query engine: iterative FCM <br />The same terms which belongs to different clusters will be linked together<br />The clusters and the membership degrees remain still <br />Membership Level<br />Green<br />Red<br />Blue<br />
  15. 15. Query engine: iterative FCM (ontology) <br />Each term is linked with other terms<br />Every other term is again linked with terms<br />Every new source tagged (in the Internet) causes new term-links<br />A<br />Membership Cluster<br />Green<br />Red<br />Blue<br />
  16. 16. Query engine: dendrogram <br />d<br />4<br />3<br />1<br />2<br />6<br />1<br />2<br />3<br />5<br />2<br />4<br />1<br />3<br />Membership Level<br />Red<br />Blue<br />Green<br />
  17. 17. Meta search engine<br />Action<br />Blogosphere<br />Fuzzy set search query<br />1<br />2<br />3<br />2. The meta search engine sends the fuzzy set search query to other blog search engines<br />Technorati<br />3. Each blog search engines send the query to the blogosphere…<br />Meta search engine<br />Blogdigger<br />4. …and gathers the results<br />etc.<br />5. The meta search engine collects all results…<br />6. …and aggregates them<br />4<br />5<br />6<br />
  18. 18. Aggregated documents<br />Blogretrievr<br />www.blogretrievr.com/<br />Blogretrievr™<br />Yo-yo<br />Hand puppet<br /> I<br /> I<br />5<br />FuzzynessFactor<br />1<br />2<br />Caption<br />1. Search Map<br />2. Search Results<br />3. Map Rotation <br />4. Zoom in/out<br />5. New search<br />3<br />4<br />
  19. 19. Content<br />Introduction<br />Weblog extraction – Folksonomies - Fuzzy logic – Fuzzy data clustering<br />Fuzzy weblog extraction<br />Building blocks – Interface - Query engine - Meta search engine - Aggregated documents<br />Example<br />Concluding Remarks<br />Questions and Answers<br />
  20. 20. Example: problem specifications<br />What is coming around the edge?<br />Samsung is screening the competitors for new killer applications<br />In the blogosphere new technologies are discussed earlier than in other media <br />OLED<br />LCD<br />LED<br />OEL<br />
  21. 21. Example: Pre-search<br />OEL<br />[0.6,1]<br />OLED<br />LED<br />[0.9,1]<br />is related<br />OLED<br />[1]<br />0.9<br />LED<br />0.6<br />OEL<br />
  22. 22. Example: The search<br />Search for an weblog <br /> with new OLED<br /> technology<br />The membership <br /> degree is [0.8,1]<br />This includes <br /> OLED [1] and <br /> LED [0.9,1]<br />But not OEL [0.6,1]<br />OEL<br />[0.6,1]<br />[0.8..1]<br />LED<br />[0.9,1]<br />OLED<br />[1]<br />FuzzynessFactor<br />
  23. 23. Example: Results<br /><ul><li>Not found with Boolean Search
  24. 24. Not found with Fuzzy Search [0.8..1]</li></ul>Found with Boolean Search<br />Found with Fuzzy Search [0.8..1]<br />OLED<br />LCD<br />LED<br />OEL<br />OLED<br />LCD<br />LED<br />OEL<br />
  25. 25. Content<br />Introduction<br />Weblog extraction – Folksonomies - Fuzzy logic – Fuzzy data clustering<br />Fuzzy weblog extraction<br />Building blocks – Interface - Query engine - Meta search engine - Aggregated documents<br />Example<br />Concluding Remarks<br />Questions and Answers<br />
  26. 26. Concluding remarks<br />The boundaries in the fuzzy set theory are not well-defined<br /><ul><li>The idea is a relationship function with the fundamentals of the set
  27. 27. This function takes values in the interval [0,1] </li></ul>Relationship in a fuzzy set is intrinsically steady instead of abrupt<br />As a result it is possible to find more relevant documents <br />
  28. 28. Aggregated docs with aim to organize the search results into several meaningful categories (clusters) <br />A cluster is a group of similar topics that are related to the original <br />The user benefits include:<br /><ul><li>Get an overview of the available themes or topics
  29. 29. View similar results together in folders rather than scattered throughout a list</li></ul>Concluding remarks<br />
  30. 30. Content<br />Introduction<br />Weblog extraction – Folksonomies - Fuzzy logic – Fuzzy data clustering<br />Fuzzy weblog extraction<br />Building blocks – Interface - Query engine - Meta search engine - Aggregated documents<br />Example<br />Concluding Remarks<br />Questions and Answers<br />
  31. 31. Questions and Answers<br />

×