Weblog Extraction With Fuzzy Classification Methods

Weblog Extraction with Fuzzy Classification Methods Edy Portmann - University of Fribourg - Switzerland

Content Introduction Weblog extraction – Folksonomies - Fuzzy logic – Fuzzy data clustering Fuzzy weblog extraction Building blocks – Interface - Query engine - Meta search engine - Aggregated documents Example Concluding Remarks Questions and Answers

Weblog extraction Website with regular (reverse-chronological) entries of comments, descriptions of events, or other material Provide instantnews on a particular subject and the readers can leave comments Data extraction is the act or process of retrieving data out of unstructured data sources

Folksonomies Practice and technique to create and manipulate tags collaboratively and annotate and categorize content collaboratively Freely chosen keywords instead of controlled vocabulary User-generated taxonomy ,[object Object]

To generate an ontology,[object Object]

Hard vs. fuzzy clustering In hard clustering, data is divided into distinct clusters, where each data element belongs to exactly one cluster In fuzzy clustering, data elements can belong to more than one cluster, and associated with each element is a set of membership levels

Interface Blogretrievr www.blogretrievr.com/ Blogretrievr™ I Yo-yo I 1 3 FuzzynessFactor 2 Caption 1. Search box 2. Fuzzyness Factor 3. Go!

Query engine: Grassroots Tagging Tags Yo-yo According to these tags, yo-yo, triangle and the colours green, red and blue they must be related in some way! But in which way? Triangle Green Tags Yo-yo Triangle Red Tags Yo-yo Triangle Blue

Query engine: Jaccard coefficient B A Jaccard coefficient A B AB AB A A B A B B A B C Not at all similar Somewhatsimilar Quitesimilar

Query engine: fuzzy c-means (FCM) d FCM is a method of clustering which allows one piece of data to belong to two or more clusters d d d d

Query engine: fuzzy c-means (FCM) The algorithm defines for each term the belonging to a certain cluster It is possible that a term belongs to more than one cluster

Query engine: iterative FCM The same terms which belongs to different clusters will be linked together The clusters and the membership degrees remain still Membership Level Green Red Blue

Query engine: iterative FCM (ontology) Each term is linked with other terms Every other term is again linked with terms Every new source tagged (in the Internet) causes new term-links A Membership Cluster Green Red Blue

Query engine: dendrogram d 4 3 1 2 6 1 2 3 5 2 4 1 3 Membership Level Red Blue Green

Meta search engine Action Blogosphere Fuzzy set search query 1 2 3 2. The meta search engine sends the fuzzy set search query to other blog search engines Technorati 3. Each blog search engines send the query to the blogosphere… Meta search engine Blogdigger 4. …and gathers the results etc. 5. The meta search engine collects all results… 6. …and aggregates them 4 5 6

Aggregated documents Blogretrievr www.blogretrievr.com/ Blogretrievr™ Yo-yo Hand puppet I I 5 FuzzynessFactor 1 2 Caption 1. Search Map 2. Search Results 3. Map Rotation 4. Zoom in/out 5. New search 3 4

Example: problem specifications What is coming around the edge? Samsung is screening the competitors for new killer applications In the blogosphere new technologies are discussed earlier than in other media OLED LCD LED OEL

Example: Pre-search OEL [0.6,1] OLED LED [0.9,1] is related OLED [1] 0.9 LED 0.6 OEL

Example: The search Search for an weblog with new OLED technology The membership degree is [0.8,1] This includes OLED [1] and LED [0.9,1] But not OEL [0.6,1] OEL [0.6,1] [0.8..1] LED [0.9,1] OLED [1] FuzzynessFactor

Example: Results ,[object Object]

Not found with Fuzzy Search [0.8..1]Found with Boolean Search Found with Fuzzy Search [0.8..1] OLED LCD LED OEL OLED LCD LED OEL

Concluding remarks The boundaries in the fuzzy set theory are not well-defined ,[object Object]

This function takes values in the interval [0,1] Relationship in a fuzzy set is intrinsically steady instead of abrupt As a result it is possible to find more relevant documents

Aggregated docs with aim to organize the search results into several meaningful categories (clusters) A cluster is a group of similar topics that are related to the original The user benefits include: ,[object Object]

Weblog Extraction With Fuzzy Classification Methods

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Weblog Extraction With Fuzzy Classification Methods

Similar to Weblog Extraction With Fuzzy Classification Methods (20)

Recently uploaded

Recently uploaded (20)

Weblog Extraction With Fuzzy Classification Methods