Learning Social Networks From Web Documents Using Support

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Learning Social Networks From Web Documents Using Support - Presentation Transcript

    1. Learning Social Networks from Web Documents Using Support Vectors Classifiers IEEE , WI’06 Masoud Makrehchi & Mohamed S. Kamel Presenter: Teng-Kai Fan Date: 2008-11-18
    2. Abstract
      • Learning social network from incomplete relationship data.
      • Translating social network extractions into a text classification problem.
      • SVM (Support Vector Machine)
      • FOAF (Friend Of A Friend) dataset & F-measure.
    3. Outline
      • Introduction
      • Related Work
      • Problem Statement
      • Proposed approach: Learning Social Network from Incomplete Network.
      • Experiment
      • Conclusion
    4. Introduction
      • A social network is defined as a map of relationship (tie) between individuals (actors).
      • Applications:
        • Marketing, Advertising.
        • Finding friends.
    5. Introduction cont.
      • In this study, they proposed an approach to generate a social network from a collection of web documents.
      • Actor-term matrix: every person can be represented by her corresponding documents.
      • Learning social relation from actor-term database.
        • Assumption: the social network is partially explored (training dataset).
        • The support vector classifier is employed to extract the missing relations to complete social network.
    6. Related Work
      • The social network models can be constructed either directly or indirectly .
      • Direct (descriptive): the concept of acquaintanceship can be extracted from information.
        • e-mail, cited paper, relational database and web page link…etc.
      • Indirect (predictive): acquaintanceship is translated into the similarity of two actors.
        • paper, opinions, news…etc.
    7. Problem Statement
      • The goal is to predict and learn the network while knowing only a small number of relations between individual persons.
      • Social networks are represented either by graphs or matrices or adjacency matrix.
      Incomplete matrix (training examples) Complete matrix (learned matrices)
    8. Learning Social Network from Incomplete Network
      • Two assumptions:
        • A subset of relations represented by adjacency matrix.
        • The textual data associated to the actors.
      • Three steps:
        • Modeling the actors in the social network.
        • Modeling the relations between the actors.
        • Training a classifier to learn the social network.
    9. Actor Modeling
      • Each actor is represented by her web documents including home page, blog, CV and so on.
      • All document associated with an individual are merged together to build a unique document vector . Each document is associated to one actor.
      where the weighting schema is
    10. Actor Modeling cont.
      • Consequently, the corpus is modeled by a matrix called Actor-Term Matrix .
      • Dimension reduction:
        • Stemming and stop-word list.
        • DF (document frequency): terms with DF less than 5 and more than 100 were removed.
      Actor Term tf*idf
    11. Relationship Modeling
      • One simple approach to model the relation between two actors is to estimate the similarity of their documents vector.
        • The similarity measure (e.g., cosine, Jaccard and Correlation) offers very poor results because it models each relation with only one variable.
      • A better approach is to aggregate the documents vector of the actors in both sides of the relation and create new aggregated document vector .
    12. Relationship Modeling cont.
      • Let d i and d j be the document vectors associated to the actor a i and a j .
      • The relation between two actors are modeled by aggregating their vectors by an operator such as MIN, MAX , or Product.
      • The aggregated document vector (relation vector) is obtained as follows:
    13. Classifier Design for Imbalance Social Network Data
      • Imbalance social network data
        • The social network is sparse :
      • A common approach to dealing with class imbalance is to artificially re-balance the training data.
        • Up-sampling the minority class.
        • Down-sampling the majority class.
      n : # of actors r : # of relations
    14. Classifier Design for Imbalance Social Network Data cont.
      • An SVM classifier with linear kernel is used for learning the social network.
        • Learning social network is a binary class problem with two classes including positive (connected) and negative (broken).
    15. Experiments
      • Evaluation measures: Precision, Recall and F-measure.
      • Two-fold cross validation.
      • Dataset: a real FOAF database contains 210,611 RDF triples.
        • Relations between the individuals: a set of true social networks.
        • Any web resource address and URLs related to the individuals.
    16. Dataset cont.
      • All social network:
        • Actors: 34,275
        • Real Ties: 33,419
        • Possible relationship: 587,370,675
        • Ratio: 1:17575
      • Down-sampling: remove with less than 20 and more than 70 members social networks.
      • After breaking the database into small sub-graphs:
        • Actors: 254
        • Real ties: 246
        • Possible relationship: 32,131
        • Ratio: 1:130
    17. Results
    18. Results cont.
    19.  
    20. Conclusion
      • A text classification formulation to approximately predict social relations using web documents were proposed.
        • A document vector aggregation model is proposed instead of document similarity.
      • Using the down-sampling to deal with high imbalance data.

    + ceyaceya, 2 years ago

    custom

    578 views, 0 favs, 1 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 578
      • 545 on SlideShare
      • 33 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 12
    Most viewed embeds
    • 33 views on http://web204seminar.blogspot.com

    more

    All embeds
    • 33 views on http://web204seminar.blogspot.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories