Semantic Recommandation Sytems for Research 2.0


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Semantic Recommandation Sytems for Research 2.0

  1. 1. SEMANTIC RECOMMENDATION SYSTEMS FOR RESEARCH 2.0 OR A Conceptual Prototype for a Twitter based Recommender System for Research 2.0 by Patrick ThonhauserThursday, October 11, 12
  2. 2. OUTLINE • Motivation • Basics (Semantic Web, Recommender Systems, Natural Language Processing) • Conceptual Prototype • Test results and Discussion • QuestionsThursday, October 11, 12
  3. 3. MOTIVATION • Is Twitter useful for discovering new connections between researchers in similar subject areas (and why Twitter)? • How much information can we extract form 140 character strings? • Is it possible to separate useful information from noise? • Are there any appropriate classifiers and metrics to measure the significance of Twitter users and Tweets?Thursday, October 11, 12
  4. 4. SEMANTIC WEB • Additional Layer of Information • Linked Data (use URIs as names, use HTTP URIs, use standards to provide Information, include links to other URIs) • RDF (based on triples -> subject, predicate, object) is like HTML for the classic web • Nearly all semantic web standards are based on RDF (like FOAF - Friend of a Friend Project)Thursday, October 11, 12
  5. 5. RECOMMENDER SYSTEMS • Collaborative Filtering (user based/ item based) • Content Based Recommendation • Knowledge Based Recommendation • Hybrid RecommendationsThursday, October 11, 12
  6. 6. NATURAL LANGUAGE PROCESSING (NLP) • Classification of Microtext Artefacts (This presentation is killer!) • Applying NLP - Pipelines • End of Sentence Detection • Tokenization • POS Tagging • Chunking • ExtractionThursday, October 11, 12
  7. 7. THE CONCEPT OF THOUGHT BUBBLES Let’s imagine every Twitter user belongs to several different topic related BubblesThursday, October 11, 12
  8. 8. LET’S SUMMARIZE •A user is part of topic related bubbles • Twitter users within topic related bubbles don’t necessarily know each other • Connections of already existing connections of the service user lead to new information • Non bidirectional connections preferred So how can we find such potentially interesting users?Thursday, October 11, 12
  9. 9. PROOF OF CONCEPT SYSTEM (1) Preselection of user set, which will be analyzed in depth A USERS THOUGHT BUBBLE SPORTS (2) Apply NLP-Pipeline for measuring user similarity SERVICE IOS DEV USER (3) Categorize the top-n best scoring TW RE ITT SOCIAL MEDIA ST ER T BU HO users according to the idea of AP BB UG I LE HT S AP Thought Bubbles I PRE- FILTERING NLP (4) Recommend top-n best scoring CLUSTERING users of a category to the user DB CATEGORI ANALYZE SATION RECS SERVER (5) Analyze acceptance of recommendationsThursday, October 11, 12
  10. 10. (1) PRE-SELECTION/FILTERING Filter accounts Filter accounts where: Filter non that are already follower_count < 300 English speaking connected to you status_count < 1000 accounts Friends of Identifiy People Friends Filter Filter Filter by using a simple Twitter NLP Pipeline Accounts Set of Twitter accounts for further processing • The set of friends of friend‘s Twitter accounts changes from iteration to iteration • Filtersare added after analyzing the acceptance of recommendationsThursday, October 11, 12
  11. 11. (2) NLP PIPELINE Tokenization and Neglect 200 most stripping Raw Tweets @mentions and POS tagged Tweets used English words URLs [(The, AT), (grand, JJ), @testuser The (jury, NN), grand jury (commented, commented on a POS tagging VBD), (on, IN), Chunking number of… (a, AT), (number, NN), ... (., .)] Set of Frequency Distributed mined Mined nouns and phrases nouns and phrases [(jury, NN), [(jury, 34), number, (social, 23), Frequency NN), DB (test case, Distribution (social 16), ...] dayly, NP), ...] Filter top n words 400 most recent Tweets of a potential recommendation are used for calculating the similarity measureThursday, October 11, 12
  12. 12. • Calculate top-n users by applying Single-Linkage- Clustering • Categorize if user belongs to user specific bubbles • Present recommendation lists to users • Analyze acceptance of recommendations (connect user accounts with FOAF) and add new filter predicate if necessary.Thursday, October 11, 12
  13. 13. recommendations are framed @gargamit100* @selvers* @UpsideLearning* @poposkidimitar* @jkalten* SUPERVISED @cpappas* @pfidalgo1* @timbuckteeth* @starsandrobots* @TheJ Russ @cliveshepherd* TEST RUN @Microsoft @jtcobb* @MichaelPhelps @SebastianThrun* @elearning* @elvaandrade @BarackObama @SteveVictor @AnwarRichardson @pabaker55* @jamesmclynn @DrEvanHarris @mstrohm* @AmyFrearson @gekitz @Hhaitch @sclater* @TheRock @MCeraWeakBaby @fatcharlesh @FrankViola @timbarker @AnnaOscarsson @WithDrake sabrinaVanessa @charliesheen @WWEDanielBryan @cmccosky @kaitlyntrigger @judithsei* @atsc* @melaniedaveid @Emmadw* @ladygaga @marcusfairs @lucyheartsTW @PeterSmith @MikeVick @meadd cameron 0 0.075 0.150 0.225 0.300Thursday, October 11, 12
  14. 14. UNSUPERVISED TEST RESULTS The probability that a recommended item is relevant is 64.4%. Standard Derivation: 31.5%Thursday, October 11, 12
  15. 15. DISCUSSION Twitter IS useful for discovering new information in sense of Research 2.0 but: • Recommendations reflect the Twitter behavior of the user • Automated tweets harm recommendation results (one sentence gets an enormous weight because it occurs very very often) • Twitter‘s request limitation is a show stopper • Comparison to similar systems (Content and collaborative filtering)Thursday, October 11, 12
  16. 16. THANK YOU! ANY QUESTIONS?Thursday, October 11, 12