Semantic Recommandation Sytems for Research 2.0
Upcoming SlideShare
Loading in...5

Semantic Recommandation Sytems for Research 2.0






Total Views
Views on SlideShare
Embed Views



1 Embed 28 28



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Semantic Recommandation Sytems for Research 2.0 Semantic Recommandation Sytems for Research 2.0 Presentation Transcript

  • SEMANTIC RECOMMENDATION SYSTEMS FOR RESEARCH 2.0 OR A Conceptual Prototype for a Twitter based Recommender System for Research 2.0 by Patrick ThonhauserThursday, October 11, 12
  • OUTLINE • Motivation • Basics (Semantic Web, Recommender Systems, Natural Language Processing) • Conceptual Prototype • Test results and Discussion • QuestionsThursday, October 11, 12
  • MOTIVATION • Is Twitter useful for discovering new connections between researchers in similar subject areas (and why Twitter)? • How much information can we extract form 140 character strings? • Is it possible to separate useful information from noise? • Are there any appropriate classifiers and metrics to measure the significance of Twitter users and Tweets?Thursday, October 11, 12
  • SEMANTIC WEB • Additional Layer of Information • Linked Data (use URIs as names, use HTTP URIs, use standards to provide Information, include links to other URIs) • RDF (based on triples -> subject, predicate, object) is like HTML for the classic web • Nearly all semantic web standards are based on RDF (like FOAF - Friend of a Friend Project)Thursday, October 11, 12
  • RECOMMENDER SYSTEMS • Collaborative Filtering (user based/ item based) • Content Based Recommendation • Knowledge Based Recommendation • Hybrid RecommendationsThursday, October 11, 12
  • NATURAL LANGUAGE PROCESSING (NLP) • Classification of Microtext Artefacts (This presentation is killer!) • Applying NLP - Pipelines • End of Sentence Detection • Tokenization • POS Tagging • Chunking • ExtractionThursday, October 11, 12
  • THE CONCEPT OF THOUGHT BUBBLES Let’s imagine every Twitter user belongs to several different topic related BubblesThursday, October 11, 12
  • LET’S SUMMARIZE •A user is part of topic related bubbles • Twitter users within topic related bubbles don’t necessarily know each other • Connections of already existing connections of the service user lead to new information • Non bidirectional connections preferred So how can we find such potentially interesting users?Thursday, October 11, 12
  • PROOF OF CONCEPT SYSTEM (1) Preselection of user set, which will be analyzed in depth A USERS THOUGHT BUBBLE SPORTS (2) Apply NLP-Pipeline for measuring user similarity SERVICE IOS DEV USER (3) Categorize the top-n best scoring TW RE ITT SOCIAL MEDIA ST ER T BU HO users according to the idea of AP BB UG I LE HT S AP Thought Bubbles I PRE- FILTERING NLP (4) Recommend top-n best scoring CLUSTERING users of a category to the user DB CATEGORI ANALYZE SATION RECS SERVER (5) Analyze acceptance of recommendationsThursday, October 11, 12
  • (1) PRE-SELECTION/FILTERING Filter accounts Filter accounts where: Filter non that are already follower_count < 300 English speaking connected to you status_count < 1000 accounts Friends of Identifiy People Friends Filter Filter Filter by using a simple Twitter NLP Pipeline Accounts Set of Twitter accounts for further processing • The set of friends of friend‘s Twitter accounts changes from iteration to iteration • Filtersare added after analyzing the acceptance of recommendationsThursday, October 11, 12
  • (2) NLP PIPELINE Tokenization and Neglect 200 most stripping Raw Tweets @mentions and POS tagged Tweets used English words URLs [(The, AT), (grand, JJ), @testuser The (jury, NN), grand jury (commented, commented on a POS tagging VBD), (on, IN), Chunking number of… (a, AT), (number, NN), ... (., .)] Set of Frequency Distributed mined Mined nouns and phrases nouns and phrases [(jury, NN), [(jury, 34), number, (social, 23), Frequency NN), DB (test case, Distribution (social 16), ...] dayly, NP), ...] Filter top n words 400 most recent Tweets of a potential recommendation are used for calculating the similarity measureThursday, October 11, 12
  • • Calculate top-n users by applying Single-Linkage- Clustering • Categorize if user belongs to user specific bubbles • Present recommendation lists to users • Analyze acceptance of recommendations (connect user accounts with FOAF) and add new filter predicate if necessary.Thursday, October 11, 12
  • recommendations are framed @gargamit100* @selvers* @UpsideLearning* @poposkidimitar* @jkalten* SUPERVISED @cpappas* @pfidalgo1* @timbuckteeth* @starsandrobots* @TheJ Russ @cliveshepherd* TEST RUN @Microsoft @jtcobb* @MichaelPhelps @SebastianThrun* @elearning* @elvaandrade @BarackObama @SteveVictor @AnwarRichardson @pabaker55* @jamesmclynn @DrEvanHarris @mstrohm* @AmyFrearson @gekitz @Hhaitch @sclater* @TheRock @MCeraWeakBaby @fatcharlesh @FrankViola @timbarker @AnnaOscarsson @WithDrake sabrinaVanessa @charliesheen @WWEDanielBryan @cmccosky @kaitlyntrigger @judithsei* @atsc* @melaniedaveid @Emmadw* @ladygaga @marcusfairs @lucyheartsTW @PeterSmith @MikeVick @meadd cameron 0 0.075 0.150 0.225 0.300Thursday, October 11, 12
  • UNSUPERVISED TEST RESULTS The probability that a recommended item is relevant is 64.4%. Standard Derivation: 31.5%Thursday, October 11, 12
  • DISCUSSION Twitter IS useful for discovering new information in sense of Research 2.0 but: • Recommendations reflect the Twitter behavior of the user • Automated tweets harm recommendation results (one sentence gets an enormous weight because it occurs very very often) • Twitter‘s request limitation is a show stopper • Comparison to similar systems (Content and collaborative filtering)Thursday, October 11, 12
  • THANK YOU! ANY QUESTIONS?Thursday, October 11, 12