SlideShare a Scribd company logo
1 of 64
Institute for Web Science & Technologies – WeST




 Challenging Retrieval
      Scenarios:
Social Media and Linked Open Data
            Dr. Thomas Gottron
          gottron@uni-koblenz.de
Outline

 The ROBUST project
    Background
    Use cases

 Retrieval on Microblogs
   Particularities of Twitter
   Interestingness
   LiveTweet

 Search on the LOD cloud
    Querying LOD as IR task
    Schema extraction
    SchemEX
Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   2
Online Communities




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   3
Business Communities

 Information ecosystems
    Employees
    Business Partners, Customers
    General Public
                                                      Valuable asset




         Risks                                             Opportunities

Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   4
High Level Objectives



                            Risk                      Community
                            Management                Analysis
                            • Risk modelling          • Contents
                            • Detection               • Single users
                            • Automatic               • Entire
                              reaction                  communities


                            Community                 Large Scale
                            Forecasting               Processing
                            • Policies                • Big Data
                            • Prediction              • Realtime
                            • Decision                • Parallel
                              support                   Processing




Challenging Retrieval Scenarios            Thomas Gottron              Lugano, 23.4.2012   5
Scenario 1


                   Social Media - Microblogs




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   6
IBM Connections




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   7
Twitter

                                                   Follower

    @janedoe




            My dear
         @johndoe had
        troubles to wake
        up this #morning



Challenging Retrieval Scenarios   Thomas Gottron              Lugano, 23.4.2012   8
Retrieval on Twitter: First Steps

 10 Millionen Tweets
 Retrieval Engine
 Query: beer
       Rang    User               Tweet
        1      LoriAG             beer
        2      Crushdwinebar      beer!!
        3      Skippertaylor      BEER
        4      BigMacScola        Beer
        5      VANiamore          beer.......
        6      CindyMcManis       To beer or not to beer on Beer Summit ?
        7      silverlakewine     beer beer beer beer beer beer beer. Simple 3pm
        8      eldoradobar        http://ping.fm/p/Bnra7 - In!!! BEER, BEER, BEER,
                                  BEER, BEER, BEER, BEER, BEER, BEER, BEER,
          9    tonx               Lompoc. beer beer beer beer beer beer beer beer beer
                                  beer. http://twitpic.com/l68ld
         10    punkeyfunky        Beer beer beer beer beer beer beer beer beer beer beer
                                  beer beer. Er, guess what I'm looking forward to?

Challenging Retrieval Scenarios            Thomas Gottron                Lugano, 23.4.2012   10
Particularities of Twitter




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   11
Twitter is different

 Maximum length: 140 characters

              500000

              450000

              400000

              350000

              300000
   # Tweets




              250000

              200000

              150000

              100000

              50000

                  0




                       101
                       105
                       109
                       113

                       121
                       125
                       129
                       133
                       137
                       141
                       117
                        13
                        17
                        21
                        25
                        29
                        33
                        37
                        41
                        45
                        49
                        53
                        57
                        61
                        65
                        69
                        73
                        77
                        81
                        85
                        89
                        93
                        97
                         1
                         5
                         9




                                             Zeichen



Challenging Retrieval Scenarios   Thomas Gottron       Lugano, 23.4.2012   12
Twitter is different

    140 characters = few words
           10000000                            85% of tweets contain each word only once

           10000000



           1000000



             100000
# Tweets




              10000



               1000
                                                                                                  Binary value !
                100



                 10



                  1
                      0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 39 41 42 43 44 46 47
                                                                                Max TF in Tweet


       Challenging Retrieval Scenarios                                 Thomas Gottron                                     Lugano, 23.4.2012       13
Length normalisation

 Why are some documents longer (classic explanation)

 Verbosity hypothesis:
    Long documents repeat themself
    Short documents prefered as they are more concise

 Scope hypothesis:
    Long documents address more topics
    Short document prefered as they are more focussed

 Intuition:
    Not valid for Twitter

Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   15
Verbosity hypothesis and Twitter?




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   16
Scope hypothesis and Twitter?

 Are long tweets broader in scope?

 LDA:
    100 topics

 Observations
   8,5% of tweets have no strong topic
   Remaining tweets:
           • 77,1% are dominated by one topic
           • 99,6% are dominated by two topics




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   17
Length normalisation on tweets

 Not necessary! … Negative impact?

 YES:
    Short tweets are preferred!

                                      Beer!



      Long tweets are considered of too wide scope.

    Pubs brewing their own beer: a list for Düsseldorf http://bit.ly/w2GZrV




Challenging Retrieval Scenarios   Thomas Gottron           Lugano, 23.4.2012   18
Interestingness




Challenging Retrieval Scenarios       Thomas Gottron   Lugano, 23.4.2012   19
Interesting Content

 Concept of „relevance“ in IR:
   Document is about a topic

 Additionally for Twitter:
    Timeliness
    Current trend
    Informative

 Interestingness
    Tweet is about a topic AND is interesting!

 Question: How to determine what is interesting???

Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   20
Retweets
                                  RT @janedoe: My
                                               Follower
                                   dear @johndoe
                                   had troubles to
                                    wake up this
   @janedoe                          #morning




           My dear
        @johndoe had
       troubles to wake
       up this #morning


Challenging Retrieval Scenarios       Thomas Gottron      Lugano, 23.4.2012   21
Retweets

                                            Retweet indicates quality
                                              „of interest for others“

                                            Depends on
                                              Content 
                                              Context (time, follower) 

                                            Idea:
                                               Learn to predict retweets!

                                               Likelihood of retweet as
                                               metric for Interestingness

Challenging Retrieval Scenarios   Thomas Gottron           Lugano, 23.4.2012   22
Retweets: Prediction model




                         Dataset        Users         Tweets       Retweets
              Choudhury                  118,506       9,998,756        7.89%
              Choudhury (extended)       277,666      29,000,000        8.64%
              Petrovic                 4,050,944      21,477,484        8.46%

Challenging Retrieval Scenarios      Thomas Gottron                Lugano, 23.4.2012   24
Logistic Regression: Weights

                                  Feature                    Dimensions      Weight
               Constant                                  (intercept)             -5.45
                                                         Direct message       -147.89
                                                         Username              146.82
               Message feature
                                                         Hashtag                42.27
                                                         URL                   249.09
                                                         Valence               -26.88
               Sentiment                                 Arousal                33.97
                                                         Dominance              19.56
                                                         Positive                -21.8
               Emoticons
                                                         Negative                 9.94
                                                         Positive               13.66
               Exclamation
                                                         Negative                 8.72
                                                         !                     -16.85
               Punctuation
                                                         ?                      23.67
               Terms                                     Odds                   19.79



Challenging Retrieval Scenarios             Thomas Gottron                Lugano, 23.4.2012   25
Logistic Regression: Topic Weights



                                        Topic                                 Weight
           social media market post site web tool traffic network                 27.54
           follow thank twitter welcome hello check nice cool people              16.08
           credit money market business rate economy home                         15.25
           christmas shop tree xmas present today wrap finish                       2.87
           home work hour long wait airport week flight head                      -14.43
           twitter update facebook account page set squidoo check                 -14.43
           cold snow warm today degree weather winter morning                     -26.56
           night sleep work morning time bed feel tired home                      -75.19




Challenging Retrieval Scenarios         Thomas Gottron                 Lugano, 23.4.2012   26
Re-Ranking using Interestingness

 Top-k relevant tweets
 Re-rank based on interestingness
 Rang Username   Tweet
  1 BeeracrossTX UK beer mag declares "the end of beer writing." @StanHieronymus says not so in the US.
                 http://bit.ly/424HRQ #beer
   2    narmmusic      beer summit @bspward @jhinderaker no one had billy beer? heehee #narm - beer summit
                       @bspward @jhinde http://tinyurl.com/n29oxj
   3    beeriety       Go green and turn those empty beer bottles into recycled beer glasses! | http://bit.ly/2src7F
                       #beer #recycle (via: @td333)
   4    hblackmon      Great Divide beer dinner @ Porter Beer Bar on 8/19 - $45 for 3 courses + beer pairings.
                       http://trunc.it/172wt
   5    nycraftbeer    Interesting Concept-Beer Petitions.com launches&hopes 2help craft beer drinkers enjoy beer
                       they want @their fave pubs. http://bit.ly/11gJQN
   6    carichardson   Beer Cheddar Soup: Dish number two in my famed beer dinner series is Beer Cheddar
                       Soup. I hadn’t had too.. http://bit.ly/1diDdF
   7    BeerBrewing    New York City Beer Events - Beer Tasting - New York Beer Festivals - New York Craft Beer
                       http://is.gd/39kXj #beer
   8    delphiforums   Love beer? Our member is trying to build up a new beer drinker's forum. Grab a #beer and
                       join us: http://tr.im/pD1n
   9    Jamie_Mason #Baltimore Beer Week continues w/ a beer brkfst, beer pioneers luncheon, drink & donate
                    event, beer tastings & more. http://ping.fm/VyTwg
   10   carichardson   Seattle and Beer: I went to Seattle last weekend. It was my friend’s stag - he likes
                       beer - we drank beer.. http://tinyurl.com/cpb4n9

Challenging Retrieval Scenarios                 Thomas Gottron                           Lugano, 23.4.2012     27
Application




Challenging Retrieval Scenarios     Thomas Gottron   Lugano, 23.4.2012   28
LiveTweet

 Data:
   Twitter streaming API: sample
   1% of all tweets

 Architecture:
    Time slices over tweets
    Analytical component with
     REST API
    Web Frontend for end user




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   29
LiveTweet




                      http://livetweet.west.uni-koblenz.de/


Challenging Retrieval Scenarios   Thomas Gottron      Lugano, 23.4.2012   30
LiveTweet: What comes next?

 Retrieval
   Incorporate with other retrieval metrics
   Include Interestingness in a learning to rank approach
   Social graph

 System extension
    Personalisation
    Public API
    Work with IBM data




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   32
Scenario 2


                                  Linked Open Data




Challenging Retrieval Scenarios       Thomas Gottron   Lugano, 23.4.2012   33
Information needs requiring semantic structure

 Examples
    Male persons who have a public profile document
    Computing science papers authored by social scientists
    American actors who are also politicians and are married
     to a model.

 Maybe specific databases available:
   Person search engines
   Bibliographic databases
   Movie database




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   34
Linked Data

          Semantic Web Technology to
          1. Provide structured data on the web
          2. Link data across data sources



          Thing               Thing               Thing             Thing            Thing


          Thing               Thing               Thing             Thing            Thing


                     typed            typed                typed            typed
                      links            links                links            links



              A                   B                 C                 D               E




Challenging Retrieval Scenarios                Thomas Gottron                Lugano, 23.4.2012   35
Entities are identified via URIs

                                                        One statement = one triple

                      rdf:type                          Subject Predicate Object
     pd:cygri                     foaf:Person

         foaf:name
                            Richard Cyganiak
        foaf:based_near
                                   dbpedia:Berlin



          pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri
         dbpedia:Berlin = http://dbpedia.org/resource/Berlin


             Description of a link between two data sources


Challenging Retrieval Scenarios        Thomas Gottron           Lugano, 23.4.2012   36
Resolving URIs



                      rdf:type
     pd:cygri                     foaf:Person

         foaf:name                                                     3.405.259
                            Richard Cyganiak        dp:population
        foaf:based_near
                                   dbpedia:Berlin

                                                          skos:subject


                                                  dp:Cities_in_Germany




Challenging Retrieval Scenarios        Thomas Gottron               Lugano, 23.4.2012   37
The LOD Cloud




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   39
Querying linked data


SELECT ?x
WHERE {
   ?x rdfs:type foaf:Person .
   ?x rdfs:type pim:Male .
   ?x foaf:maker ?y .
   ?y rdfs:type
      foaf:PersonalProfileDocument .
}




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   40
Querying linked data – an IR task?



                                  Here happens IR magic
 Information need




                    Keyword query          Documents         Information

                    SPARQL query         Data sources           Entities



                                    Here we need magic




Challenging Retrieval Scenarios       Thomas Gottron      Lugano, 23.4.2012   41
Querying linked data – using an index


SELECT ?x
WHERE {
   ?x rdfs:type foaf:Person .
   ?x rdfs:type pim:Male .
   ?x foaf:maker ?y .
   ?y rdfs:type
      foaf:PersonalProfileDocument .
}




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   42
A Schema for LOD




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   43
Idea

 Schema Index:
    Define families of graph patterns
    Assign entities to graph patterns
    Map graph patterns to context / source

 Construction:
   Streambased for scalability
   Little loss of accuracy

 NOTE:
   Index defined over entities
   But: Index stores the contexts (sources)

Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   44
Input Data

 n-Quads
              <subject> <predicate> <object> <context> .
 Example:
         <http://www.w3.org/People/Connolly/#me>
           <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
           <http://xmlns.com/foaf/0.1/Person>
           <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf> .




                             http://dig.csail.mit.edu/2008/...

                                                             foaf:
                                  w3p:                      Person
                                  #me



Challenging Retrieval Scenarios            Thomas Gottron            Lugano, 23.4.2012   45
Layer 1: RDF Classes

 All entities of a particular type                                         C1


                                                              DS 1          DS 2            DS 3




SELECT ?x
FROM …
WHERE {                                                     foaf:Person
   ?x rdfs:type foaf:Person .
}


                                                                           http://dig.csail.mit.edu/2008/...
                                   foaf:
                                  Person
  timbl:
                                                    http://www.w3.org/People/Berners-Lee/card
  card#i




Challenging Retrieval Scenarios            Thomas Gottron                      Lugano, 23.4.2012      46
Layer 2: Type Clusters

 All entities belonging to the                                      C1              C2

  same set of types
                                                                           TC1


                                                              DS 1          DS 2           DS 3
SELECT ?x
FROM …
WHERE {                                                     foaf:Person              pim:Male
   ?x rdfs:type foaf:Person .
   ?x rdfs:type pim:Male .
}                                                                      tc4711


                        pim:
                        Male

                                   foaf:
  timbl:                                            http://www.w3.org/People/Berners-Lee/card
                                  Person
  card#i



Challenging Retrieval Scenarios            Thomas Gottron                       Lugano, 23.4.2012   47
Layer 3: Equivalence Classes

 Two entities are equivalent iff:                 C1             C2            C3

    They are in the same TC
    They have the same                                   TC1                  TC2
     properties
    The property targets are in the
     same TC
                                                          EQC1


                                                   DS 1         DS 2         DS 3




Challenging Retrieval Scenarios   Thomas Gottron                  Lugano, 23.4.2012   48
Layer 3: Equivalence Classes
SELECT ?x
FROM …
WHERE {
   ?x rdfs:type foaf:Person .
   ?x rdfs:type pim:Male .
   ?x foaf:maker ?y .
                                                             foaf:Person       pim:Male         foaf:PPD
   ?y rdfs:type
       foaf:PersonalProfileDocument .
}
                                                                      tc4711                    tc1234

                 foaf:              foaf:                                         eqc0815
                Person              PPD                                           -maker-
pim:                                                                               tc1234
Male
                                                                   eqc0815
                                                                                                foaf:maker



                                  timbl:
       timbl:                      card             http://www.w3.org/People/Berners-Lee/card
       card#i



Challenging Retrieval Scenarios             Thomas Gottron                      Lugano, 23.4.2012    49
Schema Index Overview

 3 Layers – 3 different graph patterns




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   51
Schema Computation




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   52
Building the Index from a Stream

 Stream of n-quads (coming from a LD crawler)

     … Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1



                                                      FiFo
                                                                          1
                                              C3       4
                                                                          6
                                              C2       3
                                                                          4
                                                       2
                                              C2                          2
                                                       1                         3
                                              C1                          5




Challenging Retrieval Scenarios   Thomas Gottron             Lugano, 23.4.2012   53
Does it work good?

  Comparison of stream based vs. Gold standard Schema on 11 M triple data set




Challenging Retrieval Scenarios   Thomas Gottron            Lugano, 23.4.2012   55
Does it scale?

 Semantic Web Challenge: Billion Triples Track
    Provision of large scale RDF dataset
    Crawled from LOD

 Task:
    Do something „useful“
    Do it (web-)scalable
    Do it with at least 1 billion triples

 Presentation at ISWC




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   56
BTC results


                                      1st billion       2nd billion      full BTC
# triples                                   1 billion       1 billion    2.17 billion
# instances                                187.7 M          222.6 M          450.0 M
# data sources                               13.5 M           9.5 M            24.1 M
# type clusters                             208.5 k          248.5 k          448.6 k
# equivalence classes                        0.97 M          1.14 M            2.12 M
# triples index                              29.1 M          24.8 M            54.7 M
Compression ratio                            2.91%            2.48%            2.52%
# triples/sec.                                40.5 k          45.6 k            39.5 k




Challenging Retrieval Scenarios   Thomas Gottron                Lugano, 23.4.2012   57
SchemEX: What comes next?

 Hierarchy of semantic information:
   Type clusters
   Equivalence clusters
   Related types

 Optimization
   Smarter caching
   Performance – Hadoop
   Error correction




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   58
Conclusion




Challenging Retrieval Scenarios    Thomas Gottron   Lugano, 23.4.2012   59
Take away message

 Web evolving in interesting directions
   Social networks, user generated content
   Semantic data

 Challenges for IR
   Different settings
   Different tasks
   Question basic assumptions




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   60
Thank you!




Contact:
     WeST – Institute for Web Science and Technologies
     Universität Koblenz-Landau
     gottron@uni-koblenz.de


Challenging Retrieval Scenarios       Thomas Gottron     Lugano, 23.4.2012   61
Relevant Publications
1.   A. Che Alhadi, S. Staab, and T. Gottron. Exploring user purpose writing single tweets. In WebSci ’11:
     Proceedings of the 3rd International Conference on Web Science, 2011.
2.   A. Che Alhadi, T. Gottron, J. Kunegis, and N. Naveed, Livetweet: Microblog retrieval based on
     interestingness, in TREC’11: Proceedings of the Text Retrieval Conference 2011, 2011.
3.   A. Che Alhadi, T. Gottron, J. Kunegis, and N. Naveed, Livetweet: Monitoring and predicting interesting
     microblog posts, in ECIR’12: Procedings of the 34th European Conference on Information Retrieval,
     2012. in preparation.
4.   T. Gottron and N. Lipka, A comparison of language identification approaches on short, query-style texts,
     in ECIR ’10: Proceedings of the 32nd European Conference on Infor-mation Retrieval, pp. 611–614, Mar.
     2010.
5.   M. Konrath, T. Gottron, and A. Scherp. Schemex – web-scale indexed schema extraction of linked open
     data, in Semantic Web Challenge, Submission to the Billion Triple Track,
6.   2011.N. Naveed, T. Gottron, J. Kunegis, and A. Che Alhadi. Bad news travel fast: A content-based
     analysis of interestingness on twitter. In WebSci ’11: Proceedings of the 3rd International Conference on
     Web Science, 2011.
7.   N. Naveed, T. Gottron, J. Kunegis, and A. Che Alhadi. Searching microblogs: Coping with sparsity and
     document quality. In CIKM’11: Proceedings of 20th ACM Conference on Information and Knowledge
     Management, 2011.




Challenging Retrieval Scenarios              Thomas Gottron                       Lugano, 23.4.2012   62
Attic




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   63
Use Cases




SAP Community Network (SCN)       Lotus Connections             MeaningMine
Communities                       Communities                   Communities
• Customers                       • Employees                   • Social media
• Partners                        • Working groups              • News
• Suppliers                       • Interest Groups             • Web fora
• Developers                      • Projects                    • Public communities
Business value                    Business value                Business value
• Products support                • Task relevant information   • Topics
• Services                        • Collaboration               • Opinions
• Find business partners          • Innovation                  • Service for partners
Volume                            Volume                        Volume
• 6,000 posts/day                 • 4,000 posts/day             • 1,400,000 posts/day
• 1,700,000 subscribers           • 386,000 employees           • 708,000 web sources
• 16GB log/day                    • 1.5GB content/day           • 45GB content/day

   Business Partners                     Employees                   Public Domain
       Extranet                           Intranet                     Internet

Challenging Retrieval Scenarios          Thomas Gottron               Lugano, 23.4.2012   64
Twitter is different

 Follower form social graph
    PageRank applicable?!

 BUT:
    Follow not (only) motivated
     by content
    No statement about tweets!




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   65
Information seeking behaviour on Twitter




 Web                                        Twitter
      2-4 query terms                              1-2 query terms
      Broader terms                                Specific terms
      Intentions                                   Intentions
           • Navigation                               • Timely information
           • Information                              • Trends
           • Ressourcen                               • People
      Get to know a topic                          Follow a topic

Challenging Retrieval Scenarios   Thomas Gottron               Lugano, 23.4.2012   66
TREC

 Microblog Track 2011
   12.000.000 Tweets
   2 Weeks
   49 „Topics“ (Queries)
   Task: Filtering

 Constraints
   No external knowledge!
   English tweets only
   Temporal order of topic & tweets
   Official extension of „relevance“ to „interestingness“ (!!!)



Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   67
WeST @ TREC Microblog Track

 Basics:
    Lucene
    No length normalisation
    Interestingness

 4 configurations:
    WESTfilter: Retrieval via Lucene, filtering non interesting
     tweets
    WESTfilext: like WESTfilter, but with sentiments
    WESTrelint: like WESTfilter, but re-ranking according to
     interestingness
    WESTrlext: like WESTrelint, but with sentiments


Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   68
Results

 Filtering significantly better than re-ranking
 Sentiments are of disadvantage (not significant)

         0.4
        0.35
         0.3
        0.25
Score




         0.2
        0.15
         0.1
        0.05
          0
                 P5        P10    P15    P20       P30    R-prec bpref    MAP      nDCG
                                                 Metric
                   WESTfilter     WESTfilext       WESTrelint    WESTrlext

Challenging Retrieval Scenarios         Thomas Gottron              Lugano, 23.4.2012   69
Results

 Effective especially for shorter queries
        0.3

       0.25

        0.2
 MAP




       0.15

        0.1

       0.05

          0
              1              2       3              4         5        6              7
                                   Query Length (word count)
                  WESTfilext      WESTfilter         WESTrelint   WESTrlext


Challenging Retrieval Scenarios          Thomas Gottron           Lugano, 23.4.2012   70
Schema representation using VoiD




Challenging Retrieval Scenarios   Thomas Gottron   Lugano, 23.4.2012   71

More Related Content

Viewers also liked

Detection of Session Hijacking and IP Spoofing Using Sensor Nodes and Cryptog...
Detection of Session Hijacking and IP Spoofing Using Sensor Nodes and Cryptog...Detection of Session Hijacking and IP Spoofing Using Sensor Nodes and Cryptog...
Detection of Session Hijacking and IP Spoofing Using Sensor Nodes and Cryptog...IOSR Journals
 
Overview of State Estimation Technique for Power System Control
Overview of State Estimation Technique for Power System ControlOverview of State Estimation Technique for Power System Control
Overview of State Estimation Technique for Power System ControlIOSR Journals
 
A Review of FPGA-based design methodologies for efficient hardware Area estim...
A Review of FPGA-based design methodologies for efficient hardware Area estim...A Review of FPGA-based design methodologies for efficient hardware Area estim...
A Review of FPGA-based design methodologies for efficient hardware Area estim...IOSR Journals
 
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...IOSR Journals
 
Design Test-bed for assessing load utilising using Multicast Forwarding Appro...
Design Test-bed for assessing load utilising using Multicast Forwarding Appro...Design Test-bed for assessing load utilising using Multicast Forwarding Appro...
Design Test-bed for assessing load utilising using Multicast Forwarding Appro...IOSR Journals
 
Application of Comparators in Modern Power System Protection and Control
Application of Comparators in Modern Power System Protection and ControlApplication of Comparators in Modern Power System Protection and Control
Application of Comparators in Modern Power System Protection and ControlIOSR Journals
 
Presentasjon brønnøysunds avis
Presentasjon brønnøysunds avisPresentasjon brønnøysunds avis
Presentasjon brønnøysunds avisMarius Andersen
 
Sistema de Comando de incidentes, Como organizar recursos
Sistema de Comando de incidentes, Como organizar recursosSistema de Comando de incidentes, Como organizar recursos
Sistema de Comando de incidentes, Como organizar recursosYeison Ramirez
 
Презентация Барского района - Проект приграничного сотрудничества Украина - М...
Презентация Барского района - Проект приграничного сотрудничества Украина - М...Презентация Барского района - Проект приграничного сотрудничества Украина - М...
Презентация Барского района - Проект приграничного сотрудничества Украина - М...Anastasia Lanina
 
DESIGN OF A MODE DECOUPLING FOR VOLTAGE CONTROL OF WIND-DRIVEN IG SYSTEM
DESIGN OF A MODE DECOUPLING FOR VOLTAGE CONTROL OF WIND-DRIVEN IG SYSTEMDESIGN OF A MODE DECOUPLING FOR VOLTAGE CONTROL OF WIND-DRIVEN IG SYSTEM
DESIGN OF A MODE DECOUPLING FOR VOLTAGE CONTROL OF WIND-DRIVEN IG SYSTEMIOSR Journals
 
Auroras boreales
Auroras borealesAuroras boreales
Auroras borealespaulau2012
 
литературный клуб «пушкинец»
литературный клуб «пушкинец»литературный клуб «пушкинец»
литературный клуб «пушкинец»Natalya Dyrda
 

Viewers also liked (17)

Detection of Session Hijacking and IP Spoofing Using Sensor Nodes and Cryptog...
Detection of Session Hijacking and IP Spoofing Using Sensor Nodes and Cryptog...Detection of Session Hijacking and IP Spoofing Using Sensor Nodes and Cryptog...
Detection of Session Hijacking and IP Spoofing Using Sensor Nodes and Cryptog...
 
Overview of State Estimation Technique for Power System Control
Overview of State Estimation Technique for Power System ControlOverview of State Estimation Technique for Power System Control
Overview of State Estimation Technique for Power System Control
 
A Review of FPGA-based design methodologies for efficient hardware Area estim...
A Review of FPGA-based design methodologies for efficient hardware Area estim...A Review of FPGA-based design methodologies for efficient hardware Area estim...
A Review of FPGA-based design methodologies for efficient hardware Area estim...
 
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...
A Literature Survey on Ranking Tagged Web Documents in Social Bookmarking Sys...
 
Design Test-bed for assessing load utilising using Multicast Forwarding Appro...
Design Test-bed for assessing load utilising using Multicast Forwarding Appro...Design Test-bed for assessing load utilising using Multicast Forwarding Appro...
Design Test-bed for assessing load utilising using Multicast Forwarding Appro...
 
Application of Comparators in Modern Power System Protection and Control
Application of Comparators in Modern Power System Protection and ControlApplication of Comparators in Modern Power System Protection and Control
Application of Comparators in Modern Power System Protection and Control
 
Presentasjon brønnøysunds avis
Presentasjon brønnøysunds avisPresentasjon brønnøysunds avis
Presentasjon brønnøysunds avis
 
Manos a la siembra.
Manos a la siembra.Manos a la siembra.
Manos a la siembra.
 
D0931621
D0931621D0931621
D0931621
 
Sistema de Comando de incidentes, Como organizar recursos
Sistema de Comando de incidentes, Como organizar recursosSistema de Comando de incidentes, Como organizar recursos
Sistema de Comando de incidentes, Como organizar recursos
 
Презентация Барского района - Проект приграничного сотрудничества Украина - М...
Презентация Барского района - Проект приграничного сотрудничества Украина - М...Презентация Барского района - Проект приграничного сотрудничества Украина - М...
Презентация Барского района - Проект приграничного сотрудничества Украина - М...
 
DESIGN OF A MODE DECOUPLING FOR VOLTAGE CONTROL OF WIND-DRIVEN IG SYSTEM
DESIGN OF A MODE DECOUPLING FOR VOLTAGE CONTROL OF WIND-DRIVEN IG SYSTEMDESIGN OF A MODE DECOUPLING FOR VOLTAGE CONTROL OF WIND-DRIVEN IG SYSTEM
DESIGN OF A MODE DECOUPLING FOR VOLTAGE CONTROL OF WIND-DRIVEN IG SYSTEM
 
B0441418
B0441418B0441418
B0441418
 
Auroras boreales
Auroras borealesAuroras boreales
Auroras boreales
 
I0355561
I0355561I0355561
I0355561
 
литературный клуб «пушкинец»
литературный клуб «пушкинец»литературный клуб «пушкинец»
литературный клуб «пушкинец»
 
B0150711
B0150711B0150711
B0150711
 

More from Thomas Gottron

Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataThomas Gottron
 
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Thomas Gottron
 
Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Thomas Gottron
 
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources Thomas Gottron
 
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataOf Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataThomas Gottron
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresThomas Gottron
 
 Challenges in Managing Online Business Communities
 Challenges in Managing Online Business Communities Challenges in Managing Online Business Communities
 Challenges in Managing Online Business CommunitiesThomas Gottron
 
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...Thomas Gottron
 
Get the Google Feeling! Supporting Users in Finding Relevant Sources
Get the Google Feeling! Supporting Users in Finding Relevant SourcesGet the Google Feeling! Supporting Users in Finding Relevant Sources
Get the Google Feeling! Supporting Users in Finding Relevant SourcesThomas Gottron
 
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...Thomas Gottron
 

More from Thomas Gottron (10)

Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open Data
 
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
 
Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data
 
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
 
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataOf Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index Structures
 
 Challenges in Managing Online Business Communities
 Challenges in Managing Online Business Communities Challenges in Managing Online Business Communities
 Challenges in Managing Online Business Communities
 
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
 
Get the Google Feeling! Supporting Users in Finding Relevant Sources
Get the Google Feeling! Supporting Users in Finding Relevant SourcesGet the Google Feeling! Supporting Users in Finding Relevant Sources
Get the Google Feeling! Supporting Users in Finding Relevant Sources
 
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
 

Recently uploaded

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 

Challenging Retrieval Scenarios: Social Media and Linked Open Data

  • 1. Institute for Web Science & Technologies – WeST Challenging Retrieval Scenarios: Social Media and Linked Open Data Dr. Thomas Gottron gottron@uni-koblenz.de
  • 2. Outline  The ROBUST project  Background  Use cases  Retrieval on Microblogs  Particularities of Twitter  Interestingness  LiveTweet  Search on the LOD cloud  Querying LOD as IR task  Schema extraction  SchemEX Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 2
  • 3. Online Communities Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 3
  • 4. Business Communities  Information ecosystems  Employees  Business Partners, Customers  General Public Valuable asset Risks Opportunities Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 4
  • 5. High Level Objectives Risk Community Management Analysis • Risk modelling • Contents • Detection • Single users • Automatic • Entire reaction communities Community Large Scale Forecasting Processing • Policies • Big Data • Prediction • Realtime • Decision • Parallel support Processing Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 5
  • 6. Scenario 1 Social Media - Microblogs Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 6
  • 7. IBM Connections Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 7
  • 8. Twitter Follower @janedoe My dear @johndoe had troubles to wake up this #morning Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 8
  • 9. Retrieval on Twitter: First Steps  10 Millionen Tweets  Retrieval Engine  Query: beer Rang User Tweet 1 LoriAG beer 2 Crushdwinebar beer!! 3 Skippertaylor BEER 4 BigMacScola Beer 5 VANiamore beer....... 6 CindyMcManis To beer or not to beer on Beer Summit ? 7 silverlakewine beer beer beer beer beer beer beer. Simple 3pm 8 eldoradobar http://ping.fm/p/Bnra7 - In!!! BEER, BEER, BEER, BEER, BEER, BEER, BEER, BEER, BEER, BEER, 9 tonx Lompoc. beer beer beer beer beer beer beer beer beer beer. http://twitpic.com/l68ld 10 punkeyfunky Beer beer beer beer beer beer beer beer beer beer beer beer beer. Er, guess what I'm looking forward to? Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 10
  • 10. Particularities of Twitter Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 11
  • 11. Twitter is different  Maximum length: 140 characters 500000 450000 400000 350000 300000 # Tweets 250000 200000 150000 100000 50000 0 101 105 109 113 121 125 129 133 137 141 117 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 1 5 9 Zeichen Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 12
  • 12. Twitter is different  140 characters = few words 10000000 85% of tweets contain each word only once 10000000 1000000 100000 # Tweets 10000 1000 Binary value ! 100 10 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 39 41 42 43 44 46 47 Max TF in Tweet Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 13
  • 13. Length normalisation  Why are some documents longer (classic explanation)  Verbosity hypothesis:  Long documents repeat themself  Short documents prefered as they are more concise  Scope hypothesis:  Long documents address more topics  Short document prefered as they are more focussed  Intuition:  Not valid for Twitter Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 15
  • 14. Verbosity hypothesis and Twitter? Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 16
  • 15. Scope hypothesis and Twitter?  Are long tweets broader in scope?  LDA:  100 topics  Observations  8,5% of tweets have no strong topic  Remaining tweets: • 77,1% are dominated by one topic • 99,6% are dominated by two topics Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 17
  • 16. Length normalisation on tweets  Not necessary! … Negative impact?  YES:  Short tweets are preferred! Beer!  Long tweets are considered of too wide scope. Pubs brewing their own beer: a list for Düsseldorf http://bit.ly/w2GZrV Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 18
  • 17. Interestingness Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 19
  • 18. Interesting Content  Concept of „relevance“ in IR:  Document is about a topic  Additionally for Twitter:  Timeliness  Current trend  Informative  Interestingness  Tweet is about a topic AND is interesting!  Question: How to determine what is interesting??? Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 20
  • 19. Retweets RT @janedoe: My Follower dear @johndoe had troubles to wake up this @janedoe #morning My dear @johndoe had troubles to wake up this #morning Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 21
  • 20. Retweets  Retweet indicates quality  „of interest for others“  Depends on  Content   Context (time, follower)   Idea:  Learn to predict retweets! Likelihood of retweet as metric for Interestingness Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 22
  • 21. Retweets: Prediction model Dataset Users Tweets Retweets Choudhury 118,506 9,998,756 7.89% Choudhury (extended) 277,666 29,000,000 8.64% Petrovic 4,050,944 21,477,484 8.46% Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 24
  • 22. Logistic Regression: Weights Feature Dimensions Weight Constant (intercept) -5.45 Direct message -147.89 Username 146.82 Message feature Hashtag 42.27 URL 249.09 Valence -26.88 Sentiment Arousal 33.97 Dominance 19.56 Positive -21.8 Emoticons Negative 9.94 Positive 13.66 Exclamation Negative 8.72 ! -16.85 Punctuation ? 23.67 Terms Odds 19.79 Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 25
  • 23. Logistic Regression: Topic Weights Topic Weight social media market post site web tool traffic network 27.54 follow thank twitter welcome hello check nice cool people 16.08 credit money market business rate economy home 15.25 christmas shop tree xmas present today wrap finish 2.87 home work hour long wait airport week flight head -14.43 twitter update facebook account page set squidoo check -14.43 cold snow warm today degree weather winter morning -26.56 night sleep work morning time bed feel tired home -75.19 Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 26
  • 24. Re-Ranking using Interestingness  Top-k relevant tweets  Re-rank based on interestingness Rang Username Tweet 1 BeeracrossTX UK beer mag declares "the end of beer writing." @StanHieronymus says not so in the US. http://bit.ly/424HRQ #beer 2 narmmusic beer summit @bspward @jhinderaker no one had billy beer? heehee #narm - beer summit @bspward @jhinde http://tinyurl.com/n29oxj 3 beeriety Go green and turn those empty beer bottles into recycled beer glasses! | http://bit.ly/2src7F #beer #recycle (via: @td333) 4 hblackmon Great Divide beer dinner @ Porter Beer Bar on 8/19 - $45 for 3 courses + beer pairings. http://trunc.it/172wt 5 nycraftbeer Interesting Concept-Beer Petitions.com launches&hopes 2help craft beer drinkers enjoy beer they want @their fave pubs. http://bit.ly/11gJQN 6 carichardson Beer Cheddar Soup: Dish number two in my famed beer dinner series is Beer Cheddar Soup. I hadn&#8217;t had too.. http://bit.ly/1diDdF 7 BeerBrewing New York City Beer Events - Beer Tasting - New York Beer Festivals - New York Craft Beer http://is.gd/39kXj #beer 8 delphiforums Love beer? Our member is trying to build up a new beer drinker's forum. Grab a #beer and join us: http://tr.im/pD1n 9 Jamie_Mason #Baltimore Beer Week continues w/ a beer brkfst, beer pioneers luncheon, drink & donate event, beer tastings & more. http://ping.fm/VyTwg 10 carichardson Seattle and Beer: I went to Seattle last weekend. It was my friend&#8217;s stag - he likes beer - we drank beer.. http://tinyurl.com/cpb4n9 Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 27
  • 25. Application Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 28
  • 26. LiveTweet  Data:  Twitter streaming API: sample  1% of all tweets  Architecture:  Time slices over tweets  Analytical component with REST API  Web Frontend for end user Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 29
  • 27. LiveTweet http://livetweet.west.uni-koblenz.de/ Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 30
  • 28. LiveTweet: What comes next?  Retrieval  Incorporate with other retrieval metrics  Include Interestingness in a learning to rank approach  Social graph  System extension  Personalisation  Public API  Work with IBM data Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 32
  • 29. Scenario 2 Linked Open Data Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 33
  • 30. Information needs requiring semantic structure  Examples  Male persons who have a public profile document  Computing science papers authored by social scientists  American actors who are also politicians and are married to a model.  Maybe specific databases available:  Person search engines  Bibliographic databases  Movie database Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 34
  • 31. Linked Data Semantic Web Technology to 1. Provide structured data on the web 2. Link data across data sources Thing Thing Thing Thing Thing Thing Thing Thing Thing Thing typed typed typed typed links links links links A B C D E Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 35
  • 32. Entities are identified via URIs One statement = one triple rdf:type Subject Predicate Object pd:cygri foaf:Person foaf:name Richard Cyganiak foaf:based_near dbpedia:Berlin pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri dbpedia:Berlin = http://dbpedia.org/resource/Berlin Description of a link between two data sources Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 36
  • 33. Resolving URIs rdf:type pd:cygri foaf:Person foaf:name 3.405.259 Richard Cyganiak dp:population foaf:based_near dbpedia:Berlin skos:subject dp:Cities_in_Germany Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 37
  • 34. The LOD Cloud Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 39
  • 35. Querying linked data SELECT ?x WHERE { ?x rdfs:type foaf:Person . ?x rdfs:type pim:Male . ?x foaf:maker ?y . ?y rdfs:type foaf:PersonalProfileDocument . } Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 40
  • 36. Querying linked data – an IR task? Here happens IR magic Information need Keyword query Documents Information SPARQL query Data sources Entities Here we need magic Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 41
  • 37. Querying linked data – using an index SELECT ?x WHERE { ?x rdfs:type foaf:Person . ?x rdfs:type pim:Male . ?x foaf:maker ?y . ?y rdfs:type foaf:PersonalProfileDocument . } Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 42
  • 38. A Schema for LOD Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 43
  • 39. Idea  Schema Index:  Define families of graph patterns  Assign entities to graph patterns  Map graph patterns to context / source  Construction:  Streambased for scalability  Little loss of accuracy  NOTE:  Index defined over entities  But: Index stores the contexts (sources) Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 44
  • 40. Input Data  n-Quads <subject> <predicate> <object> <context> .  Example: <http://www.w3.org/People/Connolly/#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf> . http://dig.csail.mit.edu/2008/... foaf: w3p: Person #me Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 45
  • 41. Layer 1: RDF Classes  All entities of a particular type C1 DS 1 DS 2 DS 3 SELECT ?x FROM … WHERE { foaf:Person ?x rdfs:type foaf:Person . } http://dig.csail.mit.edu/2008/... foaf: Person timbl: http://www.w3.org/People/Berners-Lee/card card#i Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 46
  • 42. Layer 2: Type Clusters  All entities belonging to the C1 C2 same set of types TC1 DS 1 DS 2 DS 3 SELECT ?x FROM … WHERE { foaf:Person pim:Male ?x rdfs:type foaf:Person . ?x rdfs:type pim:Male . } tc4711 pim: Male foaf: timbl: http://www.w3.org/People/Berners-Lee/card Person card#i Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 47
  • 43. Layer 3: Equivalence Classes  Two entities are equivalent iff: C1 C2 C3  They are in the same TC  They have the same TC1 TC2 properties  The property targets are in the same TC EQC1 DS 1 DS 2 DS 3 Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 48
  • 44. Layer 3: Equivalence Classes SELECT ?x FROM … WHERE { ?x rdfs:type foaf:Person . ?x rdfs:type pim:Male . ?x foaf:maker ?y . foaf:Person pim:Male foaf:PPD ?y rdfs:type foaf:PersonalProfileDocument . } tc4711 tc1234 foaf: foaf: eqc0815 Person PPD -maker- pim: tc1234 Male eqc0815 foaf:maker timbl: timbl: card http://www.w3.org/People/Berners-Lee/card card#i Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 49
  • 45. Schema Index Overview  3 Layers – 3 different graph patterns Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 51
  • 46. Schema Computation Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 52
  • 47. Building the Index from a Stream  Stream of n-quads (coming from a LD crawler) … Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1 FiFo 1 C3 4 6 C2 3 4 2 C2 2 1 3 C1 5 Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 53
  • 48. Does it work good? Comparison of stream based vs. Gold standard Schema on 11 M triple data set Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 55
  • 49. Does it scale?  Semantic Web Challenge: Billion Triples Track  Provision of large scale RDF dataset  Crawled from LOD  Task:  Do something „useful“  Do it (web-)scalable  Do it with at least 1 billion triples  Presentation at ISWC Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 56
  • 50. BTC results 1st billion 2nd billion full BTC # triples 1 billion 1 billion 2.17 billion # instances 187.7 M 222.6 M 450.0 M # data sources 13.5 M 9.5 M 24.1 M # type clusters 208.5 k 248.5 k 448.6 k # equivalence classes 0.97 M 1.14 M 2.12 M # triples index 29.1 M 24.8 M 54.7 M Compression ratio 2.91% 2.48% 2.52% # triples/sec. 40.5 k 45.6 k 39.5 k Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 57
  • 51. SchemEX: What comes next?  Hierarchy of semantic information:  Type clusters  Equivalence clusters  Related types  Optimization  Smarter caching  Performance – Hadoop  Error correction Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 58
  • 52. Conclusion Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 59
  • 53. Take away message  Web evolving in interesting directions  Social networks, user generated content  Semantic data  Challenges for IR  Different settings  Different tasks  Question basic assumptions Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 60
  • 54. Thank you! Contact: WeST – Institute for Web Science and Technologies Universität Koblenz-Landau gottron@uni-koblenz.de Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 61
  • 55. Relevant Publications 1. A. Che Alhadi, S. Staab, and T. Gottron. Exploring user purpose writing single tweets. In WebSci ’11: Proceedings of the 3rd International Conference on Web Science, 2011. 2. A. Che Alhadi, T. Gottron, J. Kunegis, and N. Naveed, Livetweet: Microblog retrieval based on interestingness, in TREC’11: Proceedings of the Text Retrieval Conference 2011, 2011. 3. A. Che Alhadi, T. Gottron, J. Kunegis, and N. Naveed, Livetweet: Monitoring and predicting interesting microblog posts, in ECIR’12: Procedings of the 34th European Conference on Information Retrieval, 2012. in preparation. 4. T. Gottron and N. Lipka, A comparison of language identification approaches on short, query-style texts, in ECIR ’10: Proceedings of the 32nd European Conference on Infor-mation Retrieval, pp. 611–614, Mar. 2010. 5. M. Konrath, T. Gottron, and A. Scherp. Schemex – web-scale indexed schema extraction of linked open data, in Semantic Web Challenge, Submission to the Billion Triple Track, 6. 2011.N. Naveed, T. Gottron, J. Kunegis, and A. Che Alhadi. Bad news travel fast: A content-based analysis of interestingness on twitter. In WebSci ’11: Proceedings of the 3rd International Conference on Web Science, 2011. 7. N. Naveed, T. Gottron, J. Kunegis, and A. Che Alhadi. Searching microblogs: Coping with sparsity and document quality. In CIKM’11: Proceedings of 20th ACM Conference on Information and Knowledge Management, 2011. Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 62
  • 56. Attic Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 63
  • 57. Use Cases SAP Community Network (SCN) Lotus Connections MeaningMine Communities Communities Communities • Customers • Employees • Social media • Partners • Working groups • News • Suppliers • Interest Groups • Web fora • Developers • Projects • Public communities Business value Business value Business value • Products support • Task relevant information • Topics • Services • Collaboration • Opinions • Find business partners • Innovation • Service for partners Volume Volume Volume • 6,000 posts/day • 4,000 posts/day • 1,400,000 posts/day • 1,700,000 subscribers • 386,000 employees • 708,000 web sources • 16GB log/day • 1.5GB content/day • 45GB content/day Business Partners Employees Public Domain Extranet Intranet Internet Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 64
  • 58. Twitter is different  Follower form social graph  PageRank applicable?!  BUT:  Follow not (only) motivated by content  No statement about tweets! Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 65
  • 59. Information seeking behaviour on Twitter  Web  Twitter  2-4 query terms  1-2 query terms  Broader terms  Specific terms  Intentions  Intentions • Navigation • Timely information • Information • Trends • Ressourcen • People  Get to know a topic  Follow a topic Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 66
  • 60. TREC  Microblog Track 2011  12.000.000 Tweets  2 Weeks  49 „Topics“ (Queries)  Task: Filtering  Constraints  No external knowledge!  English tweets only  Temporal order of topic & tweets  Official extension of „relevance“ to „interestingness“ (!!!) Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 67
  • 61. WeST @ TREC Microblog Track  Basics:  Lucene  No length normalisation  Interestingness  4 configurations:  WESTfilter: Retrieval via Lucene, filtering non interesting tweets  WESTfilext: like WESTfilter, but with sentiments  WESTrelint: like WESTfilter, but re-ranking according to interestingness  WESTrlext: like WESTrelint, but with sentiments Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 68
  • 62. Results  Filtering significantly better than re-ranking  Sentiments are of disadvantage (not significant) 0.4 0.35 0.3 0.25 Score 0.2 0.15 0.1 0.05 0 P5 P10 P15 P20 P30 R-prec bpref MAP nDCG Metric WESTfilter WESTfilext WESTrelint WESTrlext Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 69
  • 63. Results  Effective especially for shorter queries 0.3 0.25 0.2 MAP 0.15 0.1 0.05 0 1 2 3 4 5 6 7 Query Length (word count) WESTfilext WESTfilter WESTrelint WESTrlext Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 70
  • 64. Schema representation using VoiD Challenging Retrieval Scenarios Thomas Gottron Lugano, 23.4.2012 71

Editor's Notes

  1. Online communities as mainly perceived in publicPrivate usersSocial interactionSharing of information, pictures, online ressources
  2. Business communities slightly different:Grouped around a business/enterpriseAim: add value to business (Knowledge management, public relations, customer aquisition, support, etc.)Conclusion: communities have a value, that needs to be taken care ofValue is endangered by risks (e.g. experts leaving) or might be increased by seizing opportunities (e.g. connect people working on the same topic)
  3. Main objectives as listed in DoW