Using Hyperlinks to Enrich Message Board Content with Linked Data Sheila Kinsella, Alexandre Passant, John G. Breslin Chap...
Introduction <ul><li>Hyperlinks are an important part of online conversation, often represent identifiable concepts </li><...
Example post <ul><li>imdb:tt0211915 foaf:topic dbpedia:Amélie . </li></ul><ul><li>dbpedia:Amélie dc:title &quot;Amélie“ . ...
Dataset enrichment  of XYZ
Boards.ie SIOC Data Competition <ul><li>2008 competition to do something interesting with message board data </li></ul><ul...
Change in type of websites linked to  of XYZ 2002/2003 2007/2008 Domain Main Content Type bbc.co.uk news media komplett.ie...
Identification of external data sources <ul><li>youtube.com </li></ul><ul><li>wikipedia.org (dbpedia) </li></ul><ul><li>ko...
Data sources
Structured Data <ul><li>RDF Data </li></ul><ul><ul><li>Rich descriptions of resources using common ontologies </li></ul></...
Analysis of external links For 2007/2008, we could access structured data for over 9% of all posted links 98/  99/  00/  0...
The enriched dataset DBPEDIA SIOC Linked Data/ Web APIs (30k links,  21k unique) concepts 24,000 RDF  RDF  RDF RDF 3,000  ...
Analyis example: Post content % of posts containing name/title
Analyis example: Content sharing % of content age
Analysis example: User profiling
Analysis example: User profiling
Conclusions <ul><li>Many links posted in social media sites correspond to a structured data source </li></ul><ul><ul><li>I...
Upcoming SlideShare
Loading in …5
×

Using Hyperlinks to Enrich Message Board Content with Linked Data

2,385 views
2,324 views

Published on

Presentation from I-SEMANTICS 2010, Graz, Austria. Based on the paper "Using Hyperlinks to Enrich Message Board Content with Linked Data" by Sheila Kinsella, Alexandre Passant, and John G. Breslin.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,385
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
10
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • 21 thousand
  • Using Hyperlinks to Enrich Message Board Content with Linked Data

    1. 1. Using Hyperlinks to Enrich Message Board Content with Linked Data Sheila Kinsella, Alexandre Passant, John G. Breslin Chapter
    2. 2. Introduction <ul><li>Hyperlinks are an important part of online conversation, often represent identifiable concepts </li></ul><ul><li>More and more often these hyperlinks have corresponding structured data sources </li></ul><ul><li>Our aims in this study </li></ul><ul><ul><li>Study the growth of structured, user-generated data and links in social media over 10 years </li></ul></ul><ul><ul><li>Investigate how we can use this data for enhanced analysis of online conversation </li></ul></ul>
    3. 3. Example post <ul><li>imdb:tt0211915 foaf:topic dbpedia:Amélie . </li></ul><ul><li>dbpedia:Amélie dc:title &quot;Amélie“ . </li></ul><ul><li>dbpedia:Amélie dc:date &quot;2001&quot; . </li></ul><ul><li>dbpedia:Amélie dbpprop:starring dbpedia:Audrey_Tautou . </li></ul><ul><li>dbpedia:Amélie dbpprop:director dbpedia:Jean-Pierre_Jeunet . </li></ul>http://www.imdb.com/title/tt0211915/ = Identifier we can use to query LinkedMDB/Dbpedia/Freebase…
    4. 4. Dataset enrichment of XYZ
    5. 5. Boards.ie SIOC Data Competition <ul><li>2008 competition to do something interesting with message board data </li></ul><ul><li>February 1998 – February 2008 </li></ul><ul><li>SIOC, FOAF, DC </li></ul><ul><li>~ 130k users </li></ul><ul><li>> 7m posts </li></ul>
    6. 6. Change in type of websites linked to of XYZ 2002/2003 2007/2008 Domain Main Content Type bbc.co.uk news media komplett.ie shop ireland.com news media eircom.net Web hosting yahoo.com news/discussion r te.ie news media google.com Web search g eocities.com Web hosting iol.ie Web hosting microsoft.com technical support Domain Main Content Type youtube.com UGC: video-sharing wikipedia.org UGC: encyclopedia komplett.ie shop myspace.com UGC: SNS/music flickr.com UGC: photo-sharing bbc.co.uk news media rte.ie news media carzone.ie shop photobucket.com UGC: media hosting ebay.ie shop
    7. 7. Identification of external data sources <ul><li>youtube.com </li></ul><ul><li>wikipedia.org (dbpedia) </li></ul><ul><li>komplett.ie </li></ul><ul><li>bbc.co.uk </li></ul><ul><li>myspace.com (dbtunes) </li></ul><ul><li>rte.ie </li></ul><ul><li>carzone.ie </li></ul><ul><li>google.com </li></ul><ul><li>photobucket.com </li></ul><ul><li>flickr.com </li></ul><ul><li>microsoft.com </li></ul><ul><li>eircom.net </li></ul><ul><li>ebay.ie </li></ul><ul><li>imageshack.us </li></ul><ul><li>imdb.com (linkedmdb) </li></ul><ul><li>ebay.co.uk </li></ul><ul><li>yahoo.com </li></ul><ul><li>amazon.co.uk </li></ul><ul><li>google.ie </li></ul><ul><li>blogspot.com </li></ul>
    8. 8. Data sources
    9. 9. Structured Data <ul><li>RDF Data </li></ul><ul><ul><li>Rich descriptions of resources using common ontologies </li></ul></ul><ul><ul><li>Linked Data, RDFa, SPARQL endpoint, RDF dumps </li></ul></ul><ul><ul><li>E.g. <X> <foaf:topic> <dbpedia:Education> . </li></ul></ul><ul><ul><li>We store this data and merge equivalent URIs if required </li></ul></ul><ul><li>API Data (fixed values) </li></ul><ul><ul><li>Less rich, heterogeneous, lacking common semantics </li></ul></ul><ul><ul><li>Often available as JSON/XML, easily converted to RDF </li></ul></ul><ul><ul><li>E.g. <category term='Education'/> </li></ul></ul><ul><ul><li>We manually mapped these to URIs </li></ul></ul><ul><li>API Data (tags) </li></ul><ul><ul><li>Plain text annotations, meaning can be ambiguous </li></ul></ul><ul><ul><li>E.g. “ education ” </li></ul></ul><ul><ul><li>We performed a naïve mapping of tags to URIs </li></ul></ul>EXPRESSIVENESS
    10. 10. Analysis of external links For 2007/2008, we could access structured data for over 9% of all posted links 98/ 99/ 00/ 01/ 02/ 03/ 04/ 05/ 06/ 07/ 99 00 01 02 03 04 05 06 07 08
    11. 11. The enriched dataset DBPEDIA SIOC Linked Data/ Web APIs (30k links, 21k unique) concepts 24,000 RDF RDF RDF RDF 3,000 500 1,500 8,000 6,000 2,000 6,000 23,000
    12. 12. Analyis example: Post content % of posts containing name/title
    13. 13. Analyis example: Content sharing % of content age
    14. 14. Analysis example: User profiling
    15. 15. Analysis example: User profiling
    16. 16. Conclusions <ul><li>Many links posted in social media sites correspond to a structured data source </li></ul><ul><ul><li>In 2007/2008 already more than 9% </li></ul></ul><ul><li>This data can enable us to carry out new analysis and get new insight into online communities </li></ul><ul><li>Also potential for new applications e.g. content recommendation, enhanced cross-site browsing </li></ul><ul><li>Current work: using external structured data for improving topic identification in online communities </li></ul>

    ×