SiocLog: Providing IRC Discussion Logs as Linked Data

  • 21,222 views
Uploaded on

Social Data on the Web Workshop at the International Semantic Web Conference / Washington, DC / 26th October 2009

Social Data on the Web Workshop at the International Semantic Web Conference / Washington, DC / 26th October 2009

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Gosh I forgot the links!

    Deployed at http://irc.sioc-project.org/

    Code at http://github.com/tuukka/sioclog

    Paper at http://ceur-ws.org/Vol-520/paper12.pdf
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
21,222
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
13
Comments
1
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. SiocLog: Providing IRC Discussion Logs as Linked Data Tuukka Hastrup 1 , Uldis Bojars 2 and John G. Breslin 2, 3 1 University of Jyväskylä, Finland 2 DERI, NUI Galway, Ireland 3 School of Engineering and Informatics, NUI Galway, Ireland
  • 2. Motivation
    • IRC conversations are quite disconnected from the Web and even from other IRC channels and networks
    • Often there is valuable and needed information in an IRC chat that cannot be linked to people, topics or events, or in general referenced from elsewhere
    • This may be useful to people who do not use IRC, by those on other networks, or simply by people who leave and rejoin a channel
  • 3. Motivation (2)
    • SIOC provides a framework for linking social media contributions to other content and Linked Data resources, and IRC can become part of that framework
    • We also need mechanisms to link the IRC contributions to the people who made them, hence the use of Web ID
  • 4. Background
    • We will begin by introducing the various areas relevant to this system:
      • IRC
      • Linked Data
      • SIOC
      • Web ID
  • 5. Internet Relay Chat (IRC)
    • Instant messaging / internet chat is a major form of social interaction online
    • It is often disconnected from the Web:
      • Due to the different protocols involved
      • Due to its real-time nature / lack of persistent storage
    • IRC was one of the earliest chat systems
    • It has an important role amongst open-source communities, web communities, and even geeks!
      • Hundreds of thousands of users online at any time
  • 6. Linked Data
    • Building a “Web of Data” to enhance the current Web
    • Exposing, sharing and connecting data about things via dereferenceable URIs
    • Linking datasets together that were not previously connected, for example:
      • Music and people
      • Real-world things and places
    • The Linking Open Data (LOD) effort aims to link various open datasets together (DBpedia, GeoNames, etc.)
  • 7. Semantically-Interlinked Online Communities (SIOC)
    • An effort from DERI, NUI Galway to discover how we can create / establish ontologies on the Semantic Web
    • Goal of the SIOC ontology is to address interoperability issues on the (Social) Web
    • http://sioc-project.org/
    • SIOC has been adopted in a framework of 50 applications or modules deployed on over 400 sites
    • Various domains: Web 2.0, enterprise information integration, HCLS, e-government
  • 8.  
  • 9. Some of the SIOC core ontology classes and properties
  • 10. Some examples of where SIOC is already use (about 50 implementations / applications)
  • 11. Web ID
    • A Web ID is a web address that identifies a person as a Linked Data item
    • A Web ID should also lead to a document with more information about that person (e.g. FOAF, other RDF)
    • For more information, see the definition in this paper:
      • Ching-Man Au Yeung, Ilaria Liccardi, Kanghao Lu, Oshani Seneviratne, Tim Berners-Lee, “ Decentralization: The Future of Online Social Networking ”, W3C Workshop on Future of Social Networking
  • 12. Design
  • 13. Mapping IRC identifiers to URIs on the Web
    • irc://freenode
    • (IRC Network)
    • irc://freenode/%23 channel
    • (Channel)
    • No identifier
    • (Message)
    • irc://freenode/ persona ,isuser
    • (Chat Persona)
    • http://irc.sioc-project.org/#freenode
    • http://irc.sioc-project.org/ channel #channel
    • http://irc.sioc-project.org/ channel /0000-00-00 #00:00:00.00
    • http://irc.sioc-project.org/users/ persona #user
  • 14. Some of the internal and external links
  • 15. Browsing the Linked Data
  • 16. Creating a link between a user account on IRC and a personal profile
    • Claiming a Web ID creates a link [black] between a user account (a sioc:User that created a sioc:Post in a sioct:ChatChannel) and a person (foaf:Person)
    • The person can manually verify this:
      • By pointing back to the sioc:User from their foaf:Person definition [grey]
  • 17. Web IDs in SiocLog
    • A Web ID can be claimed using mttlbot
    • Can claim using standard IRC services
    • /msg nickserv
    • set property webid SomeWebID
  • 18. Implementation
    • 2000 lines of Python source code
    • 1000 lines of Zope/TAL HTML templates
    • Twisted, SimpleTAL and Redland libraries
    • Four major components:
      • IRC interface, data analysis, data integration, Web
  • 19. Implementation (2)
    • IRC interface:
      • Discussion logger / persona monitor on Twisted
    • Data analysis:
      • Process logs, a filters pipeline, sinks for stats / output
    • Data integration:
      • Queries for external Linked Data (personal profiles)
    • Web interface:
      • Requests via CGI, publishes as HTML and RDF
  • 20. Finding the names of friends of an IRC persona with SPARQL
    • semwebquery –sparql "SELECT ?name WHERE {
    • ?person foaf:holdsAccount
    • <http://irc.sioc-project.org/users/melvster#user> .
    • ?person foaf:knows ?friend .
    • ?friend foaf:name ?name . }&quot;
  • 21. Validation
    • 291 chat personas on five channels
    • 22,418 chat messages
    • 51 chat personas have associated Web IDs claimed using mttlbot (2/3) or nickserv (1/3)
      • 44 of those have a valid associated RDF document
    • Scalable (projected 4 million triples in 10 years)
    • SiocLog data being consumed by the “Towards linked sensor data for Hackystat” project
    • SiocLog interfaces to FOAF Me for new profile creation
  • 22. Future work
    • Extend to instant messaging and private messaging
    • Study of IRC communities where users and content are distributed across channels and networks
  • 23. Acknowledgements
    • We would like to thank Science Foundation Ireland for their support under grant SFI/08/CE/I1380 (Líon 2)
    • Thanks also to Benja Fallenstein and Dan Brickley for their insights
  • 24. Summary
    • IRC conversations are quite disconnected from the Web and even from other IRC channels and networks
    • Often there is valuable and needed information in an IRC chat that cannot be linked to people, topics or events, or in general referenced from elsewhere
    • SIOC provides a framework for interlinking social media to other content and Linked Data, and IRC has been integrated as a part of that framework
    • We also used mechanisms to link IRC contributions to the people who made them via Web ID and FOAF