Your SlideShare is downloading. ×
Exploiting Semantic Web Techniques For Representing And Utilising
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Exploiting Semantic Web Techniques For Representing And Utilising


Published on

This presentation focuses on understanding different Semantic Web technologies in order to represent folksonomies.

This presentation focuses on understanding different Semantic Web technologies in order to represent folksonomies.

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Web 2.0 is the second generation of the web that evolved from Web 1.0Web 1.0 was a read only web with static content and lacked user involvementWeb 1.0 site under construction and Web 2.0 is beta
  • Taggers are foaf:Agents Taggings reify the n-ary relationship between a tagger, a tag, a resource, and a date.Tags are members of a Tag classTags have names
  • Notably, Gruber defines the source as the scope of namespaces or universe of quantification for objects.The object in this model represents the content which is being tagged; the tag is the label or word used to tag with; the tagger represents who tagged the object; the source is the system where the actual tagging model is stored; the polarity represents a + or –, which is “a vote” of the tagging fact, that is to assert that the tagging fact is true or not.
  • Limitations of tagging due to its independence and free-form structureDiscrepancies: java is to specific for some users but programming language is to generic for others
  • Freebase provides datasets built by communities that are freely accessible. Freebase offers tools that help developers access and control the content contained within these datasetsDBPedia extracts information from the online encyclopaedia Wikipedia and provides such information in a semantic format that can be processed by machines.
  • The nodes contain numbers that are connected to tags
  • Transcript

    • 1. Exploiting Semantic Web Techniques for Representing and Utilising Folksonomies
      Owen Sacco
    • 2. page 1
      Presentation Map
    • 3. page 2
      Presentation Map
      Aim & Goals
      The Semantic Web
      Meta Formats, Vocabularies & Query Language
      Web 2.0
      Web 2.0 Technologies & Applications
      Tags, Tagging, Representing Tags Semantically & Integrating Folksonomies with the Semantic Web
    • 4. Presentation Map
      Graph Mining Techniques
      Fast Unfolding of Communities in Large Networks
      State of the Art Tool
      Examining the Edge List
      The Community Structure Ontology
      Jena & Corese
      Creating & Querying RDF Statements
      Analysis & Results
      Enhancements & Future Work
      page 3
    • 5. Introduction
      page 4
    • 6. Introduction
      The research is about:
      Understanding various Semantic Web technologies for representing data semantically
      Understanding Folksonomies and how to semantically represent them
      To semantically represent tags retrieved from Bibsonomy (
      The tags have been hierarchically structured using the algorithm “fast unfolding of communities in large networks”
      Use Semantic Web technologies to create and exploit such representation of tags
      page 5
    • 7. The Semantic Web
      page 6
    • 8. The Semantic Web
      page 7
      What is the Semantic Web?
      Not a separate Web
      An extension of the current Web
      Semantic = Meaning
      Semantic Web = Meaningful Data
      Meaning is data about data, i.e. Metadata
      Advantages of Semantic Web:
      Information is given well-defined meaning
      Better enabling computers
      People to work in cooperation Source: W3C Semantic Web
    • 9. The Semantic Web
      Resource Description Framework (RDF)
      A framework that describes resources on the WWW
      Suitable for merging data on the Web
      Resources are uniquely identified by URLs
      The RDF Model is made up of triple statements
      Triple Statements: Subject, Predicate & Object
      page 8
    • 10. The Semantic Web
      An RDF Model can be serialised in RDF/XML
      An example of RDF document
      <?xml version="1.0"?>
      <rdf:RDFxmlns:rdf="" xmlns:contact="">
      <contact:Personrdf:about=""> <contact:fullName>Eric Miller</contact:fullName> <contact:mailboxrdf:resource=""/> <contact:personalTitle>Dr.</contact:personalTitle> </contact:Person>
      Source: W3C RDF Primer
      page 9
    • 11. The Semantic Web
      “A formal explicit specification of a shared conceptualisation”
      In other words: parties having a common concept of data agree and specify clearly as possible such concepts
      It is an enabling technology for information sharing and manipulation
      A vocabulary for RDF documents
      Ontologies are based on RDF models and are expressed by using the Web Ontology Language
      page 10
    • 12. The Semantic Web
      SPARQL – An RDF Query Language
      Query in the Semantic Web context means: “Technologies and protocols that programmatically retrieve information from the Web of Data”.
      Based on triple patterns similar to RDF triples
      A query returns resources for all RDF triples that match the query’s pattern
      Is used to return complex data for mash-ups or search engines containing semantic data
      Syntax is similar to SQL
      Source: W3C
      page 11
    • 13. Web 2.0
      page 12
    • 14. Web 2.0
      A “Read/Write” Web
      Web 2.0 has:
      Facilitated web design
      Provided attractive, rich, easy-to-use interfaces
      Assisted in reuse of data by merging information from various sources
      Created social networks of people
      According to Internet World Stats, between 2000 and 2003 users doubled thanks to Friendster (one of the first social network websites)
      Source: Internet World Stats - Internet Growth Statistics
      page 13
    • 15. Web 2.0
      Web 2.0 is considered a Social Web
      People are more involved by collaborating & sharing data
      One of the major Web 2.0 technologies for web development is AJAX
      A combination of several technologies:
      HTML or XHTML
      Cascading Style Sheets (CSS)
      Java Script
      page 14
    • 16. Web 2.0
      Web 2.0 created new application concepts:
      Blogs (Blogger, WordPress)
      Wikis (Wikipedia)
      Really Simple Syndication, RSS
      Mashups (MusicMesh, BBC Music)
      Social Networks (Facebook, LinkedIn, MySpace)
      Social Bookmarking (delicious, Bibsonomy)
      Photo Sharing (Flickr)
      Video Sharing (YouTube, Vimeo)
      In most of these concepts you find Tagging!
      page 15
    • 17. Folksonomies
      page 16
    • 18. Folksonomies
      “A non-hierarchical keyword or term”
      “Assign a tag to a piece of information or resources”
      “The person that assigns the tag”
      “The result of personal free tagging of information and objects for one’s own retrieval. The tagging is done in a social environment.” Thomas Vander Wal (2004)
      page 17
    • 19. Folksonomies
      Tag Cloud
      a visualisation of popular tags
      popular tags stem out from others by being in larger font or emphasised
      page 18
    • 20. Folksonomies
      page 19
      Where can we tag?
      Social Bookmarking websites
    • 21. Folksonomies
      Picture sharing websites
      page 20
    • 22. Folksonomies
      Video sharing websites
      page 21
    • 23. Folksonomies
      Why tagging?
      It’s Popular
      Nowadays, practically anyone who uses a computer or the Internet is exposed to tagging in some way.
      It’s Social
      Through the most popular tags, we can see a kind of rough consensus on the subject of the resource.
      It’s Flexible
      Ad-hoc, free-form and does not adhere to any strict classification scheme or vocabulary.
      page 22
    • 24. Folksonomies
      Basic Model
      Taggers create the tags, and sometimes they add resources.
      If we can identify something, then it can be tagged.
      Tagging is open-ended, tags can be any kind of term.
      page 23
      Source: Smith G. 2008. Tagging People-Powered Metadata for the Social Web
    • 25. Folksonomies
      How about:
      Collaborative sharing tags across multiple applications
      Collaborative filtering based on tagging
      Connecting people based on tagging
      All these can be achieved through Tag Ontologies
      Ontology is not a taxonomy
      Ontology makes semantic agreement
      Semantic agreement enables useful composition
      page 24
    • 26. Folksonomies
      Richard Newman’s Tag Ontology
      page 25
      Source: Haklae Kim et al., Review and Alignment of Tag Ontologies for Semantically-Linked Data in Collaborative Tagging Spaces
    • 27. Folksonomies
      Tom Gruber’s Conceptual Model
      Tagging(object, tag, tagger, source, + or -)
      page 26
      Source: Gruber T., Ontology for Folksonomy: A Mash-Up of Apples and Oranges.
    • 28. Folksonomies
      Limitations of tagging:
      Ambiguity of tags (example: apple is it a fruit or the computer company?)
      Lack of synonymy (example: lorry or truck)
      Discrepancies in granularity (example: java vs programming language)
      Flat Organisation of Folksonomy
      How do we overcome these?
      Use: CommonTag, MOAT, SCOT
      page 27
    • 29. Folksonomies
      To add concepts to tags from databases such as Freebase and DPPedia
      page 28
      Source: CommonTag
    • 30. Folksonomies
      Meaning Of A Tag (MOAT)
      An ontology to represent how different meanings (URIs of semantic Web resources) can be related to a tag
      Extends the Tag class from Richard Newman’s tag ontology
      Tagging (User, Resource, Tag, Meaning)
      Architecture of MOAT Framework:
      MOAT server stores different meanings that can be queried
      MOAT client interacts with the server to let users easily annotate their content
      page 29
    • 31. Folksonomies
      Social Semantic Cloud of Tags (SCOT)
      An ontology aimed to represent set of tags
      Built on top of Richard Newman’s Tag Ontology
      page 30
      Source: SCOT: Let's Share Tags!
    • 32. Folksonomies
      Limitations of the previous ontologies:
      An extra step is being added to the tagging activity
      Isn’t it daunting for the user when presented with a list of meanings to choose from?
      Which meaning shall the user choose?
      Will tagging remain popular with this additional step?
      If an automatic process is used to select a meaning of a tag, how accurate can this process be?
      Can this process really understand the user at that instance?
      page 31
    • 33. Folksonomies
      With this additional meaning, isn’t tagging becoming another “strict” classification scheme?
      Can relationships of tags really be built on meanings?
      How about using some form of algorithm that can unfold new relationships of tags?
      page 32
    • 34. Fast Unfolding of Communities in Large Networks
      page 33
    • 35. Fast Unfolding of Communities in Large Networks
      A recursive method to extract the community structure of large networks
      This method is based on modularity optimisation
      The modularity is a scalar value that measures the density of links inside communities as compared to links between communities
      It unfolds a complete hierarchical community structure for large networks in a short time
      Results have shown that on a network of 118 million nodes, the algorithm took 152 minutes
      page 34
      Source: Blondel V.B. et al. 2008. Fast unfolding of communities in large networks
    • 36. Fast Unfolding of Communities in Large Networks
      The algorithm consists of two phases which are iterated until a maximum modularity is attained.
      First, all nodes are assigned to different communities.
      Then each node is compared with its neighbours. The node is placed in the community which yields a maximum gain in modularity.
      This process is repeated for all nodes until no further movement can be attained.
      The second phase consists of building a network whose nodes are now the communities found during the first phase.
      page 35
    • 37. Fast Unfolding of Communities in Large Networks
      After the second phase, the process starts again with the first phase
      A “pass” denotes a combination of both passes
      The “passes” are iterated until there are no more changes and the maximum modularity is reached for the whole network
      The height of the network denotes in the number of passes
      At the end, a hierarchical structure is attained that consists of communities of communities.
      page 36
    • 38. Fast Unfolding of Communities in Large Networks
      page 37
    • 39. State of the Art Tool
      page 38
    • 40. State of the Art Tool
      The Data
      It is provided beforehand
      Consists of a hierarchical structure made up of communities of communities of related tags
      This hierarchical structure is constructed using the “Fast Unfolding of Communities in Large Networks” algorithm
      The tags are from the Social Bookmarking Website Bibsonomy (
      The aim for using the community structure algorithm is to unfold new relationships amongst tags
      page 39
    • 41. State of the Art Tool
      A visualisation of tagging graph that depicts the relationships amongst tags
      page 40
    • 42. State of the Art Tool
      The Input to the system will consist of Edge Lists
      Each Edge List file consists of a pass
      4 Edge List files were used for this system:
      The first list is a plain list of related tags queried from Bibsonomy
      The other three lists denote communities or communities of communities computed from the community structure algorithm
      Each relation (line) in each of the Edge List file consists as follows:
      The first edge list: <tagi, tagj, weight>
      page 41
    • 43. State of the Art Tool
      The other three edge lists:
      <communityi, tagj, weight> or
      <communityi, communityj, weight>
      The Edge List files contain:
      The first (lower level): 13126 nodes with 264718 edges
      The second (first pass): 529 nodes with 6337 edges
      The third (second pass): 65 nodes with 374 edges
      The fourth (third pass): 50 nodes with 207 edges
      page 42
    • 44. State of the Art Tool
      A sample from one of the edge lists (the lower level file)
      page 43
    • 45. State of the Art Tool
      First Task: To semantically represent all edge lists that represent the hierarchical structure
      Since the lower level edge list is made up of a set of tags, then the tags will be described using the SCOT ontology
      But to represent the hierarchical structure of communities, a new ontology must be designed that needs to be built on top of SCOT and also, the new ontology must be linked to SCOT
      page 44
    • 46. State of the Art Tool
      The Community Structure Ontology
      page 45
    • 47. State of the Art Tool
      Ontology was designed with a tool called Protege – A Java application for designing Ontolotgies
      Ontology built on OWL2
      Classes: CommunityStructure, Community, CommunityAggregation, Linkage
      Object properties: associatedCommunity, communityOf, linkedIn, linkedTag, linkedWith, unfoldedCommunity, unfoldingActivity
      Data properties: communityName, linkWeight, modularity, pass
      page 46
    • 48. State of the Art Tool
      Second Task: To create an application that will transform the edge lists to RDF/XML statements and store the documents on physical storage. Also, a query engine will be included into the application to query the RDF/XML statements.
      The application is developed using the Java programming language.
      For the creation of RDF/XML statements and to write such statements to physical storage, a widely used API is embedded in the system. This API is called the JENA API
      page 47
    • 49. State of the Art Tool
      Jena – A Semantic Web Framework
      Developed by HP
      An RDF API for reading and writing RDF models in RDF/XML
      An OWL API for reading and writing OWL ontologies
      In-memory and persistent storage for writing RDF/XML statements to memory or physical storage such as text files or even relational databases
      SPARQL query engine
      page 48
    • 50. State of the Art Tool
      The Tool
      page 49
    • 51. State of the Art Tool
      The tool provides the following features:
      Properties to setup:
      The Edge List Directory
      The Edge List File Structure
      page 50
    • 52. State of the Art Tool
      Settings to setup the type of storage required
      RDF/XML documents
      page 51
    • 53. State of the Art Tool
      Relational database persistent storage
      A TDB storage, a custom fast persistent storage
      page 52
    • 54. State of the Art Tool
      Properties to setup the Ontologies
      page 53
    • 55. State of the Art Tool
      The Method to transform the edge list to RDF Statements:
      First, the edge lists are merged together and ordered according to their hierarchical structure
      Second, the RDF Model consisting of RDF statements are created according to the Community Structure and SCOT Ontologies
      Third, RDF statements are written according to the settings setup.
      page 54
    • 56. State of the Art Tool
      Writing of RDF Statements
      RDF Documents:
      For whole documents: the whole document is written after the whole model is created
      For split documents: documents are written after the model for each community is created.
      Two index lists are created, one A-Z and an other to indicate where each community document is located
      page 55
    • 57. State of the Art Tool
      Writing of RDF Statements
      RDF Persistent Storage
      RDB Method: MySQL is used as a persistent relational databases and RDF statements are written on-the-fly, i.e. After each statement is created, these are written in the database
      TDB Method: each statement is written on-the-fly as well
      page 56
    • 58. State of the Art Tool
      Writing of RDF Statements (Results)
      page 57
    • 59. State of the Art Tool
      Querying Statements
      For RDF Documents Corese SPARQL Engine was used
      Corese SPARQL Engine is developed by Edelweiss
      Built on top of Jena with some added enhancements such as Approximated Searches, Select Expressions
      Queries only RDF documents and does not have the capability of querying directly to relational databases
      page 58
    • 60. State of the Art Tool
      Querying Statements
      For Persistent Storage, the Jena SPARQL Engine is used since Jena allows for direct querying
      Querying Methods
      RDF Documents (Split Documents):
      First query index lists
      Get community document
      Query community document and get linked communities
      Query index list and query contents for each community
      page 59
    • 61. State of the Art Tool
      Querying Methods
      RDF Documents (Whole Documents)
      Query whole model and query for community
      Retrieve linked communities
      Query linked communities for their content
      Persistent Storage
      Query whole model and query for community
      Retrieve linked communities
      Query linked communities for their content
      page 60
    • 62. State of the Art Tool
      Querying Statements (Results)
      Results are based on a community called malian
      This community has 57 linked communities and 15 linked tags
      page 61
    • 63. State of the Art Tool
      Other features
      RDF Document Viewer
      page 62
    • 64. Conclusion
      page 63
    • 65. Conclusion
      In this research we have seen the importance of Semantic Web and to describe semantically Web data
      We have seen the importance of using folksonomies for search and exploration
      Additionally, we have also seen various ontologies of how such folksonomies can be semantically represented
      From community structure algorithms and graph mining techniques, new relationships amongst other tags can be unfolded
      page 64
    • 66. Conclusion
      An ontology was designed and developed for the fast unfolding of communities in large networks
      From this ontology, RDF/XML statements can be created and are linked to the SCOT ontology
      We have seen that by using Triple Stores, persistent storage for triple statements is much faster for querying
      page 65
    • 67. Future Enhancements
      page 66
    • 68. Future Enhancements
      To try this model on larger tag models from different websites
      To include the tagger and links to the actual resource
      To analyse these links that contribute to the linked data initiative
      Optimise writing and querying based on larger models
      page 67
    • 69. page 68