Exploiting Semantic Web Techniques For Representing And Utilising

  • 1,488 views
Uploaded on

This presentation focuses on understanding different Semantic Web technologies in order to represent folksonomies.

This presentation focuses on understanding different Semantic Web technologies in order to represent folksonomies.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,488
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
46
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Web 2.0 is the second generation of the web that evolved from Web 1.0Web 1.0 was a read only web with static content and lacked user involvementWeb 1.0 site under construction and Web 2.0 is beta
  • Taggers are foaf:Agents Taggings reify the n-ary relationship between a tagger, a tag, a resource, and a date.Tags are members of a Tag classTags have names
  • Notably, Gruber defines the source as the scope of namespaces or universe of quantification for objects.The object in this model represents the content which is being tagged; the tag is the label or word used to tag with; the tagger represents who tagged the object; the source is the system where the actual tagging model is stored; the polarity represents a + or –, which is “a vote” of the tagging fact, that is to assert that the tagging fact is true or not.
  • Limitations of tagging due to its independence and free-form structureDiscrepancies: java is to specific for some users but programming language is to generic for others
  • Freebase provides datasets built by communities that are freely accessible. Freebase offers tools that help developers access and control the content contained within these datasetsDBPedia extracts information from the online encyclopaedia Wikipedia and provides such information in a semantic format that can be processed by machines.
  • The nodes contain numbers that are connected to tags

Transcript

  • 1. Exploiting Semantic Web Techniques for Representing and Utilising Folksonomies
    Owen Sacco
  • 2. page 1
    Presentation Map
  • 3. page 2
    Presentation Map
    Introduction
    Aim & Goals
    The Semantic Web
    Meta Formats, Vocabularies & Query Language
    Web 2.0
    Web 2.0 Technologies & Applications
    Folksonomies
    Tags, Tagging, Representing Tags Semantically & Integrating Folksonomies with the Semantic Web
  • 4. Presentation Map
    Graph Mining Techniques
    Fast Unfolding of Communities in Large Networks
    State of the Art Tool
    Examining the Edge List
    The Community Structure Ontology
    Jena & Corese
    Creating & Querying RDF Statements
    Analysis & Results
    Conclusion
    Enhancements & Future Work
    page 3
  • 5. Introduction
    page 4
  • 6. Introduction
    The research is about:
    Understanding various Semantic Web technologies for representing data semantically
    Understanding Folksonomies and how to semantically represent them
    To semantically represent tags retrieved from Bibsonomy (http://www.bibsonomy.org/)
    The tags have been hierarchically structured using the algorithm “fast unfolding of communities in large networks”
    Use Semantic Web technologies to create and exploit such representation of tags
    page 5
  • 7. The Semantic Web
    page 6
  • 8. The Semantic Web
    page 7
    What is the Semantic Web?
    Not a separate Web
    An extension of the current Web
    Semantic = Meaning
    Semantic Web = Meaningful Data
    Meaning is data about data, i.e. Metadata
    Advantages of Semantic Web:
    Information is given well-defined meaning
    Better enabling computers
    People to work in cooperation Source: W3C Semantic Web
  • 9. The Semantic Web
    Resource Description Framework (RDF)
    A framework that describes resources on the WWW
    Suitable for merging data on the Web
    Resources are uniquely identified by URLs
    The RDF Model is made up of triple statements
    Triple Statements: Subject, Predicate & Object
    page 8
    PREDICATE
    SUBJECT
    OBJECT
  • 10. The Semantic Web
    An RDF Model can be serialised in RDF/XML
    An example of RDF document
    <?xml version="1.0"?>
    <rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#">
    <contact:Personrdf:about="http://www.w3.org/People/EM/contact#me"> <contact:fullName>Eric Miller</contact:fullName> <contact:mailboxrdf:resource="mailto:em@w3.org"/> <contact:personalTitle>Dr.</contact:personalTitle> </contact:Person>
    </rdf:RDF>
    Source: W3C RDF Primer
    page 9
  • 11. The Semantic Web
    Ontology
    “A formal explicit specification of a shared conceptualisation”
    In other words: parties having a common concept of data agree and specify clearly as possible such concepts
    It is an enabling technology for information sharing and manipulation
    A vocabulary for RDF documents
    Ontologies are based on RDF models and are expressed by using the Web Ontology Language
    page 10
  • 12. The Semantic Web
    SPARQL – An RDF Query Language
    Query in the Semantic Web context means: “Technologies and protocols that programmatically retrieve information from the Web of Data”.
    Based on triple patterns similar to RDF triples
    A query returns resources for all RDF triples that match the query’s pattern
    Is used to return complex data for mash-ups or search engines containing semantic data
    Syntax is similar to SQL
    Source: W3C
    page 11
  • 13. Web 2.0
    page 12
  • 14. Web 2.0
    A “Read/Write” Web
    Web 2.0 has:
    Facilitated web design
    Provided attractive, rich, easy-to-use interfaces
    Assisted in reuse of data by merging information from various sources
    Created social networks of people
    According to Internet World Stats, between 2000 and 2003 users doubled thanks to Friendster (one of the first social network websites)
    Source: Internet World Stats - Internet Growth Statistics
    page 13
  • 15. Web 2.0
    Web 2.0 is considered a Social Web
    People are more involved by collaborating & sharing data
    One of the major Web 2.0 technologies for web development is AJAX
    A combination of several technologies:
    HTML or XHTML
    Cascading Style Sheets (CSS)
    Java Script
    XML
    page 14
  • 16. Web 2.0
    Web 2.0 created new application concepts:
    Blogs (Blogger, WordPress)
    Wikis (Wikipedia)
    Really Simple Syndication, RSS
    Mashups (MusicMesh, BBC Music)
    Social Networks (Facebook, LinkedIn, MySpace)
    Social Bookmarking (delicious, Bibsonomy)
    Photo Sharing (Flickr)
    Video Sharing (YouTube, Vimeo)
    In most of these concepts you find Tagging!
    page 15
  • 17. Folksonomies
    page 16
  • 18. Folksonomies
    Tag
    “A non-hierarchical keyword or term”
    Tagging
    “Assign a tag to a piece of information or resources”
    Tagger
    “The person that assigns the tag”
    Folksonomy
    “The result of personal free tagging of information and objects for one’s own retrieval. The tagging is done in a social environment.” Thomas Vander Wal (2004)
    page 17
  • 19. Folksonomies
    Tag Cloud
    a visualisation of popular tags
    popular tags stem out from others by being in larger font or emphasised
    page 18
  • 20. Folksonomies
    page 19
    Where can we tag?
    Social Bookmarking websites
  • 21. Folksonomies
    Picture sharing websites
    page 20
  • 22. Folksonomies
    Video sharing websites
    page 21
  • 23. Folksonomies
    Why tagging?
    It’s Popular
    Nowadays, practically anyone who uses a computer or the Internet is exposed to tagging in some way.
    It’s Social
    Through the most popular tags, we can see a kind of rough consensus on the subject of the resource.
    It’s Flexible
    Ad-hoc, free-form and does not adhere to any strict classification scheme or vocabulary.
    page 22
  • 24. Folksonomies
    Basic Model
    Taggers create the tags, and sometimes they add resources.
    If we can identify something, then it can be tagged.
    Tagging is open-ended, tags can be any kind of term.
    page 23
    Source: Smith G. 2008. Tagging People-Powered Metadata for the Social Web
  • 25. Folksonomies
    How about:
    Collaborative sharing tags across multiple applications
    Collaborative filtering based on tagging
    Connecting people based on tagging
    All these can be achieved through Tag Ontologies
    Ontology is not a taxonomy
    Ontology makes semantic agreement
    Semantic agreement enables useful composition
    page 24
  • 26. Folksonomies
    Richard Newman’s Tag Ontology
    page 25
    Source: Haklae Kim et al., Review and Alignment of Tag Ontologies for Semantically-Linked Data in Collaborative Tagging Spaces
  • 27. Folksonomies
    Tom Gruber’s Conceptual Model
    Tagging(object, tag, tagger, source, + or -)
    page 26
    Source: Gruber T., Ontology for Folksonomy: A Mash-Up of Apples and Oranges.
  • 28. Folksonomies
    Limitations of tagging:
    Ambiguity of tags (example: apple is it a fruit or the computer company?)
    Lack of synonymy (example: lorry or truck)
    Discrepancies in granularity (example: java vs programming language)
    Flat Organisation of Folksonomy
    How do we overcome these?
    Use: CommonTag, MOAT, SCOT
    page 27
  • 29. Folksonomies
    CommonTag
    To add concepts to tags from databases such as Freebase and DPPedia
    page 28
    Source: CommonTag
  • 30. Folksonomies
    Meaning Of A Tag (MOAT)
    An ontology to represent how different meanings (URIs of semantic Web resources) can be related to a tag
    Extends the Tag class from Richard Newman’s tag ontology
    Tagging (User, Resource, Tag, Meaning)
    Architecture of MOAT Framework:
    MOAT server stores different meanings that can be queried
    MOAT client interacts with the server to let users easily annotate their content
    page 29
  • 31. Folksonomies
    Social Semantic Cloud of Tags (SCOT)
    An ontology aimed to represent set of tags
    Built on top of Richard Newman’s Tag Ontology
    page 30
    Source: SCOT: Let's Share Tags!
  • 32. Folksonomies
    Limitations of the previous ontologies:
    An extra step is being added to the tagging activity
    Isn’t it daunting for the user when presented with a list of meanings to choose from?
    Which meaning shall the user choose?
    Will tagging remain popular with this additional step?
    If an automatic process is used to select a meaning of a tag, how accurate can this process be?
    Can this process really understand the user at that instance?
    page 31
  • 33. Folksonomies
    With this additional meaning, isn’t tagging becoming another “strict” classification scheme?
    Can relationships of tags really be built on meanings?
    How about using some form of algorithm that can unfold new relationships of tags?
    page 32
  • 34. Fast Unfolding of Communities in Large Networks
    page 33
  • 35. Fast Unfolding of Communities in Large Networks
    A recursive method to extract the community structure of large networks
    This method is based on modularity optimisation
    The modularity is a scalar value that measures the density of links inside communities as compared to links between communities
    It unfolds a complete hierarchical community structure for large networks in a short time
    Results have shown that on a network of 118 million nodes, the algorithm took 152 minutes
    page 34
    Source: Blondel V.B. et al. 2008. Fast unfolding of communities in large networks
  • 36. Fast Unfolding of Communities in Large Networks
    The algorithm consists of two phases which are iterated until a maximum modularity is attained.
    First, all nodes are assigned to different communities.
    Then each node is compared with its neighbours. The node is placed in the community which yields a maximum gain in modularity.
    This process is repeated for all nodes until no further movement can be attained.
    The second phase consists of building a network whose nodes are now the communities found during the first phase.
    page 35
  • 37. Fast Unfolding of Communities in Large Networks
    After the second phase, the process starts again with the first phase
    A “pass” denotes a combination of both passes
    The “passes” are iterated until there are no more changes and the maximum modularity is reached for the whole network
    The height of the network denotes in the number of passes
    At the end, a hierarchical structure is attained that consists of communities of communities.
    page 36
  • 38. Fast Unfolding of Communities in Large Networks
    page 37
  • 39. State of the Art Tool
    page 38
  • 40. State of the Art Tool
    The Data
    It is provided beforehand
    Consists of a hierarchical structure made up of communities of communities of related tags
    This hierarchical structure is constructed using the “Fast Unfolding of Communities in Large Networks” algorithm
    The tags are from the Social Bookmarking Website Bibsonomy (http://www.bibsonomy.org/)
    The aim for using the community structure algorithm is to unfold new relationships amongst tags
    page 39
  • 41. State of the Art Tool
    A visualisation of tagging graph that depicts the relationships amongst tags
    page 40
  • 42. State of the Art Tool
    The Input to the system will consist of Edge Lists
    Each Edge List file consists of a pass
    4 Edge List files were used for this system:
    The first list is a plain list of related tags queried from Bibsonomy
    The other three lists denote communities or communities of communities computed from the community structure algorithm
    Each relation (line) in each of the Edge List file consists as follows:
    The first edge list: <tagi, tagj, weight>
    page 41
  • 43. State of the Art Tool
    The other three edge lists:
    <communityi, tagj, weight> or
    <communityi, communityj, weight>
    The Edge List files contain:
    The first (lower level): 13126 nodes with 264718 edges
    The second (first pass): 529 nodes with 6337 edges
    The third (second pass): 65 nodes with 374 edges
    The fourth (third pass): 50 nodes with 207 edges
    page 42
  • 44. State of the Art Tool
    A sample from one of the edge lists (the lower level file)
    caching,offlinebrowser,2.0
    caching,archiving,2.0
    institutions,activity,1.0
    malian,senegal,2.0
    malian,northern,2.0
    malian,guinea,2.0
    malian,drummers,2.0
    cdf,c,1.0
    cdf,library,1.0
    page 43
  • 45. State of the Art Tool
    First Task: To semantically represent all edge lists that represent the hierarchical structure
    Since the lower level edge list is made up of a set of tags, then the tags will be described using the SCOT ontology
    But to represent the hierarchical structure of communities, a new ontology must be designed that needs to be built on top of SCOT and also, the new ontology must be linked to SCOT
    page 44
  • 46. State of the Art Tool
    The Community Structure Ontology
    page 45
    CommunityStructure
    UnfoldedCommunity
    UnfoldingActivity
    Community
    CommunityAggregation
    linkedIn
    associatedCommunity
    linkedWith
    Linkage
    name
    sioc:Resource
    modularity
    pass
    linkedTag
    communityOf
    Community
    linkWeight
    scot:Tag
  • 47. State of the Art Tool
    Ontology was designed with a tool called Protege – A Java application for designing Ontolotgies
    Ontology built on OWL2
    Classes: CommunityStructure, Community, CommunityAggregation, Linkage
    Object properties: associatedCommunity, communityOf, linkedIn, linkedTag, linkedWith, unfoldedCommunity, unfoldingActivity
    Data properties: communityName, linkWeight, modularity, pass
    page 46
  • 48. State of the Art Tool
    Second Task: To create an application that will transform the edge lists to RDF/XML statements and store the documents on physical storage. Also, a query engine will be included into the application to query the RDF/XML statements.
    The application is developed using the Java programming language.
    For the creation of RDF/XML statements and to write such statements to physical storage, a widely used API is embedded in the system. This API is called the JENA API
    page 47
  • 49. State of the Art Tool
    Jena – A Semantic Web Framework
    Developed by HP
    An RDF API for reading and writing RDF models in RDF/XML
    An OWL API for reading and writing OWL ontologies
    In-memory and persistent storage for writing RDF/XML statements to memory or physical storage such as text files or even relational databases
    SPARQL query engine
    page 48
  • 50. State of the Art Tool
    The Tool
    page 49
  • 51. State of the Art Tool
    The tool provides the following features:
    Properties to setup:
    The Edge List Directory
    The Edge List File Structure
    page 50
  • 52. State of the Art Tool
    Settings to setup the type of storage required
    RDF/XML documents
    page 51
  • 53. State of the Art Tool
    Relational database persistent storage
    A TDB storage, a custom fast persistent storage
    page 52
  • 54. State of the Art Tool
    Properties to setup the Ontologies
    page 53
  • 55. State of the Art Tool
    The Method to transform the edge list to RDF Statements:
    First, the edge lists are merged together and ordered according to their hierarchical structure
    Second, the RDF Model consisting of RDF statements are created according to the Community Structure and SCOT Ontologies
    Third, RDF statements are written according to the settings setup.
    page 54
  • 56. State of the Art Tool
    Writing of RDF Statements
    RDF Documents:
    For whole documents: the whole document is written after the whole model is created
    For split documents: documents are written after the model for each community is created.
    Two index lists are created, one A-Z and an other to indicate where each community document is located
    page 55
  • 57. State of the Art Tool
    Writing of RDF Statements
    RDF Persistent Storage
    RDB Method: MySQL is used as a persistent relational databases and RDF statements are written on-the-fly, i.e. After each statement is created, these are written in the database
    TDB Method: each statement is written on-the-fly as well
    page 56
  • 58. State of the Art Tool
    Writing of RDF Statements (Results)
    page 57
  • 59. State of the Art Tool
    Querying Statements
    For RDF Documents Corese SPARQL Engine was used
    Corese SPARQL Engine is developed by Edelweiss
    Built on top of Jena with some added enhancements such as Approximated Searches, Select Expressions
    Queries only RDF documents and does not have the capability of querying directly to relational databases
    page 58
  • 60. State of the Art Tool
    Querying Statements
    For Persistent Storage, the Jena SPARQL Engine is used since Jena allows for direct querying
    Querying Methods
    RDF Documents (Split Documents):
    First query index lists
    Get community document
    Query community document and get linked communities
    Query index list and query contents for each community
    page 59
  • 61. State of the Art Tool
    Querying Methods
    RDF Documents (Whole Documents)
    Query whole model and query for community
    Retrieve linked communities
    Query linked communities for their content
    Persistent Storage
    Query whole model and query for community
    Retrieve linked communities
    Query linked communities for their content
    page 60
  • 62. State of the Art Tool
    Querying Statements (Results)
    Results are based on a community called malian
    This community has 57 linked communities and 15 linked tags
    page 61
  • 63. State of the Art Tool
    Other features
    RDF Document Viewer
    page 62
  • 64. Conclusion
    page 63
  • 65. Conclusion
    In this research we have seen the importance of Semantic Web and to describe semantically Web data
    We have seen the importance of using folksonomies for search and exploration
    Additionally, we have also seen various ontologies of how such folksonomies can be semantically represented
    From community structure algorithms and graph mining techniques, new relationships amongst other tags can be unfolded
    page 64
  • 66. Conclusion
    An ontology was designed and developed for the fast unfolding of communities in large networks
    From this ontology, RDF/XML statements can be created and are linked to the SCOT ontology
    We have seen that by using Triple Stores, persistent storage for triple statements is much faster for querying
    page 65
  • 67. Future Enhancements
    page 66
  • 68. Future Enhancements
    To try this model on larger tag models from different websites
    To include the tagger and links to the actual resource
    To analyse these links that contribute to the linked data initiative
    Optimise writing and querying based on larger models
    page 67
  • 69. page 68