Exploiting Semantic Web Techniques For Representing And Utilising

1,830 views

Published on

This presentation focuses on understanding different Semantic Web technologies in order to represent folksonomies.

Published in: Technology
  • Be the first to comment

Exploiting Semantic Web Techniques For Representing And Utilising

  1. 1. Exploiting Semantic Web Techniques for Representing and Utilising Folksonomies<br />Owen Sacco<br />
  2. 2. page 1<br />Presentation Map<br />
  3. 3. page 2<br />Presentation Map<br />Introduction<br />Aim & Goals<br />The Semantic Web<br />Meta Formats, Vocabularies & Query Language<br />Web 2.0<br />Web 2.0 Technologies & Applications<br />Folksonomies<br />Tags, Tagging, Representing Tags Semantically & Integrating Folksonomies with the Semantic Web<br />
  4. 4. Presentation Map<br />Graph Mining Techniques<br />Fast Unfolding of Communities in Large Networks<br />State of the Art Tool<br />Examining the Edge List<br />The Community Structure Ontology<br />Jena & Corese<br />Creating & Querying RDF Statements<br />Analysis & Results<br />Conclusion<br />Enhancements & Future Work<br />page 3<br />
  5. 5. Introduction<br />page 4<br />
  6. 6. Introduction<br />The research is about:<br />Understanding various Semantic Web technologies for representing data semantically<br />Understanding Folksonomies and how to semantically represent them<br />To semantically represent tags retrieved from Bibsonomy (http://www.bibsonomy.org/) <br />The tags have been hierarchically structured using the algorithm “fast unfolding of communities in large networks”<br />Use Semantic Web technologies to create and exploit such representation of tags<br />page 5<br />
  7. 7. The Semantic Web<br />page 6<br />
  8. 8. The Semantic Web<br />page 7<br />What is the Semantic Web?<br />Not a separate Web <br />An extension of the current Web<br />Semantic = Meaning<br />Semantic Web = Meaningful Data<br />Meaning is data about data, i.e. Metadata<br />Advantages of Semantic Web:<br />Information is given well-defined meaning <br />Better enabling computers<br />People to work in cooperation Source: W3C Semantic Web<br />
  9. 9. The Semantic Web<br />Resource Description Framework (RDF)<br />A framework that describes resources on the WWW<br />Suitable for merging data on the Web<br />Resources are uniquely identified by URLs<br />The RDF Model is made up of triple statements<br />Triple Statements: Subject, Predicate & Object<br />page 8<br />PREDICATE<br />SUBJECT<br />OBJECT<br />
  10. 10. The Semantic Web<br />An RDF Model can be serialised in RDF/XML<br />An example of RDF document<br /><?xml version="1.0"?> <br /><rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#"> <br /><contact:Personrdf:about="http://www.w3.org/People/EM/contact#me"> <contact:fullName>Eric Miller</contact:fullName> <contact:mailboxrdf:resource="mailto:em@w3.org"/> <contact:personalTitle>Dr.</contact:personalTitle> </contact:Person> <br /></rdf:RDF> <br />Source: W3C RDF Primer<br />page 9<br />
  11. 11. The Semantic Web<br />Ontology<br />“A formal explicit specification of a shared conceptualisation”<br />In other words: parties having a common concept of data agree and specify clearly as possible such concepts<br />It is an enabling technology for information sharing and manipulation<br />A vocabulary for RDF documents<br />Ontologies are based on RDF models and are expressed by using the Web Ontology Language <br />page 10<br />
  12. 12. The Semantic Web<br />SPARQL – An RDF Query Language<br />Query in the Semantic Web context means: “Technologies and protocols that programmatically retrieve information from the Web of Data”.<br />Based on triple patterns similar to RDF triples<br />A query returns resources for all RDF triples that match the query’s pattern<br />Is used to return complex data for mash-ups or search engines containing semantic data<br />Syntax is similar to SQL<br />Source: W3C<br />page 11<br />
  13. 13. Web 2.0<br />page 12<br />
  14. 14. Web 2.0<br />A “Read/Write” Web<br />Web 2.0 has:<br />Facilitated web design<br />Provided attractive, rich, easy-to-use interfaces<br />Assisted in reuse of data by merging information from various sources<br />Created social networks of people<br />According to Internet World Stats, between 2000 and 2003 users doubled thanks to Friendster (one of the first social network websites)<br />Source: Internet World Stats - Internet Growth Statistics<br />page 13<br />
  15. 15. Web 2.0<br />Web 2.0 is considered a Social Web<br />People are more involved by collaborating & sharing data<br />One of the major Web 2.0 technologies for web development is AJAX<br />A combination of several technologies:<br />HTML or XHTML<br />Cascading Style Sheets (CSS)<br />Java Script<br />XML<br />page 14<br />
  16. 16. Web 2.0<br />Web 2.0 created new application concepts:<br />Blogs (Blogger, WordPress)<br />Wikis (Wikipedia)<br />Really Simple Syndication, RSS<br />Mashups (MusicMesh, BBC Music)<br />Social Networks (Facebook, LinkedIn, MySpace)<br />Social Bookmarking (delicious, Bibsonomy)<br />Photo Sharing (Flickr)<br />Video Sharing (YouTube, Vimeo)<br />In most of these concepts you find Tagging!<br />page 15<br />
  17. 17. Folksonomies<br />page 16<br />
  18. 18. Folksonomies<br />Tag<br />“A non-hierarchical keyword or term”<br />Tagging<br />“Assign a tag to a piece of information or resources”<br />Tagger<br />“The person that assigns the tag”<br />Folksonomy<br />“The result of personal free tagging of information and objects for one’s own retrieval. The tagging is done in a social environment.” Thomas Vander Wal (2004)<br />page 17<br />
  19. 19. Folksonomies<br />Tag Cloud<br />a visualisation of popular tags <br />popular tags stem out from others by being in larger font or emphasised<br />page 18<br />
  20. 20. Folksonomies<br />page 19<br />Where can we tag?<br />Social Bookmarking websites<br />
  21. 21. Folksonomies<br />Picture sharing websites<br />page 20<br />
  22. 22. Folksonomies<br />Video sharing websites<br />page 21<br />
  23. 23. Folksonomies<br />Why tagging?<br />It’s Popular<br />Nowadays, practically anyone who uses a computer or the Internet is exposed to tagging in some way.<br />It’s Social<br />Through the most popular tags, we can see a kind of rough consensus on the subject of the resource.<br />It’s Flexible<br />Ad-hoc, free-form and does not adhere to any strict classification scheme or vocabulary.<br />page 22<br />
  24. 24. Folksonomies<br />Basic Model<br />Taggers create the tags, and sometimes they add resources.<br />If we can identify something, then it can be tagged.<br />Tagging is open-ended, tags can be any kind of term.<br />page 23<br />Source: Smith G. 2008. Tagging People-Powered Metadata for the Social Web<br />
  25. 25. Folksonomies<br />How about:<br />Collaborative sharing tags across multiple applications<br />Collaborative filtering based on tagging<br />Connecting people based on tagging<br />All these can be achieved through Tag Ontologies<br />Ontology is not a taxonomy<br />Ontology makes semantic agreement<br />Semantic agreement enables useful composition<br />page 24<br />
  26. 26. Folksonomies<br />Richard Newman’s Tag Ontology<br />page 25<br />Source: Haklae Kim et al., Review and Alignment of Tag Ontologies for Semantically-Linked Data in Collaborative Tagging Spaces<br />
  27. 27. Folksonomies<br />Tom Gruber’s Conceptual Model<br />Tagging(object, tag, tagger, source, + or -)<br />page 26<br />Source: Gruber T., Ontology for Folksonomy: A Mash-Up of Apples and Oranges.<br />
  28. 28. Folksonomies<br />Limitations of tagging:<br />Ambiguity of tags (example: apple is it a fruit or the computer company?)<br />Lack of synonymy (example: lorry or truck)<br />Discrepancies in granularity (example: java vs programming language)<br />Flat Organisation of Folksonomy<br />How do we overcome these?<br />Use: CommonTag, MOAT, SCOT<br />page 27<br />
  29. 29. Folksonomies<br />CommonTag<br />To add concepts to tags from databases such as Freebase and DPPedia <br />page 28<br />Source: CommonTag<br />
  30. 30. Folksonomies<br />Meaning Of A Tag (MOAT)<br />An ontology to represent how different meanings (URIs of semantic Web resources) can be related to a tag<br />Extends the Tag class from Richard Newman’s tag ontology<br />Tagging (User, Resource, Tag, Meaning)<br />Architecture of MOAT Framework:<br />MOAT server stores different meanings that can be queried<br />MOAT client interacts with the server to let users easily annotate their content<br />page 29<br />
  31. 31. Folksonomies<br />Social Semantic Cloud of Tags (SCOT)<br />An ontology aimed to represent set of tags<br />Built on top of Richard Newman’s Tag Ontology<br />page 30<br />Source: SCOT: Let's Share Tags!<br />
  32. 32. Folksonomies<br />Limitations of the previous ontologies:<br />An extra step is being added to the tagging activity<br />Isn’t it daunting for the user when presented with a list of meanings to choose from? <br />Which meaning shall the user choose?<br />Will tagging remain popular with this additional step?<br />If an automatic process is used to select a meaning of a tag, how accurate can this process be? <br />Can this process really understand the user at that instance? <br />page 31<br />
  33. 33. Folksonomies<br />With this additional meaning, isn’t tagging becoming another “strict” classification scheme?<br />Can relationships of tags really be built on meanings?<br />How about using some form of algorithm that can unfold new relationships of tags?<br />page 32<br />
  34. 34. Fast Unfolding of Communities in Large Networks <br />page 33<br />
  35. 35. Fast Unfolding of Communities in Large Networks<br />A recursive method to extract the community structure of large networks<br />This method is based on modularity optimisation<br />The modularity is a scalar value that measures the density of links inside communities as compared to links between communities<br />It unfolds a complete hierarchical community structure for large networks in a short time<br />Results have shown that on a network of 118 million nodes, the algorithm took 152 minutes<br />page 34<br />Source: Blondel V.B. et al. 2008. Fast unfolding of communities in large networks<br />
  36. 36. Fast Unfolding of Communities in Large Networks<br />The algorithm consists of two phases which are iterated until a maximum modularity is attained.<br />First, all nodes are assigned to different communities.<br />Then each node is compared with its neighbours. The node is placed in the community which yields a maximum gain in modularity.<br />This process is repeated for all nodes until no further movement can be attained.<br />The second phase consists of building a network whose nodes are now the communities found during the first phase.<br />page 35<br />
  37. 37. Fast Unfolding of Communities in Large Networks<br />After the second phase, the process starts again with the first phase<br />A “pass” denotes a combination of both passes<br />The “passes” are iterated until there are no more changes and the maximum modularity is reached for the whole network<br />The height of the network denotes in the number of passes<br />At the end, a hierarchical structure is attained that consists of communities of communities.<br />page 36<br />
  38. 38. Fast Unfolding of Communities in Large Networks<br />page 37<br />
  39. 39. State of the Art Tool<br />page 38<br />
  40. 40. State of the Art Tool<br />The Data<br />It is provided beforehand<br />Consists of a hierarchical structure made up of communities of communities of related tags <br />This hierarchical structure is constructed using the “Fast Unfolding of Communities in Large Networks” algorithm<br />The tags are from the Social Bookmarking Website Bibsonomy (http://www.bibsonomy.org/)<br />The aim for using the community structure algorithm is to unfold new relationships amongst tags<br />page 39<br />
  41. 41. State of the Art Tool<br />A visualisation of tagging graph that depicts the relationships amongst tags<br />page 40<br />
  42. 42. State of the Art Tool<br />The Input to the system will consist of Edge Lists<br />Each Edge List file consists of a pass<br />4 Edge List files were used for this system: <br />The first list is a plain list of related tags queried from Bibsonomy<br />The other three lists denote communities or communities of communities computed from the community structure algorithm<br />Each relation (line) in each of the Edge List file consists as follows:<br />The first edge list: <tagi, tagj, weight><br />page 41<br />
  43. 43. State of the Art Tool<br />The other three edge lists:<br /><communityi, tagj, weight> or <br /><communityi, communityj, weight><br />The Edge List files contain:<br />The first (lower level): 13126 nodes with 264718 edges<br />The second (first pass): 529 nodes with 6337 edges<br />The third (second pass): 65 nodes with 374 edges<br />The fourth (third pass): 50 nodes with 207 edges<br />page 42<br />
  44. 44. State of the Art Tool<br />A sample from one of the edge lists (the lower level file)<br />caching,offlinebrowser,2.0<br />caching,archiving,2.0<br />institutions,activity,1.0<br />malian,senegal,2.0<br />malian,northern,2.0<br />malian,guinea,2.0<br />malian,drummers,2.0<br />cdf,c,1.0<br />cdf,library,1.0<br />page 43<br />
  45. 45. State of the Art Tool<br />First Task: To semantically represent all edge lists that represent the hierarchical structure<br />Since the lower level edge list is made up of a set of tags, then the tags will be described using the SCOT ontology<br />But to represent the hierarchical structure of communities, a new ontology must be designed that needs to be built on top of SCOT and also, the new ontology must be linked to SCOT<br />page 44<br />
  46. 46. State of the Art Tool<br />The Community Structure Ontology<br />page 45<br />CommunityStructure<br />UnfoldedCommunity<br />UnfoldingActivity<br />Community<br />CommunityAggregation<br />linkedIn<br />associatedCommunity<br />linkedWith<br />Linkage<br />name<br />sioc:Resource<br />modularity<br />pass<br />linkedTag<br />communityOf<br />Community<br />linkWeight<br />scot:Tag<br />
  47. 47. State of the Art Tool<br />Ontology was designed with a tool called Protege – A Java application for designing Ontolotgies<br />Ontology built on OWL2<br />Classes: CommunityStructure, Community, CommunityAggregation, Linkage<br />Object properties: associatedCommunity, communityOf, linkedIn, linkedTag, linkedWith, unfoldedCommunity, unfoldingActivity<br />Data properties: communityName, linkWeight, modularity, pass<br />page 46<br />
  48. 48. State of the Art Tool<br />Second Task: To create an application that will transform the edge lists to RDF/XML statements and store the documents on physical storage. Also, a query engine will be included into the application to query the RDF/XML statements.<br />The application is developed using the Java programming language.<br />For the creation of RDF/XML statements and to write such statements to physical storage, a widely used API is embedded in the system. This API is called the JENA API<br />page 47<br />
  49. 49. State of the Art Tool<br />Jena – A Semantic Web Framework<br />Developed by HP<br />An RDF API for reading and writing RDF models in RDF/XML<br />An OWL API for reading and writing OWL ontologies<br />In-memory and persistent storage for writing RDF/XML statements to memory or physical storage such as text files or even relational databases<br />SPARQL query engine<br />page 48<br />
  50. 50. State of the Art Tool<br />The Tool<br />page 49<br />
  51. 51. State of the Art Tool<br />The tool provides the following features:<br />Properties to setup:<br />The Edge List Directory<br />The Edge List File Structure<br />page 50<br />
  52. 52. State of the Art Tool<br />Settings to setup the type of storage required<br />RDF/XML documents<br />page 51<br />
  53. 53. State of the Art Tool<br />Relational database persistent storage<br />A TDB storage, a custom fast persistent storage<br />page 52<br />
  54. 54. State of the Art Tool<br />Properties to setup the Ontologies<br />page 53<br />
  55. 55. State of the Art Tool<br />The Method to transform the edge list to RDF Statements:<br />First, the edge lists are merged together and ordered according to their hierarchical structure<br />Second, the RDF Model consisting of RDF statements are created according to the Community Structure and SCOT Ontologies<br />Third, RDF statements are written according to the settings setup.<br />page 54<br />
  56. 56. State of the Art Tool<br />Writing of RDF Statements<br />RDF Documents:<br />For whole documents: the whole document is written after the whole model is created<br />For split documents: documents are written after the model for each community is created.<br />Two index lists are created, one A-Z and an other to indicate where each community document is located<br />page 55<br />
  57. 57. State of the Art Tool<br />Writing of RDF Statements<br />RDF Persistent Storage<br />RDB Method: MySQL is used as a persistent relational databases and RDF statements are written on-the-fly, i.e. After each statement is created, these are written in the database<br />TDB Method: each statement is written on-the-fly as well<br />page 56<br />
  58. 58. State of the Art Tool<br />Writing of RDF Statements (Results)<br />page 57<br />
  59. 59. State of the Art Tool<br />Querying Statements<br />For RDF Documents Corese SPARQL Engine was used <br />Corese SPARQL Engine is developed by Edelweiss<br />Built on top of Jena with some added enhancements such as Approximated Searches, Select Expressions<br />Queries only RDF documents and does not have the capability of querying directly to relational databases<br />page 58<br />
  60. 60. State of the Art Tool<br />Querying Statements<br />For Persistent Storage, the Jena SPARQL Engine is used since Jena allows for direct querying<br />Querying Methods<br />RDF Documents (Split Documents):<br />First query index lists<br />Get community document<br />Query community document and get linked communities<br />Query index list and query contents for each community<br />page 59<br />
  61. 61. State of the Art Tool<br />Querying Methods<br />RDF Documents (Whole Documents)<br />Query whole model and query for community<br />Retrieve linked communities<br />Query linked communities for their content<br />Persistent Storage<br />Query whole model and query for community<br />Retrieve linked communities<br />Query linked communities for their content<br />page 60<br />
  62. 62. State of the Art Tool<br />Querying Statements (Results)<br />Results are based on a community called malian<br />This community has 57 linked communities and 15 linked tags<br />page 61<br />
  63. 63. State of the Art Tool<br />Other features<br />RDF Document Viewer<br />page 62<br />
  64. 64. Conclusion<br />page 63<br />
  65. 65. Conclusion<br />In this research we have seen the importance of Semantic Web and to describe semantically Web data<br />We have seen the importance of using folksonomies for search and exploration <br />Additionally, we have also seen various ontologies of how such folksonomies can be semantically represented<br />From community structure algorithms and graph mining techniques, new relationships amongst other tags can be unfolded<br />page 64<br />
  66. 66. Conclusion<br />An ontology was designed and developed for the fast unfolding of communities in large networks<br />From this ontology, RDF/XML statements can be created and are linked to the SCOT ontology<br />We have seen that by using Triple Stores, persistent storage for triple statements is much faster for querying<br />page 65<br />
  67. 67. Future Enhancements<br />page 66<br />
  68. 68. Future Enhancements<br />To try this model on larger tag models from different websites<br />To include the tagger and links to the actual resource<br />To analyse these links that contribute to the linked data initiative<br />Optimise writing and querying based on larger models<br />page 67<br />
  69. 69. page 68<br />

×