Exploiting Semantic Web Techniques for Representing and Utilising FolksonomiesOwen Sacco
page 1Presentation Map
page 2Presentation MapIntroductionAim & GoalsThe Semantic WebMeta Formats, Vocabularies & Query LanguageWeb 2.0Web 2.0 Technologies & ApplicationsFolksonomiesTags, Tagging, Representing Tags Semantically & Integrating Folksonomies with the Semantic Web
Presentation MapGraph Mining TechniquesFast Unfolding of Communities in Large NetworksState of the Art ToolExamining the Edge ListThe Community Structure OntologyJena & CoreseCreating & Querying RDF StatementsAnalysis & ResultsConclusionEnhancements & Future Workpage 3
Introductionpage 4
IntroductionThe research is about:Understanding various Semantic Web technologies for representing data semanticallyUnderstanding Folksonomies and how to semantically represent themTo semantically represent tags retrieved from Bibsonomy (http://www.bibsonomy.org/) The tags have been hierarchically structured using the algorithm “fast unfolding of communities in large networks”Use Semantic Web technologies to create and exploit such representation of tagspage 5
The Semantic Webpage 6
The Semantic Webpage 7What is the Semantic Web?Not a separate Web An extension of the current WebSemantic = MeaningSemantic Web = Meaningful DataMeaning is data about data, i.e. MetadataAdvantages of Semantic Web:Information is given well-defined meaning Better enabling computersPeople to work in cooperation	               Source: W3C Semantic Web
The Semantic WebResource Description Framework (RDF)A framework that describes resources on the WWWSuitable for merging data on the WebResources are uniquely identified by URLsThe RDF Model is made up of triple statementsTriple Statements: Subject, Predicate & Objectpage 8PREDICATESUBJECTOBJECT
The Semantic WebAn RDF Model can be serialised in RDF/XMLAn example of RDF document<?xml version="1.0"?> <rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#"> <contact:Personrdf:about="http://www.w3.org/People/EM/contact#me"> <contact:fullName>Eric Miller</contact:fullName> <contact:mailboxrdf:resource="mailto:em@w3.org"/> <contact:personalTitle>Dr.</contact:personalTitle> </contact:Person> </rdf:RDF> Source: W3C RDF Primerpage 9
The Semantic WebOntology“A formal explicit specification of a shared conceptualisation”In other words: parties having a common concept of data agree and specify clearly as possible such conceptsIt is an enabling technology for information sharing and manipulationA vocabulary for RDF documentsOntologies are based on RDF models and are expressed by using the Web Ontology Language page 10
The Semantic WebSPARQL – An RDF Query LanguageQuery in the Semantic Web context means: “Technologies and protocols that programmatically retrieve information from the Web of Data”.Based on triple patterns similar to RDF triplesA query returns resources for all RDF triples that match the query’s patternIs used to return complex data for mash-ups or search engines containing semantic dataSyntax is similar to SQLSource: W3Cpage 11
Web 2.0page 12
Web 2.0A “Read/Write” WebWeb 2.0 has:Facilitated web designProvided attractive, rich, easy-to-use interfacesAssisted in reuse of data by merging information from various sourcesCreated social networks of peopleAccording to Internet World Stats, between 2000 and 2003 users doubled thanks to Friendster (one of the first social network websites)Source: Internet World Stats - Internet Growth Statisticspage 13
Web 2.0Web 2.0 is considered a Social WebPeople are more involved by collaborating & sharing dataOne of the major Web 2.0 technologies for web development is AJAXA combination of several technologies:HTML or XHTMLCascading Style Sheets (CSS)Java ScriptXMLpage 14
Web 2.0Web 2.0 created new application concepts:Blogs (Blogger, WordPress)Wikis (Wikipedia)Really Simple Syndication, RSSMashups (MusicMesh, BBC Music)Social Networks (Facebook, LinkedIn, MySpace)Social Bookmarking (delicious, Bibsonomy)Photo Sharing (Flickr)Video Sharing (YouTube, Vimeo)In most of these concepts you find Tagging!page 15
Folksonomiespage 16
FolksonomiesTag“A non-hierarchical keyword or term”Tagging“Assign a tag to a piece of information or resources”Tagger“The person that assigns the tag”Folksonomy“The result of personal free tagging of information and objects for one’s own retrieval. The tagging is done in a social environment.”  Thomas Vander Wal (2004)page 17
FolksonomiesTag Clouda visualisation of popular tags popular tags stem out from others by being in larger font or emphasisedpage 18
Folksonomiespage 19Where can we tag?Social Bookmarking websites
FolksonomiesPicture sharing websitespage 20
FolksonomiesVideo sharing websitespage 21
FolksonomiesWhy tagging?It’s PopularNowadays, practically anyone who uses a computer or the Internet is exposed to tagging in some way.It’s SocialThrough the most popular tags, we can see a kind of rough consensus on the subject of the resource.It’s FlexibleAd-hoc, free-form and does not adhere to any strict classification scheme or vocabulary.page 22
FolksonomiesBasic ModelTaggers create the tags, and sometimes they add resources.If we can identify something, then it can be tagged.Tagging is open-ended, tags can be any kind of term.page 23Source: Smith G. 2008. Tagging People-Powered Metadata for the Social Web
FolksonomiesHow about:Collaborative sharing tags across multiple applicationsCollaborative filtering based on taggingConnecting people based on taggingAll these can be achieved through Tag OntologiesOntology is not a taxonomyOntology makes semantic agreementSemantic agreement enables useful compositionpage 24
FolksonomiesRichard Newman’s Tag Ontologypage 25Source: Haklae Kim et al., Review and Alignment of Tag Ontologies for Semantically-Linked Data in Collaborative Tagging Spaces
FolksonomiesTom Gruber’s Conceptual ModelTagging(object, tag, tagger, source, + or -)page 26Source: Gruber T., Ontology for Folksonomy: A Mash-Up of Apples and Oranges.
FolksonomiesLimitations of tagging:Ambiguity of tags (example: apple is it a fruit or the computer company?)Lack of synonymy (example: lorry or truck)Discrepancies in granularity (example: java vs programming language)Flat Organisation of FolksonomyHow do we overcome these?Use: CommonTag, MOAT, SCOTpage 27
FolksonomiesCommonTagTo add concepts to tags from databases such as Freebase and DPPedia  page 28Source: CommonTag
FolksonomiesMeaning Of A Tag (MOAT)An ontology to represent how different meanings (URIs of semantic Web resources) can be related to a tagExtends the Tag class from Richard Newman’s tag ontologyTagging (User, Resource, Tag, Meaning)Architecture of MOAT Framework:MOAT server stores different meanings that can be queriedMOAT client interacts with the server to let users easily annotate their contentpage 29
FolksonomiesSocial Semantic Cloud of Tags (SCOT)An ontology aimed to represent set of tagsBuilt on top of Richard Newman’s Tag Ontologypage 30Source: SCOT: Let's Share Tags!
FolksonomiesLimitations of the previous ontologies:An extra step is being added to the tagging activityIsn’t it daunting for the user when presented with a list of meanings to choose from? Which meaning shall the user choose?Will tagging remain popular with this additional step?If an automatic process is used to select a meaning of a tag, how accurate can this process be? Can this process really understand the user at that instance? page 31
FolksonomiesWith this additional meaning, isn’t tagging becoming another “strict” classification scheme?Can relationships of tags really be built on meanings?How about using some form of algorithm that can unfold new relationships of tags?page 32
Fast Unfolding of Communities in Large Networks page 33
Fast Unfolding of Communities in Large NetworksA recursive method to extract the community structure of large networksThis method is based on modularity optimisationThe modularity is a scalar value that measures the density of links inside communities as compared to links between communitiesIt unfolds a complete hierarchical community structure for large networks in a short timeResults have shown that on a network of 118 million nodes, the algorithm took 152 minutespage 34Source: Blondel V.B. et al. 2008. Fast unfolding of communities in large networks
Fast Unfolding of Communities in Large NetworksThe algorithm consists of two phases which are iterated until a maximum modularity is attained.First, all nodes are assigned to different communities.Then each node is compared with its neighbours. The node is placed in the community which yields a maximum gain in modularity.This process is repeated for all nodes until no further movement can be attained.The second phase consists of building a network whose nodes are now the communities found during the first phase.page 35
Fast Unfolding of Communities in Large NetworksAfter the second phase, the process starts again with the first phaseA “pass” denotes a combination of both passesThe “passes” are iterated until there are no more changes and the maximum modularity is reached for the whole networkThe height of the network denotes in the number of passesAt the end, a hierarchical structure is attained that consists of communities of communities.page 36
Fast Unfolding of Communities in Large Networkspage 37
State of the Art Toolpage 38
State of the Art ToolThe DataIt is provided beforehandConsists of a hierarchical structure made up of communities of communities of related tags This hierarchical structure is constructed using the “Fast Unfolding of Communities in Large Networks” algorithmThe tags are from the Social Bookmarking Website Bibsonomy (http://www.bibsonomy.org/)The aim for using the community structure algorithm is to unfold new relationships amongst tagspage 39
State of the Art ToolA visualisation of tagging graph that depicts the relationships amongst tagspage 40
State of the Art ToolThe Input to the system will consist of Edge ListsEach Edge List file consists of a pass4 Edge List files were used for this system: The first list is a plain list of related tags queried from BibsonomyThe other three lists denote communities or communities of communities computed from the community structure algorithmEach relation (line) in each of the Edge List file consists as follows:The first edge list: <tagi, tagj, weight>page 41
State of the Art ToolThe other three edge lists:<communityi, tagj, weight> or <communityi, communityj, weight>The Edge List files contain:The first (lower level): 13126 nodes with 264718 edgesThe second (first pass): 529 nodes with 6337 edgesThe third (second pass): 65 nodes with 374 edgesThe fourth (third pass): 50 nodes with 207 edgespage 42
State of the Art ToolA sample from one of the edge lists (the lower level file)caching,offlinebrowser,2.0caching,archiving,2.0institutions,activity,1.0malian,senegal,2.0malian,northern,2.0malian,guinea,2.0malian,drummers,2.0cdf,c,1.0cdf,library,1.0page 43
State of the Art ToolFirst Task: To semantically represent all edge lists that represent the hierarchical structureSince the lower level edge list is made up of a set of tags, then the tags will be described using the SCOT ontologyBut to represent the hierarchical structure of communities, a new ontology must be designed that needs to be built on top of SCOT and also, the new ontology must  be linked to SCOTpage 44
State of the Art ToolThe Community Structure Ontologypage 45CommunityStructureUnfoldedCommunityUnfoldingActivityCommunityCommunityAggregationlinkedInassociatedCommunitylinkedWithLinkagenamesioc:ResourcemodularitypasslinkedTagcommunityOfCommunitylinkWeightscot:Tag
State of the Art ToolOntology was designed with a tool called Protege – A Java application for designing OntolotgiesOntology built on OWL2Classes: CommunityStructure, Community, CommunityAggregation, LinkageObject properties: associatedCommunity, communityOf, linkedIn, linkedTag, linkedWith, unfoldedCommunity, unfoldingActivityData properties: communityName, linkWeight, modularity, passpage 46
State of the Art ToolSecond Task: To create an application that will transform the edge lists to RDF/XML statements and store the documents on physical storage.  Also, a query engine will be included into the application to query the RDF/XML statements.The application is developed using the Java programming language.For the creation of RDF/XML statements and to write such statements to physical storage, a widely used API is embedded in the system.  This API is called the JENA APIpage 47
State of the Art ToolJena – A Semantic Web FrameworkDeveloped by HPAn RDF API for reading and writing RDF models in RDF/XMLAn OWL API for reading and writing OWL ontologiesIn-memory and persistent storage for writing RDF/XML statements to memory or physical storage such as text files or even relational databasesSPARQL query enginepage 48
State of the Art ToolThe Toolpage 49
State of the Art ToolThe tool provides the following features:Properties to setup:The Edge List DirectoryThe Edge List File Structurepage 50
State of the Art ToolSettings to setup the type of storage requiredRDF/XML documentspage 51
State of the Art ToolRelational database persistent storageA TDB storage, a custom fast persistent storagepage 52
State of the Art ToolProperties to setup the Ontologiespage 53
State of the Art ToolThe Method to transform the edge list to RDF Statements:First, the edge lists are merged together and ordered according to their hierarchical structureSecond, the RDF Model consisting of RDF statements are created according to the Community Structure and SCOT OntologiesThird, RDF statements are written according to the settings setup.page 54
State of the Art ToolWriting of RDF StatementsRDF Documents:For whole documents: the whole document is written after the whole model is createdFor split documents: documents are written after the model for each community is created.Two index lists are created, one A-Z and an other to indicate where each community document is locatedpage 55
State of the Art ToolWriting of RDF StatementsRDF Persistent StorageRDB Method: MySQL is used as a persistent relational databases and RDF statements are written on-the-fly, i.e. After each statement is created, these are written in the databaseTDB Method: each statement is written on-the-fly as wellpage 56
State of the Art ToolWriting of RDF Statements (Results)page 57
State of the Art ToolQuerying StatementsFor RDF Documents Corese SPARQL Engine was used Corese SPARQL Engine is developed by EdelweissBuilt on top of Jena with some added enhancements such as Approximated Searches, Select ExpressionsQueries only RDF documents and does not have the capability of querying directly to relational databasespage 58
State of the Art ToolQuerying StatementsFor Persistent Storage, the Jena SPARQL Engine is used since Jena allows for direct queryingQuerying MethodsRDF Documents (Split Documents):First query index listsGet community documentQuery community document and get linked communitiesQuery index list and query contents for each communitypage 59
State of the Art ToolQuerying MethodsRDF Documents (Whole Documents)Query whole model and query for communityRetrieve linked communitiesQuery linked communities for their contentPersistent StorageQuery whole model and query for communityRetrieve linked communitiesQuery linked communities for their contentpage 60
State of the Art ToolQuerying Statements (Results)Results are based on a community called malianThis community has 57 linked communities and 15 linked tagspage 61
State of the Art ToolOther featuresRDF Document Viewerpage 62
Conclusionpage 63
ConclusionIn this research we have seen the importance of Semantic Web and to describe semantically Web dataWe have seen the importance of using folksonomies for search and exploration Additionally, we have also seen various ontologies of how such folksonomies can be semantically representedFrom community structure algorithms and graph mining techniques, new relationships amongst other tags can be unfoldedpage 64
ConclusionAn ontology was designed and developed for the fast unfolding of communities in large networksFrom this ontology, RDF/XML statements can be created and are linked to the SCOT ontologyWe have seen that by using Triple Stores, persistent storage for triple statements is much faster for queryingpage 65
Future Enhancementspage 66
Future EnhancementsTo try this model on larger tag models from different websitesTo include the tagger and links to the actual resourceTo analyse these links that contribute to the linked data initiativeOptimise writing and querying based on larger modelspage 67
page 68

Exploiting Semantic Web Techniques For Representing And Utilising

  • 1.
    Exploiting Semantic WebTechniques for Representing and Utilising FolksonomiesOwen Sacco
  • 2.
  • 3.
    page 2Presentation MapIntroductionAim& GoalsThe Semantic WebMeta Formats, Vocabularies & Query LanguageWeb 2.0Web 2.0 Technologies & ApplicationsFolksonomiesTags, Tagging, Representing Tags Semantically & Integrating Folksonomies with the Semantic Web
  • 4.
    Presentation MapGraph MiningTechniquesFast Unfolding of Communities in Large NetworksState of the Art ToolExamining the Edge ListThe Community Structure OntologyJena & CoreseCreating & Querying RDF StatementsAnalysis & ResultsConclusionEnhancements & Future Workpage 3
  • 5.
  • 6.
    IntroductionThe research isabout:Understanding various Semantic Web technologies for representing data semanticallyUnderstanding Folksonomies and how to semantically represent themTo semantically represent tags retrieved from Bibsonomy (http://www.bibsonomy.org/) The tags have been hierarchically structured using the algorithm “fast unfolding of communities in large networks”Use Semantic Web technologies to create and exploit such representation of tagspage 5
  • 7.
  • 8.
    The Semantic Webpage7What is the Semantic Web?Not a separate Web An extension of the current WebSemantic = MeaningSemantic Web = Meaningful DataMeaning is data about data, i.e. MetadataAdvantages of Semantic Web:Information is given well-defined meaning Better enabling computersPeople to work in cooperation Source: W3C Semantic Web
  • 9.
    The Semantic WebResourceDescription Framework (RDF)A framework that describes resources on the WWWSuitable for merging data on the WebResources are uniquely identified by URLsThe RDF Model is made up of triple statementsTriple Statements: Subject, Predicate & Objectpage 8PREDICATESUBJECTOBJECT
  • 10.
    The Semantic WebAnRDF Model can be serialised in RDF/XMLAn example of RDF document<?xml version="1.0"?> <rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#"> <contact:Personrdf:about="http://www.w3.org/People/EM/contact#me"> <contact:fullName>Eric Miller</contact:fullName> <contact:mailboxrdf:resource="mailto:em@w3.org"/> <contact:personalTitle>Dr.</contact:personalTitle> </contact:Person> </rdf:RDF> Source: W3C RDF Primerpage 9
  • 11.
    The Semantic WebOntology“Aformal explicit specification of a shared conceptualisation”In other words: parties having a common concept of data agree and specify clearly as possible such conceptsIt is an enabling technology for information sharing and manipulationA vocabulary for RDF documentsOntologies are based on RDF models and are expressed by using the Web Ontology Language page 10
  • 12.
    The Semantic WebSPARQL– An RDF Query LanguageQuery in the Semantic Web context means: “Technologies and protocols that programmatically retrieve information from the Web of Data”.Based on triple patterns similar to RDF triplesA query returns resources for all RDF triples that match the query’s patternIs used to return complex data for mash-ups or search engines containing semantic dataSyntax is similar to SQLSource: W3Cpage 11
  • 13.
  • 14.
    Web 2.0A “Read/Write”WebWeb 2.0 has:Facilitated web designProvided attractive, rich, easy-to-use interfacesAssisted in reuse of data by merging information from various sourcesCreated social networks of peopleAccording to Internet World Stats, between 2000 and 2003 users doubled thanks to Friendster (one of the first social network websites)Source: Internet World Stats - Internet Growth Statisticspage 13
  • 15.
    Web 2.0Web 2.0is considered a Social WebPeople are more involved by collaborating & sharing dataOne of the major Web 2.0 technologies for web development is AJAXA combination of several technologies:HTML or XHTMLCascading Style Sheets (CSS)Java ScriptXMLpage 14
  • 16.
    Web 2.0Web 2.0created new application concepts:Blogs (Blogger, WordPress)Wikis (Wikipedia)Really Simple Syndication, RSSMashups (MusicMesh, BBC Music)Social Networks (Facebook, LinkedIn, MySpace)Social Bookmarking (delicious, Bibsonomy)Photo Sharing (Flickr)Video Sharing (YouTube, Vimeo)In most of these concepts you find Tagging!page 15
  • 17.
  • 18.
    FolksonomiesTag“A non-hierarchical keywordor term”Tagging“Assign a tag to a piece of information or resources”Tagger“The person that assigns the tag”Folksonomy“The result of personal free tagging of information and objects for one’s own retrieval. The tagging is done in a social environment.” Thomas Vander Wal (2004)page 17
  • 19.
    FolksonomiesTag Clouda visualisationof popular tags popular tags stem out from others by being in larger font or emphasisedpage 18
  • 20.
    Folksonomiespage 19Where canwe tag?Social Bookmarking websites
  • 21.
  • 22.
  • 23.
    FolksonomiesWhy tagging?It’s PopularNowadays,practically anyone who uses a computer or the Internet is exposed to tagging in some way.It’s SocialThrough the most popular tags, we can see a kind of rough consensus on the subject of the resource.It’s FlexibleAd-hoc, free-form and does not adhere to any strict classification scheme or vocabulary.page 22
  • 24.
    FolksonomiesBasic ModelTaggers createthe tags, and sometimes they add resources.If we can identify something, then it can be tagged.Tagging is open-ended, tags can be any kind of term.page 23Source: Smith G. 2008. Tagging People-Powered Metadata for the Social Web
  • 25.
    FolksonomiesHow about:Collaborative sharingtags across multiple applicationsCollaborative filtering based on taggingConnecting people based on taggingAll these can be achieved through Tag OntologiesOntology is not a taxonomyOntology makes semantic agreementSemantic agreement enables useful compositionpage 24
  • 26.
    FolksonomiesRichard Newman’s TagOntologypage 25Source: Haklae Kim et al., Review and Alignment of Tag Ontologies for Semantically-Linked Data in Collaborative Tagging Spaces
  • 27.
    FolksonomiesTom Gruber’s ConceptualModelTagging(object, tag, tagger, source, + or -)page 26Source: Gruber T., Ontology for Folksonomy: A Mash-Up of Apples and Oranges.
  • 28.
    FolksonomiesLimitations of tagging:Ambiguityof tags (example: apple is it a fruit or the computer company?)Lack of synonymy (example: lorry or truck)Discrepancies in granularity (example: java vs programming language)Flat Organisation of FolksonomyHow do we overcome these?Use: CommonTag, MOAT, SCOTpage 27
  • 29.
    FolksonomiesCommonTagTo add conceptsto tags from databases such as Freebase and DPPedia page 28Source: CommonTag
  • 30.
    FolksonomiesMeaning Of ATag (MOAT)An ontology to represent how different meanings (URIs of semantic Web resources) can be related to a tagExtends the Tag class from Richard Newman’s tag ontologyTagging (User, Resource, Tag, Meaning)Architecture of MOAT Framework:MOAT server stores different meanings that can be queriedMOAT client interacts with the server to let users easily annotate their contentpage 29
  • 31.
    FolksonomiesSocial Semantic Cloudof Tags (SCOT)An ontology aimed to represent set of tagsBuilt on top of Richard Newman’s Tag Ontologypage 30Source: SCOT: Let's Share Tags!
  • 32.
    FolksonomiesLimitations of theprevious ontologies:An extra step is being added to the tagging activityIsn’t it daunting for the user when presented with a list of meanings to choose from? Which meaning shall the user choose?Will tagging remain popular with this additional step?If an automatic process is used to select a meaning of a tag, how accurate can this process be? Can this process really understand the user at that instance? page 31
  • 33.
    FolksonomiesWith this additionalmeaning, isn’t tagging becoming another “strict” classification scheme?Can relationships of tags really be built on meanings?How about using some form of algorithm that can unfold new relationships of tags?page 32
  • 34.
    Fast Unfolding ofCommunities in Large Networks page 33
  • 35.
    Fast Unfolding ofCommunities in Large NetworksA recursive method to extract the community structure of large networksThis method is based on modularity optimisationThe modularity is a scalar value that measures the density of links inside communities as compared to links between communitiesIt unfolds a complete hierarchical community structure for large networks in a short timeResults have shown that on a network of 118 million nodes, the algorithm took 152 minutespage 34Source: Blondel V.B. et al. 2008. Fast unfolding of communities in large networks
  • 36.
    Fast Unfolding ofCommunities in Large NetworksThe algorithm consists of two phases which are iterated until a maximum modularity is attained.First, all nodes are assigned to different communities.Then each node is compared with its neighbours. The node is placed in the community which yields a maximum gain in modularity.This process is repeated for all nodes until no further movement can be attained.The second phase consists of building a network whose nodes are now the communities found during the first phase.page 35
  • 37.
    Fast Unfolding ofCommunities in Large NetworksAfter the second phase, the process starts again with the first phaseA “pass” denotes a combination of both passesThe “passes” are iterated until there are no more changes and the maximum modularity is reached for the whole networkThe height of the network denotes in the number of passesAt the end, a hierarchical structure is attained that consists of communities of communities.page 36
  • 38.
    Fast Unfolding ofCommunities in Large Networkspage 37
  • 39.
    State of theArt Toolpage 38
  • 40.
    State of theArt ToolThe DataIt is provided beforehandConsists of a hierarchical structure made up of communities of communities of related tags This hierarchical structure is constructed using the “Fast Unfolding of Communities in Large Networks” algorithmThe tags are from the Social Bookmarking Website Bibsonomy (http://www.bibsonomy.org/)The aim for using the community structure algorithm is to unfold new relationships amongst tagspage 39
  • 41.
    State of theArt ToolA visualisation of tagging graph that depicts the relationships amongst tagspage 40
  • 42.
    State of theArt ToolThe Input to the system will consist of Edge ListsEach Edge List file consists of a pass4 Edge List files were used for this system: The first list is a plain list of related tags queried from BibsonomyThe other three lists denote communities or communities of communities computed from the community structure algorithmEach relation (line) in each of the Edge List file consists as follows:The first edge list: <tagi, tagj, weight>page 41
  • 43.
    State of theArt ToolThe other three edge lists:<communityi, tagj, weight> or <communityi, communityj, weight>The Edge List files contain:The first (lower level): 13126 nodes with 264718 edgesThe second (first pass): 529 nodes with 6337 edgesThe third (second pass): 65 nodes with 374 edgesThe fourth (third pass): 50 nodes with 207 edgespage 42
  • 44.
    State of theArt ToolA sample from one of the edge lists (the lower level file)caching,offlinebrowser,2.0caching,archiving,2.0institutions,activity,1.0malian,senegal,2.0malian,northern,2.0malian,guinea,2.0malian,drummers,2.0cdf,c,1.0cdf,library,1.0page 43
  • 45.
    State of theArt ToolFirst Task: To semantically represent all edge lists that represent the hierarchical structureSince the lower level edge list is made up of a set of tags, then the tags will be described using the SCOT ontologyBut to represent the hierarchical structure of communities, a new ontology must be designed that needs to be built on top of SCOT and also, the new ontology must be linked to SCOTpage 44
  • 46.
    State of theArt ToolThe Community Structure Ontologypage 45CommunityStructureUnfoldedCommunityUnfoldingActivityCommunityCommunityAggregationlinkedInassociatedCommunitylinkedWithLinkagenamesioc:ResourcemodularitypasslinkedTagcommunityOfCommunitylinkWeightscot:Tag
  • 47.
    State of theArt ToolOntology was designed with a tool called Protege – A Java application for designing OntolotgiesOntology built on OWL2Classes: CommunityStructure, Community, CommunityAggregation, LinkageObject properties: associatedCommunity, communityOf, linkedIn, linkedTag, linkedWith, unfoldedCommunity, unfoldingActivityData properties: communityName, linkWeight, modularity, passpage 46
  • 48.
    State of theArt ToolSecond Task: To create an application that will transform the edge lists to RDF/XML statements and store the documents on physical storage. Also, a query engine will be included into the application to query the RDF/XML statements.The application is developed using the Java programming language.For the creation of RDF/XML statements and to write such statements to physical storage, a widely used API is embedded in the system. This API is called the JENA APIpage 47
  • 49.
    State of theArt ToolJena – A Semantic Web FrameworkDeveloped by HPAn RDF API for reading and writing RDF models in RDF/XMLAn OWL API for reading and writing OWL ontologiesIn-memory and persistent storage for writing RDF/XML statements to memory or physical storage such as text files or even relational databasesSPARQL query enginepage 48
  • 50.
    State of theArt ToolThe Toolpage 49
  • 51.
    State of theArt ToolThe tool provides the following features:Properties to setup:The Edge List DirectoryThe Edge List File Structurepage 50
  • 52.
    State of theArt ToolSettings to setup the type of storage requiredRDF/XML documentspage 51
  • 53.
    State of theArt ToolRelational database persistent storageA TDB storage, a custom fast persistent storagepage 52
  • 54.
    State of theArt ToolProperties to setup the Ontologiespage 53
  • 55.
    State of theArt ToolThe Method to transform the edge list to RDF Statements:First, the edge lists are merged together and ordered according to their hierarchical structureSecond, the RDF Model consisting of RDF statements are created according to the Community Structure and SCOT OntologiesThird, RDF statements are written according to the settings setup.page 54
  • 56.
    State of theArt ToolWriting of RDF StatementsRDF Documents:For whole documents: the whole document is written after the whole model is createdFor split documents: documents are written after the model for each community is created.Two index lists are created, one A-Z and an other to indicate where each community document is locatedpage 55
  • 57.
    State of theArt ToolWriting of RDF StatementsRDF Persistent StorageRDB Method: MySQL is used as a persistent relational databases and RDF statements are written on-the-fly, i.e. After each statement is created, these are written in the databaseTDB Method: each statement is written on-the-fly as wellpage 56
  • 58.
    State of theArt ToolWriting of RDF Statements (Results)page 57
  • 59.
    State of theArt ToolQuerying StatementsFor RDF Documents Corese SPARQL Engine was used Corese SPARQL Engine is developed by EdelweissBuilt on top of Jena with some added enhancements such as Approximated Searches, Select ExpressionsQueries only RDF documents and does not have the capability of querying directly to relational databasespage 58
  • 60.
    State of theArt ToolQuerying StatementsFor Persistent Storage, the Jena SPARQL Engine is used since Jena allows for direct queryingQuerying MethodsRDF Documents (Split Documents):First query index listsGet community documentQuery community document and get linked communitiesQuery index list and query contents for each communitypage 59
  • 61.
    State of theArt ToolQuerying MethodsRDF Documents (Whole Documents)Query whole model and query for communityRetrieve linked communitiesQuery linked communities for their contentPersistent StorageQuery whole model and query for communityRetrieve linked communitiesQuery linked communities for their contentpage 60
  • 62.
    State of theArt ToolQuerying Statements (Results)Results are based on a community called malianThis community has 57 linked communities and 15 linked tagspage 61
  • 63.
    State of theArt ToolOther featuresRDF Document Viewerpage 62
  • 64.
  • 65.
    ConclusionIn this researchwe have seen the importance of Semantic Web and to describe semantically Web dataWe have seen the importance of using folksonomies for search and exploration Additionally, we have also seen various ontologies of how such folksonomies can be semantically representedFrom community structure algorithms and graph mining techniques, new relationships amongst other tags can be unfoldedpage 64
  • 66.
    ConclusionAn ontology wasdesigned and developed for the fast unfolding of communities in large networksFrom this ontology, RDF/XML statements can be created and are linked to the SCOT ontologyWe have seen that by using Triple Stores, persistent storage for triple statements is much faster for queryingpage 65
  • 67.
  • 68.
    Future EnhancementsTo trythis model on larger tag models from different websitesTo include the tagger and links to the actual resourceTo analyse these links that contribute to the linked data initiativeOptimise writing and querying based on larger modelspage 67
  • 69.

Editor's Notes

  • #13 Web 2.0 is the second generation of the web that evolved from Web 1.0Web 1.0 was a read only web with static content and lacked user involvementWeb 1.0 site under construction and Web 2.0 is beta
  • #26 Taggers are foaf:Agents Taggings reify the n-ary relationship between a tagger, a tag, a resource, and a date.Tags are members of a Tag classTags have names
  • #27 Notably, Gruber defines the source as the scope of namespaces or universe of quantification for objects.The object in this model represents the content which is being tagged; the tag is the label or word used to tag with; the tagger represents who tagged the object; the source is the system where the actual tagging model is stored; the polarity represents a + or –, which is “a vote” of the tagging fact, that is to assert that the tagging fact is true or not.
  • #28 Limitations of tagging due to its independence and free-form structureDiscrepancies: java is to specific for some users but programming language is to generic for others
  • #29 Freebase provides datasets built by communities that are freely accessible. Freebase offers tools that help developers access and control the content contained within these datasetsDBPedia extracts information from the online encyclopaedia Wikipedia and provides such information in a semantic format that can be processed by machines.
  • #41 The nodes contain numbers that are connected to tags