analysis of a real online social network using semantic web frameworks

6,936 views
6,818 views

Published on

research paper at #ISWC2009
http://www-sop.inria.fr/members/Fabien.Gandon/docs/ISWC2009_ereteo_et_al.pdf.

Published in: Technology
3 Comments
17 Likes
Statistics
Notes
No Downloads
Views
Total views
6,936
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
229
Comments
3
Likes
17
Embeds 0
No embeds

No notes for slide
  • Since its birth, the web has provided many ways of interaction between people , revealing real social network structures. Social networks have been extracted from: hyperlink structure of home pages co-occurrence of names Synchronous and asynchronous communications
  • The social network effect of the web have been amplified by the deployment of a social media landscape where “expressing tools allow users to express themselves, discuss and aggregate their social life”, “sharing tools allow users to publish and share content”, and “networking tools allow users to search, connect and interact with each other” . Social platforms, like Facebook, Orkut, Hi5, etc., are at the center of this landscape as they enable us to host and aggregate these different social applications. You can publish and share your del.icio.us bookmarks, your RSS streams or your microblog posts via the Facebook news feed, thanks to dedicated Facebook applications. This integration of various means for publishing and socializing enables us to quickly share, recommend and propagate information to our social network, trigger reactions, and finally enrich it. Collaborative applications now capture more and more aspects of physical social networks and human interactions in a decentralized way. Such rich and diffuse data cannot be represented using only raw graphs as in classical SNA algorithms without some loss of knowledge.
  • metrics help understanding the global structure of the network. The density indicates the cohesion of the network. Community detection helps understanding the distribution of actors and activities in the network], by detecting group of actors densely connected. The community structure influences the way information is shared and the way actors behave.
  • The centrality highlights the most important actors of the network and three definitions have been proposed by Freeman. The degree centrality considers nodes with the higher degrees (number of adjacent edges). It highlights a local popularity of the network, actors that influence their neighbourhood. The closeness centrality is based on the average length of the paths (number of edges) linking a node to others and reveals the capacity of a node to be reached and to join others actors. The betweenness centrality focuses on the capacity of a node to be an intermediary between any two other nodes . A network is highly dependent on actors with high betweenness centrality due to their position as intermediaries and brokers in information flow.
  • The centrality highlights the most important actors of the network and three definitions have been proposed by Freeman. The degree centrality considers nodes with the higher degrees (number of adjacent edges). It highlights a local popularity of the network, actors that influence their neighbourhood. The closeness centrality is based on the average length of the paths (number of edges) linking a node to others and reveals the capacity of a node to be reached and to join others actors. The betweenness centrality focuses on the capacity of a node to be an intermediary between any two other nodes . A network is highly dependent on actors with high betweenness centrality due to their position as intermediaries and brokers in information flow.
  • Several ontologies exist for representing online social networks. Social data can be seen as a twofold structure: data that describe people and social network structure , and data that describe the content produced by network members . FOAF is used for describing people, their profile, their relationships and their online accounts. The properties defined in the RELATIONSHIP ontology specialize the “knows” property of FOAF to type relationships in a social network more precisely (familial, friendship or professional relationships). The primitives of the SIOC ontology specialize “OnlineAccount” and “holdsAccount” from FOAF in order to model the interactions and resources manipulated by users of social web applications; SIOC defines concepts such as posts, replies or user groups. The SKOS ontology offers a way to organize manipulated concepts with lightweight semantic properties (e.g. narrower, broader, related) and to link them to SIOC descriptions with the property "isSubjectOf".
  • RDF enables us to make assertions and to describe resources with triples . These triples form a directed typed graph that is well suited to represent social data, produced on different sites. Distributed identities, activities and relationships are represented with a uniform graph structure in RDF. Moreover, both nodes and relationships can be richly typed with classes and properties of ontologies that are described in RDFs and OWL adding a semantic dimension to the social graph. SPARQL is the standard query language for querying these richly typed and oriented graphs. Consequently it is a privileged tool to analyze social data represented with semantic web languages.
  • RDF enables us to make assertions and to describe resources with triples . These triples form a directed typed graph that is well suited to represent social data, produced on different sites. Distributed identities, activities and relationships are represented with a uniform graph structure in RDF. Moreover, both nodes and relationships can be richly typed with classes and properties of ontologies that are described in RDFs and OWL adding a semantic dimension to the social graph. SPARQL is the standard query language for querying these richly typed and oriented graphs. Consequently it is a privileged tool to analyze social data represented with semantic web languages.
  • Researchers have applied classical SNA methods to the graph of acquaintance and interest networks respectively formed by the properties "foaf:knows" and "foaf:interest". In order to apply existing tools they extract simple untyped graph from the richer RDF descriptions of FOAF profiles (each corresponding to one relationship “knows” or “interest”). A lot of knowledge is lost in this transformation and this knowledge could be used to parameterize social network indicators, filter their sources and customize their results.
  • global queries are mostly based on result aggregation and path computation which are missing from the standard SPARQL definition. The Corese search engine provides such features with result grouping, aggregating function like count(), sum() or avg() and path retrieving;
  • The group by clause groups results having the same values for specified variables. Then an aggregating function can be applied on each SPARQL results like count(). These features will be added to SPARQL 2.0
  • A syntactic convention in Corese enables path extraction. A regular expression is used instead of the property variable to specify that a path is searched and to describe its characteristics. sub-properties of the properties of the regular expression are taken into account, unless specified otherwise. The regular expression operators are: / (sequence), | (or), * (0 or more), ? (optional), ! (not). We can bind the path with a variable specified after the regular expression. Path characteristics are defined by adding options before the regular expression: 'i' to allow inverse properties, 's' to retrieve one shortest path, 'sa' to retrieve all shortest paths. This example retrieves a path between two resources ?x and ?y starting with zero or more foaf:knows properties and ending with the rel:worksWith property; the path length must be equal to or less than 4. Depending of the time, path retrieving is a candidate for being added to SPARQL 2.0
  • The closeness centrality of a node is the average length of the paths linking it to others nodes
  • SemSNA is an ontology of Social Network Analysis that enable to annotate social data with strategic positions and structural indices. The main class SNAConcept is used as the super class for all SNA concepts. The property isDefinedForProperty indicates for which relationship, i.e., subnetwork, an instance of the SNA concept is defined. An SNA concept is attached to a social resource with the property hasSNAConcept. The class SNAIndice describes valued concepts such as centrality, and the associated value is set with the property hasValue. This models strategic position, based on Freeman's definition of centrality, and different definitions of groups with useful indices to characterize their properties.
  • In this social network, Guillaume has both family and professional relationships. The degree of Guillaume for the relationship colleague, a superProperty of supervisor, considering a neighbourhood at distance 2 is 4.
  • Ipernity.com, the social network we analyzed, offers users several options for building their social network and sharing multimedia content. Every user can share medias, create a blog, a personal profile page, and comment on other’s shared resources. To build the social network, users can specify the type of relationship they have with others: friend, family, or simple contact (like a favorite you follow). Relationships are not symmetric, Fabien can declare a relationship with Michel but Michel can declare a different type of relationship with Fabien or not have him in his contact list at all;
  • Corese has an extension that enables us to nest SQL queries within SPARQL queries. This is done by means of this sql() function that returns a sequence of results for each variable in the SQL select clause. Then in Corese we can combine a construct clause and a select clause to generate RDF data.
  • We extended FOAF, SIOC and SIOC types in order to import social data from ipernity.com, in particular to model interactions like messages or visit on resources. We introduced the class Interaction to differentiate declared relationships (like family or friendOf) from active relationships.
  • Corese has an extension that enables us to nest SQL queries within SPARQL queries. This is done by means of this sql() function that returns a sequence of results for each variable in the SQL select clause. Then in Corese we can combine a construct clause and a select clause to generate RDF data.
  • We tested our algorithms and queries on an bi-processor quadri-core of 3.2 GHZ, and 32.0Gb of main memory. We analyzed the three types of relations separately ( favorite, friend and family) and also used polymorphic queries to analyze them as a whole using their super property: foaf:knows. We also analyzed the interactions produced by exchanges of private messages between users, as well as the ones produced by someone commenting someone else's documents. This table shows some performances when computing components, degree and shortest paths. Queries exploiting only grouping and aggregating features (component, degree) are efficient and can be computed on large scale data. Path computation is time and space consuming. When too many paths could be retrieved, we limit queries to a maximum number of graph projections or the path length. In some cases like betwenness centralities, approximations are sufficient to highlight strategic actors.
  • analysis of a real online social network using semantic web frameworks

    1. 1. Guillaume Erétéo, Michel Buffa, Fabien Gandon, Olivier Corby
    2. 2. computer-mediated networks as social networks [Wellman, 2001]
    3. 3. social media landscape <ul><li>social web amplifies social network effects </li></ul>
    4. 4. overwhelming flow of social data monitoring notifying animating consulting
    5. 5. social network analysis <ul><li>proposes graph algorithms to characterize the structure of a social network, strategic positions, and networking activities </li></ul>
    6. 6. social network analysis <ul><li>global metrics and structure </li></ul>community detection distribution of actors and activities density and diameter cohesion of the network
    7. 7. social network analysis <ul><li>strategic positions and actors </li></ul>degree centrality local attention
    8. 8. social network analysis <ul><li>strategic positions and actors </li></ul>betweenness centrality reveal broker &quot;A place for good ideas&quot; [Burt, 2004]
    9. 9. semantic social networks http:// sioc-project.org/node/158
    10. 10. (guillaume)=5 Gérard Fabien Mylène Michel Yvonne father sister mother colleague colleague d
    11. 11. Gérard Fabien Mylène Michel Yvonne father sister mother colleague colleague <family> d (guillaume)= 3 parent sibling mother father brother sister colleague knows
    12. 12. but … <ul><li>SPARQL is not expressive enough to meet SNA requirements for global metric querying of social networks (density, betweenness centrality, etc.). </li></ul>[San Martin & Gutierrez 2009]
    13. 13. classic SNA on semantic web <ul><li>rich graph representations reduced to simple </li></ul><ul><li>untyped graphs </li></ul>[Paolillo & Wright, 2006] foaf:knows foaf:interest
    14. 14. semantic SNA stack <ul><li>exploit the semantic of social networks </li></ul>
    15. 15. SPARQL extensions <ul><li>CORESE semantic search engine implementing semantic web languages using graph-based representations </li></ul>
    16. 16. grouping results <ul><li>number of followers of a twitter user </li></ul><ul><li>select ?y count( ?x ) as ?indegree where{ </li></ul><ul><li>?x twitter:follow ?y </li></ul><ul><li>} group by ?y </li></ul>
    17. 17. path extraction <ul><li>people knowing, knowing, (...) colleagues of someone </li></ul><ul><li>?x sa (foaf:knows*/rel:worksWith)::$path ?y </li></ul><ul><li>filter(pathLength($path) <= 4) </li></ul><ul><li>Regular expression operators are: / (sequence) ; | (or) ; * (0 or more) ; ? (optional) ; ! (not) </li></ul><ul><li>Path characteristics: i to allow inverse properties, s to retrieve only one shortest path, sa to retrieve all shortest paths. </li></ul>
    18. 18. full example <ul><li>closeness centrality through knows and worksWith </li></ul><ul><li>select distinct ?y ?to </li></ul><ul><li>pathLength($path) as ?length </li></ul><ul><li>(1/sum(?length)) as ?centrality </li></ul><ul><li>where{ </li></ul><ul><li>?y s (foaf:knows*/rel:worksWith)::$path ?to </li></ul><ul><li>}group by ?y </li></ul>
    19. 19. Qualified component Qualified in-degree Qualified diameter Closenness Centrality Betweenness Centrality Number of geodesics between from and to Qualified degree Number of geodesics between from and to going through b
    20. 20. SemSNA an ontology of SNA <ul><li>http://ns.inria.fr/semsna/2009/06/21/voc </li></ul>
    21. 21. add to the RDF graph <ul><li>saving the computed degrees for incremental calculations </li></ul><ul><li>CONSTRUCT </li></ul><ul><li>{ </li></ul><ul><li>?y semsna: hasSNAConcept _:b0 </li></ul><ul><li>_:b0 rdf:type semsna: Degree </li></ul><ul><li>_:b0 semsna: hasValue ?degree </li></ul><ul><li>_:b0 semsna: isDefinedForProperty rel:family </li></ul><ul><li>} </li></ul><ul><li>SELECT ?y count(?x) as ?degree where </li></ul><ul><li>{ </li></ul><ul><li>{ ?x rel:family ?y } </li></ul><ul><li>UNION </li></ul><ul><li>{ ?y rel:family ?x } </li></ul><ul><li>}group by ?y </li></ul>
    22. 22. sister mother supervisor hasSNAConcept isDefinedForProperty hasValue colleague colleague father hasCentralityDistance colleague colleague supervisor 4 Philippe 2 colleague supervisor Degree Guillaume Gérard Fabien Mylène Michel Yvonne Ivan Peter
    23. 23. Ipernity
    24. 24. using real data <ul><li>extracting a real dataset from a relational database </li></ul><ul><li>construct { ?person1 rel:friendOf ?person2 } </li></ul><ul><li>select sql(<server>, <driver>, <user>, <pwd>, select user1_id, user2_id from relations where rel = 1 ') as (?person1 , ?person2 ) </li></ul><ul><li>where {} </li></ul>
    25. 25. importing data with SemSNI <ul><li>http://ns.inria.fr/semsni/ </li></ul>
    26. 26. using real data <ul><li>ipernity.com dataset extracted in RDF 61 937 actors & 494 510 relationships </li></ul><ul><li>18 771 family links between 8 047 actors </li></ul><ul><li>136 311 friend links implicating 17 441 actors </li></ul><ul><li>339 428 favorite links for 61 425 actors </li></ul><ul><li>2 874 170 comments from 7 627 actors </li></ul><ul><li>795 949 messages exchanged by 22 500 actors </li></ul>
    27. 27. performances & limits time projections Knows 0.71 s 494 510 Favorite 0.64 s 339 428 Friend 0.31 s 136 311 Family 0.03 s 18 771 Message 1.98 s 795 949 Comment 9.67 s 2 874 170 Knows 20.59 s 989 020 Favorite 18.73 s 678 856 Friend 1.31 s 272 622 Family 0.42 s 37 542 Message 16.03 s 1 591 898 Comment 28.98 s 5 748 340 Shortest paths used to calculate Knows Path length <= 2: 14m 50.69s  Path length <= 2: 2h 56m 34.13s Path length <= 2: 7h 19m 15.18s  100 000 1 000 000 2 000 000 Favorite Path length <= 2: 5h 33m 18.43s 2 000 000 Friend Path length <= 2: 1m 12.18 s  Path length <= 2: 2m 7.98 s 1 000 000 2 000 000 Family Path length <= 2 : 27.23 s Path length <= 2 : 2m 9.73 s Path length <= 3 : 1m 10.71 s Path length <= 4 : 1m 9.06 s 1 000 000 3 681 626 1 000 000 1 000 000
    28. 28. some interpretations <ul><li>validated with managers of ipernity.com </li></ul><ul><li>friendOf , favorite , message , comment small diameter, high density </li></ul><ul><li>family as expected: large diameter, low density </li></ul><ul><li>favorite : highly centralized around Ipernity animator. </li></ul><ul><li>friendOf , family , message , comment : power law of degrees and betweenness centralities, different strategic actors </li></ul><ul><li>knows : analyze all relations using subsumption </li></ul>
    29. 29. some interpretations <ul><li>existence of a largest component in all sub networks </li></ul><ul><li>&quot;the effectiveness of the social network at doing its job&quot; [Newman 2003] </li></ul>
    30. 30. conclusion <ul><li>directed typed graph structure of RDF/S well suited to represent social knowledge & socially produced metadata spanning both internet and intranet networks. </li></ul><ul><li>definition of SNA operators in SPARQL (using extensions and OWL Lite entailment) enable to exploit the semantic structure of social data. </li></ul><ul><li>SemSNA organize and structure social data. </li></ul>
    31. 31. perspectives <ul><li>semantic based community detection algorithm </li></ul><ul><li>SemSNA Ontology </li></ul><ul><ul><li>extract complex SNA features reusing past results </li></ul></ul><ul><ul><li>support iterative or parallel approaches in the computations </li></ul></ul><ul><li>a semantic SNA to foster a semantic intranet of people </li></ul><ul><ul><li>structure overwhelming flows of corporate social data </li></ul></ul><ul><ul><li>foster and strengthen social interactions </li></ul></ul><ul><ul><li>efficient access to the social capital [Krebs, 2008] built through online collaboration </li></ul></ul>http://twitter.com/isicil
    32. 32. name Guillaume Erétéo holdsAccount organization mentorOf mentorOf holdsAccount manage contribute contribute answers twitter.com/ereteog slideshare.net/ereteog

    ×