0
Integrating  and Interpreting Social Data from Heterogeneous Sources<br />Matthew Rowe <br />Organisations, Information an...
Outline<br />Information overload<br />Increase in social data publication<br />Interlinking social data<br />Metadata Gen...
Information Overload<br />Masses of social data are published every day<br />E.g. 50 million tweets (600 per second)<br />...
Interlinking Social Data<br />Consider multi-faceted nature of social data:<br />Allows fine-grained analysis<br />Show ge...
Metadata Generation<br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br...
Metadata Generation<br /><photo id="949406913" media="photo">	<br />  <owner nsid="54948696@N00”/><br />  <title>DSC00171....
Metadata Generation<br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br...
Metadata Generation<br /><status><br />  <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br />  <id>9774519667</id...
Metadata Generation<br /><status><br />  <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br />  <id>9774519667</id...
Metadata Generation<br /><status><br />  <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br />  <id>9774519667</id...
Metadata Generation<br /><status><br />  <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br />  <id>9774519667</id...
Metadata Generation<br /><status><br />  <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br />  <id>9774519667</id...
Integrated Social Data<br />Triplify social data from multiple platforms<br />Flickr XML response -> RDF<br />Picassa XML ...
Interpreting Social Data<br />Cumbrian Use Case<br />UK region suffered worst floods in centuries<br />Observe the effects...
Interacting with Social Data<br />Built a visualisation application to analyse social data fragments<br />http://www.dcs.s...
Conclusions<br />Consistent interpretation of social data	<br />Across heterogeneous sources<br />Application<br />Allows ...
Twitter:  @mattroweshow<br />Web:     http://www.dcs.shef.ac.uk/~mrowe<br />Email:   m.rowe@dcs.shef.ac.uk<br />Questions?...
Upcoming SlideShare
Loading in...5
×

Integrating and Interpreting Social Data from Heterogeneous Sources

976

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
976
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Web 2.0 platforms provide data in proprietary formats:XML according to bespoke schemasLift to RDF using consistent semantics
  • Transcript of "Integrating and Interpreting Social Data from Heterogeneous Sources"

    1. 1. Integrating and Interpreting Social Data from Heterogeneous Sources<br />Matthew Rowe <br />Organisations, Information and Knowledge Group<br />University of Sheffield<br />SuvodeepMazumdar<br />Department of Information Studies<br />University of Sheffield<br />
    2. 2. Outline<br />Information overload<br />Increase in social data publication<br />Interlinking social data<br />Metadata Generation<br />Integrating Social Data<br />Application: Interpreting Social Data<br />Cumbrian Floods Use Case<br />Interacting with Social Data<br />Conclusions<br />
    3. 3. Information Overload<br />Masses of social data are published every day<br />E.g. 50 million tweets (600 per second)<br />http://blog.twitter.com<br />22million Facebook users in the UK<br />http://www.clickymedia.co.uk/2009/10/uk-facebook-user-statistics-october-2009/<br />Too much information to deal with!<br />Social data is multi-faceted:<br />Provenance<br />Topic<br />Geo<br />Trend services (e.g. trendistic, blogpulse):<br />Focus on majority consensus<br />Need to listen in to a specific topic<br />Concentrate on a single source/platform<br />Do not consider geo facet<br />
    4. 4.
    5. 5.
    6. 6. Interlinking Social Data<br />Consider multi-faceted nature of social data:<br />Allows fine-grained analysis<br />Show geo-localised social data<br />Relevant past social data<br />Solution: Interlink social data from heterogeneous sources<br />Use semantics!<br />Consistent data interpretation<br />
    7. 7. Metadata Generation<br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post and itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br />
    8. 8. Metadata Generation<br /><photo id="949406913" media="photo"> <br /> <owner nsid="54948696@N00”/><br /> <title>DSC00171.JPG</title> <br /> <description></description> <br /> <dates posted="1205398307" taken="2009-01-09 09:16:31" lastupdate="1257421561" /><br /> <tags> <br /> <tag id="24539622-2330113101-400" author="54948696@N00" raw="arctic">arctic</tag><br /> <tag id="24539622-2330113101-401" author="54948696@N00" raw="monkeys">monkeys</tag><br /> </tags><br /> <location latitude="53.4813" longitude="-2.2392" place_id="R8vDw_abBpSzUA"><br /> <locality place_id="R8vDw_abBpSzUA" woeid="27872">Manchester</locality><br /> <region place_id="pn4MsiGbBZlXeplyXg" woeid="24554868">England</region><br /> <country place_id="DevLebebApj4RVbtaQ" woeid="23424975">United Kingdom</country><br /> </location> <br /></photo> <br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post and itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />
    9. 9. Metadata Generation<br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post and itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />
    10. 10. Metadata Generation<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post/itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><http://twitter.com/mattroweshow/9774519667> <br />rdf:typesioc:Post ;<br />rdf:typeitr:LocalizedResource ; <br />
    11. 11. Metadata Generation<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post/itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><http://twitter.com/mattroweshow/9774519667> <br />rdf:typesioc:Post ;<br />rdf:typeitr:LocalizedResource ; <br />sioc:content "Writing up our Geovation work for #lupas2010." ;<br />dcterms:subject "lupas2010" ;<br />
    12. 12. Metadata Generation<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post/itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><http://twitter.com/mattroweshow/9774519667> <br />rdf:typesioc:Post ;<br />rdf:typeitr:LocalizedResource ; <br />sioc:content "Writing up our Geovation work for #lupas2010." ;<br />dcterms:subject "lupas2010" ;<br />itr:has_Localization _:a2 .<br />_:a2<br />rdf:typegml:Geometry ;<br />gml:pos "53.3833,-1.4722" .<br />
    13. 13. Metadata Generation<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post/itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><http://twitter.com/mattroweshow/9774519667> <br />rdf:typesioc:Post ;<br />rdf:typeitr:LocalizedResource ; <br />sioc:content "Writing up our Geovation work for #lupas2010." ;<br />dcterms:subject "lupas2010" ;<br />dcterms:created "2010-2-28 12:22:47.0" ;<br />itr:has_Localization _:a2 .<br />_:a2<br />rdf:typegml:Geometry ;<br />gml:pos "53.3833,-1.4722" .<br />
    14. 14. Metadata Generation<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post/itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><http://twitter.com/mattroweshow> <br />rdf:typefoaf:Person ;<br />rdf:typeitr:LocalizedResource ; <br />foaf:name "Matthew Rowe" ;<br />foaf:homepage <http://www.dcs.shef.ac.uk/~mrowe> ;<br /><http://twitter.com/mattroweshow/9774519667> <br />rdf:typesioc:Post ;<br />rdf:typeitr:LocalizedResource ; <br />sioc:content "Writing up our Geovation work for #lupas2010." ;<br />dcterms:subject "lupas2010" ;<br />dcterms:created "2010-2-28 12:22:47.0" ;<br />sioc:hasCreator <http://twitter.com/mattroweshow> ;<br />itr:has_Localization _:a2 .<br />_:a2<br />rdf:typegml:Geometry ;<br />gml:pos "53.3833,-1.4722" .<br />
    15. 15. Integrated Social Data<br />Triplify social data from multiple platforms<br />Flickr XML response -> RDF<br />Picassa XML response -> RDF<br />Use common semantics<br />Can perform SPARQL queries<br />PREFIX dcterms:<http://purl.org/dc/terms><br />SELECT ?item<br />WHERE {<br /> ?item dcterms:subject "iranelections" .<br /> ?item dcterms:created ?date<br />}<br />ORDER BY DESC(?date)<br />PREFIX dcterms:<http://purl.org/dc/terms><br />PREFIX itr:<http://www.dcs.shef.ac.uk/~gregoire/interaction/ns#><br />PREFIX gml:<http://www.opengis.net/gml/><br />SELECT DISTINCT ?post ?tag<br />WHERE {<br /> ?post dcterms:subject ?tag .<br /> ?post itr:has_Localization ?geo .<br /> ?geo gml:pos "53.4813,-2.2392" <br />}<br />
    16. 16. Interpreting Social Data<br />Cumbrian Use Case<br />UK region suffered worst floods in centuries<br />Observe the effects in social data<br />Rise in publication<br />Fine-grained geocoded social data <br />Dataset:<br />Microblogs from 200 Cumbrian Twitter users<br />Published during 2009<br />3513 microblogs<br />Produced 475,043 triples<br />Images from Flickr taken in Cumbria<br />6663 images<br />Produced 182,304<br />
    17. 17. Interacting with Social Data<br />Built a visualisation application to analyse social data fragments<br />http://www.dcs.shef.ac.uk/~suvodeep/ViziSocial<br />Filter by date<br />Lower slider<br />Fine-grained focus<br />Zoom in<br />Tag cloud<br />Shows fragment topics<br />Window controls tag cloud topics<br />Markers contain number of fragments<br />
    18. 18. Conclusions<br />Consistent interpretation of social data <br />Across heterogeneous sources<br />Application<br />Allows analyses of social data<br />To fine-grained detail<br />Utilises multiple facets of social data<br />Requires metadata <br />Issue of scalability<br />Future Work<br />Adapting to real time data acquisition <br />Focussing on South Yorkshire region at present<br />Assess scalability issue<br />
    19. 19. Twitter: @mattroweshow<br />Web: http://www.dcs.shef.ac.uk/~mrowe<br />Email: m.rowe@dcs.shef.ac.uk<br />Questions?<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×