Your SlideShare is downloading. ×
0
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Integrating and Interpreting Social Data from Heterogeneous Sources

970

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
970
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Web 2.0 platforms provide data in proprietary formats:XML according to bespoke schemasLift to RDF using consistent semantics
  • Transcript

    • 1. Integrating and Interpreting Social Data from Heterogeneous Sources<br />Matthew Rowe <br />Organisations, Information and Knowledge Group<br />University of Sheffield<br />SuvodeepMazumdar<br />Department of Information Studies<br />University of Sheffield<br />
    • 2. Outline<br />Information overload<br />Increase in social data publication<br />Interlinking social data<br />Metadata Generation<br />Integrating Social Data<br />Application: Interpreting Social Data<br />Cumbrian Floods Use Case<br />Interacting with Social Data<br />Conclusions<br />
    • 3. Information Overload<br />Masses of social data are published every day<br />E.g. 50 million tweets (600 per second)<br />http://blog.twitter.com<br />22million Facebook users in the UK<br />http://www.clickymedia.co.uk/2009/10/uk-facebook-user-statistics-october-2009/<br />Too much information to deal with!<br />Social data is multi-faceted:<br />Provenance<br />Topic<br />Geo<br />Trend services (e.g. trendistic, blogpulse):<br />Focus on majority consensus<br />Need to listen in to a specific topic<br />Concentrate on a single source/platform<br />Do not consider geo facet<br />
    • 4.
    • 5.
    • 6. Interlinking Social Data<br />Consider multi-faceted nature of social data:<br />Allows fine-grained analysis<br />Show geo-localised social data<br />Relevant past social data<br />Solution: Interlink social data from heterogeneous sources<br />Use semantics!<br />Consistent data interpretation<br />
    • 7. Metadata Generation<br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post and itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br />
    • 8. Metadata Generation<br /><photo id="949406913" media="photo"> <br /> <owner nsid="54948696@N00”/><br /> <title>DSC00171.JPG</title> <br /> <description></description> <br /> <dates posted="1205398307" taken="2009-01-09 09:16:31" lastupdate="1257421561" /><br /> <tags> <br /> <tag id="24539622-2330113101-400" author="54948696@N00" raw="arctic">arctic</tag><br /> <tag id="24539622-2330113101-401" author="54948696@N00" raw="monkeys">monkeys</tag><br /> </tags><br /> <location latitude="53.4813" longitude="-2.2392" place_id="R8vDw_abBpSzUA"><br /> <locality place_id="R8vDw_abBpSzUA" woeid="27872">Manchester</locality><br /> <region place_id="pn4MsiGbBZlXeplyXg" woeid="24554868">England</region><br /> <country place_id="DevLebebApj4RVbtaQ" woeid="23424975">United Kingdom</country><br /> </location> <br /></photo> <br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post and itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />
    • 9. Metadata Generation<br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post and itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />
    • 10. Metadata Generation<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post/itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><http://twitter.com/mattroweshow/9774519667> <br />rdf:typesioc:Post ;<br />rdf:typeitr:LocalizedResource ; <br />
    • 11. Metadata Generation<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post/itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><http://twitter.com/mattroweshow/9774519667> <br />rdf:typesioc:Post ;<br />rdf:typeitr:LocalizedResource ; <br />sioc:content "Writing up our Geovation work for #lupas2010." ;<br />dcterms:subject "lupas2010" ;<br />
    • 12. Metadata Generation<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post/itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><http://twitter.com/mattroweshow/9774519667> <br />rdf:typesioc:Post ;<br />rdf:typeitr:LocalizedResource ; <br />sioc:content "Writing up our Geovation work for #lupas2010." ;<br />dcterms:subject "lupas2010" ;<br />itr:has_Localization _:a2 .<br />_:a2<br />rdf:typegml:Geometry ;<br />gml:pos "53.3833,-1.4722" .<br />
    • 13. Metadata Generation<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post/itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><http://twitter.com/mattroweshow/9774519667> <br />rdf:typesioc:Post ;<br />rdf:typeitr:LocalizedResource ; <br />sioc:content "Writing up our Geovation work for #lupas2010." ;<br />dcterms:subject "lupas2010" ;<br />dcterms:created "2010-2-28 12:22:47.0" ;<br />itr:has_Localization _:a2 .<br />_:a2<br />rdf:typegml:Geometry ;<br />gml:pos "53.3833,-1.4722" .<br />
    • 14. Metadata Generation<br /><status><br /> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at><br /> <id>9774519667</id><br /> <text>Writing up our Geovation work for #lupas2010.</text><br /> <truncated>false</truncated><br /> <in_reply_to_status_id></in_reply_to_status_id><br /> <in_reply_to_user_id></in_reply_to_user_id><br /> <favorited>false</favorited><br /> <in_reply_to_screen_name></in_reply_to_screen_name><br /> <geo xmlns:georss="http://www.georss.org/georss"><br /> <georss:point>53.3833,-1.4722</georss:point><br /> </geo><br /></status><br />Web 2.0 platforms return data using:<br />Proprietary formats;<br />Heterogeneous data schemas<br />Need to link data together from disparate sources<br />A social data fragment = a single piece of social data<br />E.g. A tweet, an image, a video<br />Lift each social data fragment to RDF:<br />Create an instance of sioc:Post/itr:LocalizedResource<br />Assign it a URI<br />Assign the content to the instance (topic)<br />Use hashtags of the microblog<br />Create an instance of gml:Geometry (geo)<br />Capture geo facet<br />Assign timestamp of fragment creation (provenance)<br />Using dc:created<br />Assign the fragment to its owner (provenance)<br />Create foaf:Person instance<br /><http://twitter.com/mattroweshow> <br />rdf:typefoaf:Person ;<br />rdf:typeitr:LocalizedResource ; <br />foaf:name "Matthew Rowe" ;<br />foaf:homepage <http://www.dcs.shef.ac.uk/~mrowe> ;<br /><http://twitter.com/mattroweshow/9774519667> <br />rdf:typesioc:Post ;<br />rdf:typeitr:LocalizedResource ; <br />sioc:content "Writing up our Geovation work for #lupas2010." ;<br />dcterms:subject "lupas2010" ;<br />dcterms:created "2010-2-28 12:22:47.0" ;<br />sioc:hasCreator <http://twitter.com/mattroweshow> ;<br />itr:has_Localization _:a2 .<br />_:a2<br />rdf:typegml:Geometry ;<br />gml:pos "53.3833,-1.4722" .<br />
    • 15. Integrated Social Data<br />Triplify social data from multiple platforms<br />Flickr XML response -> RDF<br />Picassa XML response -> RDF<br />Use common semantics<br />Can perform SPARQL queries<br />PREFIX dcterms:<http://purl.org/dc/terms><br />SELECT ?item<br />WHERE {<br /> ?item dcterms:subject "iranelections" .<br /> ?item dcterms:created ?date<br />}<br />ORDER BY DESC(?date)<br />PREFIX dcterms:<http://purl.org/dc/terms><br />PREFIX itr:<http://www.dcs.shef.ac.uk/~gregoire/interaction/ns#><br />PREFIX gml:<http://www.opengis.net/gml/><br />SELECT DISTINCT ?post ?tag<br />WHERE {<br /> ?post dcterms:subject ?tag .<br /> ?post itr:has_Localization ?geo .<br /> ?geo gml:pos "53.4813,-2.2392" <br />}<br />
    • 16. Interpreting Social Data<br />Cumbrian Use Case<br />UK region suffered worst floods in centuries<br />Observe the effects in social data<br />Rise in publication<br />Fine-grained geocoded social data <br />Dataset:<br />Microblogs from 200 Cumbrian Twitter users<br />Published during 2009<br />3513 microblogs<br />Produced 475,043 triples<br />Images from Flickr taken in Cumbria<br />6663 images<br />Produced 182,304<br />
    • 17. Interacting with Social Data<br />Built a visualisation application to analyse social data fragments<br />http://www.dcs.shef.ac.uk/~suvodeep/ViziSocial<br />Filter by date<br />Lower slider<br />Fine-grained focus<br />Zoom in<br />Tag cloud<br />Shows fragment topics<br />Window controls tag cloud topics<br />Markers contain number of fragments<br />
    • 18. Conclusions<br />Consistent interpretation of social data <br />Across heterogeneous sources<br />Application<br />Allows analyses of social data<br />To fine-grained detail<br />Utilises multiple facets of social data<br />Requires metadata <br />Issue of scalability<br />Future Work<br />Adapting to real time data acquisition <br />Focussing on South Yorkshire region at present<br />Assess scalability issue<br />
    • 19. Twitter: @mattroweshow<br />Web: http://www.dcs.shef.ac.uk/~mrowe<br />Email: m.rowe@dcs.shef.ac.uk<br />Questions?<br />

    ×