Your SlideShare is downloading. ×
0
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Integrating and Interpreting Social Data from Heterogeneous Sources

965

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
965
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Web 2.0 platforms provide data in proprietary formats:XML according to bespoke schemasLift to RDF using consistent semantics
  • Transcript

    • 1. Integrating and Interpreting Social Data from Heterogeneous Sources
      Matthew Rowe
      Organisations, Information and Knowledge Group
      University of Sheffield
      SuvodeepMazumdar
      Department of Information Studies
      University of Sheffield
    • 2. Outline
      Information overload
      Increase in social data publication
      Interlinking social data
      Metadata Generation
      Integrating Social Data
      Application: Interpreting Social Data
      Cumbrian Floods Use Case
      Interacting with Social Data
      Conclusions
    • 3. Information Overload
      Masses of social data are published every day
      E.g. 50 million tweets (600 per second)
      http://blog.twitter.com
      22million Facebook users in the UK
      http://www.clickymedia.co.uk/2009/10/uk-facebook-user-statistics-october-2009/
      Too much information to deal with!
      Social data is multi-faceted:
      Provenance
      Topic
      Geo
      Trend services (e.g. trendistic, blogpulse):
      Focus on majority consensus
      Need to listen in to a specific topic
      Concentrate on a single source/platform
      Do not consider geo facet
    • 4.
    • 5.
    • 6. Interlinking Social Data
      Consider multi-faceted nature of social data:
      Allows fine-grained analysis
      Show geo-localised social data
      Relevant past social data
      Solution: Interlink social data from heterogeneous sources
      Use semantics!
      Consistent data interpretation
    • 7. Metadata Generation
      Web 2.0 platforms return data using:
      Proprietary formats;
      Heterogeneous data schemas
      Need to link data together from disparate sources
      A social data fragment = a single piece of social data
      E.g. A tweet, an image, a video
      Lift each social data fragment to RDF:
      Create an instance of sioc:Post and itr:LocalizedResource
      Assign it a URI
      Assign the content to the instance (topic)
      Use hashtags of the microblog
      Create an instance of gml:Geometry (geo)
      Capture geo facet
      Assign timestamp of fragment creation (provenance)
      Using dc:created
      Assign the fragment to its owner (provenance)
      Create foaf:Person instance
    • 8. Metadata Generation
      <photo id="949406913" media="photo">
      <owner nsid="54948696@N00”/>
      <title>DSC00171.JPG</title>
      <description></description>
      <dates posted="1205398307" taken="2009-01-09 09:16:31" lastupdate="1257421561" />
      <tags>
      <tag id="24539622-2330113101-400" author="54948696@N00" raw="arctic">arctic</tag>
      <tag id="24539622-2330113101-401" author="54948696@N00" raw="monkeys">monkeys</tag>
      </tags>
      <location latitude="53.4813" longitude="-2.2392" place_id="R8vDw_abBpSzUA">
      <locality place_id="R8vDw_abBpSzUA" woeid="27872">Manchester</locality>
      <region place_id="pn4MsiGbBZlXeplyXg" woeid="24554868">England</region>
      <country place_id="DevLebebApj4RVbtaQ" woeid="23424975">United Kingdom</country>
      </location>
      </photo>
      Web 2.0 platforms return data using:
      Proprietary formats;
      Heterogeneous data schemas
      Need to link data together from disparate sources
      A social data fragment = a single piece of social data
      E.g. A tweet, an image, a video
      Lift each social data fragment to RDF:
      Create an instance of sioc:Post and itr:LocalizedResource
      Assign it a URI
      Assign the content to the instance (topic)
      Use hashtags of the microblog
      Create an instance of gml:Geometry (geo)
      Capture geo facet
      Assign timestamp of fragment creation (provenance)
      Using dc:created
      Assign the fragment to its owner (provenance)
      Create foaf:Person instance
      <status>
      <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
      <id>9774519667</id>
      <text>Writing up our Geovation work for #lupas2010.</text>
      <truncated>false</truncated>
      <in_reply_to_status_id></in_reply_to_status_id>
      <in_reply_to_user_id></in_reply_to_user_id>
      <favorited>false</favorited>
      <in_reply_to_screen_name></in_reply_to_screen_name>
      <geo xmlns:georss="http://www.georss.org/georss">
      <georss:point>53.3833,-1.4722</georss:point>
      </geo>
      </status>
    • 9. Metadata Generation
      Web 2.0 platforms return data using:
      Proprietary formats;
      Heterogeneous data schemas
      Need to link data together from disparate sources
      A social data fragment = a single piece of social data
      E.g. A tweet, an image, a video
      Lift each social data fragment to RDF:
      Create an instance of sioc:Post and itr:LocalizedResource
      Assign it a URI
      Assign the content to the instance (topic)
      Use hashtags of the microblog
      Create an instance of gml:Geometry (geo)
      Capture geo facet
      Assign timestamp of fragment creation (provenance)
      Using dc:created
      Assign the fragment to its owner (provenance)
      Create foaf:Person instance
      <status>
      <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
      <id>9774519667</id>
      <text>Writing up our Geovation work for #lupas2010.</text>
      <truncated>false</truncated>
      <in_reply_to_status_id></in_reply_to_status_id>
      <in_reply_to_user_id></in_reply_to_user_id>
      <favorited>false</favorited>
      <in_reply_to_screen_name></in_reply_to_screen_name>
      <geo xmlns:georss="http://www.georss.org/georss">
      <georss:point>53.3833,-1.4722</georss:point>
      </geo>
      </status>
    • 10. Metadata Generation
      <status>
      <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
      <id>9774519667</id>
      <text>Writing up our Geovation work for #lupas2010.</text>
      <truncated>false</truncated>
      <in_reply_to_status_id></in_reply_to_status_id>
      <in_reply_to_user_id></in_reply_to_user_id>
      <favorited>false</favorited>
      <in_reply_to_screen_name></in_reply_to_screen_name>
      <geo xmlns:georss="http://www.georss.org/georss">
      <georss:point>53.3833,-1.4722</georss:point>
      </geo>
      </status>
      Web 2.0 platforms return data using:
      Proprietary formats;
      Heterogeneous data schemas
      Need to link data together from disparate sources
      A social data fragment = a single piece of social data
      E.g. A tweet, an image, a video
      Lift each social data fragment to RDF:
      Create an instance of sioc:Post/itr:LocalizedResource
      Assign it a URI
      Assign the content to the instance (topic)
      Use hashtags of the microblog
      Create an instance of gml:Geometry (geo)
      Capture geo facet
      Assign timestamp of fragment creation (provenance)
      Using dc:created
      Assign the fragment to its owner (provenance)
      Create foaf:Person instance
      <http://twitter.com/mattroweshow/9774519667>
      rdf:typesioc:Post ;
      rdf:typeitr:LocalizedResource ;
    • 11. Metadata Generation
      <status>
      <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
      <id>9774519667</id>
      <text>Writing up our Geovation work for #lupas2010.</text>
      <truncated>false</truncated>
      <in_reply_to_status_id></in_reply_to_status_id>
      <in_reply_to_user_id></in_reply_to_user_id>
      <favorited>false</favorited>
      <in_reply_to_screen_name></in_reply_to_screen_name>
      <geo xmlns:georss="http://www.georss.org/georss">
      <georss:point>53.3833,-1.4722</georss:point>
      </geo>
      </status>
      Web 2.0 platforms return data using:
      Proprietary formats;
      Heterogeneous data schemas
      Need to link data together from disparate sources
      A social data fragment = a single piece of social data
      E.g. A tweet, an image, a video
      Lift each social data fragment to RDF:
      Create an instance of sioc:Post/itr:LocalizedResource
      Assign it a URI
      Assign the content to the instance (topic)
      Use hashtags of the microblog
      Create an instance of gml:Geometry (geo)
      Capture geo facet
      Assign timestamp of fragment creation (provenance)
      Using dc:created
      Assign the fragment to its owner (provenance)
      Create foaf:Person instance
      <http://twitter.com/mattroweshow/9774519667>
      rdf:typesioc:Post ;
      rdf:typeitr:LocalizedResource ;
      sioc:content "Writing up our Geovation work for #lupas2010." ;
      dcterms:subject "lupas2010" ;
    • 12. Metadata Generation
      <status>
      <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
      <id>9774519667</id>
      <text>Writing up our Geovation work for #lupas2010.</text>
      <truncated>false</truncated>
      <in_reply_to_status_id></in_reply_to_status_id>
      <in_reply_to_user_id></in_reply_to_user_id>
      <favorited>false</favorited>
      <in_reply_to_screen_name></in_reply_to_screen_name>
      <geo xmlns:georss="http://www.georss.org/georss">
      <georss:point>53.3833,-1.4722</georss:point>
      </geo>
      </status>
      Web 2.0 platforms return data using:
      Proprietary formats;
      Heterogeneous data schemas
      Need to link data together from disparate sources
      A social data fragment = a single piece of social data
      E.g. A tweet, an image, a video
      Lift each social data fragment to RDF:
      Create an instance of sioc:Post/itr:LocalizedResource
      Assign it a URI
      Assign the content to the instance (topic)
      Use hashtags of the microblog
      Create an instance of gml:Geometry (geo)
      Capture geo facet
      Assign timestamp of fragment creation (provenance)
      Using dc:created
      Assign the fragment to its owner (provenance)
      Create foaf:Person instance
      <http://twitter.com/mattroweshow/9774519667>
      rdf:typesioc:Post ;
      rdf:typeitr:LocalizedResource ;
      sioc:content "Writing up our Geovation work for #lupas2010." ;
      dcterms:subject "lupas2010" ;
      itr:has_Localization _:a2 .
      _:a2
      rdf:typegml:Geometry ;
      gml:pos "53.3833,-1.4722" .
    • 13. Metadata Generation
      <status>
      <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
      <id>9774519667</id>
      <text>Writing up our Geovation work for #lupas2010.</text>
      <truncated>false</truncated>
      <in_reply_to_status_id></in_reply_to_status_id>
      <in_reply_to_user_id></in_reply_to_user_id>
      <favorited>false</favorited>
      <in_reply_to_screen_name></in_reply_to_screen_name>
      <geo xmlns:georss="http://www.georss.org/georss">
      <georss:point>53.3833,-1.4722</georss:point>
      </geo>
      </status>
      Web 2.0 platforms return data using:
      Proprietary formats;
      Heterogeneous data schemas
      Need to link data together from disparate sources
      A social data fragment = a single piece of social data
      E.g. A tweet, an image, a video
      Lift each social data fragment to RDF:
      Create an instance of sioc:Post/itr:LocalizedResource
      Assign it a URI
      Assign the content to the instance (topic)
      Use hashtags of the microblog
      Create an instance of gml:Geometry (geo)
      Capture geo facet
      Assign timestamp of fragment creation (provenance)
      Using dc:created
      Assign the fragment to its owner (provenance)
      Create foaf:Person instance
      <http://twitter.com/mattroweshow/9774519667>
      rdf:typesioc:Post ;
      rdf:typeitr:LocalizedResource ;
      sioc:content "Writing up our Geovation work for #lupas2010." ;
      dcterms:subject "lupas2010" ;
      dcterms:created "2010-2-28 12:22:47.0" ;
      itr:has_Localization _:a2 .
      _:a2
      rdf:typegml:Geometry ;
      gml:pos "53.3833,-1.4722" .
    • 14. Metadata Generation
      <status>
      <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
      <id>9774519667</id>
      <text>Writing up our Geovation work for #lupas2010.</text>
      <truncated>false</truncated>
      <in_reply_to_status_id></in_reply_to_status_id>
      <in_reply_to_user_id></in_reply_to_user_id>
      <favorited>false</favorited>
      <in_reply_to_screen_name></in_reply_to_screen_name>
      <geo xmlns:georss="http://www.georss.org/georss">
      <georss:point>53.3833,-1.4722</georss:point>
      </geo>
      </status>
      Web 2.0 platforms return data using:
      Proprietary formats;
      Heterogeneous data schemas
      Need to link data together from disparate sources
      A social data fragment = a single piece of social data
      E.g. A tweet, an image, a video
      Lift each social data fragment to RDF:
      Create an instance of sioc:Post/itr:LocalizedResource
      Assign it a URI
      Assign the content to the instance (topic)
      Use hashtags of the microblog
      Create an instance of gml:Geometry (geo)
      Capture geo facet
      Assign timestamp of fragment creation (provenance)
      Using dc:created
      Assign the fragment to its owner (provenance)
      Create foaf:Person instance
      <http://twitter.com/mattroweshow>
      rdf:typefoaf:Person ;
      rdf:typeitr:LocalizedResource ;
      foaf:name "Matthew Rowe" ;
      foaf:homepage <http://www.dcs.shef.ac.uk/~mrowe> ;
      <http://twitter.com/mattroweshow/9774519667>
      rdf:typesioc:Post ;
      rdf:typeitr:LocalizedResource ;
      sioc:content "Writing up our Geovation work for #lupas2010." ;
      dcterms:subject "lupas2010" ;
      dcterms:created "2010-2-28 12:22:47.0" ;
      sioc:hasCreator <http://twitter.com/mattroweshow> ;
      itr:has_Localization _:a2 .
      _:a2
      rdf:typegml:Geometry ;
      gml:pos "53.3833,-1.4722" .
    • 15. Integrated Social Data
      Triplify social data from multiple platforms
      Flickr XML response -> RDF
      Picassa XML response -> RDF
      Use common semantics
      Can perform SPARQL queries
      PREFIX dcterms:<http://purl.org/dc/terms>
      SELECT ?item
      WHERE {
      ?item dcterms:subject "iranelections" .
      ?item dcterms:created ?date
      }
      ORDER BY DESC(?date)
      PREFIX dcterms:<http://purl.org/dc/terms>
      PREFIX itr:<http://www.dcs.shef.ac.uk/~gregoire/interaction/ns#>
      PREFIX gml:<http://www.opengis.net/gml/>
      SELECT DISTINCT ?post ?tag
      WHERE {
      ?post dcterms:subject ?tag .
      ?post itr:has_Localization ?geo .
      ?geo gml:pos "53.4813,-2.2392"
      }
    • 16. Interpreting Social Data
      Cumbrian Use Case
      UK region suffered worst floods in centuries
      Observe the effects in social data
      Rise in publication
      Fine-grained geocoded social data
      Dataset:
      Microblogs from 200 Cumbrian Twitter users
      Published during 2009
      3513 microblogs
      Produced 475,043 triples
      Images from Flickr taken in Cumbria
      6663 images
      Produced 182,304
    • 17. Interacting with Social Data
      Built a visualisation application to analyse social data fragments
      http://www.dcs.shef.ac.uk/~suvodeep/ViziSocial
      Filter by date
      Lower slider
      Fine-grained focus
      Zoom in
      Tag cloud
      Shows fragment topics
      Window controls tag cloud topics
      Markers contain number of fragments
    • 18. Conclusions
      Consistent interpretation of social data
      Across heterogeneous sources
      Application
      Allows analyses of social data
      To fine-grained detail
      Utilises multiple facets of social data
      Requires metadata
      Issue of scalability
      Future Work
      Adapting to real time data acquisition
      Focussing on South Yorkshire region at present
      Assess scalability issue
    • 19. Twitter: @mattroweshow
      Web: http://www.dcs.shef.ac.uk/~mrowe
      Email: m.rowe@dcs.shef.ac.uk
      Questions?

    ×