Integrating and Interpreting Social Data from Heterogeneous Sources
Upcoming SlideShare
Loading in...5
×
 

Integrating and Interpreting Social Data from Heterogeneous Sources

on

  • 1,224 views

 

Statistics

Views

Total Views
1,224
Views on SlideShare
1,216
Embed Views
8

Actions

Likes
0
Downloads
6
Comments
0

2 Embeds 8

http://www.slideshare.net 6
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Trend ServicesTrendisticOnly twitterBlogpulseBlogosphere
  • Web 2.0 platforms provide data in proprietary formats:XML according to bespoke schemasLift to RDF using consistent semantics

Integrating and Interpreting Social Data from Heterogeneous Sources Integrating and Interpreting Social Data from Heterogeneous Sources Presentation Transcript

  • Integrating and Interpreting Social Data from Heterogeneous Sources
    Matthew Rowe
    Organisations, Information and Knowledge Group
    University of Sheffield
    SuvodeepMazumdar
    Department of Information Studies
    University of Sheffield
  • Outline
    Information overload
    Increase in social data publication
    Interlinking social data
    Metadata Generation
    Integrating Social Data
    Application: Interpreting Social Data
    Cumbrian Floods Use Case
    Interacting with Social Data
    Conclusions
  • Information Overload
    Masses of social data are published every day
    E.g. 50 million tweets (600 per second)
    http://blog.twitter.com
    22million Facebook users in the UK
    http://www.clickymedia.co.uk/2009/10/uk-facebook-user-statistics-october-2009/
    Too much information to deal with!
    Social data is multi-faceted:
    Provenance
    Topic
    Geo
    Trend services (e.g. trendistic, blogpulse):
    Focus on majority consensus
    Need to listen in to a specific topic
    Concentrate on a single source/platform
    Do not consider geo facet
  • Interlinking Social Data
    Consider multi-faceted nature of social data:
    Allows fine-grained analysis
    Show geo-localised social data
    Relevant past social data
    Solution: Interlink social data from heterogeneous sources
    Use semantics!
    Consistent data interpretation
  • Metadata Generation
    Web 2.0 platforms return data using:
    Proprietary formats;
    Heterogeneous data schemas
    Need to link data together from disparate sources
    A social data fragment = a single piece of social data
    E.g. A tweet, an image, a video
    Lift each social data fragment to RDF:
    Create an instance of sioc:Post and itr:LocalizedResource
    Assign it a URI
    Assign the content to the instance (topic)
    Use hashtags of the microblog
    Create an instance of gml:Geometry (geo)
    Capture geo facet
    Assign timestamp of fragment creation (provenance)
    Using dc:created
    Assign the fragment to its owner (provenance)
    Create foaf:Person instance
  • Metadata Generation
    <photo id="949406913" media="photo">
    <owner nsid="54948696@N00”/>
    <title>DSC00171.JPG</title>
    <description></description>
    <dates posted="1205398307" taken="2009-01-09 09:16:31" lastupdate="1257421561" />
    <tags>
    <tag id="24539622-2330113101-400" author="54948696@N00" raw="arctic">arctic</tag>
    <tag id="24539622-2330113101-401" author="54948696@N00" raw="monkeys">monkeys</tag>
    </tags>
    <location latitude="53.4813" longitude="-2.2392" place_id="R8vDw_abBpSzUA">
    <locality place_id="R8vDw_abBpSzUA" woeid="27872">Manchester</locality>
    <region place_id="pn4MsiGbBZlXeplyXg" woeid="24554868">England</region>
    <country place_id="DevLebebApj4RVbtaQ" woeid="23424975">United Kingdom</country>
    </location>
    </photo>
    Web 2.0 platforms return data using:
    Proprietary formats;
    Heterogeneous data schemas
    Need to link data together from disparate sources
    A social data fragment = a single piece of social data
    E.g. A tweet, an image, a video
    Lift each social data fragment to RDF:
    Create an instance of sioc:Post and itr:LocalizedResource
    Assign it a URI
    Assign the content to the instance (topic)
    Use hashtags of the microblog
    Create an instance of gml:Geometry (geo)
    Capture geo facet
    Assign timestamp of fragment creation (provenance)
    Using dc:created
    Assign the fragment to its owner (provenance)
    Create foaf:Person instance
    <status>
    <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
    <id>9774519667</id>
    <text>Writing up our Geovation work for #lupas2010.</text>
    <truncated>false</truncated>
    <in_reply_to_status_id></in_reply_to_status_id>
    <in_reply_to_user_id></in_reply_to_user_id>
    <favorited>false</favorited>
    <in_reply_to_screen_name></in_reply_to_screen_name>
    <geo xmlns:georss="http://www.georss.org/georss">
    <georss:point>53.3833,-1.4722</georss:point>
    </geo>
    </status>
  • Metadata Generation
    Web 2.0 platforms return data using:
    Proprietary formats;
    Heterogeneous data schemas
    Need to link data together from disparate sources
    A social data fragment = a single piece of social data
    E.g. A tweet, an image, a video
    Lift each social data fragment to RDF:
    Create an instance of sioc:Post and itr:LocalizedResource
    Assign it a URI
    Assign the content to the instance (topic)
    Use hashtags of the microblog
    Create an instance of gml:Geometry (geo)
    Capture geo facet
    Assign timestamp of fragment creation (provenance)
    Using dc:created
    Assign the fragment to its owner (provenance)
    Create foaf:Person instance
    <status>
    <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
    <id>9774519667</id>
    <text>Writing up our Geovation work for #lupas2010.</text>
    <truncated>false</truncated>
    <in_reply_to_status_id></in_reply_to_status_id>
    <in_reply_to_user_id></in_reply_to_user_id>
    <favorited>false</favorited>
    <in_reply_to_screen_name></in_reply_to_screen_name>
    <geo xmlns:georss="http://www.georss.org/georss">
    <georss:point>53.3833,-1.4722</georss:point>
    </geo>
    </status>
  • Metadata Generation
    <status>
    <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
    <id>9774519667</id>
    <text>Writing up our Geovation work for #lupas2010.</text>
    <truncated>false</truncated>
    <in_reply_to_status_id></in_reply_to_status_id>
    <in_reply_to_user_id></in_reply_to_user_id>
    <favorited>false</favorited>
    <in_reply_to_screen_name></in_reply_to_screen_name>
    <geo xmlns:georss="http://www.georss.org/georss">
    <georss:point>53.3833,-1.4722</georss:point>
    </geo>
    </status>
    Web 2.0 platforms return data using:
    Proprietary formats;
    Heterogeneous data schemas
    Need to link data together from disparate sources
    A social data fragment = a single piece of social data
    E.g. A tweet, an image, a video
    Lift each social data fragment to RDF:
    Create an instance of sioc:Post/itr:LocalizedResource
    Assign it a URI
    Assign the content to the instance (topic)
    Use hashtags of the microblog
    Create an instance of gml:Geometry (geo)
    Capture geo facet
    Assign timestamp of fragment creation (provenance)
    Using dc:created
    Assign the fragment to its owner (provenance)
    Create foaf:Person instance
    <http://twitter.com/mattroweshow/9774519667>
    rdf:typesioc:Post ;
    rdf:typeitr:LocalizedResource ;
  • Metadata Generation
    <status>
    <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
    <id>9774519667</id>
    <text>Writing up our Geovation work for #lupas2010.</text>
    <truncated>false</truncated>
    <in_reply_to_status_id></in_reply_to_status_id>
    <in_reply_to_user_id></in_reply_to_user_id>
    <favorited>false</favorited>
    <in_reply_to_screen_name></in_reply_to_screen_name>
    <geo xmlns:georss="http://www.georss.org/georss">
    <georss:point>53.3833,-1.4722</georss:point>
    </geo>
    </status>
    Web 2.0 platforms return data using:
    Proprietary formats;
    Heterogeneous data schemas
    Need to link data together from disparate sources
    A social data fragment = a single piece of social data
    E.g. A tweet, an image, a video
    Lift each social data fragment to RDF:
    Create an instance of sioc:Post/itr:LocalizedResource
    Assign it a URI
    Assign the content to the instance (topic)
    Use hashtags of the microblog
    Create an instance of gml:Geometry (geo)
    Capture geo facet
    Assign timestamp of fragment creation (provenance)
    Using dc:created
    Assign the fragment to its owner (provenance)
    Create foaf:Person instance
    <http://twitter.com/mattroweshow/9774519667>
    rdf:typesioc:Post ;
    rdf:typeitr:LocalizedResource ;
    sioc:content "Writing up our Geovation work for #lupas2010." ;
    dcterms:subject "lupas2010" ;
  • Metadata Generation
    <status>
    <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
    <id>9774519667</id>
    <text>Writing up our Geovation work for #lupas2010.</text>
    <truncated>false</truncated>
    <in_reply_to_status_id></in_reply_to_status_id>
    <in_reply_to_user_id></in_reply_to_user_id>
    <favorited>false</favorited>
    <in_reply_to_screen_name></in_reply_to_screen_name>
    <geo xmlns:georss="http://www.georss.org/georss">
    <georss:point>53.3833,-1.4722</georss:point>
    </geo>
    </status>
    Web 2.0 platforms return data using:
    Proprietary formats;
    Heterogeneous data schemas
    Need to link data together from disparate sources
    A social data fragment = a single piece of social data
    E.g. A tweet, an image, a video
    Lift each social data fragment to RDF:
    Create an instance of sioc:Post/itr:LocalizedResource
    Assign it a URI
    Assign the content to the instance (topic)
    Use hashtags of the microblog
    Create an instance of gml:Geometry (geo)
    Capture geo facet
    Assign timestamp of fragment creation (provenance)
    Using dc:created
    Assign the fragment to its owner (provenance)
    Create foaf:Person instance
    <http://twitter.com/mattroweshow/9774519667>
    rdf:typesioc:Post ;
    rdf:typeitr:LocalizedResource ;
    sioc:content "Writing up our Geovation work for #lupas2010." ;
    dcterms:subject "lupas2010" ;
    itr:has_Localization _:a2 .
    _:a2
    rdf:typegml:Geometry ;
    gml:pos "53.3833,-1.4722" .
  • Metadata Generation
    <status>
    <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
    <id>9774519667</id>
    <text>Writing up our Geovation work for #lupas2010.</text>
    <truncated>false</truncated>
    <in_reply_to_status_id></in_reply_to_status_id>
    <in_reply_to_user_id></in_reply_to_user_id>
    <favorited>false</favorited>
    <in_reply_to_screen_name></in_reply_to_screen_name>
    <geo xmlns:georss="http://www.georss.org/georss">
    <georss:point>53.3833,-1.4722</georss:point>
    </geo>
    </status>
    Web 2.0 platforms return data using:
    Proprietary formats;
    Heterogeneous data schemas
    Need to link data together from disparate sources
    A social data fragment = a single piece of social data
    E.g. A tweet, an image, a video
    Lift each social data fragment to RDF:
    Create an instance of sioc:Post/itr:LocalizedResource
    Assign it a URI
    Assign the content to the instance (topic)
    Use hashtags of the microblog
    Create an instance of gml:Geometry (geo)
    Capture geo facet
    Assign timestamp of fragment creation (provenance)
    Using dc:created
    Assign the fragment to its owner (provenance)
    Create foaf:Person instance
    <http://twitter.com/mattroweshow/9774519667>
    rdf:typesioc:Post ;
    rdf:typeitr:LocalizedResource ;
    sioc:content "Writing up our Geovation work for #lupas2010." ;
    dcterms:subject "lupas2010" ;
    dcterms:created "2010-2-28 12:22:47.0" ;
    itr:has_Localization _:a2 .
    _:a2
    rdf:typegml:Geometry ;
    gml:pos "53.3833,-1.4722" .
  • Metadata Generation
    <status>
    <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at>
    <id>9774519667</id>
    <text>Writing up our Geovation work for #lupas2010.</text>
    <truncated>false</truncated>
    <in_reply_to_status_id></in_reply_to_status_id>
    <in_reply_to_user_id></in_reply_to_user_id>
    <favorited>false</favorited>
    <in_reply_to_screen_name></in_reply_to_screen_name>
    <geo xmlns:georss="http://www.georss.org/georss">
    <georss:point>53.3833,-1.4722</georss:point>
    </geo>
    </status>
    Web 2.0 platforms return data using:
    Proprietary formats;
    Heterogeneous data schemas
    Need to link data together from disparate sources
    A social data fragment = a single piece of social data
    E.g. A tweet, an image, a video
    Lift each social data fragment to RDF:
    Create an instance of sioc:Post/itr:LocalizedResource
    Assign it a URI
    Assign the content to the instance (topic)
    Use hashtags of the microblog
    Create an instance of gml:Geometry (geo)
    Capture geo facet
    Assign timestamp of fragment creation (provenance)
    Using dc:created
    Assign the fragment to its owner (provenance)
    Create foaf:Person instance
    <http://twitter.com/mattroweshow>
    rdf:typefoaf:Person ;
    rdf:typeitr:LocalizedResource ;
    foaf:name "Matthew Rowe" ;
    foaf:homepage <http://www.dcs.shef.ac.uk/~mrowe> ;
    <http://twitter.com/mattroweshow/9774519667>
    rdf:typesioc:Post ;
    rdf:typeitr:LocalizedResource ;
    sioc:content "Writing up our Geovation work for #lupas2010." ;
    dcterms:subject "lupas2010" ;
    dcterms:created "2010-2-28 12:22:47.0" ;
    sioc:hasCreator <http://twitter.com/mattroweshow> ;
    itr:has_Localization _:a2 .
    _:a2
    rdf:typegml:Geometry ;
    gml:pos "53.3833,-1.4722" .
  • Integrated Social Data
    Triplify social data from multiple platforms
    Flickr XML response -> RDF
    Picassa XML response -> RDF
    Use common semantics
    Can perform SPARQL queries
    PREFIX dcterms:<http://purl.org/dc/terms>
    SELECT ?item
    WHERE {
    ?item dcterms:subject "iranelections" .
    ?item dcterms:created ?date
    }
    ORDER BY DESC(?date)
    PREFIX dcterms:<http://purl.org/dc/terms>
    PREFIX itr:<http://www.dcs.shef.ac.uk/~gregoire/interaction/ns#>
    PREFIX gml:<http://www.opengis.net/gml/>
    SELECT DISTINCT ?post ?tag
    WHERE {
    ?post dcterms:subject ?tag .
    ?post itr:has_Localization ?geo .
    ?geo gml:pos "53.4813,-2.2392"
    }
  • Interpreting Social Data
    Cumbrian Use Case
    UK region suffered worst floods in centuries
    Observe the effects in social data
    Rise in publication
    Fine-grained geocoded social data
    Dataset:
    Microblogs from 200 Cumbrian Twitter users
    Published during 2009
    3513 microblogs
    Produced 475,043 triples
    Images from Flickr taken in Cumbria
    6663 images
    Produced 182,304
  • Interacting with Social Data
    Built a visualisation application to analyse social data fragments
    http://www.dcs.shef.ac.uk/~suvodeep/ViziSocial
    Filter by date
    Lower slider
    Fine-grained focus
    Zoom in
    Tag cloud
    Shows fragment topics
    Window controls tag cloud topics
    Markers contain number of fragments
  • Conclusions
    Consistent interpretation of social data
    Across heterogeneous sources
    Application
    Allows analyses of social data
    To fine-grained detail
    Utilises multiple facets of social data
    Requires metadata
    Issue of scalability
    Future Work
    Adapting to real time data acquisition
    Focussing on South Yorkshire region at present
    Assess scalability issue
  • Twitter: @mattroweshow
    Web: http://www.dcs.shef.ac.uk/~mrowe
    Email: m.rowe@dcs.shef.ac.uk
    Questions?