Sreejata Chatterjee (                                                                                  ...
Upcoming SlideShare
Loading in …5

BIG Data, Social Data: Targeted Harnessing of Transient Micro-Blogging Data


Published on

by Sreejata Chatterjee,
Social Media Lab, Dalhousie University, Halifax, Canada

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

BIG Data, Social Data: Targeted Harnessing of Transient Micro-Blogging Data

  1. 1. Sreejata Chatterjee ( Faculty of Computer Science, Dalhousie University, Halifax, Canada Introduction System Architecture for Handling Social Media Data Case Studies #2: There are huge amounts of real-time social media data As a proof of concept, the new NLP Module, based on the Natural Language ToolKit (NLTK), has been added to an existing being created every moment. For example, ~230 million web tool called Netlytic, giving it the ability to provide sentiment tweets are posted daily by Twitter’s 200 million users [1]. analysis. If harnessed, it can provide a great wealth of insight into Netlytic – a system for what people are thinking about and what they like or automated discovery, analysis and visualization of information dislike. For instance, Twitter data has already proven to about online communities, being be useful in a number of different contexts: monitoring developed by Dr. Gruzd at the elections [2] to predicting stock market trends [3] to Dalhousie University Social conducting brand monitoring and PR campaigns [4]. Media Lab. However, social media data tend to be noisy and ephemeral. Furthermore, social media companies often Example 1: A Visual Representation of the Sentiment Analysis limit the amount of data one can access automatically at made possible by the new NLP Module now available in Netlytic any point of time, making this rich source of transient Sample API Calls Case Studies #1: Sentiment Analysis of >70K Tweets data difficult to collect. about #OccupyWallStreet getAllTweet - Return all the tweets by all the users The API developed as part of this project is currently being used in a few different applications for a system called getUserTweets - Returns tweets posted by a specified user AcademiaMap, an Online Influence Assessment App Conclusion: Overall, tweets about Research Objectives designed for scholars. the Occupy Wall Street movement getTimedUserTweets - Returns tweets within a time interval were more positive than negative. AcademiaMap-Dashboard App This work focuses on designing and developing getUserProfilePicUrl - Returns user’s profile picture AcademiaMap helps scholars to filter automated methods and a web-based infrastructure that getUserDetails - Returns detailed user information Example 2: Tag Cloud of Top 30 Topics derived from the “noise” from their Twitter streams can help other researchers and developers to collect Positive (left) and Negative (right) Tweets about #OccupyWallStreet using various "influence" metrics and and process raw social media data by: getUserTimeLineInfo - Returns basic user information provides them with an easy way to identify trending topics and interesting (1) Creating a Data Collector and Repository Tool API calls are made via HTTP requests (see below). voices to follow on Twitter. for collecting and storing public Twitter data for a (Lead developer: Melissa Anez) The output is formatted in JSON (JavaScript Object specified group of online users in an effective and Notation). efficient manner, AcademiaMap-GeoVisualizer App Footnotes (2) Connecting open APIs via Web Services which 1) Gets all tweets that have been posted between Feb 14 - April 14, 2012, by all of the users who follow “asist2011” and A Geo-based Visualization system [1] Mashable Social Media: process Twitter to add value and richness to the that displays communication [2] Social Media Lab: Twitter data in our database, such as geo-coding or “asist_org”: connections between scholarly users [3] assigning “influence” scores to Tweeters, of Twitter from across the globe. [4] Radian6: Social Media Monitoring and Engagement, Social CRM http://URL_BASE/tweetApiCalls.php?call=getAllTweets& (Lead developer: Jamiur Rahman) (3) Creating an NLP (Natural Language Processing) seedUserList=asist2011,asist_org&startTime=2012-02- Acknowledgements Module that can conduct sentiment analysis on 14&endTime=2012-04-14 I would like to thank Dr. Anatoliy Gruzd, Director of the Social Media Lab, for social media data, AcademiaMap - Twitter App supervising this research. Additionally, I would like to thank Philip Mai, (4) Providing a robust API that other developers can 2) Returns details about dalprof’s profile such as profile info, A Twitter app that automatically posts Research Manager at the Social Media Lab for his valuable feedback. use to create and test innovative web applications followers, friends, Klout score (influence score), geocoded tweets about trending topics and re- GRAND Projects: posts tweets that are popular within a • DINS - Digital Infrastructures: Access and with the data collected. location – for easy and universal location identification Use in the Network Society group of scholarly Twitter users. • NAVEL - Network Assessment and (Lead developer: Sreejata Chatterjee) Validation for Effective Leadership http://URL_BASE/tweetApiCalls.php?call=getUserDetails &user=dalprofTEMPLATE DESIGN ©