Successfully reported this slideshow.
Fishing for the right Tweet in the Twitter Flood        Architectural Overview of a Tweet Classification Engine to identif...
4.0 Major System Components:4.1 Twitter Streaming API:The Twitter Streaming API will be used to fetch tweets in Real Time....
Upcoming SlideShare
Loading in …5
×

Tweet classification framework

724 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Tweet classification framework

  1. 1. Fishing for the right Tweet in the Twitter Flood Architectural Overview of a Tweet Classification Engine to identify users on a Holiday. 1.0 Introduction: This document describes the system design of a complete framework to listen to the Twitter stream in real time, identify tweets which indicate that the Twitter user has visited a holiday destination, and enable responding to the tweets. This system can be used as a real time social marketing tool to engage Twitter Users and get them to generate useful content for a travel portal. 2.0 Design Goals: 1. Listen to the entire Twitter stream in real time. 2. Identify tweets which indicate that the user has visited a holiday destination and/or stayed in a hotel recently. 3. Filter out advertisements and other useless tweets on travel/hotel deals. 4. Enable responding to the tweets. 5. Provide Profile information about the Twitter User to better understand which tweets to respond to and also allowing for a personalised response. 6. Enable “Destination Filters” for more targeted engagement. 3.0 System Overview: The high level overview of the system is presented below. Travel related tweets Twitter Text ClassificationPublic Status from all ~ 0.2 Million/day EngineTwitter Users Streaming API with~ 200 Million/day keywords ~ thousand/day Tweets indicating recent travel & stay MySQL Destination Filters @ Response to the Web Portal User
  2. 2. 4.0 Major System Components:4.1 Twitter Streaming API:The Twitter Streaming API will be used to fetch tweets in Real Time. A valid Twitter APIKey is required to access the service. Details about the Streaming API can be found athttps://dev.twitter.com/docs/streaming-api4.2 Twitter Text Classification Engine:A custom built Text Classification Engine is trained to detect tweets that indicate recenttravel and stay. A cascade of Natural Language Processing algorithms are applied to theparsed tweet text to make the classification decision. Empirically it has been observed that only 0.5% of the tweets fetched by the keywordfiltered Streaming API are what we would like to respond to. The classification engine isindependent of the language used by the User and can handle evolving Twitter Slang as wellas non-English languages.The tweets identified by the Text Classification Engine are pushed into a MySQL database.4.3 Web Portal:A web portal is built on top of the MySQL database. A human user can see the tweetsidentified by the system and respond to the Tweet in real time using the Twitter API (detailsat https://dev.twitter.com)Destination filters (place names, hotel names etc) can be set on the Portal to allow for moretargeted engagement and also to keep the stream of identified tweets under manageablelimits. Profile information about the Twitter User will also be visible in the portal. Webelieve this will not only enable personalised responses for better conversion but also act as afilter when deciding whom to respond to. For example if a celebrity (having a far higherproportion of followers to followees) tweets about their holiday experience, we believe itwould be pointless to target such user’s.5.0 Technologies used:Programming Language: Python (http://python.org)Web Framework: Django (https://www.djangoproject.com)Libraries: Twitter API and various other utility libraries.Database: MySQL (or anything else!)

×