This document discusses the architecture of a social media monitoring platform. It includes summaries of the key components, including the front-end website built with PHP and Flash, and the back-end engine built with Java that manages user sessions and content. It also provides details on integrating with the Twitter API, challenges around scalability and product evolution, and the use of libraries like Twitter4J and the abstraction layer Maya.
7. Main Architecture
● Front (Akimedia)
○ Public Website
○ Private Website
○ Flash Animations
○ Sessions configuration management
● Back (Tesial)
○ Sessions content management
8. Front - Key components (Akimedia)
● CMS Akiwi (PHP) &
MySQL database (MyISAM)
○ Pubic Website
○ Private Website (users, eCommerce
● Flash animations
○ Displaying messages and other features
● “Manager” in PHP
○ Communication between Animations/Front/Back
9. Back - Key Components (Tesial)
● Engine
○ Java web application
○ API used by Front for data management
■ sessions, tweets, moderation, tweetos,...
● Batch
○ Java web application
○ Formerly grabbing Tweets
○ Grabbing pictures (avatars & medias)
10. Technical Infrastructure
● Front/Back Server
● DB Servers
○ Front / Back
○ Replication (fail over)
● Dev Server
○ dev (front / back)
○ pré-prod (front / back)
11. SVN
● KISS
● dev = trunk
● pre-prod = branch
● prod = tag on branch or trunk
● front = release by switch
● back = release by packaging
16. Challenges
● Scalability
○ SaaS and Agency are growing
● Twitter
○ New technical and business constraints
● Product
○ Must evolve by integrating other social networks
18. Twitter APIs
● APIs
○ Search API
○ REST API
○ Streaming API
○ https://dev.twitter.com/docs/history-rest-search-api
● Versions
○ API version 1 (deprecated)
○ API version 1.1
19. Rate limiting
● Rate limit window duration
○ 15 minutes
● Requests allotted
○ per user
○ via application-only auth
● Example
○ GET statuses/user_timeline
○ per user: 150 requests per 15 minutes
○ via app: 300 requests per 15 minutes
https://dev.twitter.com/docs/rate-limiting/1.1/limits
20. Other technical limits
● Direct messages
○ 250 per day
● Tweets
○ 1,000 per day
○ Retweets are counted as Tweets.
● Changes to account email
○ Four per hour
● Following
○ 1,000 per day
● Following (account-based)
○ up to following 2,000 other users
21. Platform objects
● Tweets can be found alone, within user objects, but
most often within timelines
● Users can be found tweeting, following, and
favoriting on Twitter
● Entities are most often found within Tweets
● Places can be found throughout the natural universe,
but typically only appear attached to Tweets on Twitter
22. Objects type
● JSON !
● XML output will be dropped soon
○ Only JSON output with API 1.1
● Be careful with ids (64 bits)
○ use String version (especially if using Javascript)
○ Twitpocalypse
■ https://dev.twitter.com/docs/twitter-ids-json-and-snowflake
○ Idpocalypse
■ https://dev.twitter.com/blog/64-bit-twitter-user-idpocalypse
28. Twitter4J
● Unofficial Java library for the Twitter API
● 100% Pure Java (5+)
● Android platform and Google App Engine
ready
● Zero dependency : No additional jars
required
● Built-in OAuth support
● Out-of-the-box gzip support
● 100% Twitter API 1.1 compatible
29. Tools
● Twurl
○ https://github.com/marcel/twurl
○ "curl" for Twitter API
○ Manage access tokens (authentication)
● Apigee Console
○ https://apigee.com/console
○ Free console to execute APIs
30. Search API
● Similar to http://search.twitter.com
● Criteria
○ terms
○ geocode
○ language ("Language detection is best-effort")
○ count, until, since-id, max-id
○ result type: mixed, recent, popular
○ include entities
● Paginated !
○ you have to manage this by yourself
32. Streaming API
● Persistent HTTP connection
○ until you decide to close it
● Could be reeeeaaaaaaally huge !
○ Think about you architecture (hard and soft) !
○ Separate storage and consumption
● Only one stream opened at a time
● Could be tricky to managed
○ Use a library
33. Streaming API endpoints groups
● Public Streams
○ GET statuses/sample
○ POST statuses/filter
○ GET statuses/firehose (limited access)
● User Streams
○ Data and events for a specific user
● Site Streams (beta and limited access)
○ Real time update for large number of users
34. About the challenges
● Scalability
● Twitter constraints
● Product evolution
=> Use of Twitter Stream API
=> Other Social Networks integration
=> Maya
36. Maya - General architecture
Listener
Listener
Session
Session
37. Maya - concepts
● Post: content emitted on a (social) network
by a user
● Feed: a collection of posts matching a set of
filters (PostFilter)
● FeedOperator: the service to register,
manage stop and restart feeds
38. Maya twitter plugin
● Backed by Twitter4J
● Rely on the live stream twitter API
● Challenges
○ Adding new feeds on the fly without losing any
history
○ Single channel for all feeds, need to identify to which
feed(s) a tweet belongs to
● Demo!
39. Front Evolutions
● From Pull to Push
○ NodeJS
● Manager from PHP to JS
○ NodeJS
● HTML5 Animations
40. Next Improvements
● Queuing traitement with SI
○ Message traitement
○ Medias grabbing and/or treatment
● Adding other social networks
● Better scalability - reliability - performance
○ load balancing
○ replication
○ nginx
○ ...