Siwow social rank_technical_whitepaper


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Siwow social rank_technical_whitepaper

  1. 1. Architectural Overview of theSIWOW SocialRank Engagement System Stanley Du Dec, 2010
  2. 2. Audience• This whitepaper is intended for technically minded people who wish to peek behind the curtain at SIWOW. We’ll cover a high- level architectural overview of SIWOW Data Services and describe what the back-end systems do and how. Please forward your questions to for more details.
  3. 3. Introduction• SIWOW SocialRank began as an RSS feed filtering service to help fight information overload. RSS feed subscribers, inundated with dozens or hundreds of stories per day, would share their subscriptions with SIWOW and subscribe to a new version of their feeds. Each new proxied feed was filtered based on how the online community interacted with the individual posts. The online community would leave comments on posts, share posts on various social sites like Digg and Twitter, or bookmark posts on sites like Delicious. Each of these engagement events was an implicit vote for the authority of that content. By collecting all of the engagement activities surrounding a post, a very useful quality signal emerges—the SIWOW Engagement Score. Today, SIWOW provides blog and media publishers, corporate marketers and public relations agencies with critical tools and data to understand how readers are interacting with content.
  4. 4. SIWOW SocialRank Architecture Overview• The SIWOW SocialRank architecture combines a searchable RSS content archive with real-time social media monitoring over an event-driven fabric. The system computes engagement of readers with content in real-time. Several of the event subscription points are available as Data Services Real-time APIs. Our cloud-based deployment makes scaling out a regular part of our process.• The diagram on the following page provides an overall visual guide to the primary architectural areas.
  5. 5. SIWOW SocialRank Core System
  6. 6. Components• DATA MINING APIs: Include: Feed, Top Posts, Engagement, Topic, SocialRank Offer customary request-response style APIs over HTTP to query SIWOW, engagement or content from the two-year-old archive of attention, engagement and content data.• REAL-TIME APIs: Include: Content, Engaement Events pushed to API subscribers are content oriented or engagement oriented. Content oriented events are new blog and mainstream media news content enhanced with engagement, sentiment and language metadata. Engagement style events are notifications of significant changes in engagement levels for individual posts or entire news feeds.
  7. 7. Components• ATTENTION MONITOR: A collection of API adapters tailored to specific social networking sites that store user interaction events with content. Currently SIWOW tracks interactions at most of the popular social sites. As new sites from around the world become popular, new adapters are deployed and new attention metrics are captured. Engagement alerts occur in real-time. They are notifications of significant changes in engagement for an individual story or feed. Feed engagement is the aggregate amount of engagement from each of the individual posts in a feed.• ATTENTION ARCHIVE: A searchable repository of individual attention events mentioning any URL across all social hubs monitored by SIWOW.• ENGAGEMENT DATABASE: A database of engagement values generated at social sites over time, allowing time series analysis and reporting of post, feed or topic scores.
  8. 8. Components• CONTENT SYSTEM: A system for checking RSS feeds, normalizing the source data and enhancing that data with language detection and semantic analysis. Content API subscribers can tap into the content pipeline at several points.• LANGUAGE & SEMANTIC ANALYZERS: Connect to the content stream and analyze post text for language and human emotional weight across several dimensions, including: anger, disgust, fear, happiness, sadness and surprise. Provides an overall positive, negative or neutral score.• FEEDS LIST: A database of over 1 million user provided feed URLs with associated metadata.
  9. 9. Components• TOPIC CURATOR: A human wiki-style curation system powered by for classifying individual feeds into topics. Feeds can exist in multiple topics simultaneously. Ranking and filtering of topics is dynamic and real-time based on the collected engagement activities for each feed.• FEED UPDATE SYSTEM: A master feeds list is checked periodically based on the level of engagement and publishing volume of the feed. Other specific integrations with PubSubHubub, RSS Cloud, Ping servers and other push protocols are also used to minimize the latency for gathering newly published content. New and updated post events are broadcast over the content stream.• CONTENT ARCHIVE: A searchable content repository of news and blog content with associated metadata, including full posts where available, author, published dates, language, tags, and URLs.
  10. 10. Life of a Blog Post… According to SIWOW SocialRank•
  11. 11. • 1 An author publishes a post.• 2 The author’s publishing system makes the new post available in its RSS feed and optionally notifies a ping or push service, eventually notifying SIWOW.• 3 The SIWOW Feed update system checks the publisher’s feed for new posts since the last check. The new post is found.• 4 The new post is normalized, passed through language and semantic analyzers, enriched with additional SocialRank metadata about the feed (engagement score, tags, etc.) and enters the Content Stream to be delivered to Data Services Content API subscribers based on filter configuration.• 5 The new post is stored in the searchable SIWOW Content Archive.• 6 Readers visit the publisher’s site or consume the RSS feed in RSS readers.• 7 Readers interested in the post share, link or comment on various social sites. The link may optionally pass through redirecting proxies, url shortening services, etc.• 8 The SIWOW Attention Monitor tracks all mentions of the story in real-time via site- specific APIs and polls for comments on the publisher’s site. Events are sent to the Attention Stream and delivered to Data Services Engagement Real-time API subscribers based on filter configuration.• 9 The engagement events are stored in the Attention Archive, ready to be included in a SocialRank calculation.
  12. 12. Engagement• SIWOW SocialRank is a measure of audience engagement with online content. Usually that content is referenced as items in an RSS feed – but today that can apply to almost any kind of content addressable by a URL.• In the late 1990s links between static HTML pages were the critical insight that led to the development of Google’s PageRank algorithm. Today we interact with content using more modes than static links, often in real-time. Tracking engagement events like comments, links, shares and votes is similar in spirit to what html page links were a decade ago – each of these social gestures is counted as a vote for that content by SocialRank. Google’s PageRank is used to drive search results while, in our case, you define the applications driven by engagement via integrating with the Data Services APIs.• What is engagement? It’s a number representing the weighted sum of attention events. We keep track of the number of times each type of event happens for each post and then calculate the weighted sum to produce an engagement value for a post.
  13. 13. Engagement• Post Engagement = (nlinks × wa) + (ntweets × wb) + (ndiggs × wc)…
  14. 14. Engagement• A post’s engagement value is an abstract number at a point in time, like 982. It’s most useful to benchmark against itself over time or in comparatives with other posts in the same feed or even across feeds. The value of the weights for each of the sources is influenced by the type of interaction at the source. Not all interactions are equal, so interactions that imply higher levels of engagement have higher weights.• Looking at all of the posts in a feed over a period of time we can roll up the Post Engagement score into a Feed Engagement score to compare feeds between themselves or over time. Feed Engagement (this week) = Post #1 Engagement for this week + Post #2 Engagement for the week + Post #3 Engagement for the week …• The Feed engagement value includes all the engagement for any stories that were available on the analyzed site. SocialRank uses week-over-week feed engagement values in our Topic ranking features.
  15. 15. 5 C’s of Engagement• The value of the weights depends on the type of attention source. Different user interactions imply different levels of engagement. A pageview is the smallest level of engagement since all it says is that a page was rendered in a browser (maybe), not anything more. Other interaction modes imply a higher expenditure of effort or emotional attachment. For example, leaving a comment implies having read the article (pageview), given it some thought, and crafting a response back to the author. Each of these modes is weighted higher. The actual numerical values are subject to change and not essential to understand the core concept.
  16. 16. 5 C’s of Engagement• CREATING: The strongest form of engagement is demonstrated by using an item as inspiration to create your own, for example, writing your own blog post that responds to or refutes someone else’s post. Creation requires the most thought and investment of time, actively generates conversation, and therefore indicates a high level of engagement.• CRITIQUING: Reading a blog post and then leaving a comment requires an investment of time, thought and effort (or sometimes just typing and name- calling...), and is a form of conversation. However, it requires less effort than writing a whole blog post. So while it is an important action, it does not indicate as much engagement as Creating.• CHATTING: Sharing and discussing information can often be started with one click, so it doesn’t require a major investment of effort. However, a desire to share is a strong indication of relevance and expends some social capital. The act of sharing and its ensuing discussion are acts of conversation. Use of social media applications like Twitter encourage both the sharing of information and the resulting conversations. As a result, social media “chatting” indicates a good level of engagement.
  17. 17. 5 C’s of Engagement• COLLECTING: Bookmarking or submitting items to social sites also tend to be “one-click” actions. They are intentional acts of archiving something of value for future reference and often sharing, but don’t require much time or effort. However, the sharing that occurs often sparks conversations, so Collecting does demonstrate some engagement.• CLICKING: Activities like clicks and pageviews indicate lower engagement because they’re passive interactions. Clicking a link to read a blog post doesn’t require much work, and you’re not giving anything back except your reading time. It is an intentional act, however, and thus indicates a mild level of interest and engagement, which may grow after the item is read.
  18. 18. SIWOW SocialRank Scores and Engagement Normalization• When discussing ranking, context is enormously important. Five comments on a hobby blog may be high but a popular mainstream media site may get hundreds of comments on average. SIWOW SocialRank values are a normalization of Post Engagement into a 1.0-10.0 score that is easy for humans to relate to.
  19. 19. SIWOW SocialRank Scores and Engagement Normalization• THEMATIC SocialRANK: Thematic ranking is context-free, meaning several posts can be compared directly with no notion of what is normal for the feeds they come from. A collection of blog posts originating across several blogs can be compared and ranked in that set (for example, ten posts discussing the iPhone). In this example, all of the Post Engagement values are retrieved, the median engagement value is found and assigned a SocialRank value of 5.0. The higher post engagement values are then extended out to 10.0 and the lower ones down to 1.0. The only normalizing effect comes from the engagement of the posts included in the set.
  20. 20. SIWOW SocialRank Scores and Engagement Normalization• TOPICS: SocialRank Topics are named collections of news feeds. User Topics are private topics managed by users, while Global Topics are public. users manually curate feeds into topics by making a decision about the fit for a feed to a topic. Topic curation is done using a Wikipedia-style model where any registered user can create new topics and add any feed to topics. The topics are then available to everyone else. Since feeds can exist in any number of topics the topic names themselves become a useful classification and provide tagging data for feeds. New topics are being created all the time to map to news sources down the long tail. In addition to the valuable classification metadata, users can consume aggregated content from an entire topic and have those posts filtered by SocialRank scores. Blogger discovery and topic coverage analysis is possible by looking at the week- over-week rankings (Engagement Database) of bloggers in a topic. Bloggers that generate engagement rise in the rankings while bloggers that don’t descend. This transparent and meritocratic ranking system allows an unbiased view of who is generating the most interest in a topic area and what they are writing about.
  21. 21. SIWOW SocialRank Scores and Engagement Normalization• INFLUENCE SHARE: At any point in time SocialRank knows the total engagement generated by all stories across all feeds within a specific topic (total attention market share). Influence share is the fraction that an individual author or a single feed commands of the total attention market share in that topic. This influence share provides critical insights into the nature of a topic. Is most of a topic’s engagement dominated by a handful of bloggers? Or, is the engagement highly fragmented across dozens?• RSS CONTENT: It’s often said: “The best thing about standards is that there are so many of them!” Nowhere is this more true than in the world of content syndication. There are plenty of different RSS versions on the web today, plus ATOM, RDF and others. SIWOW has a world class content archive and feed update system. All formats of syndicated content are consumed and the payload format is normalized and available in a consistent format. The content items themselves are available via our Feed Server or via streaming API (AMQP, Webhooks, etc.) for more efficient delivery of high volumes of data.
  22. 22. SIWOW SocialRank Scores and Engagement Normalization• LANGUAGE & SEMANTIC ANALYSIS: SIWOW can enhance a news content stream with language detection and semantic analysis. Language detection uses samples of words and phrases from the post content to select the most likely language used predominantly in the post. This automated detection is based on the actual post data, not the possibly incorrect configuration of the blog platform. Semantic analysis computes the overall tone as positive, negative or neutral based on the content of the posts. SocialRank also computes detailed weights based on 6 emotional dimensions: anger, sadness, happiness, surprise, fear and disgust.
  23. 23. Data Services Use Cases• SIWOW Data Services power several applications around the web. The following is a list of common use cases.• Engagement Analytics: Real-time or archival views of off-site interaction events with a site. Augment traditional pageview or click stream oriented data with next generation social interaction events – the new driver of web traffic.• Feed Filtering: Read what matters. Not all posts in a feed are equal, some are much more interesting and relevant than others. If you’re a big RSS consumer, understand where you should be spending your time. Find out where the conversation is today, and read what matters.• Top Posts: Find publishers in a topic and what they are best known for. Understanding a publisher’s most engaging posts provides unique insight into the publisher and his community. As a publisher, understanding what works and what doesn’t offers a reliable feedback loop on tone and topic.
  24. 24. Data Services Use Cases• Quality Content Syndication and Aggregation: There are tens of millions of news articles and blog posts being generated every day. Syndicating this stream without a filter is a recipe for failure. How do you know the content will be any good or on topic? This leads aggregators to stick to general, commodity news sources that may be safe but aren’t specific to an area of interest. Aggregate reader engagement is an effective signal for reducing noise and finding articles of interest within any topic when looking through large quantities of otherwise ambiguous content.• News Discovery: What are the best news sources in a topic? If you know about one blog can you find similar ones? How many blogs do you need to get good coverage of a topic area? Topics and feed engagement ranking help identify clusters of related blogs by topic and rank them according to their recent performance with their readership.
  25. 25. Data Services Use Cases• Influence Tracking: Who are the thought leaders in a topic? If there are 87 blogs in a topic how is the total engagement distributed among them? Is there a handful of influencers that own the space or is it a highly fragmented space? These topic attributes can have a heavy influence on how blogger outreach is performed or how content is consumed from them.• Sentiment Analysis: What is the emotional tone of content in a topic filtered by keywords? Is it generally positive, neutral or negative? Where are we today versus the normal baseline of sentiment? This is highly useful for brand monitoring.
  26. 26. SIWOW Future! Revolution! We are on the way!Thank You!Stanley DuMobile: 0086-15910916606E-mail: