LEVERAGE WHAT WE KNOW TO CREATE A FULLY PERSONALIZED EXPERIENCE

There are tens of billions of web pages out there and more than two million terabytes of text, images and more are created every
hour. So, where in this deluge do we start looking for what’s interesting to the user?

In our next iteration of the news reading feature in UC Browser, the app needs to be "smarter" and more "social", it needs to be more
than just a plain old RSS reader. RSS is so passé. We want to scour the social web, monitor twitter's trends, understand usage
habits and finds what's interesting for our users, be it a photo or a piece of viral video.

No one wants to spend a lot of time setting up an app. Leverage as much of what the iPad/iPhone and other websites know about
the users as possible (with their permission, of course). Could we use the user’s location, calendar, Facebook, Twitter feed, delicious
bookmarks or contacts list to make the our app's news reader feature smarter and more personalized?

USE CASE

Our app connects to the user's Google Reader, Twitter or Delicious account and then presents articles based on what it has learned
about their interests and reading habits. E.g. USE CASE: The user logins with his Delicious (where he/she stores all of their
bookmarks), and in five seconds our app creates a personalized magazine for the user to explore new and interesting content. And
just like Pandora (, the more you use it, the more relevant and personalized the content becomes.

A SMARTER, MORE SOCIAL WEB MAGAZINE

We wanna observe what’s happening around the social web, because the community, in aggregate, creates a strong signal for
what’s interesting and what's trending. User-generated content, sharing, commenting and bookmarking have overtaken email and
web pages in sheer volume of data created and total time spent online – Stats shows projection that 115 million people in the U.S. to
be creating content by 2013. What’s important is either these content are happening on, or reported through, social media. What’s
more, mining the social web makes it possible to personalize magazine content at the moment you start using UC Browser's social
magazine for the first time .

To take advantage of the social web in order to find and choose great content for our users, our "backend algorithm", the magic
sauce that determines "what's interesting" for our users, we need to beef up the "news selection/filtering" process:

   1. Monitors URLs that are shared through a wide range of social streams that the users choose to connect to UC Browser, such
   as Twitter, FaceBook and delicious, and begin to tell our app about their interests and focus.

   2. Throws out spam using "adaptive pattern matching heuristics" and other techniques.

   3. Associates each URL with the user who shares them and calculates the credibility of each of those users—because a URL
   from someone who has a lot of followers or is often re-tweeted, for example, is usually more credible. An algorithm needs to be
   in place to compute this into our "filtering" process and selection metrics.

   4. Combines the credibility scores of all the users who share a particular URL to calculate an overall quality score for that URL.

   5. Carries forward URLs with scores above a certain threshold as potential content to show, depending on later calculations.

The result is millions of new and vetted URLs put into the our backend news pipeline every day.

MODELING CONTENT

Each vetted URL points to text, graphics or videos that we could potentially show our users, but it takes a lot more processing to find
out what’s worth their time. So, we:

   1. Strips out all the extraneous, non-readable content at a URL. This includes HTML formatting, file “includes,” scripting code,
   whatever, ads, banner graphics, etc. That’s all removed via syntactic analysis, leaving a document that a machine can analyze
   for its content and one that you can read (if we figure it’s worthwhile).

   2. Analyzes each document via text mining and term extraction techniques, inferring the terms that succinctly capture and
   summarize (see iOS app "Summify") what the content is about.

   3. Parses out the places, names and dates via entity extraction techniques.

   4. Characterizes the writing style, patterns of speech, and the length of sentences, phrases and words, all via semantic
   classifiers.

   5. Lastly, collects metadata such as the author’s name, modifiers from user-added tags and comments, Twitter hash-tags, etc.

All these features—terms, entities, styles, metadata—define a model of what’s in a document, and they are carried forward with the
document itself.

MODELING COMMUNITY

The aggregated habits and interests of a community of users can provide valuable recommendations for its members. You’ve likely
experienced this via collaborative filtering from Amazon or Netflix. The heuristics correlate the habits of many users who are LIKE
you, in order to help derive what you will find relevant. Using a similar technique:

   1. Correlates relationships across millions of users and billions of documents, based on vetted data that our app has captured
   from the social web. This creates a huge matrix of document-user relationships, derived from both our users and external data.

   2. Condenses these relationships into a few hundred features that characterize each user and each document. Later on, these
   features become the basis for matching each incoming document to your individual interests.

   The process of condensing tends to “blur” the data a bit, and this is a good thing—it enables our app to show the users
   documents that are a little outside their direct interests, adding an element of serendipity and helping you to discover new things.

MODELING USER

The more your friends and colleagues learn about you, the more enjoyable your conversations become. Our app works the same
way—the more the user interacts with it, the smarter it gets to know about their habits/preferences/usage patterns/interests, so the
better it works at bringing the user “what’s interesting”. To do this, our app:

   1. Tracks the specific topics the user says they’re interested in and lets him/her creates a "Section" in our app for each one.

   2. Quietly watches what the users read and don’t read, and uses machine learning to infer their degree of interest in each
   document.

   3. Asks for feedback in the form of thumbs-up/thumbs-down ratings as well as labeled click-boxes so you can ask for more
   stories from specific sources, specific authors, or on specific topics. These could be popular sites or lesser-known blogs, news
   items or editorials, and so on.

So, let’s say the user “thumbs-up” multiple stories about upcoming political elections. Our app will show them more stories about
that. Or, if they repeatedly “thumbs-down” certain stories on the same general topic, UC will develop a rule to stop showing them
similar ones. But how do we know what “similar” means? Why do the users like or dislike a particular story? Is it because it’s about
foreign policy, or written by a specific author, or about a fringe candidate? (The user might not even realize why themselves).
Automatically figuring that out, without pestering the user to answer a lot of questions, isn’t easy. UC will use the hundreds of
features in its models of content, community, and the user, to find the fine-grained patterns in their ratings that represent their
preferences. This way, it can correctly reflect the user's interest by what it shows them, without too much effort on their part. In short,
UC gets better every time you use it, just by using it. And the more the user tells Zite what he/she likes and dislikes, the more
accurate its choices become.

MATCHING "WHAT'S INTERESTING" TO YOUR INTERESTS

The first time our app is started, user just simply needs to select their "interest" news source from their Google Reader, UC now has
everything it needs to narrow down the daily deluge of content into focused, personalized, and up-to-date stories. To do this:

   1. Looks at the incoming stream of new documents since the user last opened UC, and keeps the ones that match their
   Sections, sorting them by the quality score.

   2. Makes a fine-grained comparison of the highest-scored documents to the users and their interests, using the hundreds of
   features calculated for each document. This yields a content-matching score for how closely a story fits your interests.

   3. Factors the "age" of a story into its score. As a story get older, it often becomes less interesting the users and so UC lowers its
   score proportionally.

   4. Applies the user's block source input to eliminate sources they don’t want to see.

   5. Sorts the stories according to their scores with the most relevant first.

   6. Lastly, UC flows these stories onto the screen of the users' iPad or iPhone, populating each Section according to topic, and
   using the best of those to populate their Top Stories.

Smarter, More Social Browser

  • 1.
    LEVERAGE WHAT WEKNOW TO CREATE A FULLY PERSONALIZED EXPERIENCE There are tens of billions of web pages out there and more than two million terabytes of text, images and more are created every hour. So, where in this deluge do we start looking for what’s interesting to the user? In our next iteration of the news reading feature in UC Browser, the app needs to be "smarter" and more "social", it needs to be more than just a plain old RSS reader. RSS is so passé. We want to scour the social web, monitor twitter's trends, understand usage habits and finds what's interesting for our users, be it a photo or a piece of viral video. No one wants to spend a lot of time setting up an app. Leverage as much of what the iPad/iPhone and other websites know about the users as possible (with their permission, of course). Could we use the user’s location, calendar, Facebook, Twitter feed, delicious bookmarks or contacts list to make the our app's news reader feature smarter and more personalized? USE CASE Our app connects to the user's Google Reader, Twitter or Delicious account and then presents articles based on what it has learned about their interests and reading habits. E.g. USE CASE: The user logins with his Delicious (where he/she stores all of their bookmarks), and in five seconds our app creates a personalized magazine for the user to explore new and interesting content. And just like Pandora (, the more you use it, the more relevant and personalized the content becomes. A SMARTER, MORE SOCIAL WEB MAGAZINE We wanna observe what’s happening around the social web, because the community, in aggregate, creates a strong signal for what’s interesting and what's trending. User-generated content, sharing, commenting and bookmarking have overtaken email and web pages in sheer volume of data created and total time spent online – Stats shows projection that 115 million people in the U.S. to be creating content by 2013. What’s important is either these content are happening on, or reported through, social media. What’s more, mining the social web makes it possible to personalize magazine content at the moment you start using UC Browser's social magazine for the first time . To take advantage of the social web in order to find and choose great content for our users, our "backend algorithm", the magic sauce that determines "what's interesting" for our users, we need to beef up the "news selection/filtering" process: 1. Monitors URLs that are shared through a wide range of social streams that the users choose to connect to UC Browser, such as Twitter, FaceBook and delicious, and begin to tell our app about their interests and focus. 2. Throws out spam using "adaptive pattern matching heuristics" and other techniques. 3. Associates each URL with the user who shares them and calculates the credibility of each of those users—because a URL from someone who has a lot of followers or is often re-tweeted, for example, is usually more credible. An algorithm needs to be in place to compute this into our "filtering" process and selection metrics. 4. Combines the credibility scores of all the users who share a particular URL to calculate an overall quality score for that URL. 5. Carries forward URLs with scores above a certain threshold as potential content to show, depending on later calculations. The result is millions of new and vetted URLs put into the our backend news pipeline every day. MODELING CONTENT Each vetted URL points to text, graphics or videos that we could potentially show our users, but it takes a lot more processing to find out what’s worth their time. So, we: 1. Strips out all the extraneous, non-readable content at a URL. This includes HTML formatting, file “includes,” scripting code, whatever, ads, banner graphics, etc. That’s all removed via syntactic analysis, leaving a document that a machine can analyze for its content and one that you can read (if we figure it’s worthwhile). 2. Analyzes each document via text mining and term extraction techniques, inferring the terms that succinctly capture and summarize (see iOS app "Summify") what the content is about. 3. Parses out the places, names and dates via entity extraction techniques. 4. Characterizes the writing style, patterns of speech, and the length of sentences, phrases and words, all via semantic classifiers. 5. Lastly, collects metadata such as the author’s name, modifiers from user-added tags and comments, Twitter hash-tags, etc. All these features—terms, entities, styles, metadata—define a model of what’s in a document, and they are carried forward with the
  • 2.
    document itself. MODELING COMMUNITY Theaggregated habits and interests of a community of users can provide valuable recommendations for its members. You’ve likely experienced this via collaborative filtering from Amazon or Netflix. The heuristics correlate the habits of many users who are LIKE you, in order to help derive what you will find relevant. Using a similar technique: 1. Correlates relationships across millions of users and billions of documents, based on vetted data that our app has captured from the social web. This creates a huge matrix of document-user relationships, derived from both our users and external data. 2. Condenses these relationships into a few hundred features that characterize each user and each document. Later on, these features become the basis for matching each incoming document to your individual interests. The process of condensing tends to “blur” the data a bit, and this is a good thing—it enables our app to show the users documents that are a little outside their direct interests, adding an element of serendipity and helping you to discover new things. MODELING USER The more your friends and colleagues learn about you, the more enjoyable your conversations become. Our app works the same way—the more the user interacts with it, the smarter it gets to know about their habits/preferences/usage patterns/interests, so the better it works at bringing the user “what’s interesting”. To do this, our app: 1. Tracks the specific topics the user says they’re interested in and lets him/her creates a "Section" in our app for each one. 2. Quietly watches what the users read and don’t read, and uses machine learning to infer their degree of interest in each document. 3. Asks for feedback in the form of thumbs-up/thumbs-down ratings as well as labeled click-boxes so you can ask for more stories from specific sources, specific authors, or on specific topics. These could be popular sites or lesser-known blogs, news items or editorials, and so on. So, let’s say the user “thumbs-up” multiple stories about upcoming political elections. Our app will show them more stories about that. Or, if they repeatedly “thumbs-down” certain stories on the same general topic, UC will develop a rule to stop showing them similar ones. But how do we know what “similar” means? Why do the users like or dislike a particular story? Is it because it’s about foreign policy, or written by a specific author, or about a fringe candidate? (The user might not even realize why themselves). Automatically figuring that out, without pestering the user to answer a lot of questions, isn’t easy. UC will use the hundreds of features in its models of content, community, and the user, to find the fine-grained patterns in their ratings that represent their preferences. This way, it can correctly reflect the user's interest by what it shows them, without too much effort on their part. In short, UC gets better every time you use it, just by using it. And the more the user tells Zite what he/she likes and dislikes, the more accurate its choices become. MATCHING "WHAT'S INTERESTING" TO YOUR INTERESTS The first time our app is started, user just simply needs to select their "interest" news source from their Google Reader, UC now has everything it needs to narrow down the daily deluge of content into focused, personalized, and up-to-date stories. To do this: 1. Looks at the incoming stream of new documents since the user last opened UC, and keeps the ones that match their Sections, sorting them by the quality score. 2. Makes a fine-grained comparison of the highest-scored documents to the users and their interests, using the hundreds of features calculated for each document. This yields a content-matching score for how closely a story fits your interests. 3. Factors the "age" of a story into its score. As a story get older, it often becomes less interesting the users and so UC lowers its score proportionally. 4. Applies the user's block source input to eliminate sources they don’t want to see. 5. Sorts the stories according to their scores with the most relevant first. 6. Lastly, UC flows these stories onto the screen of the users' iPad or iPhone, populating each Section according to topic, and using the best of those to populate their Top Stories.