RecSys 2012 Industry Track - Sumanth Kolar, StumbleUpon
It's human nature to be curious, to learn new things, to want to find out more. Discovery is an innate human need, and with the rise of the Web, the urge to learn more has increased by leaps and bounds. According to David Hornik, investor at August Capital, "The massive scale of the Web not only creates huge challenges for search, it also cripples discovery. Gone are the good old days in which fortuity would lead to the unearthing of interesting new websites." Indeed, we live in the age of "infovores" and there is definitely a need for a service that provides serendipity.
Providing serendipitous discovery that can inform, entertain and enlighten our users is of utmost importance to StumbleUpon. This talk will focus on how StumbleUpon uses several machine learning techniques such as collaborative filtering techniques, active learning, decision trees, Bayesian models and more to solve complex problems involving classification, user behavior analysis, modelling, anti-spam and recommendations. An average StumbleUpon user spends over 7 hours per month using the product, equating to hundreds of varied recommendations and ample feedback. The talk will also provide insights into some of StumbleUpon's rich data and how we can use scale to accomplish what would otherwise not be possible. We will look at innovative ways that StumbleUpon figures out the right metrics to evaluate recommender systems - a very complex problem. We will also discuss our research on StumbleUpon's mobile activity, which is growing 800% year over year and is the fastest growing part of our business, and how mobile recommendations are unique and important.
Bio: As Engineering Director at StumbleUpon, Sumanth Kolar leads the applied research team, overseeing recommendations, anti-spam, content analysis, user modeling, data sciences and infrastructure. ?Sumanth tackles very interesting and challenging research problems as StumbleUpon delivers more than 1 billion personalized recommendations a month to its more than 25 million users. Prior to joining the company in 2009, Sumanth engineered bidding and computer vision systems at Yahoo! and Adobe Research. Sumanth holds a masters degree in computer science from the University of California at Santa Cruz.
Presentation for the Northwestern University Scholarly Resources and Technology Series, by Claire Stewart, Head, Digital Collections & Scholarly Communication Services. Addresses authors rights, basics of U.S. copyright law, exemptions in the law, open access, data sharing, and related issues. Intended audience is faculty and graduate students at Northwestern University.
The study expands upon previous work in format extension. The initial research purposed extra space provided by an unrefined format to store metadata about the file in question. This process does not negatively impact the original intent of the format and allows for the creation of new derivative file types with both backwards compatibility and new features. The file format extension algorithm has been rewritten entirely in C++ and is now being distributed as an open source C/C++ static library, roughdraftlib. The files from our previous research are essentially binary compatible though a few extra fields have been added for developer convenience. The new data represents the current and oldest compatible versions of the binary and values representing the scaling ratio of the image. These new fields are statically included in every file and take only a few bytes to encode, so they have a trivial effect on the overall encoding density.
Presentation for the Northwestern University Scholarly Resources and Technology Series, by Claire Stewart, Head, Digital Collections & Scholarly Communication Services. Addresses authors rights, basics of U.S. copyright law, exemptions in the law, open access, data sharing, and related issues. Intended audience is faculty and graduate students at Northwestern University.
The study expands upon previous work in format extension. The initial research purposed extra space provided by an unrefined format to store metadata about the file in question. This process does not negatively impact the original intent of the format and allows for the creation of new derivative file types with both backwards compatibility and new features. The file format extension algorithm has been rewritten entirely in C++ and is now being distributed as an open source C/C++ static library, roughdraftlib. The files from our previous research are essentially binary compatible though a few extra fields have been added for developer convenience. The new data represents the current and oldest compatible versions of the binary and values representing the scaling ratio of the image. These new fields are statically included in every file and take only a few bytes to encode, so they have a trivial effect on the overall encoding density.
Aula da disciplina de criação de produção de texto (webwriting e arquitetura da informação) da Pós Graduação em Mídias Sociais e Interativas do SENAC do Rio de Janeiro.
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive
Machine learning is at the core of Pinterest. Pinterest personalizes and ranks 1B+ pins, 700+ million boards for 100M+ users all over the world, using data gathered from collaborative filtering, user curation, web crawling, and more. At Pinterest we model relationships between pins, handle cold-start problems and deal with real-time recommendations.
In this presentation Jure gave an overview of the problems and effective solutions developed at Pinterest. He focused on systems and effective engineering choices made to enable productive machine learning development and enable multiple engineers effectively develop, test, and deploy machine-learned models.
The Europeana Newspapers Project held a workshop in Amsterdam in September 2013. This presentation from Channa Veldhuijsen of the National Library of the Netherlands explains some principles of usability testing for historic newspapers presented online.
We can all pretend that we're helping others by making web sites and software accessible, but we are really making the experience better for our future selves. Learn some fundamentals of web and software accessibility and how it can benefit you (whether future you from aging or you after something else limits your abilities).
We'll review simple testing techniques, basic features and enhancements, coming trends, and where to get help. This isn't intended to be a deep dive, but more of an overall primer for those who aren't sure where to start nor how it helps them.
Insights:
- Broader context for how all users are or will be disabled, whether temporarily or permanently.
- Basic tests and best practices that can be integrated into development team workflows to make interfaces accessible.
- Introduction to standards and tools already available.
Aula da disciplina de criação de produção de texto (webwriting e arquitetura da informação) da Pós Graduação em Mídias Sociais e Interativas do SENAC do Rio de Janeiro.
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive
Machine learning is at the core of Pinterest. Pinterest personalizes and ranks 1B+ pins, 700+ million boards for 100M+ users all over the world, using data gathered from collaborative filtering, user curation, web crawling, and more. At Pinterest we model relationships between pins, handle cold-start problems and deal with real-time recommendations.
In this presentation Jure gave an overview of the problems and effective solutions developed at Pinterest. He focused on systems and effective engineering choices made to enable productive machine learning development and enable multiple engineers effectively develop, test, and deploy machine-learned models.
The Europeana Newspapers Project held a workshop in Amsterdam in September 2013. This presentation from Channa Veldhuijsen of the National Library of the Netherlands explains some principles of usability testing for historic newspapers presented online.
We can all pretend that we're helping others by making web sites and software accessible, but we are really making the experience better for our future selves. Learn some fundamentals of web and software accessibility and how it can benefit you (whether future you from aging or you after something else limits your abilities).
We'll review simple testing techniques, basic features and enhancements, coming trends, and where to get help. This isn't intended to be a deep dive, but more of an overall primer for those who aren't sure where to start nor how it helps them.
Insights:
- Broader context for how all users are or will be disabled, whether temporarily or permanently.
- Basic tests and best practices that can be integrated into development team workflows to make interfaces accessible.
- Introduction to standards and tools already available.
Neste tutorial apresentei usando Python Básico conceitos de como construir um sistema de recomendação por filtragem colaborativa.
Mutirão PyCursos:
Vídeo em: https://plus.google.com/u/0/events/c3hqbk20omt3r5uoq13gpk82i9g
Fast, Cheap, and Actionable: Creating an Affordable User Research Program (Th...Michael Powers
Done a usability study? Ready for the next step? Today we have an abundance of fast, affordable user research methods, many of which can be done remotely with real users. Learn about available user research options and how one university runs successful research projects that lead to actionable insights.
Social Media: Podcasting, Blogging and Social NetworkingDawn Yankeelov
This is an overview talk regarding the state of social media on Aug. 19, 2009 by Aspectx, delivered to guests of One Southern Indiana, and Indiana Small Business Development Centers.
User Experience Design Fundamentals - Part 2: Talking with UsersLaura B
#2 in a 3-part series on UX Fundamentals: Talking with Users
Understand why you should talk to users to uncover, validate and/or understand their goals.
Learn how and when to talk with your users:
User research methods
Planning
Best practices for interviews
Reduce Product Failures While Boosting Conversion RatesUserZoom
What would a 5% improvement in your website’s conversion rates make to your bottom line? If you’re not doing regular usability testing, then you’re probably leaving at least that value on the table. Join Peter Hughes in this free webinar sponsored by UserZoom and UXPA to find out how you can reduce product failures while boosting conversion rates.
Navigating Your Online Presence in the Multifamily Housing IndustryErica Campbell Byrum
This presentation was presented at the Harbor Group Managers' Meeting on 1-26-12.
Session Description: The social web is constantly changing, uploading, sharing and ranting-about your brand. Who is talking? Who is listening? What are consumes saying? What are you saying in response? Are you taking action? Conversation and content are always encouraged, but the reality is that not all content is beneficial for the apartment community. Find out how these conversations on sites like Facebook, Twitter, Foursquare, and ApartmentRatings.com can be used to gain insight and take action. This session will address the latest strategic insights in public relations, integrated communications, management, planning, relationship, reputation, branding, social media, sustainability, and other emerging opportunities.
Attendees walked away with a crash course in PR Damage Control 101 as well as the tools that allow you to react in a confident and timely manner when confrontation occurs.
Selfish Accessibility: Presented at GoogleAdrian Roselli
We can all pretend that we're helping others by making web sites and software accessible, but we are really making them better for our future selves. Learn some fundamentals of accessibility and how it can benefit you (whether future you from aging or you after something else limits your abilities). We'll review simple testing techniques, basic features and enhancements, coming trends, and where to get help. This isn't intended to be a deep dive into ARIA, but more of an overall primer for those who aren't sure where to start nor how it helps them.
Driving Online Sales - Craig Sullivan, The future of the online marketplace 2...Invest Northern Ireland
Craig Sullivan,Group Customer Experience Manager, Belron
Craig is Group Customer Experience Manager at Belron (Autoglass) looking after 35 international websites using optimisation, web analytics and customer insight techniques to drive engagement and conversion. Craig has over 14 years of experience in the Industry and in the past has worked on projects for high street names such as LOVEFiLM,International, John Lewis Partnership and Waitrose.
What your customers REALLY think: Incorporating usability testing into agilePhil Barrett
I did this talk for Agile Africa 2014
You can’t know whether your agile project is maximising is impact unless you gather customer feedback. But the feedback that comes to you is not always the full story.
This talk looks at why you should actively go an get user feedback with usability testing, and how to go about doing your first usability test.
Nondeterministic Software for the Rest of UsTomer Gabel
A talk given at GeeCON 2018 in Krakow, Poland.
Classically-trained (if you can call it that) software engineers are used to clear problem statements and clear success and acceptance criteria. Need a mobile front-end for your blog? Sure! Support instant messaging for a million concurrent users? No problem! Store and serve 50TB of JSON blobs? Presto!
Unfortunately, it turns out modern software often includes challenges that we have a hard time with: those without clear criteria for correctness, no easy way to measure performance and success is about more than green dashboards. Your blog platform better have a spam filter, your instant messaging service has to have search, and your blobs will inevitably be fed into some data scientist's crazy contraption.
In this talk I'll share my experiences of learning to deal with non-deterministic problems, what made the process easier for me and what I've learned along the way. With any luck, you'll have an easier time of it!
Similar to Recommendations and Discovery at StumbleUpon (20)
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
2. StumbleUpon’s Mission
Help users find content they did not expect to find
Be the best way to discover new
and interesting things from across
the Web.
3. How StumbleUpon works
1. Register 2. Tell us your interests 3. Start Stumbling and
rating web pages
We use your interests and behavior to
recommend new content for you!
4. Discovery is very different from search
Discovery at StumbleUpon Search
Serendipitous Intent driven
One at a time List of articles
Never repeats Always repeats
Constantly adapting Fixed results
Tailored for you Impersonal
There is a ongoing shift from search to discovery
7. What are the key challenges to
good recommendations?
8. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
9. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
12. Continually Enhance a User’s Interest Graph
Analyze user’s StumbleUpon history to expand on
interest preferences:
• Add/remove topics
• Follow/block particular domains
13. Continually Enhance a User’s Interest Graph
Leverage social network
data:
• Find friends & people
to follow
• Find content trending
in your social circles
• Find additional
interests
14. Continually Enhance a User’s Interest Graph
Mine internal StumbleUpon
rating and sharing data to
suggest other stumblers,
topics.
15. Enhanced Interest Graph
Friends
News
Italian Food/ Trending
User
Recipes Cooking
Cars
nasa.gov
Vintage
Cars 1x.com
16. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
17. Sampling
On average hundreds of URLs are ingested into the
StumbleUpon pipeline every minute.
• Sampling key goals:
1. Determine which URLs to sample and which to skip
completely
2. Examine sampling results to identify good URLs
• URL features used when sampling:
• Known domain performance(ratings, timespent)
• Content related features (#images, #ads, url length etc)
• User features of the discoverer (spammer vs trusted user)
18. Recommendations at StumbleUpon: Sampling
Classifier based on
User Feedback
Random Forest Vote Recommend (Timespent, Ratings)
Rating Timespent
Yes Good 35sec
Good 22sec
Webpage
Bad 15sec
Yes
No Yes
Good 45sec
Good 14sec
Yes
Good 28sec
19. Leveraging In-Network Experts
• Users who thumb-up good content and
thumb-down bad content
• For example
– Joe DiMaggio – Baseball
– Julia Child- Food/Cooking
– Da Vinci- Art and Architecture
• Ratings from Experts are more trustworthy
and earn more weight.
20. Non Expert Expert
P(Thumb Up | Page Quality) P(Thumb Up | Page Quality)
Page Quality
Page Quality
Recommendations at StumbleUpon: Experts
21. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
23. Like-Minded Users
• Find users who like content
similar to the content you do
• Signals can be ratings, time
spent, interests, etc.
• Use the content they’ve liked
24. PLSI based like-minded
Vintage Cars
Action movies Astronomy
Astronomy Space Exploration
Robotics
Physics
Classic Movies
Movies
Cars Space
Neuroscience
Astronomy
Space Exploration
Science Comedy Movies
25. Like-Minded Users: Challenges Scaling
Total Pairwise Similarity Calculations
= 50K users * 5 million users * 1K features
= 250 Trillion
Probabilistic Latent Semantic Index (PLSI)
based similarity over 500 trillion calculations
PLSI based similarity framework computes in
less than an hour
27. Different methods perform differently for
different users at different times
100%
75%
Trending
Follow
50% Bias domains
Experts
News
25%
Like-minded
0%
User 1 User 2 User 3 User 4 User 5
29. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
30. Two Main Signals from Recommendation
Rating Time Spent
Both present numerous challenges . . .
31. Ratings: volume decay
Users rate more during
their initial experience
# Ratings
Time
Why is this happening?
32. Time Spent
?
?
Images
Video Text
Images
Video
T5 sec
T3 sec T4 sec
T2 sec
T1 sec
• Ratings are sparse
• < 10% of recommendations have explicit ratings.
• Using time spent decide whether the stumble was skipped
• Timespent on videos is longer than images.
• Solution: Estimate p(Like | Timespent)
• Model based on user, content patterns
33. Challenges: Time spent on different devices
Stumble Bar
Median time spent per stumble
Mobile / Tablets
Installed plugin
5th percentile time spent per stumble
34. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
38. Many other interesting problems…
• Dupe detection
• Anti-spam
• News
• Topic classification
• Metrics, quality analysis
• Trending
• Search We are HIRING !!!
• User biases, mood
• Many more…
Editor's Notes
At the end of this talk, you would have a good understanding of problems with discovery, some solutions, some data insights.
Our goal is to show content that you did not know you would likeTo surprise you, enlighten youBasically to enable exploration, discovery
-During signup, we ask interesting questions to learn more about you – solve the cold start problem
- Think of discovery as search without a term and add the complexities i.e, nothing repeats etcFor example, if you want to learn about astronomy or genetic algorithms its hard to do on search or any other services --- way more work
When I started a couple of years back, we were 6M users and 15 employeesGrowing rapidly, especially on mobileTalk about time spent and how users are super hooked.
Users are good at choosing topics that they like.. We have had repeated good success at increasing the topics they pickBut, the problem is more about having them pick the right topics for them.. Arts vs AI.. Its not simple to build a user experience that accounts for that and gets us that dataHuge area of research for StumbleUpon --- how do we get as much as possible from the user without losing them or setting completely different expectations than what the product is
Now we have a basic version of the interests graph.. Some topics you like
StumbleSenseBased on you likes/dislikes we build a SENSE for other things you may like. Hence suggest topics, domains, etc that we think you will like as you stumble alongMakes interest elicitation a part of the core productYou are learning about the user and the user understands the product a lot.. Dialgoue and back, forthNotice that we give the reason why it was recommended.. Transparency is very important.
Leverage other networks you are part of to get data about what you like and jumpstart interest graph
Also, show suggested stumblers, interests etc
More dense interest graph. Affinity, confidence to the interest varies and depending on that we can exploit, explore.
When new content is discovered /ingested how do we determine if its good or not.You will always have exceptions that need to be handled. For ex: - Domains such as youtube, basically UGC in which content is diverse.. You need to build models that account for thatUser features of the raters/discoverers .. Just because a spammer rated cnn.com you can’t ignore it.. Look at multiple sources of information and decide whether the url is worth sampling or not
Now, one way of doing this is to use a random forest with content features
And also we can sample to expertsThat’s one huge advantage SU has – the fact that we can decide which site to send and get data for that url.But, sometimes you could be recommending bad content to the expert – you get around by telling the expert that we think he is an expert and we need to get more data from him about the url. Again transparency for the win Transparency allows us to set the right expectations..
One way of defining experts is users who thumbup high quality pages and thumbdown low quality pages.. There are multiple ways you can find high quality pages-- Have a seed of experts pick urls and use them to find other experts-- Or looks at your current quality scores and see which user ratings are more predictive of that .. Use them as experts-- Social endorsement.. Have users rate others as experts, use external data sources similar to what klout is doing to do this – very hard problem.
How to you match right content to right user ? User expectations are very different. When you say you like cars and I like cars.. We are not talking about the same thingNeed to deeper understand the interest graph
One solution is find other users that are similar to you.. But then just because you are similar to me in Physics.. does not mean I would like the Music you listen to.
One solution.. Figure out latent topicsand then use them to cluster/find similar users
Now we have an interest graph that is both explicit and implicit
Different users have varying method mix.We learn the mix and balance it.. But this needs to account for mood – for example, we see that you like stumbling news in the morning and videos in the weekend. But there are always exceptions
Context i.e, showing why a recommendation was shown to a user is very important. There should be a back and forth. Recommendations should be very transparent. Context can that your friend on Facebook liked it or it can be that this is trending in Politics
Immediate conclusion is quality of recommendations is not good.. But this is both thumbups and downs Stumbling is cheap and so clicking the stumble button is better than rating. One could argue that we are doing a really good job and the marginal utility of rating is not highSolutions: Use other data such as time spent to figure out what you like. Make you rate more ;) work very closely with product on what we can do to remind the user that their ratings matter
Now, we know we need to use timespent..Last stumble, time spent Great we have a solution