Slides from an introductory talk on machine learning, and why mathematicians should take interest in it.
This is a very basic introduction, for math undergraduates & other curious minds.
Since 2014, Typesafe has been actively contributing to the Apache Spark project, and has become a certified development support partner of Databricks, the company started by the creators of Spark. Typesafe and Mesosphere have forged a partnership in which Typesafe is the official commercial support provider of Spark on Apache Mesos, along with Mesosphere’s Datacenter Operating Systems (DCOS).
In this webinar with Iulian Dragos, Spark team lead at Typesafe Inc., we reveal how Typesafe supports running Spark in various deployment modes, along with the improvements we made to Spark to help integrate backpressure signals into the underlying technologies, making it a better fit for Reactive Streams. He also show you the functionalities at work, and how to make it simple to deploy to Spark on Mesos with Typesafe.
We will introduce:
Various deployment modes for Spark: Standalone, Spark on Mesos, and Spark with Mesosphere DCOS
Overview of Mesos and how it relates to Mesosphere DCOS
Deeper look at how Spark runs on Mesos
How to manage coarse-grained and fine-grained scheduling modes on Mesos
What to know about a client vs. cluster deployment
A demo running Spark on Mesos
Complex reflection groups are somehow realDavid Bessis
Slides from my talk at "Finite Chevalley groups, reflection groups and braid groups - A conference in honour of Professor Jean Michel" in Lausanne.
The talk presents an overview of my proof of the K(π,1) property for complex reflection groups, together with a drastic simplication. As it happens, the discriminant complement is homotopy equivalent to the space of configurations of n (non-necessarily distinct) points on the circle, decorated by compatible factorizations of the Coxeter element. This space is a compact real subvariety of dimension n of the discriminant complement, and admits a natural cellular decompositions whose simplicial structure coincides with the abstract simplicial K(B,1) model provided by the dual braid monoid.
Bottomline: my K(π,1) could be rewritten in a much simpler way, removing all the tedious work with open coverings of the universal cover.
Feel free to steal ideas, reuse & repackage them, and publish as many derived works as you want (I'd love to write it down but don't have the bandwidth.)
This is open source math, enjoy ;-)!
Noncrossing partitions and reflection discriminantsDavid Bessis
Slides from a research talk given at "Non-crossing partitions in representation theory" workshop in Bielefeld.
Abstract:
"Key numerical invariants (Catalan numbers, Zeta functions) of generalized non-crossing partitions can be expressed in terms of the degrees of the associated reflection groups, suggesting strong links with the geometry of the quotient singularity. This can be explained (partly) via a "canonical decomposition" theorem for reflection singularities, which appears as "pull-backs" of configuration spaces and sets of chains in the non-crossing partition lattice. I will explain this canonical decomposition, how it is used in the proof of the K(pi,1) conjecture for complex reflection groups, and its relationship with the cyclic sieving phenomenon."
To solve the K(\pi,1) conjecture for complex reflection arrangements, the hardest case to address is G31, an exceptional arrangement of 60 hyperplanes in dimension 4. The talk presents some of the key ingredients (combinatorics, geometry, and a splash of categorical homotopy theory) involved in a proposed proof of this case.
2024 State of Marketing Report – by HubspotMarius Sescu
https://www.hubspot.com/state-of-marketing
· Scaling relationships and proving ROI
· Social media is the place for search, sales, and service
· Authentic influencer partnerships fuel brand growth
· The strongest connections happen via call, click, chat, and camera.
· Time saved with AI leads to more creative work
· Seeking: A single source of truth
· TLDR; Get on social, try AI, and align your systems.
· More human marketing, powered by robots
ChatGPT is a revolutionary addition to the world since its introduction in 2022. A big shift in the sector of information gathering and processing happened because of this chatbot. What is the story of ChatGPT? How is the bot responding to prompts and generating contents? Swipe through these slides prepared by Expeed Software, a web development company regarding the development and technical intricacies of ChatGPT!
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
The realm of product design is a constantly changing environment where technology and style intersect. Every year introduces fresh challenges and exciting trends that mold the future of this captivating art form. In this piece, we delve into the significant trends set to influence the look and functionality of product design in the year 2024.
Since 2014, Typesafe has been actively contributing to the Apache Spark project, and has become a certified development support partner of Databricks, the company started by the creators of Spark. Typesafe and Mesosphere have forged a partnership in which Typesafe is the official commercial support provider of Spark on Apache Mesos, along with Mesosphere’s Datacenter Operating Systems (DCOS).
In this webinar with Iulian Dragos, Spark team lead at Typesafe Inc., we reveal how Typesafe supports running Spark in various deployment modes, along with the improvements we made to Spark to help integrate backpressure signals into the underlying technologies, making it a better fit for Reactive Streams. He also show you the functionalities at work, and how to make it simple to deploy to Spark on Mesos with Typesafe.
We will introduce:
Various deployment modes for Spark: Standalone, Spark on Mesos, and Spark with Mesosphere DCOS
Overview of Mesos and how it relates to Mesosphere DCOS
Deeper look at how Spark runs on Mesos
How to manage coarse-grained and fine-grained scheduling modes on Mesos
What to know about a client vs. cluster deployment
A demo running Spark on Mesos
Complex reflection groups are somehow realDavid Bessis
Slides from my talk at "Finite Chevalley groups, reflection groups and braid groups - A conference in honour of Professor Jean Michel" in Lausanne.
The talk presents an overview of my proof of the K(π,1) property for complex reflection groups, together with a drastic simplication. As it happens, the discriminant complement is homotopy equivalent to the space of configurations of n (non-necessarily distinct) points on the circle, decorated by compatible factorizations of the Coxeter element. This space is a compact real subvariety of dimension n of the discriminant complement, and admits a natural cellular decompositions whose simplicial structure coincides with the abstract simplicial K(B,1) model provided by the dual braid monoid.
Bottomline: my K(π,1) could be rewritten in a much simpler way, removing all the tedious work with open coverings of the universal cover.
Feel free to steal ideas, reuse & repackage them, and publish as many derived works as you want (I'd love to write it down but don't have the bandwidth.)
This is open source math, enjoy ;-)!
Noncrossing partitions and reflection discriminantsDavid Bessis
Slides from a research talk given at "Non-crossing partitions in representation theory" workshop in Bielefeld.
Abstract:
"Key numerical invariants (Catalan numbers, Zeta functions) of generalized non-crossing partitions can be expressed in terms of the degrees of the associated reflection groups, suggesting strong links with the geometry of the quotient singularity. This can be explained (partly) via a "canonical decomposition" theorem for reflection singularities, which appears as "pull-backs" of configuration spaces and sets of chains in the non-crossing partition lattice. I will explain this canonical decomposition, how it is used in the proof of the K(pi,1) conjecture for complex reflection groups, and its relationship with the cyclic sieving phenomenon."
To solve the K(\pi,1) conjecture for complex reflection arrangements, the hardest case to address is G31, an exceptional arrangement of 60 hyperplanes in dimension 4. The talk presents some of the key ingredients (combinatorics, geometry, and a splash of categorical homotopy theory) involved in a proposed proof of this case.
2024 State of Marketing Report – by HubspotMarius Sescu
https://www.hubspot.com/state-of-marketing
· Scaling relationships and proving ROI
· Social media is the place for search, sales, and service
· Authentic influencer partnerships fuel brand growth
· The strongest connections happen via call, click, chat, and camera.
· Time saved with AI leads to more creative work
· Seeking: A single source of truth
· TLDR; Get on social, try AI, and align your systems.
· More human marketing, powered by robots
ChatGPT is a revolutionary addition to the world since its introduction in 2022. A big shift in the sector of information gathering and processing happened because of this chatbot. What is the story of ChatGPT? How is the bot responding to prompts and generating contents? Swipe through these slides prepared by Expeed Software, a web development company regarding the development and technical intricacies of ChatGPT!
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
The realm of product design is a constantly changing environment where technology and style intersect. Every year introduces fresh challenges and exciting trends that mold the future of this captivating art form. In this piece, we delve into the significant trends set to influence the look and functionality of product design in the year 2024.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
Mental health has been in the news quite a bit lately. Dozens of U.S. states are currently suing Meta for contributing to the youth mental health crisis by inserting addictive features into their products, while the U.S. Surgeon General is touring the nation to bring awareness to the growing epidemic of loneliness and isolation. The country has endured periods of low national morale, such as in the 1970s when high inflation and the energy crisis worsened public sentiment following the Vietnam War. The current mood, however, feels different. Gallup recently reported that national mental health is at an all-time low, with few bright spots to lift spirits.
To better understand how Americans are feeling and their attitudes towards mental health in general, ThinkNow conducted a nationally representative quantitative survey of 1,500 respondents and found some interesting differences among ethnic, age and gender groups.
Technology
For example, 52% agree that technology and social media have a negative impact on mental health, but when broken out by race, 61% of Whites felt technology had a negative effect, and only 48% of Hispanics thought it did.
While technology has helped us keep in touch with friends and family in faraway places, it appears to have degraded our ability to connect in person. Staying connected online is a double-edged sword since the same news feed that brings us pictures of the grandkids and fluffy kittens also feeds us news about the wars in Israel and Ukraine, the dysfunction in Washington, the latest mass shooting and the climate crisis.
Hispanics may have a built-in defense against the isolation technology breeds, owing to their large, multigenerational households, strong social support systems, and tendency to use social media to stay connected with relatives abroad.
Age and Gender
When asked how individuals rate their mental health, men rate it higher than women by 11 percentage points, and Baby Boomers rank it highest at 83%, saying it’s good or excellent vs. 57% of Gen Z saying the same.
Gen Z spends the most amount of time on social media, so the notion that social media negatively affects mental health appears to be correlated. Unfortunately, Gen Z is also the generation that’s least comfortable discussing mental health concerns with healthcare professionals. Only 40% of them state they’re comfortable discussing their issues with a professional compared to 60% of Millennials and 65% of Boomers.
Race Affects Attitudes
As seen in previous research conducted by ThinkNow, Asian Americans lag other groups when it comes to awareness of mental health issues. Twenty-four percent of Asian Americans believe that having a mental health issue is a sign of weakness compared to the 16% average for all groups. Asians are also considerably less likely to be aware of mental health services in their communities (42% vs. 55%) and most likely to seek out information on social media (51% vs. 35%).
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
This article is all about what AI trends will emerge in the field of creative operations in 2024. All the marketers and brand builders should be aware of these trends for their further use and save themselves some time!
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
Mental health has been in the news quite a bit lately. Dozens of U.S. states are currently suing Meta for contributing to the youth mental health crisis by inserting addictive features into their products, while the U.S. Surgeon General is touring the nation to bring awareness to the growing epidemic of loneliness and isolation. The country has endured periods of low national morale, such as in the 1970s when high inflation and the energy crisis worsened public sentiment following the Vietnam War. The current mood, however, feels different. Gallup recently reported that national mental health is at an all-time low, with few bright spots to lift spirits.
To better understand how Americans are feeling and their attitudes towards mental health in general, ThinkNow conducted a nationally representative quantitative survey of 1,500 respondents and found some interesting differences among ethnic, age and gender groups.
Technology
For example, 52% agree that technology and social media have a negative impact on mental health, but when broken out by race, 61% of Whites felt technology had a negative effect, and only 48% of Hispanics thought it did.
While technology has helped us keep in touch with friends and family in faraway places, it appears to have degraded our ability to connect in person. Staying connected online is a double-edged sword since the same news feed that brings us pictures of the grandkids and fluffy kittens also feeds us news about the wars in Israel and Ukraine, the dysfunction in Washington, the latest mass shooting and the climate crisis.
Hispanics may have a built-in defense against the isolation technology breeds, owing to their large, multigenerational households, strong social support systems, and tendency to use social media to stay connected with relatives abroad.
Age and Gender
When asked how individuals rate their mental health, men rate it higher than women by 11 percentage points, and Baby Boomers rank it highest at 83%, saying it’s good or excellent vs. 57% of Gen Z saying the same.
Gen Z spends the most amount of time on social media, so the notion that social media negatively affects mental health appears to be correlated. Unfortunately, Gen Z is also the generation that’s least comfortable discussing mental health concerns with healthcare professionals. Only 40% of them state they’re comfortable discussing their issues with a professional compared to 60% of Millennials and 65% of Boomers.
Race Affects Attitudes
As seen in previous research conducted by ThinkNow, Asian Americans lag other groups when it comes to awareness of mental health issues. Twenty-four percent of Asian Americans believe that having a mental health issue is a sign of weakness compared to the 16% average for all groups. Asians are also considerably less likely to be aware of mental health services in their communities (42% vs. 55%) and most likely to seek out information on social media (51% vs. 35%).
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
This article is all about what AI trends will emerge in the field of creative operations in 2024. All the marketers and brand builders should be aware of these trends for their further use and save themselves some time!
A report by thenetworkone and Kurio.
The contributing experts and agencies are (in an alphabetical order): Sylwia Rytel, Social Media Supervisor, 180heartbeats + JUNG v MATT (PL), Sharlene Jenner, Vice President - Director of Engagement Strategy, Abelson Taylor (USA), Alex Casanovas, Digital Director, Atrevia (ES), Dora Beilin, Senior Social Strategist, Barrett Hoffher (USA), Min Seo, Campaign Director, Brand New Agency (KR), Deshé M. Gully, Associate Strategist, Day One Agency (USA), Francesca Trevisan, Strategist, Different (IT), Trevor Crossman, CX and Digital Transformation Director; Olivia Hussey, Strategic Planner; Simi Srinarula, Social Media Manager, The Hallway (AUS), James Hebbert, Managing Director, Hylink (CN / UK), Mundy Álvarez, Planning Director; Pedro Rojas, Social Media Manager; Pancho González, CCO, Inbrax (CH), Oana Oprea, Head of Digital Planning, Jam Session Agency (RO), Amy Bottrill, Social Account Director, Launch (UK), Gaby Arriaga, Founder, Leonardo1452 (MX), Shantesh S Row, Creative Director, Liwa (UAE), Rajesh Mehta, Chief Strategy Officer; Dhruv Gaur, Digital Planning Lead; Leonie Mergulhao, Account Supervisor - Social Media & PR, Medulla (IN), Aurelija Plioplytė, Head of Digital & Social, Not Perfect (LI), Daiana Khaidargaliyeva, Account Manager, Osaka Labs (UK / USA), Stefanie Söhnchen, Vice President Digital, PIABO Communications (DE), Elisabeth Winiartati, Managing Consultant, Head of Global Integrated Communications; Lydia Aprina, Account Manager, Integrated Marketing and Communications; Nita Prabowo, Account Manager, Integrated Marketing and Communications; Okhi, Web Developer, PNTR Group (ID), Kei Obusan, Insights Director; Daffi Ranandi, Insights Manager, Radarr (SG), Gautam Reghunath, Co-founder & CEO, Talented (IN), Donagh Humphreys, Head of Social and Digital Innovation, THINKHOUSE (IRE), Sarah Yim, Strategy Director, Zulu Alpha Kilo (CA).
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
The search marketing landscape is evolving rapidly with new technologies, and professionals, like you, rely on innovative paid search strategies to meet changing demands.
It’s important that you’re ready to implement new strategies in 2024.
Check this out and learn the top trends in paid search advertising that are expected to gain traction, so you can drive higher ROI more efficiently in 2024.
You’ll learn:
- The latest trends in AI and automation, and what this means for an evolving paid search ecosystem.
- New developments in privacy and data regulation.
- Emerging ad formats that are expected to make an impact next year.
Watch Sreekant Lanka from iQuanti and Irina Klein from OneMain Financial as they dive into the future of paid search and explore the trends, strategies, and technologies that will shape the search marketing landscape.
If you’re looking to assess your paid search strategy and design an industry-aligned plan for 2024, then this webinar is for you.
5 Public speaking tips from TED - Visualized summarySpeakerHub
From their humble beginnings in 1984, TED has grown into the world’s most powerful amplifier for speakers and thought-leaders to share their ideas. They have over 2,400 filmed talks (not including the 30,000+ TEDx videos) freely available online, and have hosted over 17,500 events around the world.
With over one billion views in a year, it’s no wonder that so many speakers are looking to TED for ideas on how to share their message more effectively.
The article “5 Public-Speaking Tips TED Gives Its Speakers”, by Carmine Gallo for Forbes, gives speakers five practical ways to connect with their audience, and effectively share their ideas on stage.
Whether you are gearing up to get on a TED stage yourself, or just want to master the skills that so many of their speakers possess, these tips and quotes from Chris Anderson, the TED Talks Curator, will encourage you to make the most impactful impression on your audience.
See the full article and more summaries like this on SpeakerHub here: https://speakerhub.com/blog/5-presentation-tips-ted-gives-its-speakers
See the original article on Forbes here:
http://www.forbes.com/forbes/welcome/?toURL=http://www.forbes.com/sites/carminegallo/2016/05/06/5-public-speaking-tips-ted-gives-its-speakers/&refURL=&referrer=#5c07a8221d9b
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
Everyone is in agreement that ChatGPT (and other generative AI tools) will shape the future of work. Yet there is little consensus on exactly how, when, and to what extent this technology will change our world.
Businesses that extract maximum value from ChatGPT will use it as a collaborative tool for everything from brainstorming to technical maintenance.
For individuals, now is the time to pinpoint the skills the future professional will need to thrive in the AI age.
Check out this presentation to understand what ChatGPT is, how it will shape the future of work, and how you can prepare to take advantage.
A brief introduction to DataScience with explaining of the concepts, algorithms, machine learning, supervised and unsupervised learning, clustering, statistics, data preprocessing, real-world applications etc.
It's part of a Data Science Corner Campaign where I will be discussing the fundamentals of DataScience, AIML, Statistics etc.
Time Management & Productivity - Best PracticesVit Horky
Here's my presentation on by proven best practices how to manage your work time effectively and how to improve your productivity. It includes practical tips and how to use tools such as Slack, Google Apps, Hubspot, Google Calendar, Gmail and others.
The six step guide to practical project managementMindGenius
The six step guide to practical project management
If you think managing projects is too difficult, think again.
We’ve stripped back project management processes to the
basics – to make it quicker and easier, without sacrificing
the vital ingredients for success.
“If you’re looking for some real-world guidance, then The Six Step Guide to Practical Project Management will help.”
Dr Andrew Makar, Tactical Project Management
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
During this webinar, Anand Bagmar demonstrates how AI tools such as ChatGPT can be applied to various stages of the software development life cycle (SDLC) using an eCommerce application case study. Find the on-demand recording and more info at https://applitools.info/b59
Key takeaways:
• Learn how to use ChatGPT to add AI power to your testing and test automation
• Understand the limitations of the technology and where human expertise is crucial
• Gain insight into different AI-based tools
• Adopt AI-based tools to stay relevant and optimize work for developers and testers
* ChatGPT and OpenAI belong to OpenAI, L.L.C.
The Netflix prize: yet another million dollar problem
1. The Problem
Strategies
Some Funny New Science
The Netflix Prize:
yet another million dollar problem
David Bessis
Ecole Normale Sup´rieure, 27/01/2010
e
David Bessis The Netflix Prize: yet another million dollar problem
2. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
David Bessis The Netflix Prize: yet another million dollar problem
3. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
David Bessis The Netflix Prize: yet another million dollar problem
4. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
David Bessis The Netflix Prize: yet another million dollar problem
5. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Solutions must
David Bessis The Netflix Prize: yet another million dollar problem
6. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Solutions must
”be published in a refereed mathematics publication of
worldwide repute”
David Bessis The Netflix Prize: yet another million dollar problem
7. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Solutions must
”be published in a refereed mathematics publication of
worldwide repute”
”have general acceptance in the mathematics community two
years after”
David Bessis The Netflix Prize: yet another million dollar problem
8. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
David Bessis The Netflix Prize: yet another million dollar problem
9. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
David Bessis The Netflix Prize: yet another million dollar problem
10. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
No award yet.
David Bessis The Netflix Prize: yet another million dollar problem
11. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
No award yet.
Netflix Prize:
David Bessis The Netflix Prize: yet another million dollar problem
12. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
No award yet.
Netflix Prize:
Funded in 2006 by the DVD rental company Netflix.
David Bessis The Netflix Prize: yet another million dollar problem
13. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
No award yet.
Netflix Prize:
Funded in 2006 by the DVD rental company Netflix.
A problem in Applied Mathematics.
David Bessis The Netflix Prize: yet another million dollar problem
14. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
No award yet.
Netflix Prize:
Funded in 2006 by the DVD rental company Netflix.
A problem in Applied Mathematics Computer Science.
David Bessis The Netflix Prize: yet another million dollar problem
15. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
No award yet.
Netflix Prize:
Funded in 2006 by the DVD rental company Netflix.
A problem in Applied Mathematics Computer Science
Psychology.
David Bessis The Netflix Prize: yet another million dollar problem
16. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
No award yet.
Netflix Prize:
Funded in 2006 by the DVD rental company Netflix.
A problem in Applied Mathematics Computer Science
Psychology (do we really care?)
David Bessis The Netflix Prize: yet another million dollar problem
17. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
No award yet.
Netflix Prize:
Funded in 2006 by the DVD rental company Netflix.
A problem in Some Funny New Science.
David Bessis The Netflix Prize: yet another million dollar problem
18. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
No award yet.
Netflix Prize:
Funded in 2006 by the DVD rental company Netflix.
A problem in Some Funny New Science.
Clear rules.
David Bessis The Netflix Prize: yet another million dollar problem
19. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
No award yet.
Netflix Prize:
Funded in 2006 by the DVD rental company Netflix.
A problem in Some Funny New Science.
Reasonably clear rules.
David Bessis The Netflix Prize: yet another million dollar problem
20. The Problem
Strategies
Some Funny New Science
7 + 1 Million Dollar Problems
Millenium Prize Problems:
Funded in 2000 by the Clay Mathematical Institute.
Seven classical open problems in Mathematics.
Fuzzy rules.
The Poincar´ conjecture was solved by Perelman in 2003.
e
No award yet.
Netflix Prize:
Funded in 2006 by the DVD rental company Netflix.
A problem in Some Funny New Science.
Reasonably clear rules.
Prize awarded in September 2009.
David Bessis The Netflix Prize: yet another million dollar problem
21. The Problem
Rules
Strategies
Competition
Some Funny New Science
Context
Netflix has an “all-you-can-eat” pricing model.
David Bessis The Netflix Prize: yet another million dollar problem
22. The Problem
Rules
Strategies
Competition
Some Funny New Science
Context
Netflix has an “all-you-can-eat” pricing model.
They need their users to watch a lot of movies.
David Bessis The Netflix Prize: yet another million dollar problem
23. The Problem
Rules
Strategies
Competition
Some Funny New Science
Context
Netflix has an “all-you-can-eat” pricing model.
They need their users to watch a lot of movies.
Beyond a few obvious choices, people don’t know what they
want to watch.
David Bessis The Netflix Prize: yet another million dollar problem
24. The Problem
Rules
Strategies
Competition
Some Funny New Science
Context
Netflix has an “all-you-can-eat” pricing model.
They need their users to watch a lot of movies.
Beyond a few obvious choices, people don’t know what they
want to watch.
Collaborative filtering: recommending products based on prior
evaluations by other users (just like Amazon does).
David Bessis The Netflix Prize: yet another million dollar problem
25. The Problem
Rules
Strategies
Competition
Some Funny New Science
Context
Netflix has an “all-you-can-eat” pricing model.
They need their users to watch a lot of movies.
Beyond a few obvious choices, people don’t know what they
want to watch.
Collaborative filtering: recommending products based on prior
evaluations by other users (just like Amazon does).
The Netflix prize is a collaborative filtering competition:
David Bessis The Netflix Prize: yet another million dollar problem
26. The Problem
Rules
Strategies
Competition
Some Funny New Science
Context
Netflix has an “all-you-can-eat” pricing model.
They need their users to watch a lot of movies.
Beyond a few obvious choices, people don’t know what they
want to watch.
Collaborative filtering: recommending products based on prior
evaluations by other users (just like Amazon does).
The Netflix prize is a collaborative filtering competition:
Based on a huge dataset of actual ratings by Netflix users.
David Bessis The Netflix Prize: yet another million dollar problem
27. The Problem
Rules
Strategies
Competition
Some Funny New Science
Context
Netflix has an “all-you-can-eat” pricing model.
They need their users to watch a lot of movies.
Beyond a few obvious choices, people don’t know what they
want to watch.
Collaborative filtering: recommending products based on prior
evaluations by other users (just like Amazon does).
The Netflix prize is a collaborative filtering competition:
Based on a huge dataset of actual ratings by Netflix users.
Open to almost everyone.
David Bessis The Netflix Prize: yet another million dollar problem
28. The Problem
Rules
Strategies
Competition
Some Funny New Science
Context
Netflix has an “all-you-can-eat” pricing model.
They need their users to watch a lot of movies.
Beyond a few obvious choices, people don’t know what they
want to watch.
Collaborative filtering: recommending products based on prior
evaluations by other users (just like Amazon does).
The Netflix prize is a collaborative filtering competition:
Based on a huge dataset of actual ratings by Netflix users.
Open to almost everyone.
Endowed with a $1.000.000 prize.
David Bessis The Netflix Prize: yet another million dollar problem
29. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Dataset
The user space U consists of 480 189 users
(identified by a meaningless non-sequential integral id).
David Bessis The Netflix Prize: yet another million dollar problem
30. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Dataset
The user space U consists of 480 189 users
(identified by a meaningless non-sequential integral id).
The movie space M consists of 17 770 movies
(identified by integers 1, . . . , 17 770, and the associated list of titles
and release years is provided – this data is meaningful and minable).
David Bessis The Netflix Prize: yet another million dollar problem
31. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Dataset
The user space U consists of 480 189 users
(identified by a meaningless non-sequential integral id).
The movie space M consists of 17 770 movies
(identified by integers 1, . . . , 17 770, and the associated list of titles
and release years is provided – this data is meaningful and minable).
The date space D spans the period Oct. 1998 – Dec. 2005
(extremely meaningful data; no time of day is provided).
David Bessis The Netflix Prize: yet another million dollar problem
32. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Dataset
The user space U consists of 480 189 users
(identified by a meaningless non-sequential integral id).
The movie space M consists of 17 770 movies
(identified by integers 1, . . . , 17 770, and the associated list of titles
and release years is provided – this data is meaningful and minable).
The date space D spans the period Oct. 1998 – Dec. 2005
(extremely meaningful data; no time of day is provided).
The rating space R is {1, 2, 3, 4, 5} (”stars”).
David Bessis The Netflix Prize: yet another million dollar problem
33. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Dataset
The user space U consists of 480 189 users
(identified by a meaningless non-sequential integral id).
The movie space M consists of 17 770 movies
(identified by integers 1, . . . , 17 770, and the associated list of titles
and release years is provided – this data is meaningful and minable).
The date space D spans the period Oct. 1998 – Dec. 2005
(extremely meaningful data; no time of day is provided).
The rating space R is {1, 2, 3, 4, 5} (”stars”).
The training dataset T contains 100 480 507 quadruples
(u, m, d, r ) ∈ U × M × D × R.
David Bessis The Netflix Prize: yet another million dollar problem
34. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Dataset
The user space U consists of 480 189 users
(identified by a meaningless non-sequential integral id).
The movie space M consists of 17 770 movies
(identified by integers 1, . . . , 17 770, and the associated list of titles
and release years is provided – this data is meaningful and minable).
The date space D spans the period Oct. 1998 – Dec. 2005
(extremely meaningful data; no time of day is provided).
The rating space R is {1, 2, 3, 4, 5} (”stars”).
The training dataset T contains 100 480 507 quadruples
(u, m, d, r ) ∈ U × M × D × R.
The qualifying dataset Q contains 2 817 131 triples
(u, m, d) ∈ U × M × D.
David Bessis The Netflix Prize: yet another million dollar problem
35. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Challenge
Open to everyone
David Bessis The Netflix Prize: yet another million dollar problem
36. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Challenge
Open to everyone except Netflix employees and their relatives
David Bessis The Netflix Prize: yet another million dollar problem
37. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Challenge
Open to everyone except Netflix employees and their relatives and
residents of Cuba, Iran, Syria, North Korea, Myanmar and Sudan.
David Bessis The Netflix Prize: yet another million dollar problem
38. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Challenge
Open to everyone except Netflix employees and their relatives and
residents of Cuba, Iran, Syria, North Korea, Myanmar and Sudan.
Participants can join efforts in teams.
David Bessis The Netflix Prize: yet another million dollar problem
39. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Challenge
Open to everyone except Netflix employees and their relatives and
residents of Cuba, Iran, Syria, North Korea, Myanmar and Sudan.
Participants can join efforts in teams.
They can upload their predictions up to once a day.
David Bessis The Netflix Prize: yet another million dollar problem
40. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Challenge
Open to everyone except Netflix employees and their relatives and
residents of Cuba, Iran, Syria, North Korea, Myanmar and Sudan.
Participants can join efforts in teams.
They can upload their predictions up to once a day.
Predictions are maps from the qualifying set Q to the interval
[1, 5].
David Bessis The Netflix Prize: yet another million dollar problem
41. The Problem
Rules
Strategies
Competition
Some Funny New Science
The Challenge
Open to everyone except Netflix employees and their relatives and
residents of Cuba, Iran, Syria, North Korea, Myanmar and Sudan.
Participants can join efforts in teams.
They can upload their predictions up to once a day.
Predictions are maps from the qualifying set Q to the interval
[1, 5].
The metric used to benchmark predictions is the RMSE (”root
of mean square error”)
1
RMSE = |predicted rating for q − actual rating for q|2
|Q|
q∈Q
David Bessis The Netflix Prize: yet another million dollar problem
42. The Problem
Rules
Strategies
Competition
Some Funny New Science
Typical RMSEs
Theoretically, the RMSE cannot be greater than 2.
David Bessis The Netflix Prize: yet another million dollar problem
43. The Problem
Rules
Strategies
Competition
Some Funny New Science
Typical RMSEs
Theoretically, the RMSE cannot be greater than 2.
Users tend to view and rate movies they like, so they typically
give 3, 4 or 5 stars rather than 1 or 2 (the above upper bound
is unrealistically pessimistic).
David Bessis The Netflix Prize: yet another million dollar problem
44. The Problem
Rules
Strategies
Competition
Some Funny New Science
Typical RMSEs
Theoretically, the RMSE cannot be greater than 2.
Users tend to view and rate movies they like, so they typically
give 3, 4 or 5 stars rather than 1 or 2 (the above upper bound
is unrealistically pessimistic).
A basic prediction consists of mapping a triple (u, m, d) to the
average rating obtained by the movie m.
David Bessis The Netflix Prize: yet another million dollar problem
45. The Problem
Rules
Strategies
Competition
Some Funny New Science
Typical RMSEs
Theoretically, the RMSE cannot be greater than 2.
Users tend to view and rate movies they like, so they typically
give 3, 4 or 5 stars rather than 1 or 2 (the above upper bound
is unrealistically pessimistic).
A basic prediction consists of mapping a triple (u, m, d) to the
average rating obtained by the movie m. It achieves 1.0540.
David Bessis The Netflix Prize: yet another million dollar problem
46. The Problem
Rules
Strategies
Competition
Some Funny New Science
Typical RMSEs
Theoretically, the RMSE cannot be greater than 2.
Users tend to view and rate movies they like, so they typically
give 3, 4 or 5 stars rather than 1 or 2 (the above upper bound
is unrealistically pessimistic).
A basic prediction consists of mapping a triple (u, m, d) to the
average rating obtained by the movie m. It achieves 1.0540.
At the beginning of the Challenge, Netflix’s in-house
prediction system Cinematch achieved 0.9514
(roughly a 10% improvement).
David Bessis The Netflix Prize: yet another million dollar problem
47. The Problem
Rules
Strategies
Competition
Some Funny New Science
Typical RMSEs
Theoretically, the RMSE cannot be greater than 2.
Users tend to view and rate movies they like, so they typically
give 3, 4 or 5 stars rather than 1 or 2 (the above upper bound
is unrealistically pessimistic).
A basic prediction consists of mapping a triple (u, m, d) to the
average rating obtained by the movie m. It achieves 1.0540.
At the beginning of the Challenge, Netflix’s in-house
prediction system Cinematch achieved 0.9514
(roughly a 10% improvement).
Netflix set the following target: obtain a further 10%
improvement over Cinematch.
David Bessis The Netflix Prize: yet another million dollar problem
48. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 1: a Cryptographic Trick
Netflix has secretly partitioned the qualifying set
Q = Q1 Q2
into two subsets of (approximately) equal sizes.
David Bessis The Netflix Prize: yet another million dollar problem
49. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 1: a Cryptographic Trick
Netflix has secretly partitioned the qualifying set
Q = Q1 Q2
into two subsets of (approximately) equal sizes.
The RMSE achieved on Q1 is revealed to participants
(there is a public leaderboard).
David Bessis The Netflix Prize: yet another million dollar problem
50. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 1: a Cryptographic Trick
Netflix has secretly partitioned the qualifying set
Q = Q1 Q2
into two subsets of (approximately) equal sizes.
The RMSE achieved on Q1 is revealed to participants
(there is a public leaderboard).
The RMSE achieved on Q2 is used to determine the winner.
David Bessis The Netflix Prize: yet another million dollar problem
51. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 1: a Cryptographic Trick
Netflix has secretly partitioned the qualifying set
Q = Q1 Q2
into two subsets of (approximately) equal sizes.
The RMSE achieved on Q1 is revealed to participants
(there is a public leaderboard).
The RMSE achieved on Q2 is used to determine the winner.
This prevented participants from “learning from the oracle”.
David Bessis The Netflix Prize: yet another million dollar problem
52. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 1: a Cryptographic Trick
Netflix has secretly partitioned the qualifying set
Q = Q1 Q2
into two subsets of (approximately) equal sizes.
The RMSE achieved on Q1 is revealed to participants
(there is a public leaderboard).
The RMSE achieved on Q2 is used to determine the winner.
This prevented participants from “learning from the oracle”.
The goal was to achieve 0.8572.
David Bessis The Netflix Prize: yet another million dollar problem
53. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 2: Crowd Psychology Tricks
The Challenged opened on October 2, 2006.
David Bessis The Netflix Prize: yet another million dollar problem
54. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 2: Crowd Psychology Tricks
The Challenged opened on October 2, 2006.
Annual $50.000 prizes were offered to current leaders
David Bessis The Netflix Prize: yet another million dollar problem
55. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 2: Crowd Psychology Tricks
The Challenged opened on October 2, 2006.
Annual $50.000 prizes were offered to current leaders provided
they made their current methodology public.
David Bessis The Netflix Prize: yet another million dollar problem
56. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 2: Crowd Psychology Tricks
The Challenged opened on October 2, 2006.
Annual $50.000 prizes were offered to current leaders provided
they made their current methodology public.
The Challenge was to last for 30 more days after the goal was
achieved.
David Bessis The Netflix Prize: yet another million dollar problem
57. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 2: Crowd Psychology Tricks
The Challenged opened on October 2, 2006.
Annual $50.000 prizes were offered to current leaders provided
they made their current methodology public.
The Challenge was to last for 30 more days after the goal was
achieved.
The winner would be the team with the best RMSE after this
30 days period
David Bessis The Netflix Prize: yet another million dollar problem
58. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 2: Crowd Psychology Tricks
The Challenged opened on October 2, 2006.
Annual $50.000 prizes were offered to current leaders provided
they made their current methodology public.
The Challenge was to last for 30 more days after the goal was
achieved.
The winner would be the team with the best RMSE after this
30 days period (no backstabbing arXiv-style “I posted first” effect).
David Bessis The Netflix Prize: yet another million dollar problem
59. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 2: Crowd Psychology Tricks
The Challenged opened on October 2, 2006.
Annual $50.000 prizes were offered to current leaders provided
they made their current methodology public.
The Challenge was to last for 30 more days after the goal was
achieved.
The winner would be the team with the best RMSE after this
30 days period (no backstabbing arXiv-style “I posted first” effect).
Every detail was carefully anticipated (even the possibility of a
tie).
David Bessis The Netflix Prize: yet another million dollar problem
60. The Problem
Rules
Strategies
Competition
Some Funny New Science
Very Smart Rules 2: Crowd Psychology Tricks
The Challenged opened on October 2, 2006.
Annual $50.000 prizes were offered to current leaders provided
they made their current methodology public.
The Challenge was to last for 30 more days after the goal was
achieved.
The winner would be the team with the best RMSE after this
30 days period (no backstabbing arXiv-style “I posted first” effect).
Every detail was carefully anticipated (even the possibility of a
tie).
These smart rules, together with the $1.000.000 prize,
attracted thousands of participants.
David Bessis The Netflix Prize: yet another million dollar problem
61. The Problem
Rules
Strategies
Competition
Some Funny New Science
Timeline
October 2006: Cinematch RMSE = 0.9514.
David Bessis The Netflix Prize: yet another million dollar problem
62. The Problem
Rules
Strategies
Competition
Some Funny New Science
Timeline
October 2006: Cinematch RMSE = 0.9514.
October 2007: team KorBell leads with 0.8712 (8.43%
improvement).
David Bessis The Netflix Prize: yet another million dollar problem
63. The Problem
Rules
Strategies
Competition
Some Funny New Science
Timeline
October 2006: Cinematch RMSE = 0.9514.
October 2007: team KorBell leads with 0.8712 (8.43%
improvement).
October 2008: team “BellKor in BigChaos” (two teams
merging efforts) leads with 0.8616 (9.44% improvement).
David Bessis The Netflix Prize: yet another million dollar problem
64. The Problem
Rules
Strategies
Competition
Some Funny New Science
Timeline
October 2006: Cinematch RMSE = 0.9514.
October 2007: team KorBell leads with 0.8712 (8.43%
improvement).
October 2008: team “BellKor in BigChaos” (two teams
merging efforts) leads with 0.8616 (9.44% improvement).
June 26, 2009: the goal is achieved.
David Bessis The Netflix Prize: yet another million dollar problem
65. The Problem
Rules
Strategies
Competition
Some Funny New Science
Timeline
October 2006: Cinematch RMSE = 0.9514.
October 2007: team KorBell leads with 0.8712 (8.43%
improvement).
October 2008: team “BellKor in BigChaos” (two teams
merging efforts) leads with 0.8616 (9.44% improvement).
June 26, 2009: the goal is achieved.
July 26, 2009: Netflix stops gathering solutions.
David Bessis The Netflix Prize: yet another million dollar problem
66. The Problem
Rules
Strategies
Competition
Some Funny New Science
Timeline
October 2006: Cinematch RMSE = 0.9514.
October 2007: team KorBell leads with 0.8712 (8.43%
improvement).
October 2008: team “BellKor in BigChaos” (two teams
merging efforts) leads with 0.8616 (9.44% improvement).
June 26, 2009: the goal is achieved.
July 26, 2009: Netflix stops gathering solutions.
The winner is announced on September 18, 2009.
David Bessis The Netflix Prize: yet another million dollar problem
67. The Problem
Rules
Strategies
Competition
Some Funny New Science
The winning team
Three teams combined their results to win the competition:
BellKor
Bob Bell (AT&T)
Yehuda Koren (Yahoo)
Chris Volinsky (AT&T)
BigChaos
Michael Jahrer (Commendo research and consulting)
Andreas T¨scher (Commendo research and consulting)
o
Pragmatic Theory
Martin Chabbert (Pragmatic Theory)
Martin Piotte (Pragmatic Theory)
David Bessis The Netflix Prize: yet another million dollar problem
68. The Problem
Rules
Strategies
Competition
Some Funny New Science
The winning team
Three teams combined their results to win the competition:
BellKor
Bob Bell (AT&T)
Yehuda Koren (Yahoo)
Chris Volinsky (AT&T)
BigChaos
Michael Jahrer (Commendo research and consulting)
Andreas T¨scher (Commendo research and consulting)
o
Pragmatic Theory
Martin Chabbert (Pragmatic Theory)
Martin Piotte (Pragmatic Theory)
Their winnning submission achieved a RMSE of 0.8567 (10.06%
improvement over Cinematch.)
David Bessis The Netflix Prize: yet another million dollar problem
69. The Problem
Rules
Strategies
Competition
Some Funny New Science
The winning team
Three teams combined their results to win the competition:
BellKor
Bob Bell (AT&T)
Yehuda Koren (Yahoo)
Chris Volinsky (AT&T)
BigChaos
Michael Jahrer (Commendo research and consulting)
Andreas T¨scher (Commendo research and consulting)
o
Pragmatic Theory
Martin Chabbert (Pragmatic Theory)
Martin Piotte (Pragmatic Theory)
Their winnning submission achieved a RMSE of 0.8567 (10.06%
improvement over Cinematch.)
Another team, The Ensemble, achieved the same RMSE...
David Bessis The Netflix Prize: yet another million dollar problem
70. The Problem
Rules
Strategies
Competition
Some Funny New Science
The winning team
Three teams combined their results to win the competition:
BellKor
Bob Bell (AT&T)
Yehuda Koren (Yahoo)
Chris Volinsky (AT&T)
BigChaos
Michael Jahrer (Commendo research and consulting)
Andreas T¨scher (Commendo research and consulting)
o
Pragmatic Theory
Martin Chabbert (Pragmatic Theory)
Martin Piotte (Pragmatic Theory)
Their winnning submission achieved a RMSE of 0.8567 (10.06%
improvement over Cinematch.)
Another team, The Ensemble, achieved the same RMSE...
...and lost because their submission was posted 24 minutes later!
David Bessis The Netflix Prize: yet another million dollar problem
71. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Computer implementation
Memory requirements:
Movies can be encoded on 2 bytes (17770 < 2562 ).
David Bessis The Netflix Prize: yet another million dollar problem
72. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Computer implementation
Memory requirements:
Movies can be encoded on 2 bytes (17770 < 2562 ).
Viewers can be encoded on 3 bytes (480189 < 2563 ).
David Bessis The Netflix Prize: yet another million dollar problem
73. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Computer implementation
Memory requirements:
Movies can be encoded on 2 bytes (17770 < 2562 ).
Viewers can be encoded on 3 bytes (480189 < 2563 ).
Dates can be encoded on 2 bytes.
David Bessis The Netflix Prize: yet another million dollar problem
74. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Computer implementation
Memory requirements:
Movies can be encoded on 2 bytes (17770 < 2562 ).
Viewers can be encoded on 3 bytes (480189 < 2563 ).
Dates can be encoded on 2 bytes.
A triple (m, v , d) can be encoded on 7 bytes.
David Bessis The Netflix Prize: yet another million dollar problem
75. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Computer implementation
Memory requirements:
Movies can be encoded on 2 bytes (17770 < 2562 ).
Viewers can be encoded on 3 bytes (480189 < 2563 ).
Dates can be encoded on 2 bytes.
A triple (m, v , d) can be encoded on 7 bytes.
700 MB suffice to store the dataset.
David Bessis The Netflix Prize: yet another million dollar problem
76. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Computer implementation
Memory requirements:
Movies can be encoded on 2 bytes (17770 < 2562 ).
Viewers can be encoded on 3 bytes (480189 < 2563 ).
Dates can be encoded on 2 bytes.
A triple (m, v , d) can be encoded on 7 bytes.
700 MB suffice to store the dataset.
It is possible (necessary) to work in RAM.
David Bessis The Netflix Prize: yet another million dollar problem
77. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Computer implementation
Memory requirements:
Movies can be encoded on 2 bytes (17770 < 2562 ).
Viewers can be encoded on 3 bytes (480189 < 2563 ).
Dates can be encoded on 2 bytes.
A triple (m, v , d) can be encoded on 7 bytes.
700 MB suffice to store the dataset.
It is possible (necessary) to work in RAM.
Commodity hardware is sufficient.
David Bessis The Netflix Prize: yet another million dollar problem
78. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Computer implementation
Memory requirements:
Movies can be encoded on 2 bytes (17770 < 2562 ).
Viewers can be encoded on 3 bytes (480189 < 2563 ).
Dates can be encoded on 2 bytes.
A triple (m, v , d) can be encoded on 7 bytes.
700 MB suffice to store the dataset.
It is possible (necessary) to work in RAM.
Commodity hardware is sufficient.
(I have some Ruby code to interactively play with the dataset.)
David Bessis The Netflix Prize: yet another million dollar problem
79. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Remarks
About 200 ratings per users.
David Bessis The Netflix Prize: yet another million dollar problem
80. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Remarks
About 200 ratings per users.
This is likely caused by Cinematch’s data gathering procedure:
David Bessis The Netflix Prize: yet another million dollar problem
81. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Remarks
About 200 ratings per users.
This is likely caused by Cinematch’s data gathering procedure:
users sometime rate tens of movies on a single day.
David Bessis The Netflix Prize: yet another million dollar problem
82. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Remarks
About 200 ratings per users.
This is likely caused by Cinematch’s data gathering procedure:
users sometime rate tens of movies on a single day.
This causes an insanely huge bias within the dataset (movies
are perceived differently when rated individually or within a
rating spree), not fully exploited by the winners.
David Bessis The Netflix Prize: yet another million dollar problem
83. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Remarks
About 200 ratings per users.
This is likely caused by Cinematch’s data gathering procedure:
users sometime rate tens of movies on a single day.
This causes an insanely huge bias within the dataset (movies
are perceived differently when rated individually or within a
rating spree), not fully exploited by the winners.
Netflix, do you read me?
David Bessis The Netflix Prize: yet another million dollar problem
84. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Remarks
About 200 ratings per users.
This is likely caused by Cinematch’s data gathering procedure:
users sometime rate tens of movies on a single day.
This causes an insanely huge bias within the dataset (movies
are perceived differently when rated individually or within a
rating spree), not fully exploited by the winners.
Netflix, do you read me?
Some movies were rated by hundreds of thousands viewers,
some by just a few (long-tail distribution).
David Bessis The Netflix Prize: yet another million dollar problem
85. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Remarks
About 200 ratings per users.
This is likely caused by Cinematch’s data gathering procedure:
users sometime rate tens of movies on a single day.
This causes an insanely huge bias within the dataset (movies
are perceived differently when rated individually or within a
rating spree), not fully exploited by the winners.
Netflix, do you read me?
Some movies were rated by hundreds of thousands viewers,
some by just a few (long-tail distribution).
Similarly, a user rated all the movies, and many just a few.
David Bessis The Netflix Prize: yet another million dollar problem
86. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Remarks
About 200 ratings per users.
This is likely caused by Cinematch’s data gathering procedure:
users sometime rate tens of movies on a single day.
This causes an insanely huge bias within the dataset (movies
are perceived differently when rated individually or within a
rating spree), not fully exploited by the winners.
Netflix, do you read me?
Some movies were rated by hundreds of thousands viewers,
some by just a few (long-tail distribution).
Similarly, a user rated all the movies, and many just a few.
Let F be the set of all final 9 ratings for all individual users.
David Bessis The Netflix Prize: yet another million dollar problem
87. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Remarks
About 200 ratings per users.
This is likely caused by Cinematch’s data gathering procedure:
users sometime rate tens of movies on a single day.
This causes an insanely huge bias within the dataset (movies
are perceived differently when rated individually or within a
rating spree), not fully exploited by the winners.
Netflix, do you read me?
Some movies were rated by hundreds of thousands viewers,
some by just a few (long-tail distribution).
Similarly, a user rated all the movies, and many just a few.
Let F be the set of all final 9 ratings for all individual users.
Then F = Q P, with P ⊂ T publicly tagged by Netflix.
David Bessis The Netflix Prize: yet another million dollar problem
88. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Remarks
About 200 ratings per users.
This is likely caused by Cinematch’s data gathering procedure:
users sometime rate tens of movies on a single day.
This causes an insanely huge bias within the dataset (movies
are perceived differently when rated individually or within a
rating spree), not fully exploited by the winners.
Netflix, do you read me?
Some movies were rated by hundreds of thousands viewers,
some by just a few (long-tail distribution).
Similarly, a user rated all the movies, and many just a few.
Let F be the set of all final 9 ratings for all individual users.
Then F = Q P, with P ⊂ T publicly tagged by Netflix.
Q is a random draw of 2/3 of F .
David Bessis The Netflix Prize: yet another million dollar problem
89. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Remarks
About 200 ratings per users.
This is likely caused by Cinematch’s data gathering procedure:
users sometime rate tens of movies on a single day.
This causes an insanely huge bias within the dataset (movies
are perceived differently when rated individually or within a
rating spree), not fully exploited by the winners.
Netflix, do you read me?
Some movies were rated by hundreds of thousands viewers,
some by just a few (long-tail distribution).
Similarly, a user rated all the movies, and many just a few.
Let F be the set of all final 9 ratings for all individual users.
Then F = Q P, with P ⊂ T publicly tagged by Netflix.
Q is a random draw of 2/3 of F .
Q resembles P but is very dissimilar from T .
David Bessis The Netflix Prize: yet another million dollar problem
90. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Algorithms
The machine learning toolbox consists of many methods:
Clustering methods.
David Bessis The Netflix Prize: yet another million dollar problem
91. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Algorithms
The machine learning toolbox consists of many methods:
Clustering methods.
Regressions.
David Bessis The Netflix Prize: yet another million dollar problem
92. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Algorithms
The machine learning toolbox consists of many methods:
Clustering methods.
Regressions.
Latent parameters methods (SVD).
David Bessis The Netflix Prize: yet another million dollar problem
93. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Algorithms
The machine learning toolbox consists of many methods:
Clustering methods.
Regressions.
Latent parameters methods (SVD).
Neural networks.
David Bessis The Netflix Prize: yet another million dollar problem
94. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Algorithms
The machine learning toolbox consists of many methods:
Clustering methods.
Regressions.
Latent parameters methods (SVD).
Neural networks.
SVM
David Bessis The Netflix Prize: yet another million dollar problem
95. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Algorithms
The machine learning toolbox consists of many methods:
Clustering methods.
Regressions.
Latent parameters methods (SVD).
Neural networks.
SVM
...
David Bessis The Netflix Prize: yet another million dollar problem
96. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Beginner’s mistakes
Underestimate the volume effect.
David Bessis The Netflix Prize: yet another million dollar problem
97. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Beginner’s mistakes
Underestimate the volume effect.
Think conceptually and discretely rather than globally and
continuously.
David Bessis The Netflix Prize: yet another million dollar problem
98. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Beginner’s mistakes
Underestimate the volume effect.
Think conceptually and discretely rather than globally and
continuously.
Put users and movies into categories (clustering introduces
unwanted discontinuities).
David Bessis The Netflix Prize: yet another million dollar problem
99. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Beginner’s mistakes
Underestimate the volume effect.
Think conceptually and discretely rather than globally and
continuously.
Put users and movies into categories (clustering introduces
unwanted discontinuities).
Learn from the probe.
David Bessis The Netflix Prize: yet another million dollar problem
100. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Beginner’s mistakes
Underestimate the volume effect.
Think conceptually and discretely rather than globally and
continuously.
Put users and movies into categories (clustering introduces
unwanted discontinuities).
Learn from the probe.
Dealing with 100 000 000 data isn’t a logic puzzle.
David Bessis The Netflix Prize: yet another million dollar problem
101. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Beginner’s mistakes
Underestimate the volume effect.
Think conceptually and discretely rather than globally and
continuously.
Put users and movies into categories (clustering introduces
unwanted discontinuities).
Learn from the probe.
Dealing with 100 000 000 data isn’t a logic puzzle.
It resembles Thermodynamics.
David Bessis The Netflix Prize: yet another million dollar problem
102. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Linear regression
Suppose all viewers in X have rated all movies in Y : the rating
matrix is
(rx,y )(x,y )∈X ×Y .
David Bessis The Netflix Prize: yet another million dollar problem
103. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Linear regression
Suppose all viewers in X have rated all movies in Y : the rating
matrix is
(rx,y )(x,y )∈X ×Y .
Suppose you want to model the ratings given to a particular movie
y0 based on the ratings given to the movies in Y = Y − {y0 }.
David Bessis The Netflix Prize: yet another million dollar problem
104. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Linear regression
Suppose all viewers in X have rated all movies in Y : the rating
matrix is
(rx,y )(x,y )∈X ×Y .
Suppose you want to model the ratings given to a particular movie
y0 based on the ratings given to the movies in Y = Y − {y0 }.
A linear regression is a natural way to do that.
David Bessis The Netflix Prize: yet another million dollar problem
105. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Linear regression
Suppose all viewers in X have rated all movies in Y : the rating
matrix is
(rx,y )(x,y )∈X ×Y .
Suppose you want to model the ratings given to a particular movie
y0 based on the ratings given to the movies in Y = Y − {y0 }.
A linear regression is a natural way to do that.
Write (rx,y ) = (Cy )y ∈Y where the Cy are the column vectors.
David Bessis The Netflix Prize: yet another million dollar problem
106. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Linear regression
Suppose all viewers in X have rated all movies in Y : the rating
matrix is
(rx,y )(x,y )∈X ×Y .
Suppose you want to model the ratings given to a particular movie
y0 based on the ratings given to the movies in Y = Y − {y0 }.
A linear regression is a natural way to do that.
Write (rx,y ) = (Cy )y ∈Y where the Cy are the column vectors.
Performing the linear regression consists of approximating Cy0 by
ˆ
its orthogonal projection Cy0 on the hyperplane generated by the
(Cy )y ∈Y .
David Bessis The Netflix Prize: yet another million dollar problem
107. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Linear regression
Suppose all viewers in X have rated all movies in Y : the rating
matrix is
(rx,y )(x,y )∈X ×Y .
Suppose you want to model the ratings given to a particular movie
y0 based on the ratings given to the movies in Y = Y − {y0 }.
A linear regression is a natural way to do that.
Write (rx,y ) = (Cy )y ∈Y where the Cy are the column vectors.
Performing the linear regression consists of approximating Cy0 by
ˆ
its orthogonal projection Cy0 on the hyperplane generated by the
(Cy )y ∈Y .
Clearly, there exists a unique solution.
David Bessis The Netflix Prize: yet another million dollar problem
108. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Linear regression
Suppose all viewers in X have rated all movies in Y : the rating
matrix is
(rx,y )(x,y )∈X ×Y .
Suppose you want to model the ratings given to a particular movie
y0 based on the ratings given to the movies in Y = Y − {y0 }.
A linear regression is a natural way to do that.
Write (rx,y ) = (Cy )y ∈Y where the Cy are the column vectors.
Performing the linear regression consists of approximating Cy0 by
ˆ
its orthogonal projection Cy0 on the hyperplane generated by the
(Cy )y ∈Y .
Clearly, there exists a unique solution.
It optimizes RMSE.
David Bessis The Netflix Prize: yet another million dollar problem
109. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Linear regression
Suppose all viewers in X have rated all movies in Y : the rating
matrix is
(rx,y )(x,y )∈X ×Y .
Suppose you want to model the ratings given to a particular movie
y0 based on the ratings given to the movies in Y = Y − {y0 }.
A linear regression is a natural way to do that.
Write (rx,y ) = (Cy )y ∈Y where the Cy are the column vectors.
Performing the linear regression consists of approximating Cy0 by
ˆ
its orthogonal projection Cy0 on the hyperplane generated by the
(Cy )y ∈Y .
Clearly, there exists a unique solution.
It optimizes RMSE.
Write the formula!
David Bessis The Netflix Prize: yet another million dollar problem
110. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 1: missing data
Not all viewers have seen all movies.
David Bessis The Netflix Prize: yet another million dollar problem
111. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 1: missing data
Not all viewers have seen all movies.
Worse, there are virtually no complete rectangular blocks
within the dataset.
David Bessis The Netflix Prize: yet another million dollar problem
112. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 1: missing data
Not all viewers have seen all movies.
Worse, there are virtually no complete rectangular blocks
within the dataset.
Regression by viewers or by movies?
David Bessis The Netflix Prize: yet another million dollar problem
113. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 1: missing data
Not all viewers have seen all movies.
Worse, there are virtually no complete rectangular blocks
within the dataset.
Regression by viewers or by movies?
It is better to do regression by movies.
David Bessis The Netflix Prize: yet another million dollar problem
114. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 1: missing data
Not all viewers have seen all movies.
Worse, there are virtually no complete rectangular blocks
within the dataset.
Regression by viewers or by movies?
It is better to do regression by movies.
Normalize ratings:
David Bessis The Netflix Prize: yet another million dollar problem
115. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 1: missing data
Not all viewers have seen all movies.
Worse, there are virtually no complete rectangular blocks
within the dataset.
Regression by viewers or by movies?
It is better to do regression by movies.
Normalize ratings:
replace the rating rv ,m by the meaningful signal, i.e., the difference
r v ,m between rv ,m and the average rating for m.
David Bessis The Netflix Prize: yet another million dollar problem
116. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 1: missing data
Not all viewers have seen all movies.
Worse, there are virtually no complete rectangular blocks
within the dataset.
Regression by viewers or by movies?
It is better to do regression by movies.
Normalize ratings:
replace the rating rv ,m by the meaningful signal, i.e., the difference
r v ,m between rv ,m and the average rating for m.
Then it becomes natural to set r v ,m to 0 when v hasn’t rated m.
David Bessis The Netflix Prize: yet another million dollar problem
117. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 1: missing data
Not all viewers have seen all movies.
Worse, there are virtually no complete rectangular blocks
within the dataset.
Regression by viewers or by movies?
It is better to do regression by movies.
Normalize ratings:
replace the rating rv ,m by the meaningful signal, i.e., the difference
r v ,m between rv ,m and the average rating for m.
Then it becomes natural to set r v ,m to 0 when v hasn’t rated m.
Actually, whether or not v has rated m is a meaningful information!
David Bessis The Netflix Prize: yet another million dollar problem
118. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 1: missing data
Not all viewers have seen all movies.
Worse, there are virtually no complete rectangular blocks
within the dataset.
Regression by viewers or by movies?
It is better to do regression by movies.
Normalize ratings:
replace the rating rv ,m by the meaningful signal, i.e., the difference
r v ,m between rv ,m and the average rating for m.
Then it becomes natural to set r v ,m to 0 when v hasn’t rated m.
Actually, whether or not v has rated m is a meaningful information!
Add normalized bit columns to account for that.
David Bessis The Netflix Prize: yet another million dollar problem
119. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 2: the curse of dimensionality
We all know that Lagrange interpolators are not to be used on
noisy data. Rather, one should look at best-fitting polynomials of a
given low degree.
David Bessis The Netflix Prize: yet another million dollar problem
120. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 2: the curse of dimensionality
We all know that Lagrange interpolators are not to be used on
noisy data. Rather, one should look at best-fitting polynomials of a
given low degree.
Similarly, the curse of dimensionality asserts that:
With high-dimensionality datasets, one will always find stupid
predictors, making perfect predictions on the dataset, and
failing to generalize.
David Bessis The Netflix Prize: yet another million dollar problem
121. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 2: the curse of dimensionality
We all know that Lagrange interpolators are not to be used on
noisy data. Rather, one should look at best-fitting polynomials of a
given low degree.
Similarly, the curse of dimensionality asserts that:
With high-dimensionality datasets, one will always find stupid
predictors, making perfect predictions on the dataset, and
failing to generalize.
By looking at my audience today, what should I be able to infer?
David Bessis The Netflix Prize: yet another million dollar problem
122. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 2: the curse of dimensionality
We all know that Lagrange interpolators are not to be used on
noisy data. Rather, one should look at best-fitting polynomials of a
given low degree.
Similarly, the curse of dimensionality asserts that:
With high-dimensionality datasets, one will always find stupid
predictors, making perfect predictions on the dataset, and
failing to generalize.
By looking at my audience today, what should I be able to infer?
That having long hair is a reasonably good gender predictor?
David Bessis The Netflix Prize: yet another million dollar problem
123. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 2: the curse of dimensionality
We all know that Lagrange interpolators are not to be used on
noisy data. Rather, one should look at best-fitting polynomials of a
given low degree.
Similarly, the curse of dimensionality asserts that:
With high-dimensionality datasets, one will always find stupid
predictors, making perfect predictions on the dataset, and
failing to generalize.
By looking at my audience today, what should I be able to infer?
That having long hair is a reasonably good gender predictor?
That wearing a grey sweater is a reasonably good gender
predictor?
David Bessis The Netflix Prize: yet another million dollar problem
124. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Real life problems 2: the curse of dimensionality
We all know that Lagrange interpolators are not to be used on
noisy data. Rather, one should look at best-fitting polynomials of a
given low degree.
Similarly, the curse of dimensionality asserts that:
With high-dimensionality datasets, one will always find stupid
predictors, making perfect predictions on the dataset, and
failing to generalize.
By looking at my audience today, what should I be able to infer?
That having long hair is a reasonably good gender predictor?
That wearing a grey sweater is a reasonably good gender
predictor?
Dilemma: overlearning vs underlearning.
David Bessis The Netflix Prize: yet another million dollar problem
125. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Ridge regression (aka Tikhonov regularization)
Linear regression: given vectors x, y1 , . . . , yn ∈ Rm , find λ1 , . . . , λn
that minimize
||x − λi yi ||2 .
David Bessis The Netflix Prize: yet another million dollar problem
126. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Ridge regression (aka Tikhonov regularization)
Linear regression: given vectors x, y1 , . . . , yn ∈ Rm , find λ1 , . . . , λn
that minimize
||x − λi yi ||2 .
When n is large (with respect to m), the linear system is
overdetermined. Overfitting occurs.
David Bessis The Netflix Prize: yet another million dollar problem
127. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Ridge regression (aka Tikhonov regularization)
Linear regression: given vectors x, y1 , . . . , yn ∈ Rm , find λ1 , . . . , λn
that minimize
||x − λi yi ||2 .
When n is large (with respect to m), the linear system is
overdetermined. Overfitting occurs.
A telltale sign of overfitting is the presence of λi ’s with huge norms
compensating each other.
David Bessis The Netflix Prize: yet another million dollar problem
128. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Ridge regression (aka Tikhonov regularization)
Linear regression: given vectors x, y1 , . . . , yn ∈ Rm , find λ1 , . . . , λn
that minimize
||x − λi yi ||2 .
When n is large (with respect to m), the linear system is
overdetermined. Overfitting occurs.
A telltale sign of overfitting is the presence of λi ’s with huge norms
compensating each other.
Ridge regression (Tikhonov regularization): find λ1 , . . . , λn that
minimize
||x − λi yi ||2 + ε |λi |2
where ε is a well-adjusted (small) penalty term.
David Bessis The Netflix Prize: yet another million dollar problem
129. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Assigning attributes to movies
Assume that movies differ by their amount of certain qualities:
David Bessis The Netflix Prize: yet another million dollar problem
130. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Assigning attributes to movies
Assume that movies differ by their amount of certain qualities:
Violence.
David Bessis The Netflix Prize: yet another million dollar problem
131. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Assigning attributes to movies
Assume that movies differ by their amount of certain qualities:
Violence.
Sex.
David Bessis The Netflix Prize: yet another million dollar problem
132. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Assigning attributes to movies
Assume that movies differ by their amount of certain qualities:
Violence.
Sex.
Anything else?
David Bessis The Netflix Prize: yet another million dollar problem
133. Practical issues
The Problem
Regressions
Strategies
Latent factors
Some Funny New Science
Tuning and Blending
Assigning attributes to movies
Assume that movies differ by their amount of certain qualities:
Violence.
Sex.
Maybe not.
David Bessis The Netflix Prize: yet another million dollar problem