In this conference session we share how we are using Tableau “out of the box” and also describe how it fits into our overall data environment. In addition, we’ll describe how we expect to use the Data Catalog and Object Model, our explorations of large-scale data stores, and challenges we are working on including governance and data lineage. Video of session can be viewed here: https://youtu.be/Nr24tw3dmZQ
Presented at Strata San Jose 2018. Shares how Netflix enables business teams to perform cohort analysis on very large, high dimensional data by using Big Data and web application technologies such as Spark, Druid, Node, React, and D3
User Behavior Analytics at Netflix, presented at Predictive Analytics World in 2017. Slides include the data processing architecture, the analytic component that identifies abnormal patterns, a rules engine and the overall modular framework that fits all these pieces together to provide an end-to-end solution.
Presentation at the Netflix Expo session at RecSys 2020 virtual conference on 2020-09-24. It provides an overview of recommendation and personalization at Netflix and then highlights some of the things we’ve been working on as well as some important open research questions in the field of recommendations.
Talk with Yves Raimond at the GPU Tech Conference on Marth 28, 2018 in San Jose, CA.
Abstract:
In this talk, we will survey how Deep Learning methods can be applied to personalization and recommendations. We will cover why standard Deep Learning approaches don't perform better than typical collaborative filtering techniques. Then we will survey we will go over recently published research at the intersection of Deep Learning and recommender systems, looking at how they integrate new types of data, explore new models, or change the recommendation problem statement. We will also highlight some of the ways that neural networks are used at Netflix and how we can use GPUs to train recommender systems. Finally, we will highlight promising new directions in this space.
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020Zachary Schendel
In the Netflix user interface (UI), when a row or UI element is named “Because you Watched...”, “More Like This”, or “Because you added to your list”, the overarching goal is to recommend a movie or TV show that a member might like based on the fact that they took a meaningful action on a source item. We have employed similar recommendations in many UI elements: on the homepage as a row of recommendations, after you click into a title, or as a piece of information about why a member should watch a title.
From an algorithmic perspective, there are many ways to define a “successful” similar recommendation. We sought to broaden that definition of success. To this end, the Consumer Insights team recently completed a suite of research projects to explore the intricacies of member perceptions of similar recommendations. The Netflix Consumer Insights team employs qualitative (e.g., in-depth interviews) and quantitative (e.g., surveys) research methods, interfacing directly with Netflix members to uncover pain points that can inspire new product innovation. The research concluded that, while the typical member believes movies are broadly similar when they share a common genre or theme, similarity is more complex, nuanced, and personal than we might have imagined. The vernacular we use in the UI implies that there should be at least some kind of relationship between the source item and the recommendations that follow. Many of our similar recommendations felt “out of place”, mostly because the relationship between the source item and the recommendation was unclear or absent. When similar recommendations tell a completely misleading, incorrect, or confusing story, member trust can be broken.
We will structure the presentation around three new insights that our research found to have an influence on the perception of similarity in the context of Netflix as well as the research methods used to uncover those insights. First, the reason a member loves a given movie will vary. For example, do you want to watch other baseball movies like Field of Dreams, or would you prefer other romances like Field of Dreams? Second, members are more or less flexible about how similar a recommendation actually needs to be depending on the properties of and their interactions with the canvas containing the recommendation. For example, a Because You Watched row on the homepage implies vaguer similarity while a More Like This gallery behind a click into the source item implies stricter similarity. Finally, even when we held the UI element constant, we found that similar recommendations are only valuable in some contexts. After finishing a movie, a member might prefer a similar recommendation one day and a change of pace the next. Research methods discussed will include Inverse Multi-Dimensional Scaling [1], survey experimentation, and ways to apply qualitative research to improve algorithmic recommendations.
Talk from QCon SF on 2018-11-05
For many years, the main goal of the Netflix personalized recommendation system has been to get the right titles in front each of our members at the right time. With a catalog spanning thousands of titles and a diverse member base spanning over a hundred million accounts, recommending the titles that are just right for each member is crucial. But the job of recommendation does not end there. Why should you care about any particular title we recommend? What can we say about a new and unfamiliar title that will pique your interest? How do we convince you that a title is worth watching? Answering these questions is critical in helping our members discover great content, especially for unfamiliar titles. One way to do this is to consider the artwork or imagery we use to visually portray each title. If the artwork representing a title captures something compelling to you, then it acts as a gateway into that title and gives you some visual “evidence” for why the title might be good for you. Selecting good artwork is important because it may be the first time a member becomes aware of a title (and sometimes the only time), so it must speak to them in a meaningful way. In this talk, we will present an approach for personalizing the artwork we show for each title on the Netflix homepage. We will look at how to frame this as a machine learning problem using contextual multi-armed bandits in a recommendation system setting. We will also describe the algorithmic and system challenges involved in getting this type of approach for artwork personalization to succeed at Netflix scale. Finally, we will discuss some of the future opportunities that we see to expand and improve upon this approach.
Netflix Data Engineering @ Uber Engineering MeetupBlake Irvine
People, Platform, Projects: these slides overview how Netflix works with Big Data. I share how our teams are organized, the roles we typically have on the teams, an overview of our Big Data Platform, and two example projects.
Presented at Strata San Jose 2018. Shares how Netflix enables business teams to perform cohort analysis on very large, high dimensional data by using Big Data and web application technologies such as Spark, Druid, Node, React, and D3
User Behavior Analytics at Netflix, presented at Predictive Analytics World in 2017. Slides include the data processing architecture, the analytic component that identifies abnormal patterns, a rules engine and the overall modular framework that fits all these pieces together to provide an end-to-end solution.
Presentation at the Netflix Expo session at RecSys 2020 virtual conference on 2020-09-24. It provides an overview of recommendation and personalization at Netflix and then highlights some of the things we’ve been working on as well as some important open research questions in the field of recommendations.
Talk with Yves Raimond at the GPU Tech Conference on Marth 28, 2018 in San Jose, CA.
Abstract:
In this talk, we will survey how Deep Learning methods can be applied to personalization and recommendations. We will cover why standard Deep Learning approaches don't perform better than typical collaborative filtering techniques. Then we will survey we will go over recently published research at the intersection of Deep Learning and recommender systems, looking at how they integrate new types of data, explore new models, or change the recommendation problem statement. We will also highlight some of the ways that neural networks are used at Netflix and how we can use GPUs to train recommender systems. Finally, we will highlight promising new directions in this space.
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020Zachary Schendel
In the Netflix user interface (UI), when a row or UI element is named “Because you Watched...”, “More Like This”, or “Because you added to your list”, the overarching goal is to recommend a movie or TV show that a member might like based on the fact that they took a meaningful action on a source item. We have employed similar recommendations in many UI elements: on the homepage as a row of recommendations, after you click into a title, or as a piece of information about why a member should watch a title.
From an algorithmic perspective, there are many ways to define a “successful” similar recommendation. We sought to broaden that definition of success. To this end, the Consumer Insights team recently completed a suite of research projects to explore the intricacies of member perceptions of similar recommendations. The Netflix Consumer Insights team employs qualitative (e.g., in-depth interviews) and quantitative (e.g., surveys) research methods, interfacing directly with Netflix members to uncover pain points that can inspire new product innovation. The research concluded that, while the typical member believes movies are broadly similar when they share a common genre or theme, similarity is more complex, nuanced, and personal than we might have imagined. The vernacular we use in the UI implies that there should be at least some kind of relationship between the source item and the recommendations that follow. Many of our similar recommendations felt “out of place”, mostly because the relationship between the source item and the recommendation was unclear or absent. When similar recommendations tell a completely misleading, incorrect, or confusing story, member trust can be broken.
We will structure the presentation around three new insights that our research found to have an influence on the perception of similarity in the context of Netflix as well as the research methods used to uncover those insights. First, the reason a member loves a given movie will vary. For example, do you want to watch other baseball movies like Field of Dreams, or would you prefer other romances like Field of Dreams? Second, members are more or less flexible about how similar a recommendation actually needs to be depending on the properties of and their interactions with the canvas containing the recommendation. For example, a Because You Watched row on the homepage implies vaguer similarity while a More Like This gallery behind a click into the source item implies stricter similarity. Finally, even when we held the UI element constant, we found that similar recommendations are only valuable in some contexts. After finishing a movie, a member might prefer a similar recommendation one day and a change of pace the next. Research methods discussed will include Inverse Multi-Dimensional Scaling [1], survey experimentation, and ways to apply qualitative research to improve algorithmic recommendations.
Talk from QCon SF on 2018-11-05
For many years, the main goal of the Netflix personalized recommendation system has been to get the right titles in front each of our members at the right time. With a catalog spanning thousands of titles and a diverse member base spanning over a hundred million accounts, recommending the titles that are just right for each member is crucial. But the job of recommendation does not end there. Why should you care about any particular title we recommend? What can we say about a new and unfamiliar title that will pique your interest? How do we convince you that a title is worth watching? Answering these questions is critical in helping our members discover great content, especially for unfamiliar titles. One way to do this is to consider the artwork or imagery we use to visually portray each title. If the artwork representing a title captures something compelling to you, then it acts as a gateway into that title and gives you some visual “evidence” for why the title might be good for you. Selecting good artwork is important because it may be the first time a member becomes aware of a title (and sometimes the only time), so it must speak to them in a meaningful way. In this talk, we will present an approach for personalizing the artwork we show for each title on the Netflix homepage. We will look at how to frame this as a machine learning problem using contextual multi-armed bandits in a recommendation system setting. We will also describe the algorithmic and system challenges involved in getting this type of approach for artwork personalization to succeed at Netflix scale. Finally, we will discuss some of the future opportunities that we see to expand and improve upon this approach.
Netflix Data Engineering @ Uber Engineering MeetupBlake Irvine
People, Platform, Projects: these slides overview how Netflix works with Big Data. I share how our teams are organized, the roles we typically have on the teams, an overview of our Big Data Platform, and two example projects.
At Netflix we take context of the member seriously.
In this keynote talk we will see how modeling contextual factors such as time or device can help members to find the right content at the right moment
At the end, the goal is to maximize member satisfaction and retention
These slides will go through which contextual factors matters for the video service and why we choose to use them or not.
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
In this talk, we will provide an overview of Deep Learning methods applied to personalization and search at Netflix. We will set the stage by describing the unique challenges faced at Netflix in the areas of recommendations and information retrieval. Then we will delve into how we leverage a blend of traditional algorithms and emergent deep learning methods and new types of embeddings, especially hyperbolic space embeddings, to address these challenges.
Slides from Michelle Ufford's talk, Data-Driven @ Netflix. Talk given at PASS Summit 2016 in October 2016.
Netflix is the quintessential data-driven company. It’s 83 million members stream more than 125 million hours in over 190 countries every day and generate more than 700 billion events in the process. In this session, we’ll share how data is used to make informed decisions across the entire business — from content acquisition to content delivery, and everything in between. We’ll look at how Netflix successfully employs a scalable cloud-based data platform to support a constant deluge of data and a small army of data analysts, engineers, and scientists. We’ll discuss the advanced analytical capabilities that are enabled through modern data technologies. Lastly, we’ll explore some of the architectural & operational principals that enable Netflix to so effectively make use of its data.
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
These are the slides from my talk presented at AI Next Con conference in Seattle in Jan 2019. Here I talk in a bit more detail about the intuition behind collaborative filtering and go a bit deeper into the details of non linear deep learned models.
Netflix - Enabling a Culture of AnalyticsBlake Irvine
These are slides from a conference where I presented how we are enabling a culture of analytics at Netflix. I highlight aspects of our culture, our Data Science team organization, our BI tool evolution, and how we are making data accessible.
The Netflix experience is driven by a number of Machine Learning algorithms: personalized ranking, page generation, search, similarity, ratings, etc. On the 6th of January, we simultaneously launched Netflix in 130 new countries around the world, which brings the total to over 190 countries. Preparing for such a rapid expansion while ensuring each algorithm was ready to work seamlessly created new challenges for our recommendation and search teams. In this post, we highlight the four most interesting challenges we’ve encountered in making our algorithms operate globally and, most importantly, how this improved our ability to connect members worldwide with stories they'll love.
Recommendation systems today are widely used across many applications such as in multimedia content platforms, social networks, and ecommerce, to provide suggestions to users that are most likely to fulfill their needs, thereby improving the user experience. Academic research, to date, largely focuses on the performance of recommendation models in terms of ranking quality or accuracy measures, which often don’t directly translate into improvements in the real-world. In this talk, we present some of the most interesting challenges that we face in the personalization efforts at Netflix. The goal of this talk is to sunshine challenging research problems in industrial recommendation systems and start a conversation about exciting areas of future research.
Artwork Personalization at Netflix Fernando Amat RecSys2018 Fernando Amat
For many years, the main goal of the Netflix personalized recommendation system has been to get the right titles in front of our members at the right time. But the job of recommendation does not end there. The homepage should be able to convey to the member enough evidence of why a title may be good for her, especially for shows that the member has never heard of. One way to address this challenge is to personalize the way we portray the titles on our service. An important aspect of how to portray titles is through the artwork or imagery we display to visually represent each title. The artwork may highlight an actor that you recognize, capture an exciting moment like a car chase, or contain a dramatic scene that conveys the essence of a movie or show. It is important to select good artwork because it may be the first time a member becomes aware of a title (and sometimes the only time), so it must speak to them in a meaningful way. In this talk, we will present an approach for personalizing the artwork we use on the Netflix homepage. The system selects an image for each member and video to give better visual evidence for why the title might be appealing to that particular member.
(Presented at the Deep Learning Re-Work SF Summit on 01/25/2018)
In this talk, we go through the traditional recommendation systems set-up, and show that deep learning approaches in that set-up don't bring a lot of extra value. We then focus on different ways to leverage these techniques, most of which relying on breaking away from that traditional set-up; through providing additional data to your recommendation algorithm, modeling different facets of user/item interactions, and most importantly re-framing the recommendation problem itself. In particular we show a few results obtained by casting the problem as a contextual sequence prediction task, and using it to model time (a very important dimension in most recommendation systems).
At Netflix, we try to provide the best personalized video recommendations to our members. To do this, we need to adapt our recommendations for each contextual situation, which depends on information such as time or device. In this talk, I will describe how state of the art Contextual Recommendations are used at Netflix. A first example of contextual adaptation is the model that powers the Continue Watching row. It uses a feature-based approach with a carefully constructed training set to learn how to adapt to the context of the member. Next, I will dive into more modern approaches such as Tensor Factorization and LSTMs and share some results from deployments of these methods. I will highlight lessons learned and some common pitfalls of using these powerful methods in industrial scale systems. Finally, I will touch upon system reliability, choice of optimization metrics, hidden costs, risks and benefits of using highly adaptive systems.
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
Talk at RecSys 2017 in Como, Italy on 2017-08-29.
Abstract:
Time plays a key role in recommendation. Handling it properly is especially critical when using recommender systems in real-world applications, which may not be as clear when doing research with historical data. In this talk, we will discuss some of the important challenges of handling time in recommendation algorithms at Netflix. We will focus on challenges related to how our users, items, and systems all change over time. We will then discuss some strategies for tackling these challenges, which revolves around proper treatment of causality in our systems.
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
Slides from our talk at the RecSys 2016 conference in Boston, MA 2016-09-18 on our perspective for what are important areas for future work in recommender systems.
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
I had a fun time giving tutorial on the topic of deep learning in recommender systems at Latin America School on Recommender Systems (LARS) in Fortaleza, Brazil.
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit
Netflix is the world’s largest streaming service, with 80 million members in over 250 countries. Netflix uses machine learning to inform nearly every aspect of the product, from the recommendations you get, to the boxart you see, to the decisions made about which TV shows and movies are created.
Given this scale, we utilized Apache Spark to be the engine of our recommendation pipeline. Apache Spark enables Netflix to use a single, unified framework/API – for ETL, feature generation, model training, and validation. With pipeline framework in Spark ML, each step within the Netflix recommendation pipeline (e.g. label generation, feature encoding, model training, model evaluation) is encapsulated as Transformers, Estimators and Evaluators – enabling modularity, composability and testability. Thus, Netflix engineers can build our own feature engineering logics as Transformers, learning algorithms as Estimators, and customized metrics as Evaluators, and with these building blocks, we can more easily experiment with new pipelines and rapidly deploy them to production.
In this talk, we will discuss how Apache Spark is used as a distributed framework we build our own algorithms on top of to generate personalized recommendations for each of our 80+ million subscribers, specific techniques we use at Netflix to scale, and the various pitfalls we’ve found along the way.
Personalization at Netflix - Making Stories Travel Sudeep Das, Ph.D.
I give a high level overview of how personalization at Netflix helps our members find titles that spark joy, as well as help stories travel across the world.
Netflix was a trailblazing innovator in machine learning as applied to personalization and recommendation systems but there are many other applications of machine learning at Netflix, especially as we further evolve into a global entertainment company. This talk will give an overview of how machine learning is leveraged before content launches on Netflix and how machine learning can support the creative process and serve as a tool for decision makers in our content and marketing organization. The process of creating content is a high-touch, creative endeavor so we need to be similarly creative in the machine learning innovations we develop. From neural nets that predict audience size for content that doesn't exist yet, to NLP and deep learning techniques that mine scripts to highlight properties we need legal clearance for ... we are building unprecedented innovations. The talk will also broadly cover the challenges we face in this space, including data scarcity and making ML interpretable for non-technical stakeholders.
At Netflix we take context of the member seriously.
In this keynote talk we will see how modeling contextual factors such as time or device can help members to find the right content at the right moment
At the end, the goal is to maximize member satisfaction and retention
These slides will go through which contextual factors matters for the video service and why we choose to use them or not.
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
In this talk, we will provide an overview of Deep Learning methods applied to personalization and search at Netflix. We will set the stage by describing the unique challenges faced at Netflix in the areas of recommendations and information retrieval. Then we will delve into how we leverage a blend of traditional algorithms and emergent deep learning methods and new types of embeddings, especially hyperbolic space embeddings, to address these challenges.
Slides from Michelle Ufford's talk, Data-Driven @ Netflix. Talk given at PASS Summit 2016 in October 2016.
Netflix is the quintessential data-driven company. It’s 83 million members stream more than 125 million hours in over 190 countries every day and generate more than 700 billion events in the process. In this session, we’ll share how data is used to make informed decisions across the entire business — from content acquisition to content delivery, and everything in between. We’ll look at how Netflix successfully employs a scalable cloud-based data platform to support a constant deluge of data and a small army of data analysts, engineers, and scientists. We’ll discuss the advanced analytical capabilities that are enabled through modern data technologies. Lastly, we’ll explore some of the architectural & operational principals that enable Netflix to so effectively make use of its data.
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
These are the slides from my talk presented at AI Next Con conference in Seattle in Jan 2019. Here I talk in a bit more detail about the intuition behind collaborative filtering and go a bit deeper into the details of non linear deep learned models.
Netflix - Enabling a Culture of AnalyticsBlake Irvine
These are slides from a conference where I presented how we are enabling a culture of analytics at Netflix. I highlight aspects of our culture, our Data Science team organization, our BI tool evolution, and how we are making data accessible.
The Netflix experience is driven by a number of Machine Learning algorithms: personalized ranking, page generation, search, similarity, ratings, etc. On the 6th of January, we simultaneously launched Netflix in 130 new countries around the world, which brings the total to over 190 countries. Preparing for such a rapid expansion while ensuring each algorithm was ready to work seamlessly created new challenges for our recommendation and search teams. In this post, we highlight the four most interesting challenges we’ve encountered in making our algorithms operate globally and, most importantly, how this improved our ability to connect members worldwide with stories they'll love.
Recommendation systems today are widely used across many applications such as in multimedia content platforms, social networks, and ecommerce, to provide suggestions to users that are most likely to fulfill their needs, thereby improving the user experience. Academic research, to date, largely focuses on the performance of recommendation models in terms of ranking quality or accuracy measures, which often don’t directly translate into improvements in the real-world. In this talk, we present some of the most interesting challenges that we face in the personalization efforts at Netflix. The goal of this talk is to sunshine challenging research problems in industrial recommendation systems and start a conversation about exciting areas of future research.
Artwork Personalization at Netflix Fernando Amat RecSys2018 Fernando Amat
For many years, the main goal of the Netflix personalized recommendation system has been to get the right titles in front of our members at the right time. But the job of recommendation does not end there. The homepage should be able to convey to the member enough evidence of why a title may be good for her, especially for shows that the member has never heard of. One way to address this challenge is to personalize the way we portray the titles on our service. An important aspect of how to portray titles is through the artwork or imagery we display to visually represent each title. The artwork may highlight an actor that you recognize, capture an exciting moment like a car chase, or contain a dramatic scene that conveys the essence of a movie or show. It is important to select good artwork because it may be the first time a member becomes aware of a title (and sometimes the only time), so it must speak to them in a meaningful way. In this talk, we will present an approach for personalizing the artwork we use on the Netflix homepage. The system selects an image for each member and video to give better visual evidence for why the title might be appealing to that particular member.
(Presented at the Deep Learning Re-Work SF Summit on 01/25/2018)
In this talk, we go through the traditional recommendation systems set-up, and show that deep learning approaches in that set-up don't bring a lot of extra value. We then focus on different ways to leverage these techniques, most of which relying on breaking away from that traditional set-up; through providing additional data to your recommendation algorithm, modeling different facets of user/item interactions, and most importantly re-framing the recommendation problem itself. In particular we show a few results obtained by casting the problem as a contextual sequence prediction task, and using it to model time (a very important dimension in most recommendation systems).
At Netflix, we try to provide the best personalized video recommendations to our members. To do this, we need to adapt our recommendations for each contextual situation, which depends on information such as time or device. In this talk, I will describe how state of the art Contextual Recommendations are used at Netflix. A first example of contextual adaptation is the model that powers the Continue Watching row. It uses a feature-based approach with a carefully constructed training set to learn how to adapt to the context of the member. Next, I will dive into more modern approaches such as Tensor Factorization and LSTMs and share some results from deployments of these methods. I will highlight lessons learned and some common pitfalls of using these powerful methods in industrial scale systems. Finally, I will touch upon system reliability, choice of optimization metrics, hidden costs, risks and benefits of using highly adaptive systems.
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
Talk at RecSys 2017 in Como, Italy on 2017-08-29.
Abstract:
Time plays a key role in recommendation. Handling it properly is especially critical when using recommender systems in real-world applications, which may not be as clear when doing research with historical data. In this talk, we will discuss some of the important challenges of handling time in recommendation algorithms at Netflix. We will focus on challenges related to how our users, items, and systems all change over time. We will then discuss some strategies for tackling these challenges, which revolves around proper treatment of causality in our systems.
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
Slides from our talk at the RecSys 2016 conference in Boston, MA 2016-09-18 on our perspective for what are important areas for future work in recommender systems.
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
I had a fun time giving tutorial on the topic of deep learning in recommender systems at Latin America School on Recommender Systems (LARS) in Fortaleza, Brazil.
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit
Netflix is the world’s largest streaming service, with 80 million members in over 250 countries. Netflix uses machine learning to inform nearly every aspect of the product, from the recommendations you get, to the boxart you see, to the decisions made about which TV shows and movies are created.
Given this scale, we utilized Apache Spark to be the engine of our recommendation pipeline. Apache Spark enables Netflix to use a single, unified framework/API – for ETL, feature generation, model training, and validation. With pipeline framework in Spark ML, each step within the Netflix recommendation pipeline (e.g. label generation, feature encoding, model training, model evaluation) is encapsulated as Transformers, Estimators and Evaluators – enabling modularity, composability and testability. Thus, Netflix engineers can build our own feature engineering logics as Transformers, learning algorithms as Estimators, and customized metrics as Evaluators, and with these building blocks, we can more easily experiment with new pipelines and rapidly deploy them to production.
In this talk, we will discuss how Apache Spark is used as a distributed framework we build our own algorithms on top of to generate personalized recommendations for each of our 80+ million subscribers, specific techniques we use at Netflix to scale, and the various pitfalls we’ve found along the way.
Personalization at Netflix - Making Stories Travel Sudeep Das, Ph.D.
I give a high level overview of how personalization at Netflix helps our members find titles that spark joy, as well as help stories travel across the world.
Netflix was a trailblazing innovator in machine learning as applied to personalization and recommendation systems but there are many other applications of machine learning at Netflix, especially as we further evolve into a global entertainment company. This talk will give an overview of how machine learning is leveraged before content launches on Netflix and how machine learning can support the creative process and serve as a tool for decision makers in our content and marketing organization. The process of creating content is a high-touch, creative endeavor so we need to be similarly creative in the machine learning innovations we develop. From neural nets that predict audience size for content that doesn't exist yet, to NLP and deep learning techniques that mine scripts to highlight properties we need legal clearance for ... we are building unprecedented innovations. The talk will also broadly cover the challenges we face in this space, including data scarcity and making ML interpretable for non-technical stakeholders.
Analytics is Taking over the World (Again) - UKOUG Tech'17Rittman Analytics
In this presentation we'll look at some of the new industries and new technologies that are only possible today with analytics, how employee empowerment and improving your fitness are spin-offs of the same technology used to track boxes around a warehouse and spot fraudulent bank transactions, and how Oracle are embedding these new analytics capabilities in their cloud-based HR.
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
Join us live in Portland or over the wire for networking and two fantastic presentations! Data viz freelancer Desireé Abbott will demonstrate how adding interactivity to your dashboards will delight and spark curiosity in your users. Then, Charlotte Taft & Laurie Rugemer will reprise their TC24 presentation on the keys to building a successful analytics team.
Northern New England TUG May 2024 - Abbott, Taft, Rugemerpatrickdtherriault
Join us live in Portland or over the wire for networking and two fantastic presentations! Data viz freelancer Desireé Abbott will demonstrate how adding interactivity to your dashboards will delight and spark curiosity in your users. Then, Charlotte Taft & Laurie Rugemer will reprise their TC24 presentation on the keys to building a successful analytics team.
Delivered at Kristu Jayanti College, Feb 1, 2018
During IEEE International Conference on Current Trends in Advanced Computing.
Github - https://github.com/raghu-icecraft/tech-talks/tree/master/Tableau_Feb%2018
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...Databricks
In this talk, we will present how we used Spark, Databricks, Airflow and MLflow to process big data, and build a pipeline of both ML(XGBoost) and statistical models that maximizes our revenues in one of our core products, called the “Offer Wall”. The “Offer wall” is a mobile phone product that is integrated with existing apps, suggesting users to perform tasks in exchange for in-app currency. The problem gets even more interesting when considering the fact that some of the tasks users do take 15 minutes and some may take up to 2 to weeks, forcing us to make revenue determining decisions in an uncertain space all of the time. The solution we developed utilizes Databricks and Spark’s strengths and diversity in machine learning, big data, MLflow and Airflow integrations, allowing us to deliver a production-grade solution with short development time between experiments.
SAP Process Mining in Action: Hear from Two CustomersCelonis
Hear about insights gained and other benefits of leveraging SAP Process Mining by Celonis at two of the largest global enterprises in their respective industries: SAP SE and Schlumberger.
Mark Saul, Head of Process Management at SAP SE has been spearheading the planning, introduction and successful implementation of SAP Process Mining at SAP. He will outline the benefits and use cases that are relevant for Europe’s largest software company by using SAP Process Mining with SAP S/4 HANA, SAP Data Hub and the positive outcomes for the company.
Jim Brady, Vice President Architecture & Governance from Schlumberger will highlight the company’s SAP GoLive of one of the largest launches recent history. In particular, using SAP Process Mining during the vital hypercare period in that global SAP launch. The focus during that critical time is on adaption monitoring, conformance monitoring, de-bottlenecking, and in part design validation to ensure the SAP launch proves to be a big success.
Presenters:
Alex Marx, Global Partner Director, SAP
James P. Brady, Vice President IT Architecture & Governance, Schlumberger
Mark Saul, Head of Process Management, SAP
Case Study: Lessons from Newell Rubbermaid's SAP HANA Proof of ConceptSAPinsider Events
View this session from Reporting & Analytics 2014. Coming to Las Vegas in November! www.reporting2015.com
In this session, Newell Rubbermaid guides you through the key elements that comprised its SAP HANA business case and proof of concept, including an emphasis on process improvement. Learn firsthand how Newell Rubbermaid:
· Identified which business processes were most likely to realize significant improvement as a result of utilizing SAP HANA
· Established a “current state” baseline and demonstrated a “projected state” that could be realized through the use of SAP HANA
· Determined which SAP BI tools to use based on specific reporting scenarios and end user requirements
Delivered at PSG College of Technology, Mar 24, 2018
Github - https://github.com/raghu-icecraft/tech-talks/tree/master/Tableau/Mar_18
Basics of BI, Data Visualization. Tableau Features and integration with R.
Discussed about Tableau Public and Tableau Desktop.
Additions Compared to ICCTAC 2018 session :-
Some more emphasis added related to Data Science.
Added slides related to Bi and Data science Gartner Magic Quadrant of year 2018.
A slide dedicated to foremost Principles of Data Visualization; a note Edward Tufte and Gestalt laws.
Audience are MSc Data Science students along with other Teaching Staff.
Workshop happened in PSG College of Technology, Coimbatore (Department of MCA).
Get ready to boost the efficiency and effectiveness of your day with the Cartegraph Fall 2018 release! This free webinar will walk you through our brand new enhancements, including setting the system into Maintenance Mode, improving the accuracy of your scenarios, and three new Analytics Dashboard gadgets.
Save your webinar spot now and learn how to:
- Simplify system maintenance by seeing who's signed in, notifying users, and locking the system
- Fine-tune your Scenario Builder projections with budget categories and combined triggers
- Track progress toward a goal using the new key performance indicator (KPI) gadget
How to Use Data Effectively by Abra Sr. Business AnalystProduct School
Key Takeaways from this presentation include:
- How data is used to run day to day operations
- How data is used to influence product decisions and marketing strategies
- Which skills are necessary to become self-serving in data tasks regardless of core responsibilities
Building an immersive Data Function in Large Scale Organizations.
Data is hard, analytics is hard. Many challenges in both fields have been mastered, but many more lie ahead. One of them is how to establish the combination of both data and analytics as a company function in a large organization. In this talk, I shared insights from the ongoing journey to build a data function at Mercedes-Benz Cars Finance and to embed it into the company’s innermost workings.
Analytic Excellence - Saying Goodbye to Old ConstraintsInside Analysis
The Briefing Room with Dr. Robin Bloor and Actian
Live Webcast August 6, 2013
http://www.insideanalysis.com
With all the innovations in compute power these days, one of the hardest hurdles to overcome is the tendency to think in old ways. By and large, the processing constraints of yesterday no longer apply. The new constraints revolve around the strategic management of data, and the effective use of business analytics. How can your organization take the helm in this new era of analysis?
Register for this episode of The Briefing Room to find out! Veteran Analyst Wayne Eckerson of The BI Leadership Forum, will explain how a handful of key innovations has significantly changed the game for data processing and analytics. He'll be briefed by John Santaferraro of Actian, who will tout his company's unique position in "scale-up and scale-out" for analyzing data.
#askSAP Analytics Innovations Community Call: Delivering Big Data Inisghts wi...SAP Analytics
http://bit.ly/askSAP_BigData_Insight - Moderated by SAP Mentor Tammy Powlas, you’ll hear from SAP experts Ty Miller, Angela Harvey, and Tammy Powlas on our BI product strategy for both Enterprise BI with SAP BusinessObjects BI 4.2 and Trusted Data Discovery with SAP Lumira. We’ll also have guest speaker Justin Sears from Hortonworks, who will share how companies can derive new insights thanks to the powerful combination of Hortonworks Data Platform and SAP Lumira.
Other topics of discussion include:
- Our latest roadmap on dashboarding innovations, including the 1.6 release of SAP Design Studio.
- A deep-dive on how new data-wrangling capabilities in SAP Lumira bring the power of Big Data to Trusted Data Discovery, and a real-world demo of this new functionality.
- See how customers are leveraging SAP Lumira and Hortonworks for big data analytics
Learn more at SAPBI.com
Similar to Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix (20)
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
13. ● Data and Analytics are embraced across the company
○ Engineering, UX, Customer Service, Finance, & more
● A/B Testing of almost everything...
○ Product, Signup Methods, Payments, Messaging, & more
● Algorithms for...
○ Recommendations, Content, Marketing, & more
Data is Ubiquitous
BLAKE IRVINE | TABLEAU CONFERENCE 2018
14. Employees
BLAKE IRVINE | TABLEAU CONFERENCE 2018
5000 employees
300 in data teams
200+ in dedicated analytic teams
40. ● Vertical Teams
Organization
Content Marketing Growth Tech
Data Engineering
Science & Analytics
Business Teams
Analytic Teams
Engineering Teams
#content-analytics
#marketing-analytics
#growth-analytics
#tech-analytics
BLAKE IRVINE | TABLEAU CONFERENCE 2018
41. ● Growing user base
● We’ve started up:
○ A Tableau User Group
○ Education tracks
● Early days... much more to do here!
○ Office Hours
○ Tableau Days
○ Data Doctor & more
Community
BLAKE IRVINE | TABLEAU CONFERENCE 2018
47. ● The vast majority of our data sources are Extracts
○ Very few live connections
● Why?
○ BIG DATA
○ Some direct connections to Presto or MPP
● Extracts provide an aggregation and caching layer
We Love Data Extracts!
BLAKE IRVINE | TABLEAU CONFERENCE 2018
49. 1 Use Big Data Portal to develop query
2 Commit query to ETL repository & deploy
3 Configure ETL workflow so data dependencies are met
4 Use ETL job to publish TDE to server
5 Connect to TDE, Develop Viz, Publish to server, Share
“Best Practice” Pattern
BLAKE IRVINE | TABLEAU CONFERENCE 2018
50. 1 Use Big Data Portal to develop query
2 Paste the query into Tableau
3 Develop Viz
4 Publish, and Schedule data refresh on Tableau Server
“Self-Serve” Pattern
BLAKE IRVINE | TABLEAU CONFERENCE 2018
51. ● “Best Practice” Pattern is:
○ More robust
○ But complex
● “Self-Serve” Pattern is:
○ Easy and convenient
○ Less scalable
○ Harder to manage
Dilemma...
BLAKE IRVINE | TABLEAU CONFERENCE 2018
57. We have REALLY big data
1 Trillion
New Data Events Daily
150 Petabyte
Warehouse
300 Terabytes
Written Daily
5 Petabytes
Read Daily
BLAKE IRVINE | TABLEAU CONFERENCE 2018
58. ● Data volume
● Level of Detail
Constantly Balancing
● Speed of access
● Data prep
BLAKE IRVINE | TABLEAU CONFERENCE 2018
59. Development Choices
Choice 1 Choice 2 Choice 3
Data Engine MPP Cloud TDE
Data Size < 1B rows < 10B rows < 100M rows
Performance
Up to many
minutes
Many
minutes
Up to many
seconds
BLAKE IRVINE | TABLEAU CONFERENCE 2018
60. ● For REALLY big data use cases
● For very fast interactivity
● For custom UI/UX/dataviz
● Custom Analytic Tools
○ Web app built with Javascript
○ Data stored in Druid
Choice 4...
BLAKE IRVINE | TABLEAU CONFERENCE 2018
61. ● Druid
○ An open source data system for analytic applications
○ Distributed, horizontally scalable architecture
○ VERY, VERY fast
○ Queries are in JSON format to REST endpoint
Druid white paper: http://static.druid.io/docs/druid.pdf
BLAKE IRVINE | TABLEAU CONFERENCE 2018
62. ● Can we connect Tableau to Druid?
○ All the performance benefits of Druid...
○ Tableau or web apps use same data store…
● We are exploring this...
○ There is now a Druid SQL layer based on Apache Calcite
○ Have done some testing, finding limitations
Tableau ?
BLAKE IRVINE | TABLEAU CONFERENCE 2018
63. ● TDE -> Hyper with 2018.2 upgrade
○ Happening now(ish)
○ Expectations: faster for small and medium data (<100M)
● Snowflake
○ Fast for “large” data stores (1B+)
● Data scale is always a challenge!
In the meantime...
BLAKE IRVINE | TABLEAU CONFERENCE 2018
65. ● Where did this data come from?
● Can I trust this data?
Challenge 2: Data Lineage
● Tableau PRO: very easy to pull in data, analyze, and publish
● Tableau CON: very easy to pull in data, analyze, and publish
BLAKE IRVINE | TABLEAU CONFERENCE 2018
68. ● ...but not about Tableau
We have Data Lineage...
BLAKE IRVINE | TABLEAU CONFERENCE 2018
69. ● Can the upcoming Metadata APIs and Object Model help?
● Metadata APIs:
○ Inventory of workbooks, data sources, and metrics
○ Identify similar existing data and workbooks?
● Automate building of similar insights, and integrate to our
existing data lineage system
Metadata APIs
BLAKE IRVINE | TABLEAU CONFERENCE 2018
76. ● Improved layout & pagination
● Export to different formats
● Distribution management: what, who, and when
What we’d like
BLAKE IRVINE | TABLEAU CONFERENCE 2018