Sharone Dayan, Machine Learning Engineer and Daria Stefic, Data Scientist, both from Contentsquare, delve into evaluation strategies for dealing with partially labelled or unlabelled data.
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...Kai Wähner
"Big Data" is currently a big hype. Large amounts of historical data are stored in Hadoop or other platforms. Business Intelligence tools and statistical computing are used to draw new knowledge and to find patterns from this data, for example for promotions, cross-selling or fraud detection. The key challenge is how these findings can be integrated from historical data into new transactions in real time to make customers happy, increase revenue or prevent fraud.
"Fast Data" via stream processing is the solution to embed patterns - which were obtained from analyzing historical data - into future transactions in real-time. This session uses several real world success stories to explain the concepts behind stream processing and its relation to Hadoop and other big data platforms. The session discusses how patterns and statistical models of R, Spark MLlib and other technologies can be integrated into real-time processing using open source frameworks (such as Apache Storm, Spark or Flink) or products (such as IBM InfoSphere Streams or TIBCO StreamBase). A live demo shows the complete development lifecycle combining analytics, machine learning and stream processing.
Intro to Data Analytics with Oscar's Director of ProductProduct School
The Director of Product at Oscar, Vasudev Vadlamudi, went over key types of quantitative analysis that B2C product managers use on the job including: funnels, cohorts, and a/b testing. For each one he looked into when and why they are used, and used examples.
Time-to-Event Models, presented by DataSong and Revolution AnalyticsRevolution Analytics
Companies are doing a better and better job of collecting data that explains why consumers behave the way they do. These diverse data sets cause us to rethink some of the workhorse algorithms for data analysis. Specifically, the traditional binary response model leaves much room for improvement in how it embraces time. Cross–sectional models allow much rich data to fall through the cracks. We’ll discuss real-world scenarios and how to better use data with time to event modeling.
Learn how financial institutions are betting on the Big Data and Artificial Intelligence through APIs that help banks to define products, segmenting customers and detect possible fraud. Throughout this ebook we offer a review of the APIs bank data aggregation. More information in http://bbva.info/2t1NEv7
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...Kai Wähner
"Big Data" is currently a big hype. Large amounts of historical data are stored in Hadoop or other platforms. Business Intelligence tools and statistical computing are used to draw new knowledge and to find patterns from this data, for example for promotions, cross-selling or fraud detection. The key challenge is how these findings can be integrated from historical data into new transactions in real time to make customers happy, increase revenue or prevent fraud.
"Fast Data" via stream processing is the solution to embed patterns - which were obtained from analyzing historical data - into future transactions in real-time. This session uses several real world success stories to explain the concepts behind stream processing and its relation to Hadoop and other big data platforms. The session discusses how patterns and statistical models of R, Spark MLlib and other technologies can be integrated into real-time processing using open source frameworks (such as Apache Storm, Spark or Flink) or products (such as IBM InfoSphere Streams or TIBCO StreamBase). A live demo shows the complete development lifecycle combining analytics, machine learning and stream processing.
Intro to Data Analytics with Oscar's Director of ProductProduct School
The Director of Product at Oscar, Vasudev Vadlamudi, went over key types of quantitative analysis that B2C product managers use on the job including: funnels, cohorts, and a/b testing. For each one he looked into when and why they are used, and used examples.
Time-to-Event Models, presented by DataSong and Revolution AnalyticsRevolution Analytics
Companies are doing a better and better job of collecting data that explains why consumers behave the way they do. These diverse data sets cause us to rethink some of the workhorse algorithms for data analysis. Specifically, the traditional binary response model leaves much room for improvement in how it embraces time. Cross–sectional models allow much rich data to fall through the cracks. We’ll discuss real-world scenarios and how to better use data with time to event modeling.
Learn how financial institutions are betting on the Big Data and Artificial Intelligence through APIs that help banks to define products, segmenting customers and detect possible fraud. Throughout this ebook we offer a review of the APIs bank data aggregation. More information in http://bbva.info/2t1NEv7
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone Projectgyansingh01
Consolidated Digital Marketing Plan for Grainger.com, as part of Digital Marketing Certification – Capstone Project, University of Illinois – Urbana-Champaign & Coursera.
Contact me if you have any questions.
Mapping the value of your customers journeyEthology
The game has changed with customer engagement! Go beyond just "journey-mapping" and understand the micro-moments of your customer's entire process of interaction with your brand. There are right and wrong moments of content implementation that could be costing you profit!
Get more insights on the following:
- Customers are in the driver seat with engagement points more often than the brand
- User Experience (UX) can illuminate the customer journey and find more meaningful ways to connect with them
- True strategy and planning are a process that must be ongoing versus just a "start-point"
SPEAKER:
Anthony Quiroz, UX Design Strategist at Ethology
The business value of consumer analytics and big data is not just about what you can discover or infer about the consumer, but how you can use this insight promptly and effectively across multiple touchpoints (including e-Commerce systems and CRM) to create a powerful and truly personalized consumer experience.
For most organizations, mobilizing this kind of intelligence raises organizational challenges as well as technical ones.
This presentation reveals how some leading companies are starting to address these challenges, and describes the vital role of enterprise architecture in supporting such initiatives.
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...Clark Boyd
The volume and velocity of available data brings with it a huge amount of new opportunities for marketers. However, without the analytics know-how to avail of this data, these are opportunities that are often missed. Moreover, the variety of different data sources and analytics platforms only add to this complexity.
This presentation covers:
- How to define and communicate an analytics framework
- How to set up analytics dashboards for a range of stakeholders
- The people and skills you need for an optimal analytics team
- Practical tips for improving your campaign measurement
Operationalizing Customer Analytics with Azure and Power BICCG
Many organizations fail to realize the value of data science teams because they are not effectively translating the analytic findings produced by these teams into quantifiable business results. This webinar demonstrates how to visualize analytic models like churn and turn their output into action. Senior Business Solution Architect, Mike Druta, presents methods for operationalizing analytic models produced by data science teams into a repeatable process that can be automated and applied continuously using Azure.
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...StormMC
Op 21 november vond in Hamburg de “Die einzige Google Analytics Konferenz in Deutschland” plaats. Het congres stond in het teken van analytics, customer journeys, dashboarding, attributie en conversieratio optimalisatie en wordt georganiseerd door o.a. onze partner Trakken.
Namens StormMC is Remi van Beekum (Chief Innovation Officer) van de partij. Hij is gevraagd om zijn visie op mobile te geven en zal ingaan op de groei van mobile, de lastigheid van het meten van cross-device gedrag en de wijze waarop advertentiebudgetten voor mobile kunnen worden ingezet.
In deze presentatie:
Waarom is Mobile zo belangrijk?
Wat is de volgende stap in mobile? (vanuit een marketingperspectief)
Hoe meet je succes, ROI of ROAS voor mobiele campagnes middels Google Analytics?
Waarom is het zo lastig om mobiel te converteren?
Hoe converteren mobiele bezoekers nog meer?
Hoe zijn cross-device conversies inzichtelijk te maken?
Een paar laatste tips
English summary:
Why is ‘mobile’ so important?
What is the next step in mobile?
How to measure succes/ ROAS for mobile campaigns using Google Analytics
Why mobile conversions are so hard
How to measure cross-device
Some final advice
As electricity is difficult to store, it is crucial to strictly maintain the balance between production and consumption. The integration of intermittent renewable energies into the production mix has made the management of the balance more complex. However, access to near real-time data and communication with consumers via smart meters suggest demand response. Specifically, sending signals would encourage users to adjust their consumption according to the production of electricity. The algorithms used to select these signals must learn consumer reactions and optimize them while balancing exploration and exploitation. Various sequential or reinforcement learning approaches are being considered.
Online violence amplifies IRL discriminations, and the lack of diversity grows in a vicious circle. Understanding cyber-violence, its forms and mechanisms, can help us fight back. To process massive volumes of data, AI finally comes into play for good.
More Related Content
Similar to Evaluation strategies for dealing with partially labelled or unlabelled data
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Market Awareness acts as a non-biased third party firm to apply a known methodology, to a lost prospect or existing project, to uncover critical intelligence on your company\'s true strengths and weaknesses.
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone Projectgyansingh01
Consolidated Digital Marketing Plan for Grainger.com, as part of Digital Marketing Certification – Capstone Project, University of Illinois – Urbana-Champaign & Coursera.
Contact me if you have any questions.
Mapping the value of your customers journeyEthology
The game has changed with customer engagement! Go beyond just "journey-mapping" and understand the micro-moments of your customer's entire process of interaction with your brand. There are right and wrong moments of content implementation that could be costing you profit!
Get more insights on the following:
- Customers are in the driver seat with engagement points more often than the brand
- User Experience (UX) can illuminate the customer journey and find more meaningful ways to connect with them
- True strategy and planning are a process that must be ongoing versus just a "start-point"
SPEAKER:
Anthony Quiroz, UX Design Strategist at Ethology
The business value of consumer analytics and big data is not just about what you can discover or infer about the consumer, but how you can use this insight promptly and effectively across multiple touchpoints (including e-Commerce systems and CRM) to create a powerful and truly personalized consumer experience.
For most organizations, mobilizing this kind of intelligence raises organizational challenges as well as technical ones.
This presentation reveals how some leading companies are starting to address these challenges, and describes the vital role of enterprise architecture in supporting such initiatives.
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...Clark Boyd
The volume and velocity of available data brings with it a huge amount of new opportunities for marketers. However, without the analytics know-how to avail of this data, these are opportunities that are often missed. Moreover, the variety of different data sources and analytics platforms only add to this complexity.
This presentation covers:
- How to define and communicate an analytics framework
- How to set up analytics dashboards for a range of stakeholders
- The people and skills you need for an optimal analytics team
- Practical tips for improving your campaign measurement
Operationalizing Customer Analytics with Azure and Power BICCG
Many organizations fail to realize the value of data science teams because they are not effectively translating the analytic findings produced by these teams into quantifiable business results. This webinar demonstrates how to visualize analytic models like churn and turn their output into action. Senior Business Solution Architect, Mike Druta, presents methods for operationalizing analytic models produced by data science teams into a repeatable process that can be automated and applied continuously using Azure.
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...StormMC
Op 21 november vond in Hamburg de “Die einzige Google Analytics Konferenz in Deutschland” plaats. Het congres stond in het teken van analytics, customer journeys, dashboarding, attributie en conversieratio optimalisatie en wordt georganiseerd door o.a. onze partner Trakken.
Namens StormMC is Remi van Beekum (Chief Innovation Officer) van de partij. Hij is gevraagd om zijn visie op mobile te geven en zal ingaan op de groei van mobile, de lastigheid van het meten van cross-device gedrag en de wijze waarop advertentiebudgetten voor mobile kunnen worden ingezet.
In deze presentatie:
Waarom is Mobile zo belangrijk?
Wat is de volgende stap in mobile? (vanuit een marketingperspectief)
Hoe meet je succes, ROI of ROAS voor mobiele campagnes middels Google Analytics?
Waarom is het zo lastig om mobiel te converteren?
Hoe converteren mobiele bezoekers nog meer?
Hoe zijn cross-device conversies inzichtelijk te maken?
Een paar laatste tips
English summary:
Why is ‘mobile’ so important?
What is the next step in mobile?
How to measure succes/ ROAS for mobile campaigns using Google Analytics
Why mobile conversions are so hard
How to measure cross-device
Some final advice
As electricity is difficult to store, it is crucial to strictly maintain the balance between production and consumption. The integration of intermittent renewable energies into the production mix has made the management of the balance more complex. However, access to near real-time data and communication with consumers via smart meters suggest demand response. Specifically, sending signals would encourage users to adjust their consumption according to the production of electricity. The algorithms used to select these signals must learn consumer reactions and optimize them while balancing exploration and exploitation. Various sequential or reinforcement learning approaches are being considered.
Online violence amplifies IRL discriminations, and the lack of diversity grows in a vicious circle. Understanding cyber-violence, its forms and mechanisms, can help us fight back. To process massive volumes of data, AI finally comes into play for good.
In the energy sector, the use of temporal data stands as a pivotal topic. At GRDF, we have developed several methods to effectively handle such data. This presentation will specifically delve into our approaches for anomaly detection and data imputation within time series, leveraging transformers and adversarial training techniques.
Natasha shares her experience to delve into the complexities, challenges, and strategies associated with effectively leading tech teams dispersed across borders.
Nour and Maria present the work they did at Tweag, Modus Create innovation arm, where the GenAI team developed an evaluation framework for Retrieval-Augmented Generation (RAG) systems. RAG systems provide an easy and low-cost way to extend the knowledge of Large Language Models (LLMs) but measuring their performance is not an easy task.
The presentation will review existing evaluation frameworks, ranging from those based on the traditional ML approach of using groundtruth datasets, including Tweag's, to those that use LLMs to compute evaluation metrics.
It will also delve into the practical implementation of Tweag's chatbot over two distinct documents datasets and provide insights on chunking, embedding and how open source and commercial LLMs compare.
Laure talked about a very hot topic in the community at the moment with the ChatGPT phenomenon: how to supervise a PhD thesis in NLP in the age of Large Language Models (LLMs)?
Abstract: Who hasn't heard of the "Pilot Syndrome"? 85% of Data Science Pilots remain pilots and do not make it to the production stage. Let's build a production-ready and end-user-friendly Data Science application. 100% python and 100% open source.
Phase 1 | Building the GUI: create an interactive and powerful interface in a few lines of code
Phase 2 | Integrated back end: Manage your models and pipelines and create scenarios the smart way
"Nature Language Processing for proteins" by Amélie Héliou, Software Engineer @ Google Research
Abstract: Over the past few months, Large Language Models have become very popular.
We'll see how a simple LLM works, from input sentence to prediction.
I'll then present an application of LLM to protein name prediction.
Twitter: @Amelie_hel
"We are not passing by, and we are not a trend". What if an automated and large scale version of the Bechdel-Wallace test could confirm the speech of Alice Diop at the Cesar 2023?
That's the objective of BechdelAI : to build a tool based on Artificial Intelligence and open-source, allowing to measure the inequalities and the under-representation of women in movies and audiovisual.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
2. About us
Sharone Dayan
Machine Learning Engineer
@ Contentsquare
sharone.dayan@contentsquare.com
Daria Stefic, PhD
Data Scientist
@ Contentsquare
daria.stefic@contentsquare.com
3. Agenda
1. Global Picture
Evaluation methods
2. Case studies
a. Purchase Intent Prediction
b. Unsupervised Segment Discovery
c. Anomaly Detection for Alerting
3. Key takeaways
Academia VS Industry
5. Do we have
labels?
Semi-supervised/
Unsupervised
Can I use public
datasets?
Can I generate
artificial data?
Can I design a
proxy?
Supervised
YES NO or FEW
Can I get a
manually labelled
dataset?
Unsupervised
Segment Discovery
Purchase Intent
Prediction
Anomaly detection for
Alerting
Evaluation methods
8. 8
Main goal
“Who could have converted?”
non-buyers buyers
missed purchase segment
purchase intent segment
Detect non converting users who
had converting intentions based on
their behaviour (e.g. interaction with
product details, add to cart)
Design choices:
● Focus on anonymous users
● Focus on retail clients
● No difference between single-item
and multi-item purchase
● Offline prediction
9. 9
If we had labels for purchase intent…
…we could:
1. directly train a classifier in a supervised
way to recognise intent
2. evaluate our classifier with standard
classification metrics (e.g. f1-score) directly
on these labels
3. compare different solutions in an unbiased
way
…but:
The only labels we have for purchase intent are
from converting sessions
10. 10
Positive Negative
Positive
Negative
We don’t want any
converters predicted as
‘Not intended to purchase‘
Predicted intent
Actual
conversion
Purchase Intent Evaluation
We want some
non-converters predicted
as ‘Intended to purchase’
15. 15
Wait, what… Evaluating unsupervised?
How to include business constraints?
How to benchmark different settings (features, distances,
clustering algorithms, etc.)?
17. 17
We need to validate the clustering “health”
Toy scenarios where we have clear expectations on what the result of
the clustering should be + run functional tests around them.
4 types of session generators (artificial data):
- sessions focused on one specific page
- sessions having a given probability for each page group
- “cycling“ sessions always coming back to the same sequence
- sessions containing a specific pattern
18. 18
Health check results complex health check -> areas of improvement
basic health check -> mandatory
Health check
difficulty
Features A,
distance a,
clustering i
Features A,
distance b,
clustering i
… Features B,
distance c,
clustering j
…
…
…
…
…
20. 20
Example: number of users with API errors
Anomaly detection in time series
Alert!
Main goal
Alert clients (in real time) if they have issues on their platform
23. 23
But, how do we know if our model is:
● Raising true alerts?
● Not raising false alerts?
When metric values are beyond the bounds → alert
The model = seasonality + bounds
27. 27
Raised alerts of last value model vs. seasonal model
Last value:
Seasonal:
Evaluation #1: forecasting error
False alarm!
.
28. 28
Do we want to alert on those?
Evaluation #2: anomalous points classification
29. 29
When something happens, it lasts more than 5 min
Example: Iphone launch
Evaluation #3: anomalous periods
30. 30
Negative periods
● Model is correct if there are not
detections → TN, otherwise FP
Positive periods
● Model is correct if it has >1
detection → TP, otherwise FN
Annotated examples for evaluation
Evaluation #3: anomalous periods
31. 31
Annotated examples for evaluation
Evaluation #3: anomalous periods
Negative periods
● Model is correct if there are not
detections → TN, otherwise FP
Positive periods
● Model is correct if it has >1
detection → TP, otherwise FN