1. The document discusses embedding-based retrieval techniques for search ranking systems. It outlines approaches for training embeddings using triplet loss and hard negative mining, and serving embeddings using approximate nearest neighbor techniques like product quantization.
2. When serving, techniques like product quantization are used to efficiently find the top-K nearest neighbors from large embedding spaces. This improves efficiency over brute force approaches.
3. Major companies like Facebook, Alibaba, and LinkedIn are developing next-generation systems using cross-attention and cost-aware models to further improve ranking performance while maintaining query efficiency.
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusManasi Vartak
These are slides from Manasi Vartak's Strata Talk in March 2020 on Robust MLOps with Open-Source.
* Introduction to talk
* What is MLOps?
* Building an MLOps Pipeline
* Real-world Simulations
* Let’s fix the pipeline
* Wrap-up
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Ed Fernandez
Adoption of ML at scale in the Enterprise, Machine Learning Platforms & AutoML
[1] Definitions & Context
• Machine Learning Platforms, Definitions
• ML models & apps as first class assets in the Enterprise
• Workflow of an ML application
• ML Algorithms, overview
• Architecture of a ML platform
• Update on the Hype cycle for ML & predictive apps
[2] Adopting ML at Scale
• The Problem with Machine Learning - Scaling ML in the
Enterprise
• Technical Debt in ML systems
• How many models are too many models
• The need for ML platforms
[3] The Market for ML Platforms
• ML platform Market References - from early adopters to
mainstream
• Custom Build vs Buy: ROI & Technical Debt
• ML Platforms - Vendor Landscape
[4] Custom Built ML Platforms
• ML platform Market References - a closer look
Facebook - FBlearner
Uber - Michelangelo
AirBnB - BigHead
• ML Platformization Going Mainstream: The Great Enterprise Pivot
[5] From DevOps to MLOps
• DevOps <> ModelOps
• The ML platform driven Organization
• Leadership & Accountability (labour division)
[6] Automated ML - AutoML
• Scaling ML - Rapid Prototyping & AutoML:
• Definition, Rationale
• Vendor Comparison
• AutoML - OptiML: Use Cases
[7] Future Evolution for ML Platforms
Appendix I: Practical Recommendations for ML onboarding in the Enterprise
Appendix II: List of References & Additional Resources
Discuss the different ways model can be served with MLflow. We will cover both the open source MLflow and Databricks managed MLflow ways to serve models. Will cover the basic differences between batch scoring and real-time scoring. Special emphasis on the new upcoming Databricks production-ready model serving.
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusManasi Vartak
These are slides from Manasi Vartak's Strata Talk in March 2020 on Robust MLOps with Open-Source.
* Introduction to talk
* What is MLOps?
* Building an MLOps Pipeline
* Real-world Simulations
* Let’s fix the pipeline
* Wrap-up
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Ed Fernandez
Adoption of ML at scale in the Enterprise, Machine Learning Platforms & AutoML
[1] Definitions & Context
• Machine Learning Platforms, Definitions
• ML models & apps as first class assets in the Enterprise
• Workflow of an ML application
• ML Algorithms, overview
• Architecture of a ML platform
• Update on the Hype cycle for ML & predictive apps
[2] Adopting ML at Scale
• The Problem with Machine Learning - Scaling ML in the
Enterprise
• Technical Debt in ML systems
• How many models are too many models
• The need for ML platforms
[3] The Market for ML Platforms
• ML platform Market References - from early adopters to
mainstream
• Custom Build vs Buy: ROI & Technical Debt
• ML Platforms - Vendor Landscape
[4] Custom Built ML Platforms
• ML platform Market References - a closer look
Facebook - FBlearner
Uber - Michelangelo
AirBnB - BigHead
• ML Platformization Going Mainstream: The Great Enterprise Pivot
[5] From DevOps to MLOps
• DevOps <> ModelOps
• The ML platform driven Organization
• Leadership & Accountability (labour division)
[6] Automated ML - AutoML
• Scaling ML - Rapid Prototyping & AutoML:
• Definition, Rationale
• Vendor Comparison
• AutoML - OptiML: Use Cases
[7] Future Evolution for ML Platforms
Appendix I: Practical Recommendations for ML onboarding in the Enterprise
Appendix II: List of References & Additional Resources
Discuss the different ways model can be served with MLflow. We will cover both the open source MLflow and Databricks managed MLflow ways to serve models. Will cover the basic differences between batch scoring and real-time scoring. Special emphasis on the new upcoming Databricks production-ready model serving.
Using synthetic data for computer vision model trainingUnity Technologies
During this webinar Unity’s computer vision team provides an overview of computer vision, walks through current real-world data workflows, and explains why companies are moving toward synthetically generated data as an alternate data source for model training.
Watch the webinar: https://resources.unity.com/ai-ml/cv-webinar-dec-2021
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Vitaly Bondar
A presentation about a new Google Research paper in the text-to-image task - Imagen.
This latent diffusion-based model outperforms DALLE-2 and other models and produces incredibly realistic images.
Query optimizers and people have one thing in common: the better they understand their data, the better they can do their jobs. Optimizing queries is hard if you don't have good estimates for the sizes of the intermediate join and aggregate results. Data profiling is a technique that scans data, looking for patterns within the data such as keys, functional dependencies, and correlated columns. These richer statistics can be used in Apache Calcite's query optimizer, and the projects that use it, such as Apache Hive, Phoenix and Drill. We describe how we built a data profiler as a table function in Apache Calcite, review the recent research and algorithms that made it possible, and show how you can use the profiler to improve the quality of your data.
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIWithTheBest
This is how Generative Adversarial Networks (GANs) work and benefit the tech and dev industry. Although GANs still have room for improvement, GANs are important generative models that learn how to create realistic samples.
GANS
Ian Goodfellow, OpenAI Research Scientist
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
This presentation explains basic ideas of graph neural networks (GNNs) and their common applications. Primary target audiences are students, engineers and researchers who are new to GNNs but interested in using GNNs for their projects. This is a modified version of the course material for a special lecture on Data Science at Nara Institute of Science and Technology (NAIST), given by Preferred Networks researcher Katsuhiko Ishiguro, PhD.
Slides by Víctor Garcia about the paper:
Reed, Scott, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. "Generative adversarial text to image synthesis." ICML 2016.
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks
Does more data always improve ML models? Is it better to use distributed ML instead of single node ML?
In this talk I will show that while more data often improves DL models in high variance problem spaces (with semi or unstructured data) such as NLP, image, video more data does not significantly improve high bias problem spaces where traditional ML is more appropriate. Additionally, even in the deep learning domain, single node models can still outperform distributed models via transfer learning.
Data scientists have pain points running many models in parallel automating the experimental set up. Getting others (especially analysts) within an organization to use their models Databricks solves these problems using pandas udfs, ml runtime and MLflow.
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/42Oo8TOl85I.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Abstract:
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. Although H2O has made it easier for practitioners to train and deploy machine learning models at scale, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Deep Neural Networks in particular, are notoriously difficult for a non-expert to tune properly. In this presentation, we provide an overview of the field of "Automatic Machine Learning" and introduce the new AutoML functionality in H2O. H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard. H2O AutoML is available in all the H2O interfaces including the h2o R package, Python module and the Flow web GUI. We will also provide simple code examples to get you started using AutoML.
Erin's Bio:
Erin is a Statistician and Machine Learning Scientist at H2O.ai. She is the main author of H2O Ensemble. Before joining H2O, she was the Principal Data Scientist at Wise.io and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc. Erin received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from University of California, Berkeley. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing. She also holds a B.S. and M.A. in Mathematics.
Slides for the hands on PyData workshop.
Cover three main topics:
- Current state of NLP models at Walmart
- Steps we took to optimize serving BERT
- how we serve models with Facebook’s TorchServe.
Corresponding repo for notebooks for handson:
https://bit.ly/pytorch-workshop-2021
Dowhy: An end-to-end library for causal inferenceAmit Sharma
In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Python library that is built with causal assumptions as its first-class citizens, based on the formal framework of causal graphs to specify and test causal assumptions. DoWhy presents an API for the four steps common to any causal analysis---1) modeling the data using a causal graph and structural assumptions, 2) identifying whether the desired effect is estimable under the causal model, 3) estimating the effect using statistical estimators, and finally 4) refuting the obtained estimate through robustness checks and sensitivity analyses. In particular, DoWhy implements a number of robustness checks including placebo tests, bootstrap tests, and tests for unoberved confounding. DoWhy is an extensible library that supports interoperability with other implementations, such as EconML and CausalML for the the estimation step.
Using synthetic data for computer vision model trainingUnity Technologies
During this webinar Unity’s computer vision team provides an overview of computer vision, walks through current real-world data workflows, and explains why companies are moving toward synthetically generated data as an alternate data source for model training.
Watch the webinar: https://resources.unity.com/ai-ml/cv-webinar-dec-2021
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Vitaly Bondar
A presentation about a new Google Research paper in the text-to-image task - Imagen.
This latent diffusion-based model outperforms DALLE-2 and other models and produces incredibly realistic images.
Query optimizers and people have one thing in common: the better they understand their data, the better they can do their jobs. Optimizing queries is hard if you don't have good estimates for the sizes of the intermediate join and aggregate results. Data profiling is a technique that scans data, looking for patterns within the data such as keys, functional dependencies, and correlated columns. These richer statistics can be used in Apache Calcite's query optimizer, and the projects that use it, such as Apache Hive, Phoenix and Drill. We describe how we built a data profiler as a table function in Apache Calcite, review the recent research and algorithms that made it possible, and show how you can use the profiler to improve the quality of your data.
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIWithTheBest
This is how Generative Adversarial Networks (GANs) work and benefit the tech and dev industry. Although GANs still have room for improvement, GANs are important generative models that learn how to create realistic samples.
GANS
Ian Goodfellow, OpenAI Research Scientist
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
This presentation explains basic ideas of graph neural networks (GNNs) and their common applications. Primary target audiences are students, engineers and researchers who are new to GNNs but interested in using GNNs for their projects. This is a modified version of the course material for a special lecture on Data Science at Nara Institute of Science and Technology (NAIST), given by Preferred Networks researcher Katsuhiko Ishiguro, PhD.
Slides by Víctor Garcia about the paper:
Reed, Scott, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. "Generative adversarial text to image synthesis." ICML 2016.
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks
Does more data always improve ML models? Is it better to use distributed ML instead of single node ML?
In this talk I will show that while more data often improves DL models in high variance problem spaces (with semi or unstructured data) such as NLP, image, video more data does not significantly improve high bias problem spaces where traditional ML is more appropriate. Additionally, even in the deep learning domain, single node models can still outperform distributed models via transfer learning.
Data scientists have pain points running many models in parallel automating the experimental set up. Getting others (especially analysts) within an organization to use their models Databricks solves these problems using pandas udfs, ml runtime and MLflow.
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/42Oo8TOl85I.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Abstract:
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. Although H2O has made it easier for practitioners to train and deploy machine learning models at scale, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Deep Neural Networks in particular, are notoriously difficult for a non-expert to tune properly. In this presentation, we provide an overview of the field of "Automatic Machine Learning" and introduce the new AutoML functionality in H2O. H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard. H2O AutoML is available in all the H2O interfaces including the h2o R package, Python module and the Flow web GUI. We will also provide simple code examples to get you started using AutoML.
Erin's Bio:
Erin is a Statistician and Machine Learning Scientist at H2O.ai. She is the main author of H2O Ensemble. Before joining H2O, she was the Principal Data Scientist at Wise.io and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc. Erin received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from University of California, Berkeley. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing. She also holds a B.S. and M.A. in Mathematics.
Slides for the hands on PyData workshop.
Cover three main topics:
- Current state of NLP models at Walmart
- Steps we took to optimize serving BERT
- how we serve models with Facebook’s TorchServe.
Corresponding repo for notebooks for handson:
https://bit.ly/pytorch-workshop-2021
Dowhy: An end-to-end library for causal inferenceAmit Sharma
In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Python library that is built with causal assumptions as its first-class citizens, based on the formal framework of causal graphs to specify and test causal assumptions. DoWhy presents an API for the four steps common to any causal analysis---1) modeling the data using a causal graph and structural assumptions, 2) identifying whether the desired effect is estimable under the causal model, 3) estimating the effect using statistical estimators, and finally 4) refuting the obtained estimate through robustness checks and sensitivity analyses. In particular, DoWhy implements a number of robustness checks including placebo tests, bootstrap tests, and tests for unoberved confounding. DoWhy is an extensible library that supports interoperability with other implementations, such as EconML and CausalML for the the estimation step.
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
Data science isn't an easy task to pull of.
You start with exploring data and experimenting with models.
Finally, you find some amazing insight!
What now?
How do you transform a little experiment to a production ready workflow? Better yet, how do you scale it from a small sample in R/Python to TBs of production data?
Building a BIG ML Workflow - from zero to hero, is about the work process you need to take in order to have a production ready workflow up and running.
Covering :
* Small - Medium experimentation (R)
* Big data implementation (Spark Mllib /+ pipeline)
* Setting Metrics and checks in place
* Ad hoc querying and exploring your results (Zeppelin)
* Pain points & Lessons learned the hard way (is there any other way?)
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf
10 More Lessons Learned from Building Real-Life ML Systems: A year ago I presented a collection of 10 lessons in MLConf. These goal of the presentation was to highlight some of the practical issues that ML practitioners encounter in the field, many of which are not included in traditional textbooks and courses. The original 10 lessons included some related to issues such as feature complexity, sampling, regularization, distributing/parallelizing algorithms, or how to think about offline vs. online computation.
Since that presentation and associated material was published, I have been asked to complement it with more/newer material. In this talk I will present 10 new lessons that not only build upon the original ones, but also relate to my recent experiences at Quora. I will talk about the importance of metrics, training data, and debuggability of ML systems. I will also describe how to combine supervised and non-supervised approaches or the role of ensembles in practical ML systems.
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB
Today an increasingly large number of products use machine learning and AI to deliver a great personalized user experience, and workplace software is no exception. Spoke goes beyond traditional ticketing with their friendly, AI-powered chatbot that gives workplace teams hours of time back as it automatically responds to questions on Slack, email, SMS, and web. Learn how Spoke uses MongoDB to do dynamic model training in real time from user interaction data and serves thousands of models, with multiple customized models per client.
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBLisa Roth, PMP
Today an increasingly large number of products use machine learning and AI to deliver a great personalized user experience, and workplace software is no exception. Spoke goes beyond traditional ticketing with their friendly, AI-powered chatbot that gives workplace teams hours of time back as it automatically responds to questions on Slack, email, SMS, and web. Learn how Spoke uses MongoDB to do dynamic model training in real time from user interaction data and serves thousands of models, with multiple customized models per client.
Unsupervised Learning is a form of learning technique (basically machine learning) all the topics are covered from Artificial Intelligence: Structure and strategies for complex problem solving Fifth Edition by George F Lugar.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
4. ● Full pipeline composed of multiple stages
○ Matching / Pre-ranking: focus on reducing search space for later stages by dropping irrelevant
samples while guarantee high recall (ensure all positive samples included)
○ Ranking: focus on high precision, guarantee top-K align with user interest.
○ Reranking: overwrite model result for customized business purpose, like promoting
new/high-quality content, assisting known model weakness … etc.
Main concept
5. ● Matching: Recall@K, QPS (query per second), RP (response time)
● Ranking: NDCG@K, MAP@K
● Reranking: your business objects.
(AUC is only for quick evaluation, not directly align with business value)
Evaluation according to purpose
6. ● Over-simplified here, missing CF(collaborative filtering) family, factorize family, GCN family…etc.
● Two tower is an architecture prioritizing engineering effort (speed, cost, complexity).
○ Avoid cross-attention
○ ANN (Approximate nearest neighbor) on inference, without model.
● COLD: wanna include cross-attention, how to balance performance & computing cost.
Retrieval stage evolution
Current mainstream
precisecheap
7. ● Key idea of two-tower: train embedding, but avoid cross attention
○ Query and candidate encoded independently all the way to the last layer (which calculating similarity).
○ Because using the whole model to do online inference is super expensive. We want to use the final embedding
only in online inference (forget about the model after training).
○ Cost: without cross-attention means sacrificing performance.
● Btw, JEM team deployed BERT+cross_attention since their data volume is small.
● Alibaba COLD tried to cost down while using cross attention.
○ Benchmark two-tower, COLD, and their “deep interest network”.
Why avoiding cross-attention?
8. Why Embedding?
From “syntactic matching” to “semantic matching”.
1. Factorize features
○ Everything shouldn’t be naively one-hot coded as black-or-white, they have implicit
relationships in high dimensional space. (concept of Factorize Machine)
○ For linear model, even by adding quadratic features like <xi*yi>, if <xi*yi> rarely or
never happen in training set, linear model cannot learn their relationship <Wx,Wy>.
FM or embedding could still learn through indirect relationship like x-z-y.
2. Fuzzy text match
○ Match between query "kacis creations" and “Kasie’s creations” while the term-based
match cannot.
3. Personalization
○ User embedding naturally enabled personalized matching result.
12. ● A good review on main concept (good gif made by Google)
○ Two-tower, queries and database items are mapped to the embedding space.
○ model responds to natural-language queries.
Review: main concept of using embedding
13. 1. “Unified” Embedding (by Facebook EBR)
○ two sided model: one side is search request (character n-gram), the other side is the document.
○ Other (social/location) features are included into encoder input! (thus called “unified”)
2. Triplet Loss function: keep enlarging the distance
between positive query-doc pairs with negative ones.
○ Have m terms (need) to tune, good and bad.
○ (?) Why not just use “clicked” as positive,
“seen but not clicked” as negative?
○ (?) Slower to converge
Best practice by today (2020)
14. Data sampling is crucial
1. Choosing “clicked” and “seen” as positive sample are good as each other in online test.
○ Since seen is also chosen by ranking stage, and it’s fine to choose the same in retrieval stage.
2. Hard negative mining
○ Online: choose K doc from other positive query-doc paris as hard negative. (K=2 the best)
○ Offline: choose top 101~500 from historical SERP as hard negative.
3. Random negative (seen) is better than hard-negative only (seen but not click)!!!
○ Hypothesis: model focused too much on hard-negative lose the ability to deal with obvious ones.
(ex: all hard-negatives are same location job with anchor job, so model thought location is not
important, which is obviously wrong.)
○ Also, random sample distribution align with serving distribution.
○ Best practice (two styles):
■ random negative/hard-negatives = 100:1
■ Transfer learning: train on hard first, than on random negatives.
15. Hard Negative Mining
1. Facebook in “Embedding Based Retrieval in Facebook Search”
○ Online hard negative
■ choose K doc from other positive query-doc paris as hard negative. (K=2 the best)
○ Offline hard negative
■ choose top 101~500 from historical SERP as hard negative.
2. AirBnB in “Real-time Personalization using Embeddings for Search Ranking at Airbnb”
○ Random sample items in same location with positive samples as hard-negative.
○ Add “rejected by room owner” as hard negative sample.
16. Embedding everywhere
1. Query converted to embedding
2. Indexed document with embedding
3. Retrieval stage use embedding,
also pass embedding to ranking model
to ensure ranking align with retrieval
(avoiding Matthew effect)
Engineering question
1. How often embedding re-trained/updated?
2. Detail about embedding based indexing?
Facebook search ranking system
17. Other topics
● Matthew Effect
○ Current ranking stages are designed for existing retrieval scenarios
=> ranker won’t agree with new retrieval algorithm, it reject (no impression) or give
them poor position (hard to be seen).
○ Solution: ranking model use retrieval stage embeddings as features, so ranking
model could learn from new insight. (by Facebook: empirically just add the cosine
similarity of query-item as ranking model feature)
● Embedding ensemble: weighted concatenation
○ Cascade multiple embeddings trained for different purpose
(each embedding will focus on one specific purpose, just like multi-channel retrieval.)
○ Alibaba COLD spend efforts on choosing best embeddings
19. ● Two-tower encoded embeddings independently, thus inference (serving) stage
no longer need the model.
● The only task in serving: find “top-K nearest neighborhood”
● Brute force way take O(N^2), how to reduce this?
Serving: main challenge
20. Serving: ANN (Approximate nearest neighborhood)
1. Tree based
○ KD-tree : good for low dimensional embeddings, but in high dimensional it’s good as brute-force.
○ For high-dimension, use hash-based or vector quantization (following two categories)
2. Hash based
○ LSH (Locality-Sensitive Hashing)
○ For < 10 million data volume, this category is good.
○ Open source like FALCONN, Annoy (by Spotify), NMSLIB (AWS Elastic Search, best@2019).
3. Vector quantization
○ Main stream for hundred million level data. Product quantization is the best practice. (deep-dive
in following slides)
○ Open source like FAISS (by Facebook), ScaNN (by Google)
4. Others
○ Milvus: open source vector similarity search engine, use it as a database query.
○ NGT (by Yahoo): best in some benchmarks
○ NSG (by Alibaba-Taobao)
All giants develop their own embedding+ANN, serve faster without losing precision is the key!
21. ● Benchmark reference : (1 / 2 / 3)
○ NMSLib is the best among hash based algorithm.
○ FAISS speed up with GPU, and ScaNN further improved performance (recall@10).
(All new algorithms claim better against NMSlib, so maybe at this moment (2020) NMSlib is still the most stable choice
if you don’t trust new thing.)
Serving: ANN (Approximate nearest neighborhood)
FAISS (vector quantization) is
much faster than nmslib with GPU
ScaNN (vector quantization) now
best in both performance & speed
22. Serving : Product Quantization
(Note that before product quantization, there is “coarse quantization” using K-means and choose cluster.)
1. Let’s say you have original 50k jobs, each represent in 1024 dim embeddings.
2. Break-down 1024 dim embedding vector into 8 x 128 dim chunks
3. Encode them into 8-bit (256) groups, each group represented by its center.
23. 1. When calculating all 50K distance(query, item) pairs
○ Preparing a look-up table of 256 distance(query, center)
○ Thus, for each of 50K distance(query, item) = SUM(8 x distance(query, center_i))
2. Computing reduced thousand times:
○ from: 1K dim root-mean-square
○ To: SUM(8 look-up-table values)
3. Memory reduced ~500 times
○ 4096 bytes float => 8 bytes id
Serving : Product Quantization
24. Serving : Other techniques
● Coarse quantization (invert file index)
○ Cluster all items into groups, only choose top-K groups with center closest to query.
○ After coarse quantization, then do product quantization to choose final candidates.
● Residue encoding
○ After vectors grouped, use residue to replace the original embedding vectors to improve
resolution after quantized. (as if remove offset, centeralize vectors into origin, like left figure.)
○ Note that for different group, query vector will have different residue, since different center.
26. TL;DR
(If you can only remember two thing today, here it is):
1. In training stage,
find best way to compile all knowledge into
user/item embeddings.
2. In serving stage,
find the fastest/cheapest way to
find nearest neighbors.
28. Alibaba’s latest best practices: (both expensive and high engineering effort, just FYI)
● Retrieval: COLD (Cost aware, Online, Lightweight Deep pre-ranking)
○ Eager to the performance improve from cross-attention.
○ Feature selection to reduce computing cost (avoid assembling too many
embeddings)
○ In short, choose ensembled embeddings having best AUC while maintaining
acceptable QPS (query-per-seconds) and RT (return-time).
○ Also take many engineering efforts to speed-up & cost down.
● Ranking: DIEN (deep interest network)
○ Rather than synthesize user embedding by “latest K clicked items”, use attention to
query “latest K relevant clicked items”.
○ Demerit: user embedding have to be synthesized online.
“Maybe” next generation : Ali COLD+DIEN
29. Deep interest network (DIN)
● Basic model (left): trained upon user_vec, item_vec of latest K clicked items.
● DIN (right): current viewing item decide attention weight to latest K clicked items.
● Note everything (Goods/Shop/Category/Other) are embedding, rather than 1-hot-coding.
○ Everything is factorized, not just black-or-white (0/1).
30. An alternative future: user-interest-capsule
● Latest (since 2019) user-vector tech, MIND (multi interest network with dynamic routing) in Alibaba:
○ DIN attention to item? Why not also attention to multi-user interest? It’s naive to assume user as
single user-interest vector.
○ Indeed could skip this since job-seeker rarely have multiple career interest.
32. ● DeepFM v.s unified embedding
● Two tower or Siamese
● Character level n-gram
● Triplet loss
● Random negative + Hard negative mining (100:1)
● Residual encoding
● Embeddings weighted concatenation
● Multitask
Review tricks worth trying
33. Impact size : Facebook, Tencent
1. Facebook EBR
a. location feature and social embedding helps a lot! (Don’t forget domain specific data!)
2. Tencent ranking model (CTR)
a. Naive DNN: AUC=0.7618
b. Multi-task (CTR+Favorite+Like…) DNN: AUC=0.7678 (+0.6%)
c. DeepFM: AUC +0.2%
d. Last View + DIN: AUC +0.2%
e. Last Display + GRU (?): AUC + 0.4%
34. Trade off between performance (recall) & computing cost, strike balance between
vector-product based and fully DIN.
Impact size : Alibaba