Semantic Equivalence of e-Commerce Queries

•

0 likes•1,614 views

Title: Semantic Equivalence of e-Commerce Queries Authors: Aritra Mandal, Daniel Tunkelang, Zhe Wu Presented at KDD 2023 Workshop on E-Commerce and Natural Language Processing (ECNLP 2023).

Software

Semantic Equivalence of
e-Commerce Queries
Aritra Mandal and Daniel Tunkelang and Zhe Wu
eBay Inc.

Search query != search intent.
● Information retrieval researchers worry about queries that map to multiple intents.
jaguar or ?
● Practitioners worry more about multiple queries that map to the same intent.
lightning to 3.5mm
iphone to aux

Equivalent queries should yield equivalent experiences.
Recall?
CTR?
Conversion Rate?
...
?
or

Opportunity to increase recall while preserving precision.
Similar but not
equivalent intent.

High-level strategy to leverage query equivalence.
Map queries to vectors.
Store in nearest-neighbor database.
(i.e., optimize for user
or business outcome)

Two strategies for recognizing equivalent queries.
● Surface Similarity
○ Variation in inflection, word order, compounding, noise words.
black tshirts for men = mens black t-shirt =
● Behavioral Similarity
○ Queries lead to engagement with equivalent or similar results.
lightning to 3.5mm = iphone to aux =

Introducing the “bag of documents” model.

Query vectors are centroids of associated product vectors
►
►
[0.13, 0.81, … ]
[0.09, 0.75, … ]
…
►
[0.11, 0.79, … ]
[0.13, 0.81, … ]
[0.09, 0.77, … ]
…
►
[0.12, 0.78, … ]
►
cos > 0.98
black tshirts for men mens black t-shirt

$Works well, but only for head and torso queries. ● Offline approach works for queries with enough engagement history. ● Would be expensive to compute aggregates of result vectors online. ● Still, head and torso queries tend to represent a large fraction of traffic.$

Train online sentence transformer model for tail queries.
● Train using (query1, query2, similarity) triples from offline model.
● Oversample similar query pairs to increase sensitivity where it matters.
● Fine-tune a pre-trained micro-BERT sentence transformer model.
● Concatenate the output of a query classifier to the query keywords.

Architecture for Online Query Similarity Model

Results
Model Dataset Name Pearson’s correlation
query-sim-ecom eBay Internal 0.87
query-sim-ecom ESCI query-query 0.85
all-MiniLM-L12-v2 ESCI query-query 0.68
Query 1 Query 2 cosine
hdmi to galaxy s8 s9 hdmi 0.9993
movie money prop money 0.9995
cassette adapter for iphone tape to aux 0.9993
Examples from ESCI
of queries with low
surface but high
behavioral similarity:

Summary
● Queries with equivalent intent should yield equivalent experiences.
● Query similarity can increase recall while preserving precision.
● Signals can come from either surface or behavioral similarity.
● Offline bag-of-documents model: queries as means of product vectors.
● Fine-tune online Micro-BERT sentence transformer model for tail queries.
● It just works!

Machine learning model fairness and interpretability are critical for data scientists, researchers and developers to explain their models and understand the value and accuracy of their findings. Interpretability is also important to debug machine learning models and make informed decisions about how to improve them. In this session, Francesca will go over a few methods and tools that enable you to "unpack” machine learning models, gain insights into how and why they produce specific results, assess your AI systems fairness and mitigate any observed fairness issues. Using open-source fairness and interpretability packages, attendees will learn how to: - Explain model prediction by generating feature importance values for the entire model and/or individual data points. - Achieve model interpretability on real-world datasets at scale, during training and inference. - Use an interactive visualization dashboard to discover patterns in data and explanations at training time. - Leverage additional interactive visualizations to assess which groups of users might be negatively impacted by a model and compare multiple models in terms of their fairness and performance.

Multi-modal sources for predictive modeling using deep learning

Sanghamitra Deb

The need for sophistication in modern search engine implementations

Ben DeMott

Top 40 Data Science Interview Questions and Answers 2022.pdf

Suraj Kumar

1 – What is F1 score? F1 score is a measure of the accuracy of a model. It is defined as the harmonic mean of precision and recall. F1 score is one of the most popular metrics for assessing how well a machine learning algorithm performs on predicting a target variable. F1 score ranges from 0 to 1, with higher values indicating better performance. The F1 score is used to evaluate the performance of a machine learning algorithm by considering how many times it has classified correctly and how many times it has misclassified. The higher the F1 score, the better the performance of an algorithm. 2 – What is pickling and unpickling? Pickling is the process of converting an object into a string representation. It can be used to store the object in a file, send it over a network, or save it to disk. Unpickling is the inverse process of pickling. It converts an object from its string representation back into an object. Pickling and unpickling can be done with machine learning by using an algorithm that converts the input to the output. 3 – Difference between likelihood and probability? Probability is a measure of the likelihood of an event happening under certain conditions. The event can be a machine learning algorithm predicting the probability that a person will buy a product or not. Likelihood is the probability that an event will happen, based on evidence and knowledge about the world. For example, if you see someone who looks like they are going to rob you and you know that they have robbed other people in the past, your likelihood of being robbed is high. 4 – Which machine learning algorithm known as a lazy learner? KNN is a machine learning algorithm known as a lazy learner. K-NN is a lazy learner because it doesn’t learn any machine learnt values or variables from the training data but dynamically calculates distance every time it wants to classify, hence memorizes the training dataset instead. 5 – How to fix multicollinearity? Multicollinearity is a statistical problem that arises when two or more independent variables are highly correlated. One way to fix multicollinearity is to use a different variable that has less correlation with the other variables. If there are not any other variables available, one can use a transformation on the original variable and then re-run the regression. 6 – Significance of gamma and Regularization in SVM? The significance of gamma and regularization in SVM is that they are used to control the trade-off between the training error and the generalization error. In other words, these two parameters are used to balance the bias-variance trade-off. Regularization is a technique to reduce overfitting by penalizing models with more complexity than necessary. The goal of regularization is to find a model that has good generalization performance, which means it can correctly predict new data points with high accuracy. On the other hand, gamma is a parameter that controls how much weight should be given to each training ex

Machine Learning 2 deep Learning: An Intro

Si Krishan

Comparable entity mining from comparative questions

IEEEFINALYEARPROJECTS

The talk "Strategies for using alternative queries to mitigate zero results and their application to online marketplaces" took center stage at Haystack EU 2023. The presenters: Jean Silva @wallapop, ex @trivago Working with various programming languages since 2008 and fall in love with the search world in 2014. Jean has worked in different industries such as travel, e-commerce, and classified listings. René Kriegler @OpenSourceConnections Works in search since ~2008 Jean and René shed light on the challenges associated with zero results queries, emphasizing the critical importance of avoiding empty search result lists to enhance user experience and prevent negative business impacts. This issue is particularly pertinent in user-generated content platforms like Wallapop, where the diversity of items and user backgrounds complicates search outcomes. Addressing these challenges, they discussed recent advancements in vector search and the availability of powerful language models for semantic search. While these developments offer potential solutions by ensuring some results are always returned, they also pose usability and computational challenges, especially in platforms with short-lived content. Building upon strategies presented at Haystack US 2019, Jean and René explored approaches to providing users with alternative queries, such as query relaxation and term substitution, leveraging Large Language Models. Additionally, they shared insights from experiments involving simpler query relaxation techniques based on query term statistics. These strategies not only improve user experience but also offer versatile alternatives to vector search, applicable beyond classified ads platforms. For those interested in delving deeper into these discussions, the conference talk can be accessed via the following link: https://haystackconf.com/eu2023/talk-9/ Jean Silva Search Engineer @Wallapop.

Introduction to Machine Learning

SATHVIK MANIKANTAN N U

Sentiment Analysis: A comparative study of Deep Learning and Machine Learning

IRJET Journal

Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

MLconf

Scott is co-founder and CEO of SigOpt, a YC and a16z backed “Optimization as a Service” startup in San Francisco. Scott has been applying optimal learning techniques in industry and academia for years, from bioinformatics to production advertising systems. Before SigOpt, Scott worked on the Ad Targeting team at Yelp leading the charge on academic research and outreach with projects like the Yelp Dataset Challenge and open sourcing MOE. Scott holds a PhD in Applied Mathematics and an MS in Computer Science from Cornell University and BS degrees in Mathematics, Physics, and Computational Physics from Oregon State University. Scott was chosen as one of Forbes’ 30 under 30 in 2016. Abstract summary Bayesian Global Optimization: Using Optimal Learning to Deep Learning Models: In this talk we introduce Bayesian Optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different configurations is time-consuming or expensive. Deep learning pipelines are notoriously expensive to train and often have many tunable parameters including hyperparameters, the architecture, feature transformations that can have a large impact on the efficacy of the model. We will motivate the problem by giving several example applications using open source deep learning frameworks and open datasets. We’ll compare the results of Bayesian Optimization to standard techniques like grid search, random search, and expert tuning.

MLconf 2017 Seattle Lunch Talk - Using Optimal Learning to tune Deep Learning...

SigOpt

In this talk we introduce Bayesian Optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different parameters is time-consuming or expensive. Deep learning pipelines are notoriously expensive to train and often have many tunable parameters including hyperparameters, the architecture, feature transformations that can have a large impact on the efficacy of the model. We will motivate the problem by giving several example applications using multiple open source deep learning frameworks and open datasets. We’ll compare the results of Bayesian Optimization to standard techniques like grid search, random search, and expert tuning.

E017252831

IOSR Journals

Extraction of Data Using Comparable Entity Mining

iosrjce

IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.

Demystifying Machine Learning

Ayodele Odubela

Helping Searchers Satisfice through Query Understanding

Daniel Tunkelang

Behavioral economics transformed how we think about human decision making, rejecting expected utility maximization for the real world of heuristics, biases, and satisficing. In this talk, I'll argue that our thinking about search engines needs a similar transformation. I will compare the Probability Ranking Principle to expected utility maximization and offer ways that AI can help searchers satisfice through query understanding. This was an invited talk given at the 2023 Walmart AI Summit. Speaker Bio Daniel Tunkelang is an independent consultant specializing in search, machine learning / AI, and data science. He completed undergraduate and master's degrees in Computer Science and Math at MIT and a PhD in computer science at CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired in 2011. He then led engineering and data science teams at Google and LinkedIn. He has written a book on Faceted Search, and he blogs on Medium about search-related topics — particularly query understanding. He has worked with numerous tech companies, retailers, and others, including Algolia, Apple, Canva, Coupang, eBay, Etsy, Flipkart, Home Depot, Oracle, Pinterest, Salesforce, Target, Yelp, and Zoom.

How to fine-tune and develop your own large language model.pptx

Knoldus Inc.

VMworld vBrownbag vmtn6739e - machine learning (ai) for workload analytics an...

Kenneth Moore

Designing cloud solutions can be a challenging task. Complications occur when critical specification criteria are unavailable. Crowd-sourced data provides knowledge and understanding, enabling the calculation of any missing criteria. Analytical intelligence, machine learning and cloud community data form the recipe for successfully drafting cloud solutions. View the VMworld 2017 vBrownBag Tech Talk on youtube: https://youtu.be/DqojNKVPQKE

Movie Recommendation System.pptx

randominfo

Neel Sundaresan - Teaching a machine to code

MLconf

ChatGPT-and-Generative-AI-Landscape Working of generative ai search

rohitcse52

Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018

Sri Ambati

This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/p4iAnxwC_Eg The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes! This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling. He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based.

Interpretable Machine Learning

Sri Ambati

Usage of AI and machine learning models is likely to become more commonplace as larger swaths of the economy embrace automation and data-driven decision-making. While these predictive systems can be quite accurate, they have been treated as inscrutable black boxes in the past, that produce only numeric predictions with no accompanying explanations. Unfortunately, recent studies and recent events have drawn attention to mathematical and sociological flaws in prominent weak AI and ML systems, but practitioners usually don’t have the right tools to pry open machine learning black-boxes and debug them. This presentation introduces several new approaches to that increase transparency, accountability, and trustworthiness in machine learning models. If you are a data scientist or analyst and you want to explain a machine learning model to your customers or managers (or if you have concerns about documentation, validation, or regulatory requirements), then this presentation is for you!

AWS_Meetup_BLR_July_22_Social.pdf

Ayyanar Jeyakrishnan

In this workshop we covered an introduction to Generative AI and Large Language Models (LLMs), an explanation of AWS Foundation Models and their role in providing pre-trained LLMs, the benefits of leveraging LLMs in enterprises, deploying LLMs on AWS Infrastructure including infrastructure requirements and available AWS services and tools, and a demo showcasing Text-to-Image and Text Summarization using Foundation Models, as well as utilising Retrieval Augmented Generation and LangChain with AWS tools for Enterprise use cases. Connect with me for interesting session in future @https://www.linkedin.com/in/jayyanar/

Deep learning Introduction and Basics

Nitin Mishra

Everything you need to know about AutoML

Arpitha Gurumurthy

MMM, Search!

Daniel Tunkelang

MMM, Search! An opinionated discussion of search metrics, models, and methods. Presented to the Wikimedia Foundation on April 27, 2020. About the Speaker Daniel Tunkelang is an independent consultant specializing in search, discovery, machine learning / AI, and data science. He was a founding employee of Endeca, a search pioneer that Oracle acquired. After 10 years at Endeca, he moved to Google, where he led a local search team. He then served as a director of data science and search at LinkedIn. After leaving LinkedIn in 2015, he became an independent consultant. His clients have included Apple, eBay, Coupang, Etsy, Flipkart, Gartner, Pinterest, Salesforce, and Yelp; as well as some of the largest traditional retailers. Daniel completed undergraduate and master's degrees in Computer Science and Math at MIT and a Ph.D. in computer science at CMU. He wrote a book on Faceted Search, published by Morgan & Claypool, and he blogs on Medium about search-related topics -- particularly about query understanding. He is also active on Twitter, LinkedIn, and Quora.

Enterprise Intelligence

Daniel Tunkelang

Enterprise Intelligence: Putting the Pieces Together http://enterpriserelevance.com/kdd2016/keynote.html These slides are for a keynote presentation delivered at the Workshop on Enterprise Intelligence, held in conjunction with the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016). About the author: Daniel Tunkelang is a data science and engineering executive who has built and led some of the strongest teams in the software industry. He studied computer science and math at MIT and has a PhD in computer science from CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired for $1.1B. He led a local search team at Google. He was a director of data science and engineering at LinkedIn, and he established their query understanding team. Daniel is a widely recognized writer and speaker. He is frequently invited to speak at academic and industry conferences, particularly in the areas of information retrieval, web science, and data science. He has written the definitive textbook on faceted search (now a standard for ecommerce sites), established an annual symposium on human-computer interaction and information retrieval, and authored 24 US patents. His social media posts have attracted over a million page views. Daniel advises and consults for companies that can benefit strategically from his expertise. His clients range from early-stage startups to "unicorn" technology companies like Etsy and Pinterest. He helps companies make decisions around algorithms, technology, product strategy, hiring, and organizational structure.

Similar to Semantic Equivalence of e-Commerce Queries

JAVA 2013 IEEE DATAMINING PROJECT Comparable entity mining from comparative q...

IEEEGLOBALSOFTTECHNOLOGIES

Strategies for using alternative queries to mitigate zero results

Jean Silva

Introduction to Machine Learning

SATHVIK MANIKANTAN N U

Sentiment Analysis: A comparative study of Deep Learning and Machine Learning

IRJET Journal

Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

MLconf

MLconf 2017 Seattle Lunch Talk - Using Optimal Learning to tune Deep Learning...

SigOpt

E017252831

IOSR Journals

Extraction of Data Using Comparable Entity Mining

iosrjce

Demystifying Machine Learning

Ayodele Odubela

Helping Searchers Satisfice through Query Understanding

Daniel Tunkelang

How to fine-tune and develop your own large language model.pptx

Knoldus Inc.

VMworld vBrownbag vmtn6739e - machine learning (ai) for workload analytics an...

Kenneth Moore

Movie Recommendation System.pptx

randominfo

Neel Sundaresan - Teaching a machine to code

MLconf

ChatGPT-and-Generative-AI-Landscape Working of generative ai search

rohitcse52

Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018

Sri Ambati

Interpretable Machine Learning

Sri Ambati

AWS_Meetup_BLR_July_22_Social.pdf

Ayyanar Jeyakrishnan

Deep learning Introduction and Basics

Nitin Mishra

Everything you need to know about AutoML

Arpitha Gurumurthy

Similar to Semantic Equivalence of e-Commerce Queries (20)

JAVA 2013 IEEE DATAMINING PROJECT Comparable entity mining from comparative q...

Strategies for using alternative queries to mitigate zero results

Introduction to Machine Learning

Sentiment Analysis: A comparative study of Deep Learning and Machine Learning

Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

MLconf 2017 Seattle Lunch Talk - Using Optimal Learning to tune Deep Learning...

E017252831

Extraction of Data Using Comparable Entity Mining

Demystifying Machine Learning

Helping Searchers Satisfice through Query Understanding

How to fine-tune and develop your own large language model.pptx

VMworld vBrownbag vmtn6739e - machine learning (ai) for workload analytics an...

Movie Recommendation System.pptx

Neel Sundaresan - Teaching a machine to code

ChatGPT-and-Generative-AI-Landscape Working of generative ai search

Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018

Interpretable Machine Learning

AWS_Meetup_BLR_July_22_Social.pdf

Deep learning Introduction and Basics

Everything you need to know about AutoML

More from Daniel Tunkelang

MMM, Search!

Daniel Tunkelang

Enterprise Intelligence

Daniel Tunkelang

Query Understanding: A Manifesto

Daniel Tunkelang

Where should you put your data scientists?

Daniel Tunkelang

Data Science: A Mindset for Productivity

Daniel Tunkelang

Data Science: A Mindset for Productivity Keynote at 2015 Ronin Labs West Coast CTO Summit https://www.eventjoy.com/e/west-coast-cto-summit-2015 Abstract Data science isn't just about using a collection of technologies and algorithms. Data science requires a mindset that solves problems at a higher level of abstraction. How do we model utility when we think about optimization? How do we decide which hypotheses to test? How do we allocate our scarce resources to make progress? There are no silver bullets. But I'll share what I've learned from a variety of contexts over the course of my work at Endeca, Google, and LinkedIn; and I hope you'll leave this talk with some practical wisdom you can apply to your next data science project.

My Three Ex’s: A Data Science Approach for Applied Machine Learning

Daniel Tunkelang

My Three Ex’s: A Data Science Approach for Applied Machine Learning Daniel Tunkelang (LinkedIn) Presented at QCon San Francisco 2014 in the Applied Machine Learning and Data Science track https://qconsf.com/presentation/my-three-ex%E2%80%99s-data-science-approach-applied-machine-learning Abstract This talk is about applying machine learning to solve problems. It’s not a talk about machine learning — or at least not about the theory of machine learning. Theoretical machine learning requires a deep understanding of computer science and statistics. It’s one of the most studied areas of computer science, and advances in theoretical machine learning give us hope of solving the world’s “AI-hard” problems. Applied machine learning is more grounded but no less important. We are surrounded by opportunities to apply classifiers, learn rules, compute similarity, and assemble clusters. We don’t need to develop new algorithms for any of these problems — our textbooks and open-source libraries have done that hard work for us. But algorithms are not enough. Applying machine learning to solve problems requires a data science mindset that transcends the algorithmic details. In this talk, I’ll communicate the data science mindset by describing my three ex’s: express, explain, and experiment. These three activities are the pillars of a successful strategy for applying machine learning to solve problems. Whether you’re a machine learning novice or expert, I hope you’ll leave this talk with some practical wisdom you can apply to your next project.

Web science - How is it different?

Daniel Tunkelang

Web Science: How is it different? Daniel Tunkelang, LinkedIn Keynote Address at ACM Web Science 2014 Conference The scientific method of observation, measurement, and experiment may be our greatest achievement as a species. The technological innovation we enjoy today is the product of a culture of systematized scientific experimentation. But historically scientific experimentation has been expensive. Experiments consumed natural resources, took a long time to conduct, and required even more time and labor to analyze. In order to be productive, scientists have had to factor these costs into their work and to optimize accordingly. Web science is different. Not, as some have speciously argued, because big data has made the scientific method obsolete. The key difference is that web science has changed the economics of scientific experimentation. Thus, even as web scientists apply the traditional scientific method, they optimize based on very different economics. In this talk, I'll survey how web science has changed our approach to experimentation, for better and for worse. Specifically, I'll talk about differences in hypothesis generation, offline analysis, and online testing. Bio Daniel Tunkelang is Head of Query Understanding at LinkedIn, where he previously formed and led the product data science team. LinkedIn search allows members to find people, companies, jobs, groups and other content. His team aims to provide users with the best possible results that satisfy their information needs and help to get insights from professional data. Tunkelang has BS and MS degrees in computer science and math from MIT, and a PhD in computer science from CMU. He co-founded the annual symposium on human-computer interaction and information retrieval (HCIR) and wrote the first book on Faceted Search (Morgan and Claypool 2009). Prior to joining LinkedIn, Tunkelang was Chief Scientist of Endeca (acquired by Oracle in 2011 for $1.1B) and leader of the local search quality team at Google, mapping local businesses to their home pages. He is the co-inventor of 20 patents.

Better Search Through Query Understanding

Daniel Tunkelang

Better Search Through Query Understanding Presented as a Data Talk at Intuit on April 22, 2014 Search is a fundamental problem of our time — we use search engines daily to satisfy a variety of personal and professional information needs. But search engine development still feels stuck in an information retrieval paradigm that focuses on result ranking. In this talk, I’ll advocate an emphasis on query understanding. I’ll talk about how we implement query understanding at LinkedIn, and I’ll present examples from the broader web. Hopefully you’ll come out with a different perspective on search and share my appreciation for how we can improve search through query understanding. About the Speaker Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.

Social Search in a Professional Context

Daniel Tunkelang

Keynote at CIKM 2013 Workshop on Data-driven User Behavioral Modelling and Mining from Social Media Social Search in a Professional Context Daniel Tunkelang (LinkedIn) Social networks bring a new dimension to search. Instead of looking for web pages or text documents, LinkedIn members search a world of entities connected by a rich graph of relationships. Search is a fundamental part of the LinkedIn ecosystem, as it helps our members find and be found. Unlike most search applications, LinkedIn's search experience is highly personalized: two LinkedIn members performing the same search query are likely to see completely different results. Delivering the right results to the right person depends on our ability to leverage our each member's unique professional identity and network. In this talk, I'll describe the kinds of search behavior we see on LinkedIn, and some of the approaches we've taken to help our members address their information needs.

Find and be Found: Information Retrieval at LinkedIn

Daniel Tunkelang

Find and Be Found: Information Retrieval at LinkedIn SIGIR 2013 Industry Track Presentation http://sigir2013.ie/industry_track.html LinkedIn has a unique data collection: the 200M+ members who use LinkedIn are also the most valuable entities in our corpus, which consists of people, companies, jobs, and a rich content ecosystem. Our members use LinkedIn to satisfy a diverse set of navigational and exploratory information needs, which we address by leveraging semi-structured and social content to understanding their query intent and deliver a personalized search experience. In this talk, we will discuss some of the unique challenges we face in building the LinkedIn search platform, the solutions we've developed so far, and the open problems we see ahead of us. Shakti Sinha heads LinkedIn's search relevance team, and has been making key contributions to LinkedIn's search products since 2010. He previously worked at Google as both a research intern and a software engineer. He has an MS in Computer Science from Stanford, as well as a BS degree from College of Engineering, Pune. Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.

Search as Communication: Lessons from a Personal Journey

Daniel Tunkelang

Search as Communication: Lessons from a Personal Journey by Daniel Tunkelang (Head of Query Understanding, LinkedIn) Presented at Etsy's Code as Craft Series on May 21, 2013 When I tell people I spent a decade studying computer science at MIT and CMU, most assume that I focused my studies in information retrieval — after all, I’ve spent most of my professional life working on search. But that’s not how it happened. I learned about information extraction as a summer intern at IBM Research, where I worked on visual query reformulation. I learned how search engines work by building one at Endeca. It was only after I’d hacked my way through the problem for a few years that I started to catch up on the rich scholarly literature of the past few decades. As a result, I developed a point of view about search without the benefit of academic conventional wisdom. Specifically, I came to see search not so much as a ranking problem as a communication problem. In this talk, I’ll explain my communication-centric view of search, offering examples, general techniques, and open problems. -- Daniel Tunkelang is Head of Query Understanding at LinkedIn. Educated at MIT and CMU, he has his career working on big data, addressing key challenges in search, data mining, user interfaces, and network analysis. He co-founded enterprise search and business intelligence pioneer Endeca, where he spent a decade as its Chief Scientist. In 2011, Endeca was acquired by Oracle for over $1B. Previous to LinkedIn, he led a team at Google working on local search quality. Daniel has authored fifteen patents, written a textbook on faceted search, and created the annual symposium on human-computer interaction and information retrieval.

Enterprise Search: How do we get there from here?

Daniel Tunkelang

Enterprise Search: How Do We Get There From Here? by Daniel Tunkelang (Head of Query Understanding, LinkedIn) Keynote at 2013 Enterprise Search Summit We've been tackling the challenges of enterprise and site search for at least 3 decades. We've succeeded to the point that search is the gateway to many of our information repositories. Nonetheless, users of enterprise search systems are frustrated with these systems' shortcomings. We see this frustration in surveys, but, more importantly, most of us experience it personally in our daily work life. We all dream of a world where searching any information repository is as effective as searching the web—perhaps even more so. A world where we find what we're looking for, or quickly determine that it doesn't exist. Is this Utopia possible? If so, how do we get there from here? Or at least somewhere close? In this talk, Tunkelang reviews the track record of enterprise search. He talks about what's worked and what hasn't, especially as compared to web search. Finally, he proposes some paths to bring us closer to our dream. -- Daniel Tunkelang is Head of Query Understanding at LinkedIn. Educated at MIT and CMU, he has his career working on big data, addressing key challenges in search, data mining, user interfaces, and network analysis. He co-founded enterprise search and business intelligence pioneer Endeca, where he spent a decade as its Chief Scientist. In 2011, Endeca was acquired by Oracle for over $1B. Previous to LinkedIn, he led a team at Google working on local search quality. Daniel has authored fifteen patents, written a textbook on faceted search, and created the annual symposium on human-computer interaction and information retrieval.

Big Data, We Have a Communication Problem

Daniel Tunkelang

Big Data, We Have a Communication Problem by Daniel Tunkelang Presented on April 30, 2013 at the TTI/Vanguard Conference on Ginormous Systems http://www.ttivanguard.com/conference/2013/ginormous.html It's a cliché that we live in a world of Big Data. But the bottleneck in understanding data is not computational. Rather, the biggest challenge is designing technical solutions that effectively leverage human cognitive ability. Data analysis systems should augment people's capabilities rather than replace them. This argument is as old as computer science itself: in 1962, Doug Engelbart said that the goal of technology is “the enhancement of human intellect by increasing the capability of a human to approach a complex problem situation.” Algorithms extract signal from raw data, but people fill in the gaps, creating models and evaluating analyses. Empowering people to understand data is not just a surface problem of building better interfaces and visualizations. We need to interact with data not only after performing computational analysis, but throughout the analysis process in order to improve our models and algorithms. In order to do so, we need tools and processes specifically designed to offer people transparency, guidance, and control. Human-computer information retrieval has been revolutionizing our approach to information seeking -- no modern search engine limits users to black-box relevance ranking and ten blue links. We need to take similar steps in our analysis of big data, making people the center of the analysis process and developing the technical innovations that enable people to fulfill this role.

How to Interview a Data Scientist

Daniel Tunkelang

How To Interview a Data Scientist Daniel Tunkelang Presented at the O'Reilly Strata 2013 Conference Video: https://www.youtube.com/watch?v=gUTuESHKbXI Interviewing data scientists is hard. The tech press sporadically publishes “best” interview questions that are cringe-worthy. At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. And, when we test coding and algorithmic problem solving, we do it with real problems that we’ve faced in the course of our day jobs. In general, we try as hard as possible to make the interview process representative of actual work. In this session, I’ll offer general principles and concrete examples of how to interview data scientists. I’ll also touch on the challenges of sourcing and closing top candidates.

Information, Attention, and Trust: A Hierarchy of Needs

Daniel Tunkelang

Data By The People, For The People

Daniel Tunkelang

Data By The People, For The People Daniel Tunkelang Director, Data Science at LinkedIn Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful. Bio: Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.

Content, Connections, and Context

Daniel Tunkelang

Content, Connections, and Context Daniel Tunkelang, LinkedIn Keynote at Workshop on Recommender Systems and the Social Web At 6th ACM International Conference on Recommender Systems (RecSys 2012) Recommender systems for the social web combine three kinds of signals to relate the subject and object of recommendations: content, connections, and context. Content comes first - we need to understand what we are recommending and to whom we are recommending it in order to decide whether the recommendation is relevant. Connections supply a social dimension, both as inputs to improve relevance and as social proof to explain the recommendations. Finally, context determines where and when a recommendation is appropriate. I'll talk about how we use these three kinds of signals in LinkedIn's recommender systems, as well as the challenges we see in delivering social recommendations and measuring their relevance.

Scale, Structure, and Semantics

Daniel Tunkelang

Keynote at 2012 Semantic Technology and Business Conference Scale, Structure, and Semantics Daniel Tunkelang, LinkedIn Science fiction has a mixed track record when it comes to anticipating technological innovations. While Jules Verne fared well with with his predictions of submarine and space technology, artificial intelligence hasn't produced anything like Arthur C. Clarke's HAL 9000. Instead, we've managed to elicit intelligence from machines through unexpected means. Search engines have achieved remarkable success in organizing the world's information by crawling the web, indexing documents, and exploiting link structure to establish authoritativeness. At LinkedIn, we apply large-scale analytics to terabytes of semistructured data to deliver products and insights that serve our 150M+ members. Semantics emerge when we apply the right analytical techniques to a sufficient quality and quantity of data. In this talk, I will describe how LinkedIn's huge and rich graph of relationship data that powers the products our users love. I believe that the lessons we have learned apply broadly to other semantic applications. While quantity and quality of data are the key challenges to delivering a semantically rich experience, the key is to create the right ecosystem that incents people to give you good data, which then forms the basis for great data products.

Strata 2012: Humans, Machines, and the Dimensions of Microwork

Daniel Tunkelang

Presentation from O'Reilly Strata 2012 on Big Data Humans, Machines, and the Dimensions of Microwork Daniel Tunkelang (LinkedIn) Claire Hunsaker (Samasource) The advent of crowdsourcing has wildly expanded the ways we think of incorporating human judgments into computational workflows. Computer scientists, economists, and sociologists have explored how to effectively and efficiently distribute microwork tasks to crowds and use their work as inputs to create or improve data products. Simultaneously, crowdsourcing providers are exploring the bounds of mechanical QA flows, worker interfaces, and workforce management systems. But what tasks should be performed by humans rather than algorithms? And what makes a set of human judgments robust? Quantity? Consensus? Quality or trustworthiness of the workers? Moreover, the robustness of judgments depends not only on the workers, but on the task design. Effective crowdsourcing is a cooperative endeavor. In this talk, we will analyze various dimensions of microwork that characterize applications, tasks, and crowds. Drawing on our experience at companies that have pioneered the use of microwork (Samasource) and data science (LinkedIn), we will offer practical advice to help you design crowdsourcing workflows to meet your data product needs.

Recommendations as a Conversation with the User

Daniel Tunkelang

These slides are from a tutorial at the 5th ACM International Conference on Recommender Systems (RecSys 2011). Recommender systems aim to provide users with products or content that satisfy the users' stated or inferred needs. The primary evaluation measures for recommender systems emphasize either the perceived relevance of the recommendations or the actions associated with those recommendations (e.g., purchases or clicks). Unfortunately, this transactional emphasis neglects how users interact with recommendations in the context of information seeking tasks. The effectiveness of this interaction determines the user's experience beyond a single transaction. This tutorial explores the role of recommendations as part of a conversation between the user and an information seeking system. The tutorial does not require any special background in interfaces or usability, and will focus on practical techniques to make recommender systems most effective for users.

More from Daniel Tunkelang (20)

MMM, Search!

Enterprise Intelligence

Query Understanding: A Manifesto

Where should you put your data scientists?

Data Science: A Mindset for Productivity

My Three Ex’s: A Data Science Approach for Applied Machine Learning

Web science - How is it different?

Better Search Through Query Understanding

Social Search in a Professional Context

Find and be Found: Information Retrieval at LinkedIn

Search as Communication: Lessons from a Personal Journey

Enterprise Search: How do we get there from here?

Big Data, We Have a Communication Problem

How to Interview a Data Scientist

Information, Attention, and Trust: A Hierarchy of Needs

Data By The People, For The People

Content, Connections, and Context

Scale, Structure, and Semantics

Strata 2012: Humans, Machines, and the Dimensions of Microwork

Recommendations as a Conversation with the User

Recently uploaded

Globus Compute Introduction - GlobusWorld 2024

Globus

Lecture 1 Introduction to games development

abdulrafaychaudhry

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

Globus

JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

Tier1 app

Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.

How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?

XfilesPro

Designing for Privacy in Amazon Web Services

KrzysztofKkol1

Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload. Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.

First Steps with Globus Compute Multi-User Endpoints

Globus

In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.

Software Testing Exam imp Ques Notes.pdf

MayankTawar1

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

WSO2

SOCRadar Research Team: Latest Activities of IntelBroker

SOCRadar

The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month. The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies. However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News. Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...

Globus

The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.

Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL

Natan Silnitsky

In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey. Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience. Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system. Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.

Cyaniclab : Software Development Agency Portfolio.pdf

Cyanic lab

CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...

Shahin Sheidaei

Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.

How to Position Your Globus Data Portal for Success Ten Good Practices

Globus

Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.

Prosigns: Transforming Business with Tailored Technology Solutions

Prosigns

Unlocking Business Potential: Tailored Technology Solutions by Prosigns Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support. Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth. Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices. AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making. Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency. DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration. Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly. Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business. Join us on a journey of innovation and growth. Let's partner for success with Prosigns.

Cracking the code review at SpringIO 2024

Paco van Beckhoven

Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production. Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process? In this session we will cover: - The Art of Effective Code Reviews - Streamlining the Review Process - Elevating Reviews with Automated Tools By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces

How Recreation Management Software Can Streamline Your Operations.pptx

wottaspaceseo

Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf

AMB-Review

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos https://www.amb-review.com/tubetrivia-ai Exclusive Features: AI-Powered Questions, Wide Range of Categories, Adaptive Difficulty, User-Friendly Interface, Multiplayer Mode, Regular Updates. #TubeTriviaAI #QuizVideoMagic #ViralQuizVideos #AIQuizGenerator #EngageExciteExplode #MarketingRevolution #BoostYourTraffic #SocialMediaSuccess #AIContentCreation #UnlimitedTraffic

Enhancing Research Orchestration Capabilities at ORNL.pdf

Globus

Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.

Recently uploaded (20)

Globus Compute Introduction - GlobusWorld 2024

Lecture 1 Introduction to games development

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?

Designing for Privacy in Amazon Web Services

First Steps with Globus Compute Multi-User Endpoints

Software Testing Exam imp Ques Notes.pdf

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

SOCRadar Research Team: Latest Activities of IntelBroker

Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...

Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL

Cyaniclab : Software Development Agency Portfolio.pdf

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...

How to Position Your Globus Data Portal for Success Ten Good Practices

Prosigns: Transforming Business with Tailored Technology Solutions

Cracking the code review at SpringIO 2024

How Recreation Management Software Can Streamline Your Operations.pptx

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf

Enhancing Research Orchestration Capabilities at ORNL.pdf

Semantic Equivalence of e-Commerce Queries

1. Semantic Equivalence of e-Commerce Queries Aritra Mandal and Daniel Tunkelang and Zhe Wu eBay Inc.

2. Search query != search intent. ● Information retrieval researchers worry about queries that map to multiple intents. jaguar or ? ● Practitioners worry more about multiple queries that map to the same intent. lightning to 3.5mm iphone to aux

3. Equivalent queries should yield equivalent experiences. Recall? CTR? Conversion Rate? ... ? or

4. Opportunity to increase recall while preserving precision. Similar but not equivalent intent.

5. High-level strategy to leverage query equivalence. Map queries to vectors. Store in nearest-neighbor database. (i.e., optimize for user or business outcome)

6. Two strategies for recognizing equivalent queries. ● Surface Similarity ○ Variation in inflection, word order, compounding, noise words. black tshirts for men = mens black t-shirt = ● Behavioral Similarity ○ Queries lead to engagement with equivalent or similar results. lightning to 3.5mm = iphone to aux =

7. Introducing the “bag of documents” model.

8. Query vectors are centroids of associated product vectors ► ► [0.13, 0.81, … ] [0.09, 0.75, … ] … ► [0.11, 0.79, … ] [0.13, 0.81, … ] [0.09, 0.77, … ] … ► [0.12, 0.78, … ] ► cos > 0.98 black tshirts for men mens black t-shirt

9. Works well, but only for head and torso queries. ● Offline approach works for queries with enough engagement history. ● Would be expensive to compute aggregates of result vectors online. ● Still, head and torso queries tend to represent a large fraction of traffic.

10. Train online sentence transformer model for tail queries. ● Train using (query1, query2, similarity) triples from offline model. ● Oversample similar query pairs to increase sensitivity where it matters. ● Fine-tune a pre-trained micro-BERT sentence transformer model. ● Concatenate the output of a query classifier to the query keywords.

11. Architecture for Online Query Similarity Model

12. Results Model Dataset Name Pearson’s correlation query-sim-ecom eBay Internal 0.87 query-sim-ecom ESCI query-query 0.85 all-MiniLM-L12-v2 ESCI query-query 0.68 Query 1 Query 2 cosine hdmi to galaxy s8 s9 hdmi 0.9993 movie money prop money 0.9995 cassette adapter for iphone tape to aux 0.9993 Examples from ESCI of queries with low surface but high behavioral similarity:

13. Summary ● Queries with equivalent intent should yield equivalent experiences. ● Query similarity can increase recall while preserving precision. ● Signals can come from either surface or behavioral similarity. ● Offline bag-of-documents model: queries as means of product vectors. ● Fine-tune online Micro-BERT sentence transformer model for tail queries. ● It just works!

Semantic Equivalence of e-Commerce Queries

Recommended

Recommended

More Related Content

Similar to Semantic Equivalence of e-Commerce Queries

Similar to Semantic Equivalence of e-Commerce Queries (20)

More from Daniel Tunkelang

More from Daniel Tunkelang (20)

Recently uploaded

Recently uploaded (20)

Semantic Equivalence of e-Commerce Queries