Behavioral economics transformed how we think about human decision making, rejecting expected utility maximization for the real world of heuristics, biases, and satisficing. In this talk, I'll argue that our thinking about search engines needs a similar transformation. I will compare the Probability Ranking Principle to expected utility maximization and offer ways that AI can help searchers satisfice through query understanding.
This was an invited talk given at the 2023 Walmart AI Summit.
Speaker Bio
Daniel Tunkelang is an independent consultant specializing in search, machine learning / AI, and data science. He completed undergraduate and master's degrees in Computer Science and Math at MIT and a PhD in computer science at CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired in 2011. He then led engineering and data science teams at Google and LinkedIn. He has written a book on Faceted Search, and he blogs on Medium about search-related topics — particularly query understanding. He has worked with numerous tech companies, retailers, and others, including Algolia, Apple, Canva, Coupang, eBay, Etsy, Flipkart, Home Depot, Oracle, Pinterest, Salesforce, Target, Yelp, and Zoom.
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Analytics India Magazine
Most organizations understand the predictive power and the potential gains from AIML, but AI and ML are still now a black box technology for them. While deep learning and neural networks can provide excellent inputs to businesses, leaders are challenged to use them because of the complete blind faith required to ‘trust’ AI. In this talk we will use the latest technological developments from researchers, the US defense department, and the industry to unbox the black box and provide businesses a clear understanding of the policy levers that they can pull, why, and by how much, to make effective decisions?
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Analytics India Magazine
Most organizations understand the predictive power and the potential gains from AIML, but AI and ML are still now a black box technology for them. While deep learning and neural networks can provide excellent inputs to businesses, leaders are challenged to use them because of the complete blind faith required to ‘trust’ AI. In this talk we will use the latest technological developments from researchers, the US defense department, and the industry to unbox the black box and provide businesses a clear understanding of the policy levers that they can pull, why, and by how much, to make effective decisions?
Data Con LA 2022 - Real world consumer segmentationData Con LA
Jaysen Gillespie, Head of Analytics and Data Science at RTB House
1. Shopkick has over 30M downloads, but the userbase is very heterogeneous. Anecdotal evidence indicated a wide variety of users for whom the app holds long-term appeal.
2. Marketing and other teams challenged Analytics to get beyond basic summary statistics and develop a holistic segmentation of the userbase.
3. Shopkick's data science team used SQL and python to gather data, clean data, and then perform a data-driven segmentation using a k-means algorithm.
4. Interpreting the results is more work -- and more fun -- than running the algo itself. We'll discuss how we transform from ""segment 1"", ""segment 2"", etc. to something that non-analytics users (Marketing, Operations, etc.) could actually benefit from.
5. So what? How did team across Shopkick change their approach given what Analytics had discovered.
Modern Search: Using ML & NLP advances to enhance search and discoveryAll Things Open
Presented at Open Source Charlotte
Presented by Grant Ingersoll
Title: Modern Search: Using ML & NLP advances to enhance search and discovery
Abstract: With the recent advances in natural language processing and machine learning thanks to deep learning and large general purpose models, many search applications are confronted with how best to upgrade their systems, if at all. In this talk, we’ll look at practical ways to enhance search using neural and other machine learning techniques across ranking, content understanding and query understanding. We’ll also look at the tradeoffs of traditional approaches with a goal of helping you decide what’s best for your application.
For more info on Open Source Charlotte: https://www.meetup.com/open-source-charlotte/
THINKING ABOUT THINKING
Audience: PM & BA
Level: All
Date: May 26
Time: 11:30 AM - 12:30 PM
Description
Thinking is a big part of a Project Manager’s and Business Analyst's job. But how often have you spent time thinking about thinking? This presentation looks at thinking as a critical soft skill for project managers and how a disciplined approach to thinking improves you effectiveness as a change agent for the company in the role of project manager. The presentation will discuss the Thinking Hats, Five Types of Thinking, and brush into the entire world of Business Analytics. The presentation focuses on how the skills of Strategic Analysis, Tactical Analysis, Predictive Analysis, Data mining work together for the complete business management cycle. To add to the thinking equation, the session will explore the power of Social Media sentiment and how the way people "feel" about things is an important factor in the business equation. Think about it !!!!
1. Participants will understand the relationship between planning, analysis, problem solving, decision making and thinking.
2. Students will be able to explain an "Adapting to Whats Happening Model" that includes Data Recording, Strategic Analysis, Tactical Analysis, Predictive Analysis, and Social Media Sentiment. And how it impacts the business.
3. Students will explore various factors of human bias and how that impacts thinking. The student will understand that bias cannot not be completely eliminated, but should be embraced as a human factor in any thinking exercise. The student will understand that personal perspective/bias is a factor, but not THE factor in thinking.
Introduction to Enterprise Search. A two hour class to introduce Enterprise Search. It covers:
The problems enterprise search can solve
History of (web) search
How we search and find?
Current state of Enterprise Search + stats
Technical concept
Information quality
Feedback cycle
Five dimensions of Findability
How AI will change the way you help students succeed - SchooLinksKatie Fang
In this presentation, we are going to uncover
1) why there's so much hype about AI/Machine Learning (and what these things really are)
2) Whirlwind tour of machine learning/statistics techniques and what they mean for counselors
3) Optimism for what the future brings - data as your friend rather than something to be managed.
As more and more organizations move from recognizing that unstructured data exists, and remains untapped, the field of semantic technology and text analysis capabilities is
A recommendation system, often referred to as a recommender system or recommendation engine, is a type of machine learning application that provides personalized suggestions or recommendations to users. These systems are widely used in various domains to help users discover products, services, or content that are likely to be of interest to them. There are several approaches to building recommendation systems in machine learning:
How to Write an Analytical Essay (with Samples) | EssayPro. How to write an Analytical Essay? - The English Digest. Analytical Essay - 6+ Examples, Format, Pdf | Examples. Learn How to Write an Analytical Essay on Trust My Paper. Top 100 Ideas for Analytical Essay Topics. 100+ Interesting Analytical Essay Topics And Ideas For 2022. 40 Analytical Essay Topics To Write - YourPerfectEssay. 100+ Analytical Essay Topics for 2022 Recommended By Professionals. Example Of A Analytical Essay – Telegraph. How to Write an Analytical Essay: A Complete Guide & Examples .... Analytical Essay Writing - Guide, Topics and Examples. How to Write Analytical Essay - Complete Essay Format | Analytical .... Analytical Essay: 5 Essential Tips for Writing the Best Paper .... Calaméo - Analytical Essay Writing Ideas and Topics. Top 50 Analytical Essay Topics and Ideas for College/UNI Students. Complete Analytical Essay Writing Guide | Topics & Examples | Essay .... Analytical Essay Writing Tips For College Students - Blog BuyEssayClub.com. College Essay: Analytical essay introduction example. 250 Best Analytical Essay Topics and Ideas. How To Write A Winning Analytical Essay ? – Retail Study Tour. How To Write A Analytical Essay. PPT - Best Analytical Essay Tips PowerPoint Presentation, free download .... Analytical Essay Topics | OneEssay Writer Tips. College essay: Analytical expository essay example. How To Write Analytical Essays With Ease? Essay Writing Help. Thesis Statement In Analytical Essay - Thesis Title Ideas for College. Analytical Essay: Analytical topics for essays. Analytical Essay Writing | Essay starters, Essay writing tips, Creative .... Analytical Essay | English - Year 12 QCE | Thinkswap. Writing An Analytical Essay. Analytical essay: Analytical essay example topics. How to Write an Analytical Essay: 15 Steps (with Pictures). Complete Analytical Essay Writing Guide | Topics & Tips | Essay writing ... Topics For Analytical Essay
Leading and leaning-in on Ai in Recruitment
● What is Ai and why does it matter?
● What value does Ai add to the recruitment life cycle?
● What risks should you be aware of?
● Key questions to ask to evaluate and mitigate risks
● The FAIR™ Framework
● The Power of intelligent chat to Hire with Heart
Fantastic Problems and Where to Find Them: Daryl WeirFuturice
Machine learning and big data have been buzz words for years now, but how do you know you have a machine learning problem on your hands? These slides, taken from a Futurice Beer & Tech talk, describe the types of problems ML methods are well suited to solve, with examples from a wide variety of industries. The deck also tells you where to get started if you want to try solving one of these problems yourself.
Title:
Semantic Equivalence of e-Commerce Queries
Authors:
Aritra Mandal, Daniel Tunkelang, Zhe Wu
Presented at KDD 2023 Workshop on E-Commerce and Natural Language Processing (ECNLP 2023).
MMM, Search!
An opinionated discussion of search metrics, models, and methods. Presented to the Wikimedia Foundation on April 27, 2020.
About the Speaker
Daniel Tunkelang is an independent consultant specializing in search, discovery, machine learning / AI, and data science.
He was a founding employee of Endeca, a search pioneer that Oracle acquired. After 10 years at Endeca, he moved to Google, where he led a local search team. He then served as a director of data science and search at LinkedIn.
After leaving LinkedIn in 2015, he became an independent consultant. His clients have included Apple, eBay, Coupang, Etsy, Flipkart, Gartner, Pinterest, Salesforce, and Yelp; as well as some of the largest traditional retailers.
Daniel completed undergraduate and master's degrees in Computer Science and Math at MIT and a Ph.D. in computer science at CMU. He wrote a book on Faceted Search, published by Morgan & Claypool, and he blogs on Medium about search-related topics -- particularly about query understanding. He is also active on Twitter, LinkedIn, and Quora.
More Related Content
Similar to Helping Searchers Satisfice through Query Understanding
Data Con LA 2022 - Real world consumer segmentationData Con LA
Jaysen Gillespie, Head of Analytics and Data Science at RTB House
1. Shopkick has over 30M downloads, but the userbase is very heterogeneous. Anecdotal evidence indicated a wide variety of users for whom the app holds long-term appeal.
2. Marketing and other teams challenged Analytics to get beyond basic summary statistics and develop a holistic segmentation of the userbase.
3. Shopkick's data science team used SQL and python to gather data, clean data, and then perform a data-driven segmentation using a k-means algorithm.
4. Interpreting the results is more work -- and more fun -- than running the algo itself. We'll discuss how we transform from ""segment 1"", ""segment 2"", etc. to something that non-analytics users (Marketing, Operations, etc.) could actually benefit from.
5. So what? How did team across Shopkick change their approach given what Analytics had discovered.
Modern Search: Using ML & NLP advances to enhance search and discoveryAll Things Open
Presented at Open Source Charlotte
Presented by Grant Ingersoll
Title: Modern Search: Using ML & NLP advances to enhance search and discovery
Abstract: With the recent advances in natural language processing and machine learning thanks to deep learning and large general purpose models, many search applications are confronted with how best to upgrade their systems, if at all. In this talk, we’ll look at practical ways to enhance search using neural and other machine learning techniques across ranking, content understanding and query understanding. We’ll also look at the tradeoffs of traditional approaches with a goal of helping you decide what’s best for your application.
For more info on Open Source Charlotte: https://www.meetup.com/open-source-charlotte/
THINKING ABOUT THINKING
Audience: PM & BA
Level: All
Date: May 26
Time: 11:30 AM - 12:30 PM
Description
Thinking is a big part of a Project Manager’s and Business Analyst's job. But how often have you spent time thinking about thinking? This presentation looks at thinking as a critical soft skill for project managers and how a disciplined approach to thinking improves you effectiveness as a change agent for the company in the role of project manager. The presentation will discuss the Thinking Hats, Five Types of Thinking, and brush into the entire world of Business Analytics. The presentation focuses on how the skills of Strategic Analysis, Tactical Analysis, Predictive Analysis, Data mining work together for the complete business management cycle. To add to the thinking equation, the session will explore the power of Social Media sentiment and how the way people "feel" about things is an important factor in the business equation. Think about it !!!!
1. Participants will understand the relationship between planning, analysis, problem solving, decision making and thinking.
2. Students will be able to explain an "Adapting to Whats Happening Model" that includes Data Recording, Strategic Analysis, Tactical Analysis, Predictive Analysis, and Social Media Sentiment. And how it impacts the business.
3. Students will explore various factors of human bias and how that impacts thinking. The student will understand that bias cannot not be completely eliminated, but should be embraced as a human factor in any thinking exercise. The student will understand that personal perspective/bias is a factor, but not THE factor in thinking.
Introduction to Enterprise Search. A two hour class to introduce Enterprise Search. It covers:
The problems enterprise search can solve
History of (web) search
How we search and find?
Current state of Enterprise Search + stats
Technical concept
Information quality
Feedback cycle
Five dimensions of Findability
How AI will change the way you help students succeed - SchooLinksKatie Fang
In this presentation, we are going to uncover
1) why there's so much hype about AI/Machine Learning (and what these things really are)
2) Whirlwind tour of machine learning/statistics techniques and what they mean for counselors
3) Optimism for what the future brings - data as your friend rather than something to be managed.
As more and more organizations move from recognizing that unstructured data exists, and remains untapped, the field of semantic technology and text analysis capabilities is
A recommendation system, often referred to as a recommender system or recommendation engine, is a type of machine learning application that provides personalized suggestions or recommendations to users. These systems are widely used in various domains to help users discover products, services, or content that are likely to be of interest to them. There are several approaches to building recommendation systems in machine learning:
How to Write an Analytical Essay (with Samples) | EssayPro. How to write an Analytical Essay? - The English Digest. Analytical Essay - 6+ Examples, Format, Pdf | Examples. Learn How to Write an Analytical Essay on Trust My Paper. Top 100 Ideas for Analytical Essay Topics. 100+ Interesting Analytical Essay Topics And Ideas For 2022. 40 Analytical Essay Topics To Write - YourPerfectEssay. 100+ Analytical Essay Topics for 2022 Recommended By Professionals. Example Of A Analytical Essay – Telegraph. How to Write an Analytical Essay: A Complete Guide & Examples .... Analytical Essay Writing - Guide, Topics and Examples. How to Write Analytical Essay - Complete Essay Format | Analytical .... Analytical Essay: 5 Essential Tips for Writing the Best Paper .... Calaméo - Analytical Essay Writing Ideas and Topics. Top 50 Analytical Essay Topics and Ideas for College/UNI Students. Complete Analytical Essay Writing Guide | Topics & Examples | Essay .... Analytical Essay Writing Tips For College Students - Blog BuyEssayClub.com. College Essay: Analytical essay introduction example. 250 Best Analytical Essay Topics and Ideas. How To Write A Winning Analytical Essay ? – Retail Study Tour. How To Write A Analytical Essay. PPT - Best Analytical Essay Tips PowerPoint Presentation, free download .... Analytical Essay Topics | OneEssay Writer Tips. College essay: Analytical expository essay example. How To Write Analytical Essays With Ease? Essay Writing Help. Thesis Statement In Analytical Essay - Thesis Title Ideas for College. Analytical Essay: Analytical topics for essays. Analytical Essay Writing | Essay starters, Essay writing tips, Creative .... Analytical Essay | English - Year 12 QCE | Thinkswap. Writing An Analytical Essay. Analytical essay: Analytical essay example topics. How to Write an Analytical Essay: 15 Steps (with Pictures). Complete Analytical Essay Writing Guide | Topics & Tips | Essay writing ... Topics For Analytical Essay
Leading and leaning-in on Ai in Recruitment
● What is Ai and why does it matter?
● What value does Ai add to the recruitment life cycle?
● What risks should you be aware of?
● Key questions to ask to evaluate and mitigate risks
● The FAIR™ Framework
● The Power of intelligent chat to Hire with Heart
Fantastic Problems and Where to Find Them: Daryl WeirFuturice
Machine learning and big data have been buzz words for years now, but how do you know you have a machine learning problem on your hands? These slides, taken from a Futurice Beer & Tech talk, describe the types of problems ML methods are well suited to solve, with examples from a wide variety of industries. The deck also tells you where to get started if you want to try solving one of these problems yourself.
Title:
Semantic Equivalence of e-Commerce Queries
Authors:
Aritra Mandal, Daniel Tunkelang, Zhe Wu
Presented at KDD 2023 Workshop on E-Commerce and Natural Language Processing (ECNLP 2023).
MMM, Search!
An opinionated discussion of search metrics, models, and methods. Presented to the Wikimedia Foundation on April 27, 2020.
About the Speaker
Daniel Tunkelang is an independent consultant specializing in search, discovery, machine learning / AI, and data science.
He was a founding employee of Endeca, a search pioneer that Oracle acquired. After 10 years at Endeca, he moved to Google, where he led a local search team. He then served as a director of data science and search at LinkedIn.
After leaving LinkedIn in 2015, he became an independent consultant. His clients have included Apple, eBay, Coupang, Etsy, Flipkart, Gartner, Pinterest, Salesforce, and Yelp; as well as some of the largest traditional retailers.
Daniel completed undergraduate and master's degrees in Computer Science and Math at MIT and a Ph.D. in computer science at CMU. He wrote a book on Faceted Search, published by Morgan & Claypool, and he blogs on Medium about search-related topics -- particularly about query understanding. He is also active on Twitter, LinkedIn, and Quora.
Enterprise Intelligence: Putting the Pieces Together
http://enterpriserelevance.com/kdd2016/keynote.html
These slides are for a keynote presentation delivered at the Workshop on Enterprise Intelligence, held in conjunction with the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016).
About the author:
Daniel Tunkelang is a data science and engineering executive who has built and led some of the strongest teams in the software industry. He studied computer science and math at MIT and has a PhD in computer science from CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired for $1.1B. He led a local search team at Google. He was a director of data science and engineering at LinkedIn, and he established their query understanding team. Daniel is a widely recognized writer and speaker. He is frequently invited to speak at academic and industry conferences, particularly in the areas of information retrieval, web science, and data science. He has written the definitive textbook on faceted search (now a standard for ecommerce sites), established an annual symposium on human-computer interaction and information retrieval, and authored 24 US patents. His social media posts have attracted over a million page views. Daniel advises and consults for companies that can benefit strategically from his expertise. His clients range from early-stage startups to "unicorn" technology companies like Etsy and Pinterest. He helps companies make decisions around algorithms, technology, product strategy, hiring, and organizational structure.
Query understanding is about focusing less on the results and more on the query. It’s about figuring out what the searcher wants, rather than scoring and ranking results. Once you’ve established this mindset, your approach to search changes: you focus on query performance rather than ranking.
Presented at QConSF 2016: https://qconsf.com/sf2016/presentation/query-understanding-manifesto
I delivered this keynote at the Fast Forward Labs Data Leadership Conference on April 28, 2016. You can find related materials in the following publications:
https://www.oreilly.com/ideas/where-should-you-put-your-data-scientists
http://firstround.com/review/doing-data-science-right-your-most-common-questions-answered/
Data Science: A Mindset for Productivity
Keynote at 2015 Ronin Labs West Coast CTO Summit
https://www.eventjoy.com/e/west-coast-cto-summit-2015
Abstract
Data science isn't just about using a collection of technologies and algorithms. Data science requires a mindset that solves problems at a higher level of abstraction. How do we model utility when we think about optimization? How do we decide which hypotheses to test? How do we allocate our scarce resources to make progress?
There are no silver bullets. But I'll share what I've learned from a variety of contexts over the course of my work at Endeca, Google, and LinkedIn; and I hope you'll leave this talk with some practical wisdom you can apply to your next data science project.
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
My Three Ex’s: A Data Science Approach for Applied Machine Learning
Daniel Tunkelang (LinkedIn)
Presented at QCon San Francisco 2014 in the Applied Machine Learning and Data Science track
https://qconsf.com/presentation/my-three-ex%E2%80%99s-data-science-approach-applied-machine-learning
Abstract
This talk is about applying machine learning to solve problems.
It’s not a talk about machine learning — or at least not about the theory of machine learning. Theoretical machine learning requires a deep understanding of computer science and statistics. It’s one of the most studied areas of computer science, and advances in theoretical machine learning give us hope of solving the world’s “AI-hard” problems.
Applied machine learning is more grounded but no less important. We are surrounded by opportunities to apply classifiers, learn rules, compute similarity, and assemble clusters. We don’t need to develop new algorithms for any of these problems — our textbooks and open-source libraries have done that hard work for us.
But algorithms are not enough. Applying machine learning to solve problems requires a data science mindset that transcends the algorithmic details.
In this talk, I’ll communicate the data science mindset by describing my three ex’s: express, explain, and experiment. These three activities are the pillars of a successful strategy for applying machine learning to solve problems. Whether you’re a machine learning novice or expert, I hope you’ll leave this talk with some practical wisdom you can apply to your next project.
Web Science: How is it different?
Daniel Tunkelang, LinkedIn
Keynote Address at ACM Web Science 2014 Conference
The scientific method of observation, measurement, and experiment may be our greatest achievement as a species. The technological innovation we enjoy today is the product of a culture of systematized scientific experimentation.
But historically scientific experimentation has been expensive. Experiments consumed natural resources, took a long time to conduct, and required even more time and labor to analyze. In order to be productive, scientists have had to factor these costs into their work and to optimize accordingly.
Web science is different. Not, as some have speciously argued, because big data has made the scientific method obsolete. The key difference is that web science has changed the economics of scientific experimentation. Thus, even as web scientists apply the traditional scientific method, they optimize based on very different economics.
In this talk, I'll survey how web science has changed our approach to experimentation, for better and for worse. Specifically, I'll talk about differences in hypothesis generation, offline analysis, and online testing.
Bio
Daniel Tunkelang is Head of Query Understanding at LinkedIn, where he previously formed and led the product data science team. LinkedIn search allows members to find people, companies, jobs, groups and other content. His team aims to provide users with the best possible results that satisfy their information needs and help to get insights from professional data. Tunkelang has BS and MS degrees in computer science and math from MIT, and a PhD in computer science from CMU. He co-founded the annual symposium on human-computer interaction and information retrieval (HCIR) and wrote the first book on Faceted Search (Morgan and Claypool 2009). Prior to joining LinkedIn, Tunkelang was Chief Scientist of Endeca (acquired by Oracle in 2011 for $1.1B) and leader of the local search quality team at Google, mapping local businesses to their home pages. He is the co-inventor of 20 patents.
Better Search Through Query Understanding
Presented as a Data Talk at Intuit on April 22, 2014
Search is a fundamental problem of our time — we use search engines daily to satisfy a variety of personal and professional information needs. But search engine development still feels stuck in an information retrieval paradigm that focuses on result ranking. In this talk, I’ll advocate an emphasis on query understanding. I’ll talk about how we implement query understanding at LinkedIn, and I’ll present examples from the broader web. Hopefully you’ll come out with a different perspective on search and share my appreciation for how we can improve search through query understanding.
About the Speaker
Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Keynote at CIKM 2013 Workshop on Data-driven User Behavioral Modelling and Mining from Social Media
Social Search in a Professional Context
Daniel Tunkelang (LinkedIn)
Social networks bring a new dimension to search. Instead of looking for web pages or text documents, LinkedIn members search a world of entities connected by a rich graph of relationships. Search is a fundamental part of the LinkedIn ecosystem, as it helps our members find and be found. Unlike most search applications, LinkedIn's search experience is highly personalized: two LinkedIn members performing the same search query are likely to see completely different results. Delivering the right results to the right person depends on our ability to leverage our each member's unique professional identity and network. In this talk, I'll describe the kinds of search behavior we see on LinkedIn, and some of the approaches we've taken to help our members address their information needs.
Find and be Found: Information Retrieval at LinkedInDaniel Tunkelang
Find and Be Found: Information Retrieval at LinkedIn
SIGIR 2013 Industry Track Presentation
http://sigir2013.ie/industry_track.html
LinkedIn has a unique data collection: the 200M+ members who use LinkedIn are also the most valuable entities in our corpus, which consists of people, companies, jobs, and a rich content ecosystem. Our members use LinkedIn to satisfy a diverse set of navigational and exploratory information needs, which we address by leveraging semi-structured and social content to understanding their query intent and deliver a personalized search experience. In this talk, we will discuss some of the unique challenges we face in building the LinkedIn search platform, the solutions we've developed so far, and the open problems we see ahead of us.
Shakti Sinha heads LinkedIn's search relevance team, and has been making key contributions to LinkedIn's search products since 2010. He previously worked at Google as both a research intern and a software engineer. He has an MS in Computer Science from Stanford, as well as a BS degree from College of Engineering, Pune.
Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
Search as Communication: Lessons from a Personal Journey
by Daniel Tunkelang (Head of Query Understanding, LinkedIn)
Presented at Etsy's Code as Craft Series on May 21, 2013
When I tell people I spent a decade studying computer science at MIT and CMU, most assume that I focused my studies in information retrieval — after all, I’ve spent most of my professional life working on search.
But that’s not how it happened. I learned about information extraction as a summer intern at IBM Research, where I worked on visual query reformulation. I learned how search engines work by building one at Endeca. It was only after I’d hacked my way through the problem for a few years that I started to catch up on the rich scholarly literature of the past few decades.
As a result, I developed a point of view about search without the benefit of academic conventional wisdom. Specifically, I came to see search not so much as a ranking problem as a communication problem.
In this talk, I’ll explain my communication-centric view of search, offering examples, general techniques, and open problems.
--
Daniel Tunkelang is Head of Query Understanding at LinkedIn. Educated at MIT and CMU, he has his career working on big data, addressing key challenges in search, data mining, user interfaces, and network analysis. He co-founded enterprise search and business intelligence pioneer Endeca, where he spent a decade as its Chief Scientist. In 2011, Endeca was acquired by Oracle for over $1B. Previous to LinkedIn, he led a team at Google working on local search quality. Daniel has authored fifteen patents, written a textbook on faceted search, and created the annual symposium on human-computer interaction and information retrieval.
Enterprise Search: How do we get there from here?Daniel Tunkelang
Enterprise Search: How Do We Get There From Here?
by Daniel Tunkelang (Head of Query Understanding, LinkedIn)
Keynote at 2013 Enterprise Search Summit
We've been tackling the challenges of enterprise and site search for at least 3 decades. We've succeeded to the point that search is the gateway to many of our information repositories. Nonetheless, users of enterprise search systems are frustrated with these systems' shortcomings. We see this frustration in surveys, but, more importantly, most of us experience it personally in our daily work life. We all dream of a world where searching any information repository is as effective as searching the web—perhaps even more so. A world where we find what we're looking for, or quickly determine that it doesn't exist. Is this Utopia possible? If so, how do we get there from here? Or at least somewhere close? In this talk, Tunkelang reviews the track record of enterprise search. He talks about what's worked and what hasn't, especially as compared to web search. Finally, he proposes some paths to bring us closer to our dream.
--
Daniel Tunkelang is Head of Query Understanding at LinkedIn. Educated at MIT and CMU, he has his career working on big data, addressing key challenges in search, data mining, user interfaces, and network analysis. He co-founded enterprise search and business intelligence pioneer Endeca, where he spent a decade as its Chief Scientist. In 2011, Endeca was acquired by Oracle for over $1B. Previous to LinkedIn, he led a team at Google working on local search quality. Daniel has authored fifteen patents, written a textbook on faceted search, and created the annual symposium on human-computer interaction and information retrieval.
Big Data, We Have a Communication Problem
by Daniel Tunkelang
Presented on April 30, 2013 at the TTI/Vanguard Conference on Ginormous Systems
http://www.ttivanguard.com/conference/2013/ginormous.html
It's a cliché that we live in a world of Big Data. But the bottleneck in understanding data is not computational. Rather, the biggest challenge is designing technical solutions that effectively leverage human cognitive ability. Data analysis systems should augment people's capabilities rather than replace them. This argument is as old as computer science itself: in 1962, Doug Engelbart said that the goal of technology is “the enhancement of human intellect by increasing the capability of a human to approach a complex problem situation.” Algorithms extract signal from raw data, but people fill in the gaps, creating models and evaluating analyses.
Empowering people to understand data is not just a surface problem of building better interfaces and visualizations. We need to interact with data not only after performing computational analysis, but throughout the analysis process in order to improve our models and algorithms. In order to do so, we need tools and processes specifically designed to offer people transparency, guidance, and control.
Human-computer information retrieval has been revolutionizing our approach to information seeking -- no modern search engine limits users to black-box relevance ranking and ten blue links. We need to take similar steps in our analysis of big data, making people the center of the analysis process and developing the technical innovations that enable people to fulfill this role.
How To Interview a Data Scientist
Daniel Tunkelang
Presented at the O'Reilly Strata 2013 Conference
Video: https://www.youtube.com/watch?v=gUTuESHKbXI
Interviewing data scientists is hard. The tech press sporadically publishes “best” interview questions that are cringe-worthy.
At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. And, when we test coding and algorithmic problem solving, we do it with real problems that we’ve faced in the course of our day jobs. In general, we try as hard as possible to make the interview process representative of actual work.
In this session, I’ll offer general principles and concrete examples of how to interview data scientists. I’ll also touch on the challenges of sourcing and closing top candidates.
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
Presented by Daniel Tunkelang, LinkedIn Director of Data Science, at Stanford's 2nd annual conference on Computational Social Science (CSS), hosted by Institute for Research in the Social Sciences (IRiSS).
Details at https://iriss.stanford.edu/css/conference-agenda-2013
Data By The People, For The People
Daniel Tunkelang
Director, Data Science at LinkedIn
Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful.
Bio:
Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Content, Connections, and Context
Daniel Tunkelang, LinkedIn
Keynote at Workshop on Recommender Systems and the Social Web
At 6th ACM International Conference on Recommender Systems (RecSys 2012)
Recommender systems for the social web combine three kinds of signals to relate the subject and object of recommendations: content, connections, and context.
Content comes first - we need to understand what we are recommending and to whom we are recommending it in order to decide whether the recommendation is relevant. Connections supply a social dimension, both as inputs to improve relevance and as social proof to explain the recommendations. Finally, context determines where and when a recommendation is appropriate.
I'll talk about how we use these three kinds of signals in LinkedIn's recommender systems, as well as the challenges we see in delivering social recommendations and measuring their relevance.
Keynote at 2012 Semantic Technology and Business Conference
Scale, Structure, and Semantics
Daniel Tunkelang, LinkedIn
Science fiction has a mixed track record when it comes to anticipating technological innovations. While Jules Verne fared well with with his predictions of submarine and space technology, artificial intelligence hasn't produced anything like Arthur C. Clarke's HAL 9000.
Instead, we've managed to elicit intelligence from machines through unexpected means. Search engines have achieved remarkable success in organizing the world's information by crawling the web, indexing documents, and exploiting link structure to establish authoritativeness. At LinkedIn, we apply large-scale analytics to terabytes of semistructured data to deliver products and insights that serve our 150M+ members. Semantics emerge when we apply the right analytical techniques to a sufficient quality and quantity of data.
In this talk, I will describe how LinkedIn's huge and rich graph of relationship data that powers the products our users love. I believe that the lessons we have learned apply broadly to other semantic applications. While quantity and quality of data are the key challenges to delivering a semantically rich experience, the key is to create the right ecosystem that incents people to give you good data, which then forms the basis for great data products.
Strata 2012: Humans, Machines, and the Dimensions of MicroworkDaniel Tunkelang
Presentation from O'Reilly Strata 2012 on Big Data
Humans, Machines, and the Dimensions of Microwork
Daniel Tunkelang (LinkedIn)
Claire Hunsaker (Samasource)
The advent of crowdsourcing has wildly expanded the ways we think of incorporating human judgments into computational workflows. Computer scientists, economists, and sociologists have explored how to effectively and efficiently distribute microwork tasks to crowds and use their work as inputs to create or improve data products. Simultaneously, crowdsourcing providers are exploring the bounds of mechanical QA flows, worker interfaces, and workforce management systems.
But what tasks should be performed by humans rather than algorithms? And what makes a set of human judgments robust? Quantity? Consensus? Quality or trustworthiness of the workers? Moreover, the robustness of judgments depends not only on the workers, but on the task design. Effective crowdsourcing is a cooperative endeavor.
In this talk, we will analyze various dimensions of microwork that characterize applications, tasks, and crowds. Drawing on our experience at companies that have pioneered the use of microwork (Samasource) and data science (LinkedIn), we will offer practical advice to help you design crowdsourcing workflows to meet your data product needs.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
2. Overview
● A Brief Introduction to Behavioral Economics
● The Probability Ranking Principle and its Discontents
● Can Dense Retrieval Help?
● Problems that Dense Retrieval is still Neglecting
● Helping Searchers Satisfice through Query Understanding
4. Classical Economics: Expected Utility Maximization
● People make decisions between alternatives everyday.
● Utility maximization: people choose alternative with higher utility.
● When outcomes are uncertain, people maximize expected utility.
● Risk aversion makes people discount utility of risky outcomes.
● Assumes people are rational and have complete information.
● Homo economicus model dominated economics for a long time.
5. Behavioral Economics: What People Really Do
● People make decisions using bounded rationality. (Herb Simon, 1955)
○ Limited information and limited resources to act on it.
○ Instead of maximizing expected utility, people “satisfice”.
● People act irrationally within their constraints. (Tversky and Kahneman, 1974)
○ Risk evaluation depends on framing (e.g, winning vs. losing).
○ Utility of an outcome depends on other options.
○ Heuristics and biases maps people “predictably irrational”.
7. Probability Ranking Principle
● “The Probability Ranking Principle In IR” (Stephen Robertson, 1977)
● Asserts that results should be sorted by expected relevance.
● Serves as foundation of most search engines to this day.
● We just need to get better at computing expected relevance.
● Sounds like expected utility maximization!
● Does it address what searchers really do?
8. Problems with the Probability Ranking Principle
● Similar results tend to be relevant to same requests. (van Rijsbergen, 1979)
● Results are evaluated independently, but utility of results is not additive.
● Relevance measures do not predict user benefit. (Turpin and Scholer, 2008)
● Search engines should help people help themselves. (Marchionini, 2006)
● Perhaps search engines should help users satisfice rather than maximize?
9. The Real World Intervenes
● Search applications separate boolean retrieval from scored ranking.
● Recognition that relevance and utility are distinct concepts.
● If retrieval is effective, probability of relevance has low variance.
● But conditional utility given relevance often has high variance.
● Satisfice on relevance, but maximize conditional utility given relevance.
10. Techniques that Have Worked
● Multi-stage ranking for computational efficiency.
● Reranking results to increase diversity.
● Training a separate relevance model.
● Suggesting refinements to elicit more specific intents.
● Autocomplete to nudge users to better queries.
● Content and query understanding.
12. AI Enables the Vector Space Model on Steroids!
● Embeddings transform content and queries into vectors.
● Similarity-based nearest-neighbor retrieval.
● But it is still a vector space model! (Salton, 1974)
● Efforts to improve embedding quality and efficiency.
● Expected utility maximization, but with better tools.
13. But does this really help?
● Most queries are still short, often a single entity.
● Eliciting searcher's intent still as big a challenge as ranking.
● Relevance still feels binary for most applications.
● Which is a challenge for similarity-based retrieval.
● Fine-grained scoring is poor fit for short queries.
● Focus is expected utility maximization, not satisficing.
14. So how can we use AI to address these problems?
15. Focus on content and query understanding.
● Content understanding is often binary classification.
● Same with query understanding, but more nuanced.
● Improve representations and matching, not ranking.
● Learn from head and torso to improve the tail.
● Focus on helping searchers satisfice!
16. Let’s explore query understanding a bit.
● Assumptions
○ Content is already classified into a taxonomy.
○ Content has robust string representation (e.g., title).
○ We can learn from historical search behavior.
● Let’s focus on ecommerce.
○ Tends to have good structured data.
○ Mostly short, entity-centric queries.
○ Query traffic has a power law distribution.
21. Some Nuances
● Learning from behavior introduces presentation bias.
● Behavior may not cover less popular categories.
● Queries vary in specificity.
○ “mens black tee” -> Men’s T-Shirts; “mens clothes” -> Men’s Clothing
● Queries can span multiple categories or no category.
○ “sony” -> TVs, Audio Video Games; “returns” -> Returns Page
● Category overlap confuses the classifier.
○ Cell Phone Accessories overlap with Audio and Computer Accessories.
23. Query Similarity: Beyond the Most Significant Bit
● Query category is “most significant bit” of relevance.
● But queries supply more signal than a category.
● Distinct queries can express equivalent search intent.
►
24. How do we compute query similarity?
● Superficial Query Similarity
○ Variations in stemming, word order, stop words, etc.
○ Effective but limited approach.
○ Needs guardrails for precision, e.g., shirt dress != dress shirt.
● Embeddings
○ Looks beyond tokens and superficial signals.
○ But most queries are short!
26. Compute query vector as mean of document vectors.
● Aggregate mean of document vectors for clicks or purchases.
● Documents have more robust string representations than queries.
● Aggregating document vectors reduces noise and variance.
● Even works well with “primitive” embeddings like fastText.
● Requires retrieval and ranking to be good enough.
● Biggest risk: no engagement for unretrieved results.
28. Simple, effective, but limited to offline computation.
● Only works for queries that have engagement history.
● Would be expensive to compute aggregates online.
● Still, head and torso queries are a large fraction of traffic.
● Can use nearest neighbors to improve recall.
● Can replace query-level analytics with intent-level analytics.
● But what do we do for tail queries?
29. We can train a sentence transformer model for the tail.
30. Same trick: use head and torso queries for training.
● Training data consists of (query1
, query2
, similarity) triples.
● We focus on how well query2
substitutes for query1
.
● Below a certain similarity, we don’t care about precise score.
● Important to oversample relatively similar query pairs.
● Fine-tune a pre-trained sentence transformer model.
● Can use output of query classifier as an input.
● Generalizes “bag of documents” model to tail queries.
32. Summary: It’s All About Satisficing
● Classical economics focused on maximizing utility.
● But real life is mostly about satisficing.
● The same holds true for search.
● AI is great, but let’s use to help searchers satisfice.
● Huge opportunity to do so through query understanding.