Shakti Sinha and Daniel Tunkelang discuss how LinkedIn's search functionality works. They explain that LinkedIn search is personalized based on a user's profile and network. Query understanding involves tagging queries to determine entity types like people, companies, or skills. Ranking is also personalized using machine learning models trained on search logs to determine relevance for a specific user's query. The system aims to provide both globally and personally relevant results, as about two-thirds of clicks come from out of a user's network.
Better Search Through Query Understanding
Presented as a Data Talk at Intuit on April 22, 2014
Search is a fundamental problem of our time — we use search engines daily to satisfy a variety of personal and professional information needs. But search engine development still feels stuck in an information retrieval paradigm that focuses on result ranking. In this talk, I’ll advocate an emphasis on query understanding. I’ll talk about how we implement query understanding at LinkedIn, and I’ll present examples from the broader web. Hopefully you’ll come out with a different perspective on search and share my appreciation for how we can improve search through query understanding.
About the Speaker
Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016MLconf
E-commerce Query Tagging System Using Unsupervised Training Methods: Amazon is one of the world’s largest e-commerce sites and Amazon Search powers the majority of Amazon’s sales. A key component of Amazon Search is the query understanding pipeline, which extracts appropriate semantic information used to precisely display products for billions of queries everyday. In this talk, we will go through the primary building blocks of query understanding pipeline.
Amazon Search enables users to search against structured products, hence it is necessary to extract information from queries in a format that is consistent with the structured information about the products. Query tagging is the task of semantically annotating query terms to pre-defined labels (such as brand, product-type and color). We propose a scalable system to train large-scale machine learning algorithms to solve this problem. Our system improved the precision over baseline, which is a dictionary lookup based tagger, by 10% and approximately doubled the recall.
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...Qi Guo
*** Please check out our LinkedIn Engineering blog post: https://engineering.linkedin.com/blog/2019/04/ai-behind-linkedin-recruiter-search-and-recommendation-systems ***
LinkedIn Talent Solutions business contributes to around 65% of LinkedIn’s annual revenue, and provides tools for job providers to reach out to potential candidates and for job seekers to find suitable career opportunities. LinkedIn’s job ecosystem has been designed as a platform to connect job providers and job seekers, and to serve as a marketplace for efficient matching between potential candidates and job openings. A key mechanism to help achieve these goals is the LinkedIn Recruiter product, which enables recruiters to search for relevant candidates and obtain candidate recommendations for their job postings.
We highlight a few unique information retrieval, system, and modeling challenges associated with talent search and recommendation systems.
In this talk, we will present how we formulated and addressed the problems, the overall system design and architecture, the challenges encountered in practice, and the lessons learned from the production deployment of these systems at LinkedIn. By presenting our experiences of applying techniques at the intersection of recommender systems, information retrieval, machine learning, and statistical modeling in a large-scale industrial setting and highlighting the open problems, we hope to stimulate further research and collaborations within the SIGIR community.
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
“Why is MongoDB so slow?” you may ask yourself on occasion. You’ve created indexes, you’ve learned how to use the aggregation pipeline. What the heck? Could it be your queries? This talk will outline what tools are at your disposal (both in MongoDB Atlas and in MongoDB server) to identify inefficient queries.
Anatomy of an eCommerce Search Engine by Mayur DatarNaresh Jain
In this talk, the chief Data scientist of Flipkart will uncover the various challenges in running an e-commerce search platform like scale, recency, update rates, business shaping etc. He will also explain the overall system architecture of the search platform and get into the details of some of the sub-systems, including the query understanding and rewriting sub-system.
Better Search Through Query Understanding
Presented as a Data Talk at Intuit on April 22, 2014
Search is a fundamental problem of our time — we use search engines daily to satisfy a variety of personal and professional information needs. But search engine development still feels stuck in an information retrieval paradigm that focuses on result ranking. In this talk, I’ll advocate an emphasis on query understanding. I’ll talk about how we implement query understanding at LinkedIn, and I’ll present examples from the broader web. Hopefully you’ll come out with a different perspective on search and share my appreciation for how we can improve search through query understanding.
About the Speaker
Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016MLconf
E-commerce Query Tagging System Using Unsupervised Training Methods: Amazon is one of the world’s largest e-commerce sites and Amazon Search powers the majority of Amazon’s sales. A key component of Amazon Search is the query understanding pipeline, which extracts appropriate semantic information used to precisely display products for billions of queries everyday. In this talk, we will go through the primary building blocks of query understanding pipeline.
Amazon Search enables users to search against structured products, hence it is necessary to extract information from queries in a format that is consistent with the structured information about the products. Query tagging is the task of semantically annotating query terms to pre-defined labels (such as brand, product-type and color). We propose a scalable system to train large-scale machine learning algorithms to solve this problem. Our system improved the precision over baseline, which is a dictionary lookup based tagger, by 10% and approximately doubled the recall.
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...Qi Guo
*** Please check out our LinkedIn Engineering blog post: https://engineering.linkedin.com/blog/2019/04/ai-behind-linkedin-recruiter-search-and-recommendation-systems ***
LinkedIn Talent Solutions business contributes to around 65% of LinkedIn’s annual revenue, and provides tools for job providers to reach out to potential candidates and for job seekers to find suitable career opportunities. LinkedIn’s job ecosystem has been designed as a platform to connect job providers and job seekers, and to serve as a marketplace for efficient matching between potential candidates and job openings. A key mechanism to help achieve these goals is the LinkedIn Recruiter product, which enables recruiters to search for relevant candidates and obtain candidate recommendations for their job postings.
We highlight a few unique information retrieval, system, and modeling challenges associated with talent search and recommendation systems.
In this talk, we will present how we formulated and addressed the problems, the overall system design and architecture, the challenges encountered in practice, and the lessons learned from the production deployment of these systems at LinkedIn. By presenting our experiences of applying techniques at the intersection of recommender systems, information retrieval, machine learning, and statistical modeling in a large-scale industrial setting and highlighting the open problems, we hope to stimulate further research and collaborations within the SIGIR community.
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
“Why is MongoDB so slow?” you may ask yourself on occasion. You’ve created indexes, you’ve learned how to use the aggregation pipeline. What the heck? Could it be your queries? This talk will outline what tools are at your disposal (both in MongoDB Atlas and in MongoDB server) to identify inefficient queries.
Anatomy of an eCommerce Search Engine by Mayur DatarNaresh Jain
In this talk, the chief Data scientist of Flipkart will uncover the various challenges in running an e-commerce search platform like scale, recency, update rates, business shaping etc. He will also explain the overall system architecture of the search platform and get into the details of some of the sub-systems, including the query understanding and rewriting sub-system.
Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as items to be recommended, in response to user's need. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this tutorial will be on the fundamentals of neural networks and their applications to learning to rank.
How to Build your Training Set for a Learning To Rank ProjectSease
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
– model and collect the necessary feedback from the users (implicit or explicit)
– calculate for each training sample a relevance label which is meaningful and not ambiguous (Click Through Rate, Sales Rate …)
– transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training library expect)
Join us as we explore real world scenarios and dos and don’ts from the e-commerce industry.
basic Function and Terminology of Recommendation Systems. Some Algorithmic Implementation with some sample Dataset for Understanding. It contains all the Layers of RS Framework well explained.
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
How should data be preprocessed for use in machine learning algorithms? How to identify the most predictive attributes of a dataset? What features can generate to improve the accuracy of a model?
Feature Engineering is the process of extracting and selecting, from raw data, features that can be used effectively in predictive models. As the quality of the features greatly influences the quality of the results, knowing the main techniques and pitfalls will help you to succeed in the use of machine learning in your projects.
In this talk, we will present methods and techniques that allow us to extract the maximum potential of the features of a dataset, increasing flexibility, simplicity and accuracy of the models. The analysis of the distribution of features and their correlations, the transformation of numeric attributes (such as scaling, normalization, log-based transformation, binning), categorical attributes (such as one-hot encoding, feature hashing, Temporal (date / time), and free-text attributes (text vectorization, topic modeling).
Python, Python, Scikit-learn, and Spark SQL examples will be presented and how to use domain knowledge and intuition to select and generate features relevant to predictive models.
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a senior member of the support team I will share more common mistakes observed and some tips and tricks to avoiding them.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as webpages, in response to user's need, which may be expressed as a query. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this lecture will be on the fundamentals of neural networks and their applications to learning to rank.
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...Abhimanyu Lad
We describe the challenges that we faced while building the instant search experience at LinkedIn, and present techniques that we developed to overcome them. We discuss three aspects of instant search – performance, tolerance to user errors, and accuracy of search results.
In this lecture, I will first cover the recent advances in neural recommender systems such as autoencoder-based and MLP-based recommender systems. Then, I will introduce the recent achievement for automatic playlist continuation in music recommendation.
Discovering User's Topics of Interest in Recommender SystemsGabriel Moreira
This talk introduces the main techniques of Recommender Systems and Topic Modeling.
Then, we present a case of how we've combined those techniques to build Smart Canvas (www.smartcanvas.com), a service that allows people to bring, create and curate content relevant to their organization, and also helps to tear down knowledge silos.
We present some of Smart Canvas features powered by its recommender system, such as:
- Highlight relevant content, explaining to the users which of his topics of interest have generated each recommendation.
- Associate tags to users’ profiles based on topics discovered from content they have contributed. These tags become searchable, allowing users to find experts or people with specific interests.
- Recommends people with similar interests, explaining which topics brings them together.
We give a deep dive into the design of our large-scale recommendation algorithms, giving special attention to our content-based approach that uses topic modeling techniques (like LDA and NMF) to discover people’s topics of interest from unstructured text, and social-based algorithms using a graph database connecting content, people and teams around topics.
Our typical data pipeline that includes the ingestion millions of user events (using Google PubSub and BigQuery), the batch processing of the models (with PySpark, MLib, and Scikit-learn), the online recommendations (with Google App Engine, Titan Graph Database and Elasticsearch), and the data-driven evaluation of UX and algorithms through A/B testing experimentation. We also touch topics about non-functional requirements of a software-as-a-service like scalability, performance, availability, reliability and multi-tenancy and how we addressed it in a robust architecture deployed on Google Cloud Platform.
Enterprise Intelligence: Putting the Pieces Together
http://enterpriserelevance.com/kdd2016/keynote.html
These slides are for a keynote presentation delivered at the Workshop on Enterprise Intelligence, held in conjunction with the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016).
About the author:
Daniel Tunkelang is a data science and engineering executive who has built and led some of the strongest teams in the software industry. He studied computer science and math at MIT and has a PhD in computer science from CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired for $1.1B. He led a local search team at Google. He was a director of data science and engineering at LinkedIn, and he established their query understanding team. Daniel is a widely recognized writer and speaker. He is frequently invited to speak at academic and industry conferences, particularly in the areas of information retrieval, web science, and data science. He has written the definitive textbook on faceted search (now a standard for ecommerce sites), established an annual symposium on human-computer interaction and information retrieval, and authored 24 US patents. His social media posts have attracted over a million page views. Daniel advises and consults for companies that can benefit strategically from his expertise. His clients range from early-stage startups to "unicorn" technology companies like Etsy and Pinterest. He helps companies make decisions around algorithms, technology, product strategy, hiring, and organizational structure.
Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as items to be recommended, in response to user's need. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this tutorial will be on the fundamentals of neural networks and their applications to learning to rank.
How to Build your Training Set for a Learning To Rank ProjectSease
Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
– model and collect the necessary feedback from the users (implicit or explicit)
– calculate for each training sample a relevance label which is meaningful and not ambiguous (Click Through Rate, Sales Rate …)
– transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training library expect)
Join us as we explore real world scenarios and dos and don’ts from the e-commerce industry.
basic Function and Terminology of Recommendation Systems. Some Algorithmic Implementation with some sample Dataset for Understanding. It contains all the Layers of RS Framework well explained.
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
How should data be preprocessed for use in machine learning algorithms? How to identify the most predictive attributes of a dataset? What features can generate to improve the accuracy of a model?
Feature Engineering is the process of extracting and selecting, from raw data, features that can be used effectively in predictive models. As the quality of the features greatly influences the quality of the results, knowing the main techniques and pitfalls will help you to succeed in the use of machine learning in your projects.
In this talk, we will present methods and techniques that allow us to extract the maximum potential of the features of a dataset, increasing flexibility, simplicity and accuracy of the models. The analysis of the distribution of features and their correlations, the transformation of numeric attributes (such as scaling, normalization, log-based transformation, binning), categorical attributes (such as one-hot encoding, feature hashing, Temporal (date / time), and free-text attributes (text vectorization, topic modeling).
Python, Python, Scikit-learn, and Spark SQL examples will be presented and how to use domain knowledge and intuition to select and generate features relevant to predictive models.
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a senior member of the support team I will share more common mistakes observed and some tips and tricks to avoiding them.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as webpages, in response to user's need, which may be expressed as a query. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this lecture will be on the fundamentals of neural networks and their applications to learning to rank.
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...Abhimanyu Lad
We describe the challenges that we faced while building the instant search experience at LinkedIn, and present techniques that we developed to overcome them. We discuss three aspects of instant search – performance, tolerance to user errors, and accuracy of search results.
In this lecture, I will first cover the recent advances in neural recommender systems such as autoencoder-based and MLP-based recommender systems. Then, I will introduce the recent achievement for automatic playlist continuation in music recommendation.
Discovering User's Topics of Interest in Recommender SystemsGabriel Moreira
This talk introduces the main techniques of Recommender Systems and Topic Modeling.
Then, we present a case of how we've combined those techniques to build Smart Canvas (www.smartcanvas.com), a service that allows people to bring, create and curate content relevant to their organization, and also helps to tear down knowledge silos.
We present some of Smart Canvas features powered by its recommender system, such as:
- Highlight relevant content, explaining to the users which of his topics of interest have generated each recommendation.
- Associate tags to users’ profiles based on topics discovered from content they have contributed. These tags become searchable, allowing users to find experts or people with specific interests.
- Recommends people with similar interests, explaining which topics brings them together.
We give a deep dive into the design of our large-scale recommendation algorithms, giving special attention to our content-based approach that uses topic modeling techniques (like LDA and NMF) to discover people’s topics of interest from unstructured text, and social-based algorithms using a graph database connecting content, people and teams around topics.
Our typical data pipeline that includes the ingestion millions of user events (using Google PubSub and BigQuery), the batch processing of the models (with PySpark, MLib, and Scikit-learn), the online recommendations (with Google App Engine, Titan Graph Database and Elasticsearch), and the data-driven evaluation of UX and algorithms through A/B testing experimentation. We also touch topics about non-functional requirements of a software-as-a-service like scalability, performance, availability, reliability and multi-tenancy and how we addressed it in a robust architecture deployed on Google Cloud Platform.
Enterprise Intelligence: Putting the Pieces Together
http://enterpriserelevance.com/kdd2016/keynote.html
These slides are for a keynote presentation delivered at the Workshop on Enterprise Intelligence, held in conjunction with the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016).
About the author:
Daniel Tunkelang is a data science and engineering executive who has built and led some of the strongest teams in the software industry. He studied computer science and math at MIT and has a PhD in computer science from CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired for $1.1B. He led a local search team at Google. He was a director of data science and engineering at LinkedIn, and he established their query understanding team. Daniel is a widely recognized writer and speaker. He is frequently invited to speak at academic and industry conferences, particularly in the areas of information retrieval, web science, and data science. He has written the definitive textbook on faceted search (now a standard for ecommerce sites), established an annual symposium on human-computer interaction and information retrieval, and authored 24 US patents. His social media posts have attracted over a million page views. Daniel advises and consults for companies that can benefit strategically from his expertise. His clients range from early-stage startups to "unicorn" technology companies like Etsy and Pinterest. He helps companies make decisions around algorithms, technology, product strategy, hiring, and organizational structure.
Query understanding is about focusing less on the results and more on the query. It’s about figuring out what the searcher wants, rather than scoring and ranking results. Once you’ve established this mindset, your approach to search changes: you focus on query performance rather than ranking.
Presented at QConSF 2016: https://qconsf.com/sf2016/presentation/query-understanding-manifesto
Keynote at CIKM 2013 Workshop on Data-driven User Behavioral Modelling and Mining from Social Media
Social Search in a Professional Context
Daniel Tunkelang (LinkedIn)
Social networks bring a new dimension to search. Instead of looking for web pages or text documents, LinkedIn members search a world of entities connected by a rich graph of relationships. Search is a fundamental part of the LinkedIn ecosystem, as it helps our members find and be found. Unlike most search applications, LinkedIn's search experience is highly personalized: two LinkedIn members performing the same search query are likely to see completely different results. Delivering the right results to the right person depends on our ability to leverage our each member's unique professional identity and network. In this talk, I'll describe the kinds of search behavior we see on LinkedIn, and some of the approaches we've taken to help our members address their information needs.
Data Science: A Mindset for Productivity
Keynote at 2015 Ronin Labs West Coast CTO Summit
https://www.eventjoy.com/e/west-coast-cto-summit-2015
Abstract
Data science isn't just about using a collection of technologies and algorithms. Data science requires a mindset that solves problems at a higher level of abstraction. How do we model utility when we think about optimization? How do we decide which hypotheses to test? How do we allocate our scarce resources to make progress?
There are no silver bullets. But I'll share what I've learned from a variety of contexts over the course of my work at Endeca, Google, and LinkedIn; and I hope you'll leave this talk with some practical wisdom you can apply to your next data science project.
Web Science: How is it different?
Daniel Tunkelang, LinkedIn
Keynote Address at ACM Web Science 2014 Conference
The scientific method of observation, measurement, and experiment may be our greatest achievement as a species. The technological innovation we enjoy today is the product of a culture of systematized scientific experimentation.
But historically scientific experimentation has been expensive. Experiments consumed natural resources, took a long time to conduct, and required even more time and labor to analyze. In order to be productive, scientists have had to factor these costs into their work and to optimize accordingly.
Web science is different. Not, as some have speciously argued, because big data has made the scientific method obsolete. The key difference is that web science has changed the economics of scientific experimentation. Thus, even as web scientists apply the traditional scientific method, they optimize based on very different economics.
In this talk, I'll survey how web science has changed our approach to experimentation, for better and for worse. Specifically, I'll talk about differences in hypothesis generation, offline analysis, and online testing.
Bio
Daniel Tunkelang is Head of Query Understanding at LinkedIn, where he previously formed and led the product data science team. LinkedIn search allows members to find people, companies, jobs, groups and other content. His team aims to provide users with the best possible results that satisfy their information needs and help to get insights from professional data. Tunkelang has BS and MS degrees in computer science and math from MIT, and a PhD in computer science from CMU. He co-founded the annual symposium on human-computer interaction and information retrieval (HCIR) and wrote the first book on Faceted Search (Morgan and Claypool 2009). Prior to joining LinkedIn, Tunkelang was Chief Scientist of Endeca (acquired by Oracle in 2011 for $1.1B) and leader of the local search quality team at Google, mapping local businesses to their home pages. He is the co-inventor of 20 patents.
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
My Three Ex’s: A Data Science Approach for Applied Machine Learning
Daniel Tunkelang (LinkedIn)
Presented at QCon San Francisco 2014 in the Applied Machine Learning and Data Science track
https://qconsf.com/presentation/my-three-ex%E2%80%99s-data-science-approach-applied-machine-learning
Abstract
This talk is about applying machine learning to solve problems.
It’s not a talk about machine learning — or at least not about the theory of machine learning. Theoretical machine learning requires a deep understanding of computer science and statistics. It’s one of the most studied areas of computer science, and advances in theoretical machine learning give us hope of solving the world’s “AI-hard” problems.
Applied machine learning is more grounded but no less important. We are surrounded by opportunities to apply classifiers, learn rules, compute similarity, and assemble clusters. We don’t need to develop new algorithms for any of these problems — our textbooks and open-source libraries have done that hard work for us.
But algorithms are not enough. Applying machine learning to solve problems requires a data science mindset that transcends the algorithmic details.
In this talk, I’ll communicate the data science mindset by describing my three ex’s: express, explain, and experiment. These three activities are the pillars of a successful strategy for applying machine learning to solve problems. Whether you’re a machine learning novice or expert, I hope you’ll leave this talk with some practical wisdom you can apply to your next project.
I delivered this keynote at the Fast Forward Labs Data Leadership Conference on April 28, 2016. You can find related materials in the following publications:
https://www.oreilly.com/ideas/where-should-you-put-your-data-scientists
http://firstround.com/review/doing-data-science-right-your-most-common-questions-answered/
The Top Skills That Can Get You Hired in 2017LinkedIn
We analyzed all the recruiting activity on LinkedIn this year and identified the Top Skills employers seek. Starting Oct 24, learn these skills and much more for free during the Week of Learning.
#AlwaysBeLearning https://learning.linkedin.com/week-of-learning
PDF, audio, and voiceover are now available on designintechreport.wordpress.com
Today’s most beloved technology products and services balance design and engineering in a way that perfectly blends form and function. Businesses started by designers have created billions of dollars of value, are raising billions in capital, and VC firms increasingly see the importance of design. The third annual Design in Tech Report examines how design trends are revolutionizing the entrepreneurial and corporate ecosystems in tech. This report covers related M&A activity, new patterns in creativity × business, and the rise of computational design.
This is the presentation I gave at the 2012 CIPD Exhibition in Manchester. If you would like a copy of the notes or have any questions then please email me.
I presented this informative session on LinkedIn to the Lake Villa District Library patrons on January 16, 2019. Topics include enhancing your professional profile, building your network of connections, which LinkedIn Groups to join as well as searching individual companies and jobs.
LinkedIn Basics and Best Practices July 2018Bruce Bennett
I presented this to the St. Joseph Employment Ministry on Wednesday July 26, 2018. The topics includes an overview of LinkedIn and its general usage, building your own network of connections, identifying LinkedIn Groups to join, and searching individuals, companies and jobs. Learn about the creating a job search agent and the options for job applications using LinkedIn. Additionally, I covered the best practices or activities for getting the most out of LinkedIn.
Create a professional profile and utilize
the tremendous amount of information
on LinkedIn. Topics include building your
own network of connections, identifying
LinkedIn Groups to join, and searching
individuals, companies and jobs. Learn to
create a job search agent and the options
for job applications using LinkedIn.
Referrals are the #1 Source of Hires.
Referrals Get Hired is an online BRAND building strategy used by executives, entry-level, and all types of job seekers to get the job they want.
Finding talent
on LinkedIn
Grow your company and hire more effectively with
the world’s most advanced professional network.
In this guide, you will learn the ways that LinkedIn can
support and accelerate hiring for your company. Even if
you’re not an experienced recruiting professional, these
tips will help you effectively hire on LInkedIn.
Use LinkedIn, world's most advanced professional network, to help you make your next hire. Find tips on how to use LinkedIn's free and premium features to publicize your open positions or search the network for that perfect candidate.
I will present this at the McHenry County Workforce Network in October. It covers the general usage of LinkedIn and the elements that need to be developed for your profile. My recommendations for regular activity are presented too.
Learn about the basics for creating a professional profile and utilizing the tremendous amount of information on LinkedIn. Topics include an overview of LinkedIn and its general usage, building your own network of connections, identifying LinkedIn Groups to join, and searching individuals, companies and jobs. Learn about creating a job search agent and the options for job applications using LinkedIn. Additionally, discover the best practices or activities for getting the most out of LinkedIn.
In this edition of the Quarterly Product Release Webinar, LinkedIn's product experts give you a full look at all the Q1 product updates rolling out across LinkedIn Talent Solutions.
Speaker Names:
Lauren Kuemmeler, Senior Customer Success Manager
Tucker Johns, Senior Customer Success Manager
Sankar Venkatraman, Global Product Evangelist
Learn about new LinkedIn Recruiter enhancements and how they can help corporate recruiters track, share, and manage talent more efficiently. Also, learn how to increase your InMail response rate and strengthen your employer brand.
Learn more about LinkedIn Talent Solutions: http://linkd.in/1bgERGj
Subscribe to the LinkedIn Talent Blog: http://linkd.in/18yp4Cg
Follow the LinkedIn Talent Solutions page: http://linkd.in/1cNvIFT
Tweet with us: http://bit.ly/HireOnLinkedIn
Sam Marshall of ClearBox consulting and David Francoeur, of Bonzai taking a non-technical look at intranet search, to help you improve results and the overall experience.
If you're responsible for search configuration then we welcome you, but this webinar is also for intranet managers and digital team members who care about content and ensuring the intranet is truly useful to colleagues.
The business cost of poor search
Why intranet search is hard
How to improve the search user experience
Ways to diagnose why search fails
Quick ways to enhance your search results.
Click through to see key topics from ConnectIn 2013 in Toronto, including predictions of talent acquisition in 2015 and Moneyball sourcing.
Learn more about LinkedIn Talent Solutions: http://linkd.in/1bgERGj
Subscribe to the LinkedIn Talent Blog: http://linkd.in/18yp4Cg
Follow the LinkedIn Talent Solutions page: http://linkd.in/1cNvIFT
Tweet with us: http://bit.ly/HireOnLinkedIn
Similar to Find and be Found: Information Retrieval at LinkedIn (20)
Title:
Semantic Equivalence of e-Commerce Queries
Authors:
Aritra Mandal, Daniel Tunkelang, Zhe Wu
Presented at KDD 2023 Workshop on E-Commerce and Natural Language Processing (ECNLP 2023).
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
Behavioral economics transformed how we think about human decision making, rejecting expected utility maximization for the real world of heuristics, biases, and satisficing. In this talk, I'll argue that our thinking about search engines needs a similar transformation. I will compare the Probability Ranking Principle to expected utility maximization and offer ways that AI can help searchers satisfice through query understanding.
This was an invited talk given at the 2023 Walmart AI Summit.
Speaker Bio
Daniel Tunkelang is an independent consultant specializing in search, machine learning / AI, and data science. He completed undergraduate and master's degrees in Computer Science and Math at MIT and a PhD in computer science at CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired in 2011. He then led engineering and data science teams at Google and LinkedIn. He has written a book on Faceted Search, and he blogs on Medium about search-related topics — particularly query understanding. He has worked with numerous tech companies, retailers, and others, including Algolia, Apple, Canva, Coupang, eBay, Etsy, Flipkart, Home Depot, Oracle, Pinterest, Salesforce, Target, Yelp, and Zoom.
MMM, Search!
An opinionated discussion of search metrics, models, and methods. Presented to the Wikimedia Foundation on April 27, 2020.
About the Speaker
Daniel Tunkelang is an independent consultant specializing in search, discovery, machine learning / AI, and data science.
He was a founding employee of Endeca, a search pioneer that Oracle acquired. After 10 years at Endeca, he moved to Google, where he led a local search team. He then served as a director of data science and search at LinkedIn.
After leaving LinkedIn in 2015, he became an independent consultant. His clients have included Apple, eBay, Coupang, Etsy, Flipkart, Gartner, Pinterest, Salesforce, and Yelp; as well as some of the largest traditional retailers.
Daniel completed undergraduate and master's degrees in Computer Science and Math at MIT and a Ph.D. in computer science at CMU. He wrote a book on Faceted Search, published by Morgan & Claypool, and he blogs on Medium about search-related topics -- particularly about query understanding. He is also active on Twitter, LinkedIn, and Quora.
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
Search as Communication: Lessons from a Personal Journey
by Daniel Tunkelang (Head of Query Understanding, LinkedIn)
Presented at Etsy's Code as Craft Series on May 21, 2013
When I tell people I spent a decade studying computer science at MIT and CMU, most assume that I focused my studies in information retrieval — after all, I’ve spent most of my professional life working on search.
But that’s not how it happened. I learned about information extraction as a summer intern at IBM Research, where I worked on visual query reformulation. I learned how search engines work by building one at Endeca. It was only after I’d hacked my way through the problem for a few years that I started to catch up on the rich scholarly literature of the past few decades.
As a result, I developed a point of view about search without the benefit of academic conventional wisdom. Specifically, I came to see search not so much as a ranking problem as a communication problem.
In this talk, I’ll explain my communication-centric view of search, offering examples, general techniques, and open problems.
--
Daniel Tunkelang is Head of Query Understanding at LinkedIn. Educated at MIT and CMU, he has his career working on big data, addressing key challenges in search, data mining, user interfaces, and network analysis. He co-founded enterprise search and business intelligence pioneer Endeca, where he spent a decade as its Chief Scientist. In 2011, Endeca was acquired by Oracle for over $1B. Previous to LinkedIn, he led a team at Google working on local search quality. Daniel has authored fifteen patents, written a textbook on faceted search, and created the annual symposium on human-computer interaction and information retrieval.
Enterprise Search: How do we get there from here?Daniel Tunkelang
Enterprise Search: How Do We Get There From Here?
by Daniel Tunkelang (Head of Query Understanding, LinkedIn)
Keynote at 2013 Enterprise Search Summit
We've been tackling the challenges of enterprise and site search for at least 3 decades. We've succeeded to the point that search is the gateway to many of our information repositories. Nonetheless, users of enterprise search systems are frustrated with these systems' shortcomings. We see this frustration in surveys, but, more importantly, most of us experience it personally in our daily work life. We all dream of a world where searching any information repository is as effective as searching the web—perhaps even more so. A world where we find what we're looking for, or quickly determine that it doesn't exist. Is this Utopia possible? If so, how do we get there from here? Or at least somewhere close? In this talk, Tunkelang reviews the track record of enterprise search. He talks about what's worked and what hasn't, especially as compared to web search. Finally, he proposes some paths to bring us closer to our dream.
--
Daniel Tunkelang is Head of Query Understanding at LinkedIn. Educated at MIT and CMU, he has his career working on big data, addressing key challenges in search, data mining, user interfaces, and network analysis. He co-founded enterprise search and business intelligence pioneer Endeca, where he spent a decade as its Chief Scientist. In 2011, Endeca was acquired by Oracle for over $1B. Previous to LinkedIn, he led a team at Google working on local search quality. Daniel has authored fifteen patents, written a textbook on faceted search, and created the annual symposium on human-computer interaction and information retrieval.
Big Data, We Have a Communication Problem
by Daniel Tunkelang
Presented on April 30, 2013 at the TTI/Vanguard Conference on Ginormous Systems
http://www.ttivanguard.com/conference/2013/ginormous.html
It's a cliché that we live in a world of Big Data. But the bottleneck in understanding data is not computational. Rather, the biggest challenge is designing technical solutions that effectively leverage human cognitive ability. Data analysis systems should augment people's capabilities rather than replace them. This argument is as old as computer science itself: in 1962, Doug Engelbart said that the goal of technology is “the enhancement of human intellect by increasing the capability of a human to approach a complex problem situation.” Algorithms extract signal from raw data, but people fill in the gaps, creating models and evaluating analyses.
Empowering people to understand data is not just a surface problem of building better interfaces and visualizations. We need to interact with data not only after performing computational analysis, but throughout the analysis process in order to improve our models and algorithms. In order to do so, we need tools and processes specifically designed to offer people transparency, guidance, and control.
Human-computer information retrieval has been revolutionizing our approach to information seeking -- no modern search engine limits users to black-box relevance ranking and ten blue links. We need to take similar steps in our analysis of big data, making people the center of the analysis process and developing the technical innovations that enable people to fulfill this role.
How To Interview a Data Scientist
Daniel Tunkelang
Presented at the O'Reilly Strata 2013 Conference
Video: https://www.youtube.com/watch?v=gUTuESHKbXI
Interviewing data scientists is hard. The tech press sporadically publishes “best” interview questions that are cringe-worthy.
At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. And, when we test coding and algorithmic problem solving, we do it with real problems that we’ve faced in the course of our day jobs. In general, we try as hard as possible to make the interview process representative of actual work.
In this session, I’ll offer general principles and concrete examples of how to interview data scientists. I’ll also touch on the challenges of sourcing and closing top candidates.
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
Presented by Daniel Tunkelang, LinkedIn Director of Data Science, at Stanford's 2nd annual conference on Computational Social Science (CSS), hosted by Institute for Research in the Social Sciences (IRiSS).
Details at https://iriss.stanford.edu/css/conference-agenda-2013
Data By The People, For The People
Daniel Tunkelang
Director, Data Science at LinkedIn
Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful.
Bio:
Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Content, Connections, and Context
Daniel Tunkelang, LinkedIn
Keynote at Workshop on Recommender Systems and the Social Web
At 6th ACM International Conference on Recommender Systems (RecSys 2012)
Recommender systems for the social web combine three kinds of signals to relate the subject and object of recommendations: content, connections, and context.
Content comes first - we need to understand what we are recommending and to whom we are recommending it in order to decide whether the recommendation is relevant. Connections supply a social dimension, both as inputs to improve relevance and as social proof to explain the recommendations. Finally, context determines where and when a recommendation is appropriate.
I'll talk about how we use these three kinds of signals in LinkedIn's recommender systems, as well as the challenges we see in delivering social recommendations and measuring their relevance.
Keynote at 2012 Semantic Technology and Business Conference
Scale, Structure, and Semantics
Daniel Tunkelang, LinkedIn
Science fiction has a mixed track record when it comes to anticipating technological innovations. While Jules Verne fared well with with his predictions of submarine and space technology, artificial intelligence hasn't produced anything like Arthur C. Clarke's HAL 9000.
Instead, we've managed to elicit intelligence from machines through unexpected means. Search engines have achieved remarkable success in organizing the world's information by crawling the web, indexing documents, and exploiting link structure to establish authoritativeness. At LinkedIn, we apply large-scale analytics to terabytes of semistructured data to deliver products and insights that serve our 150M+ members. Semantics emerge when we apply the right analytical techniques to a sufficient quality and quantity of data.
In this talk, I will describe how LinkedIn's huge and rich graph of relationship data that powers the products our users love. I believe that the lessons we have learned apply broadly to other semantic applications. While quantity and quality of data are the key challenges to delivering a semantically rich experience, the key is to create the right ecosystem that incents people to give you good data, which then forms the basis for great data products.
Strata 2012: Humans, Machines, and the Dimensions of MicroworkDaniel Tunkelang
Presentation from O'Reilly Strata 2012 on Big Data
Humans, Machines, and the Dimensions of Microwork
Daniel Tunkelang (LinkedIn)
Claire Hunsaker (Samasource)
The advent of crowdsourcing has wildly expanded the ways we think of incorporating human judgments into computational workflows. Computer scientists, economists, and sociologists have explored how to effectively and efficiently distribute microwork tasks to crowds and use their work as inputs to create or improve data products. Simultaneously, crowdsourcing providers are exploring the bounds of mechanical QA flows, worker interfaces, and workforce management systems.
But what tasks should be performed by humans rather than algorithms? And what makes a set of human judgments robust? Quantity? Consensus? Quality or trustworthiness of the workers? Moreover, the robustness of judgments depends not only on the workers, but on the task design. Effective crowdsourcing is a cooperative endeavor.
In this talk, we will analyze various dimensions of microwork that characterize applications, tasks, and crowds. Drawing on our experience at companies that have pioneered the use of microwork (Samasource) and data science (LinkedIn), we will offer practical advice to help you design crowdsourcing workflows to meet your data product needs.
These slides are from a tutorial at the 5th ACM International Conference on Recommender Systems (RecSys 2011).
Recommender systems aim to provide users with products or content that satisfy the users' stated or inferred needs. The primary evaluation measures for recommender systems emphasize either the perceived relevance of the recommendations or the actions associated with those recommendations (e.g., purchases or clicks). Unfortunately, this transactional emphasis neglects how users interact with recommendations in the context of information seeking tasks. The effectiveness of this interaction determines the user's experience beyond a single transaction. This tutorial explores the role of recommendations as part of a conversation between the user and an information seeking system. The tutorial does not require any special background in interfaces or usability, and will focus on practical techniques to make recommender systems most effective for users.
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInDaniel Tunkelang
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Daniel Tunkelang (LinkedIn)
LinkedIn operates the world's largest professional network on the Internet with more than 100 million members in over 200 countries. In order to connect its users to the people, opportunities, and content
that best advance their careers, LinkedIn has developed a variety of algorithms that surface relevant content, offer personalized recommendations, and establish topic-sensitive reputation -- all at a
massive scale. In this talk, I will discuss some of the most challenging technical problems we face at LinkedIn, and the approaches we are taking to address them.
Note: This talk was presented at the Carnegie Mellon University School of Computer Science Intelligence Seminar on September 20, 2011. As of May 2013, LinkedIn has over 225 million members.
The War on Attention Poverty: Measuring Twitter AuthorityDaniel Tunkelang
The War on Attention Poverty: Measuring Twitter Authority
As social networks like Facebook and Twitter have grown in popularity, we've had ample opportunity to appreciate Herb Simon's admonition that "a wealth of information creates a poverty of attention". Since there is no way we can hope to follow all of the information being shared by our social networks, we need some filtering or ranking mechanism.
A broad class of approaches involves determining which authors are the most authoritative or influential. There are already a variety of proposed authority measures, as well as research on their effectiveness. In this talk, I will review the various attempts that have been made to measure Twitter authority. In particular, I will discuss the work on TunkRank, a measure inspired by PageRank that explicitly models attention scarcity.
Design for Interaction
by Daniel Tunkelang, Chief Scientist of Endeca
An invited presentation at SIGMOD '09 (http://sigmod09.org/)
Research in information retrieval has focused on presenting the most relevant results to a user in response to a free-text search query. Research in database systems assumes a model where the user enters a formal query, and the results are exactly those the user requested. Neither community has emphasized user interaction—a critical concern for practical information access.
As William Goffman noted in the 1960s and Nick Belkin continually reminds us today, the relationship between a document and query, though necessary, is not sufficient to determine relevance—yet ranked retrieval approaches rely heavily or exclusively on this relationship. Meanwhile, recent work on database usability by Jeff Naughton and H.V. Jagadish surfaces the rigidity of database systems that return nothing unless users know how to formulate precise queries.
This talk presents human-computer information retrieval (HCIR) as a general approach that addresses some of the key challenges facing both research communities. A vision first put forward by Gary Marchionini, HCIR expects people and systems to work together to implement information access. Such an approach requires rethinking information access not as a matching or ranking problem, but rather as a communication problem. Specifically, we need interfaces that optimize the bidirectional communication between the user and the system, thus optimizing the symbiotic division of labor between the two.
This talk reviews the history of HCIR efforts and presents ongoing work to implement the HCIR vision. In particular, it presents an interactive set retrieval approach that responds to queries with an overview of the user's current context and an organized set of options for incremental exploration.
Enterprises are awash in textual documents that represent valuable information assets. The limited access of conventional search interfaces, however, prevents enterprises from unlocking this value;
* An expert guide to how richer interfaces enable exploration and discovery and how these typically rely on content enrichment techniques that can be unreliable, labor-intensive, or both. It is essential to maximize the effectiveness of content enrichment, not only to achieve the desired value, but also to incent organizations to make the necessary investment.
* Useful insight about content enrichment approaches that have demonstrated success in supporting exploration and discovery.
* Gain insight into both the enrichment techniques and the ways they are used to enable exploratory search.
Daniel Tunkelang, Chief Scientist, Endeca
exploring semantic means
Daniel Tunkelang, Chief Scientist of Endeca
Endeca is a leading provider of enterprise information access. While Endeca is not a “semantic web” company (we’re more of an XML / XQuery shop), we share Tim Berners-Lee’s dream of exposing the semantic content of data to reduce the tedious and brittles processes that people use today in order to meet their information needs, whether on the web or in the enterprise. Our emphasis is on exploratory search, as contrasted with the “10 blue links” approach that characterizes conventional search engines. Come join us for a tour of how we are enabling a conversation between humans and data through content enrichment and set-oriented retrieval and analysis.
This presentation outlines the principles of information seeking as a dialogue and walk though concrete examples that illustrate the principles of human-computer information retrieval (HCIR). The foundation is an interactive set retrieval approach that responds to queries with an overview of the user\'s current context and an organized set of options for incremental exploration. Contextual summaries of document sets optimize the system\'s communication with the user, while query refinement options optimize user\'s communication with the system.
By enabling bidirectional communication between the user and the system, we can address the inherent limitations of best-match approaches.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Find and be Found: Information Retrieval at LinkedIn
1. Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
formation Retrieval at LinkedIn
Shakti Sinha Daniel Tunkelang
Head, Search Relevance Head, Query Understanding
Shakti Daniel
Find and be Found:
15. Query tagging: key to query understanding.
§ Using human judgments to evaluate tag precision.
– Extremely accurate (> 99%) for identifying person names.
– Harder to distinguish company vs. title vs. skill (e.g., oracle dba).
§ Comparing CTR for tag matches vs. non-matches.
– Difference can be large enough to suggest filtering vs. ranking:
15
16. Detecting navigational vs. exploratory queries.
Pre-retrieval
§ Sequence of query tags.
Post-retrieval
§ Distribution of scores / features.
16
Click behavior
§ Title searches >50x more
likely to get 2+ clicks than
name searches.
17. Query expansion for exploratory queries.
17
software patent lawyer
Query expansions derived
from reformulations.
e.g., lawyer -> attorney
18. Understanding misspelled queries.
18
daniel tankalong infomation retrieval
marisa meyer ingenero eletrico
jonathan podemsky desenista industrail
Did you mean daniel tunkelang?
Did you mean marissa mayer?
Did you mean johnathan podemsky?
Did you mean information retrieval?
Did you mean ingeniero electrico?
Did you mean desenhista industrial?
19. Spelling out the details.
entity data
people, companies
successful queries
tunkelang =>
reformulations
marisa => marissa
n-grams
dublin => du ub bl li in
metaphones
mark/marc => MRK
word pairs
johnathan podemsky
INDEX
} {marisa meyer yoohoo
marissa
marisa
meyer
mayer
yahoo
yoohoo
19
23. Relevant results can be in or out of network.
23
§ Searcher’s network matters for relevance.
– Within network results have higher CTR.
§ But the network is not enough.
– About two thirds of search clicks come from out of
network results.
24. Personalized machine-learned ranking.
24
§ Data point is a triple (searcher, query, document).
– Searcher features are important!
§ Labels: Is this document relevant to the query and
the user?
– Depends on the user’s network, location, etc.
– Too much to ask random person to judge.
§ Training data has to be collected from search logs.
25. Search log data has biases.
25
§ Presentation bias
– Results shown higher tend to get clicked more often.
– Use FairPairs [Radlinski and Joachims, AAAI’06].
not flipped
flipped
flipped
Clicked!
✗
✔
✔
✗
✗
✗
training data
26. Search log data has biases.
26
§ Sample bias
– User clicks or skips only what is shown.
– What about low scoring results from existing model?
– Add low-scoring results as ‘easy negatives’ so model
learns bad results not presented to user.
…
label 0
label 0
label 0
label 0
…
page 1 page 2 page 3 page n
28. How to train your model.
28
§ Train simple models to resemble complex ones.
– Build Additive Groves model [Sorokina et al, ECML ’07],
which is good at detecting interactions.
§ Build tree with logistic regression leaves.
§ By restricting tree to user and query features, only
regression model evaluated for each document.
β0 +β1 T(x1)+...+βn xn
α0 +α1 P(x1)+...+αnQ(xn )
X2=?
X10< 0.1234 ?
γ0 +γ1 R(x1)+...+γnQ(xn )
29. Take-Aways
§ LinkedIn’s search problem is unique because of deep role
of personalization – users are integral part of the corpus.
§ Query understanding allows us to optimize for entity-
oriented search against semi-structured content.
§ Ranking requires us to contextually apply global and
personalized user, query, and document features.
29