This document discusses using neural language models and embeddings to improve search relevance and understand document "aboutness". It describes how word embeddings can represent words with similar meanings that appear in similar contexts as closer points in a vector space. Document embeddings can also represent full texts. However, embeddings derived from corpus data may not match searcher language. The document then covers using recurrent neural networks (RNNs) as language models to predict likely next words based on context. This could help understand a document or suggest additional contextually relevant terms. Finally, the document discusses using sequence-to-sequence models to "translate" documents into queries.
Top-Down? Bottom Up? A Survey of Hierarchical Design MethodologiesTrent McConaghy
How do you optimize, synthesize, or evolve a design that has 10 thousand parts? 10 billion? Doing it flat could easily fail.
There is a way! In many practical cases, we can recursively decompose the problem into many sub-problems. We then solve each sub-problem and stitch it together to solve the main problem. We can do this in a well-structured fashion using a hierarchical design methodology. There are top-down and bottom-up variants; both achieve remarkable results. This talk explores those approaches. Could this be how nature scales?
Video:
T. McConaghy, "Top Down? Bottom Up? A Survey of Hierarchical Design Methodologies", Machine Learning Group Berlin, Berlin, Feb. 26, 2018
https://www.youtube.com/watch?v=FvxwIplXBQw.
Original paper:
G. G. E. Gielen, T. McConaghy, T. Eeckelaert, "Performance space modeling for hierarchical synthesis of analog integrated circuits", in Proc. Design Automation Conference (DAC), pp. 881-886, June 13-17, 2005.
http://trent.st/content/2005_DAC_hierarchy.pdf
Deep Learning for Natural Language ProcessingJonathan Mugan
Deep Learning represents a significant advance in artificial intelligence because it enables computers to represent concepts using vectors instead of symbols. Representing concepts using vectors is particularly useful in natural language processing, and this talk will elucidate those benefits and provide an understandable introduction to the technologies that make up deep learning. The talk will outline ways to get started in deep learning, and it will conclude with a discussion of the gaps that remain between our current technologies and true computer understanding.
An Introduction to the Emerging JSON-Based Identity and Security Protocols (O...Brian Campbell
A short technical introduction, presented at an OWASP Vancouver chapter meeting, to some aspects of JOSE (JWS, JWE, and JWK) as well as JSON Web Token (JWT).
JSON Web Token (JWT) is emerging as the goto format for security tokens in next generation identity systems. This talk will provide a technical overview of JWT and it’s underpinnings, the JavaScript Object Signing and Encryption (JOSE) suite of specifications, and equip you with the knowledge and sills needed to talk about and use JWT and JOSE effectively. All the cool kids are doing it and JWT+JOSE recently won a Special European Identity Award for Best Innovation for Security in the API Economy at the 2014 European Identity & Cloud Conference.
A technical overview of JSON Web Token (JWT) and its JOSE underpinnings, which are poised to be the next generation identity token, as well as a look at using one open source implementation (jose4j).
Also some (bad) jokes.
Top-Down? Bottom Up? A Survey of Hierarchical Design MethodologiesTrent McConaghy
How do you optimize, synthesize, or evolve a design that has 10 thousand parts? 10 billion? Doing it flat could easily fail.
There is a way! In many practical cases, we can recursively decompose the problem into many sub-problems. We then solve each sub-problem and stitch it together to solve the main problem. We can do this in a well-structured fashion using a hierarchical design methodology. There are top-down and bottom-up variants; both achieve remarkable results. This talk explores those approaches. Could this be how nature scales?
Video:
T. McConaghy, "Top Down? Bottom Up? A Survey of Hierarchical Design Methodologies", Machine Learning Group Berlin, Berlin, Feb. 26, 2018
https://www.youtube.com/watch?v=FvxwIplXBQw.
Original paper:
G. G. E. Gielen, T. McConaghy, T. Eeckelaert, "Performance space modeling for hierarchical synthesis of analog integrated circuits", in Proc. Design Automation Conference (DAC), pp. 881-886, June 13-17, 2005.
http://trent.st/content/2005_DAC_hierarchy.pdf
Deep Learning for Natural Language ProcessingJonathan Mugan
Deep Learning represents a significant advance in artificial intelligence because it enables computers to represent concepts using vectors instead of symbols. Representing concepts using vectors is particularly useful in natural language processing, and this talk will elucidate those benefits and provide an understandable introduction to the technologies that make up deep learning. The talk will outline ways to get started in deep learning, and it will conclude with a discussion of the gaps that remain between our current technologies and true computer understanding.
An Introduction to the Emerging JSON-Based Identity and Security Protocols (O...Brian Campbell
A short technical introduction, presented at an OWASP Vancouver chapter meeting, to some aspects of JOSE (JWS, JWE, and JWK) as well as JSON Web Token (JWT).
JSON Web Token (JWT) is emerging as the goto format for security tokens in next generation identity systems. This talk will provide a technical overview of JWT and it’s underpinnings, the JavaScript Object Signing and Encryption (JOSE) suite of specifications, and equip you with the knowledge and sills needed to talk about and use JWT and JOSE effectively. All the cool kids are doing it and JWT+JOSE recently won a Special European Identity Award for Best Innovation for Security in the API Economy at the 2014 European Identity & Cloud Conference.
A technical overview of JSON Web Token (JWT) and its JOSE underpinnings, which are poised to be the next generation identity token, as well as a look at using one open source implementation (jose4j).
Also some (bad) jokes.
Better Machine Learning with Less Data - Slater Victoroff (Indico Data)Shift Conference
Deep learning has been making headlines around the world for its unparalleled performance in everything from recognizing cat videos to playing go. However deep learning remains well beyond the reach of average developers and researchers due to its massive compute and data requirements. Transfer learning is an old field of study going through a renaissance that promises to change by re-using portions of deep network models to drastically reduce both compute and data requirements. Come learn about some traditional transfer learning techniques such as word2vec and newer technologies like GPT and BERT.
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson
Meetup Link: https://www.meetup.com/Cognitive-Computing-Enthusiasts/events/250444108/
Recording Link: https://www.youtube.com/watch?v=4uXg1KTXdQc
When developing a machine learning system, the possibilities are limitless. However, with the recent explosion of Big Data and AI, there are more options than ever to filter through. Which technologies to select, which model topologies to build, and which infrastructure to use for deployment, just to name a few. We have explored these options for our faceted refinement system for video content system (consisting of 100K+ videos) along with their many roadblocks. Three primary areas of focus involve natural language processing, video frame sampling, and infrastructure deployment.
The BioMoby Semantic Annotation ExperimentMark Wilkinson
My presentation to the Canadian Semantic Web Workshop, Kelowna BC, June 2009.
The slideshow describes a portion of Dr. Benjamin Good's PhD thesis work in which he examines the quality of annotation done through an open tagging process when the tags are constrained by a controlled vocabulary. The target for annotations were a set of BioMoby Semantic Web Services.
Now is the time to unravel the mysteries of cloud computing. The speaker is going to brief about Generative AI in the session that is related to study Jams 2023
Nature is the ultimate complex system. Nature 1.0 is seeds & soil. *Evolving.* Nature 2.0 adds silicon & steel. *Evolving.*
Presented to Complex Systems Group, Stanford University, on May 4, 2018.
This is a brief introduction to text analysis given at the April, 2020 meeting of the Ann Arbor Statistical Association. It covers sentiment scoring using TextBlob, and basic use of the NLTK module. All work is done in Python. The Jupyter notebook and .csv data set are in: https://drive.google.com/open?id=1o90unxJQrm0i3iycCJAqFpePGOLnovvg
Introduction to NLP with some practical exercises (tokenization, keyword extraction, topic modelling) using Python libraries like NLTK, Gensim and TextBlob, plus a general overview of the field.
Modern Machine Learning (ML) represents everything as vectors, from documents, to videos, to user behavior. This representation makes it possible to accurately search, retrieve, rank, and classify different items by similarity and relevance.
Running real-time applications that rely on large numbers of such high dimensional vectors requires a new kind of data infrastructure. In this talk we will discuss the need for such infrastructure, the algorithmic and engineering challenges in working with vector data at scale, and open problems we still have no adequate solutions for.
Time permitting, I will introduce Pinecone as a solution to some of these challenges.
Learn how to get small teams to embrace both using and contributing to open source in a healthy way. See case studies of open source projects and how they have contributed to the growth of a small team.
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017MLconf
Representation Learning @ Red Hat:
For many companies, the vast majority of their data is unstructured and unlabeled; however, the data often contains information that could be useful in a variety of scenarios. Representation learning is the process of extracting meaningful features from unlabeled data so that it can be used in other tasks. In this talk, you’ll hear about how Red Hat is using deep learning to discover meaningful entity representations in a number of different settings, including: (1) identifying duplicate documents on the Customer Portal, (2) finding contextually similar URLs with word2vec, and (3) clustering behaviorally similar customers with doc2vec. To close, we will walk through an example demonstrating how representation learning can be applied to Major League Baseball players.
Bio: Michael first developed his data crunching chops as an undergraduate at Auburn University (War Eagle!) where he used a number of different statistical techniques to investigate various aspects of salamander biology (work that led to several publications). He then went on to earn a M.S. in evolutionary biology from The University of Chicago (where he wrote a thesis on frog ecomorphology) before changing directions and earning a second M.S. in computer science (with a focus on intelligent systems) from The University of Texas at Dallas. As a Machine Learning Engineer – Information Retrieval at Red Hat, Michael is constantly looking for ways to use the latest and greatest machine learning technology to improve search.
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
With ecommerce experiencing explosive growth, it seems intuitive that the B2B segment of that ecosystem is mirroring the same trajectory. That said, B2B has very different needs when it comes to transacting with the same style of experiences that we see in B2C. For instance, B2B ecommerce is about precision findability, whereas B2C customers can convert at higher rates when they’re just browsing online. In order for the B2B buying experience to be successful, search needs to be tuned to meet the unique needs of the segment.
In this webinar with Forrester senior analyst Joe Cicman, you’ll learn:
-Which verticals in B2B will drive the most growth, and how machine-learning powered personalization tactics can be deployed to support those specific verticals
-Why an omnichannel selling approach must be deployed in order to see success in B2B
-How deploying content search capabilities will support a longer sales cycle at scale
-What the next steps are to support a robust B2B commerce strategy supported by new technology
Speakers
Joe Cicman, Senior Analyst, Forrester
Jenny Gomez, VP of Marketing, Lucidworks
Customer loyalty starts with quickly responding to your customer’s needs. When it comes to resolving open support cases, time is of the essence. Time spent searching for answers adds up and creates inefficiencies in resolving cases at scale. Relevant answers need to be a few clicks away and easily accessible for agents directly from their service console.
We will explore how Lucidworks’ Agent Insights application automatically connects agents with the correct answers and resources. You’ll learn how to:
-Configure a proactive widget in an agent’s case view page to access resources across third-party systems (such as Sharepoint, Confluence, JIRA, Zendesk, and ServiceNow).
-Easily set up query pipelines to autonomously route assets and resources that are relevant to the case-at-hand—directly to the right agent.
-Identify subject matter experts within your support data and access tribal knowledge with lightning-fast speed.
More Related Content
Similar to The Neural Search Frontier - Doug Turnbull, OpenSource Connections
Better Machine Learning with Less Data - Slater Victoroff (Indico Data)Shift Conference
Deep learning has been making headlines around the world for its unparalleled performance in everything from recognizing cat videos to playing go. However deep learning remains well beyond the reach of average developers and researchers due to its massive compute and data requirements. Transfer learning is an old field of study going through a renaissance that promises to change by re-using portions of deep network models to drastically reduce both compute and data requirements. Come learn about some traditional transfer learning techniques such as word2vec and newer technologies like GPT and BERT.
Breaking Through The Challenges of Scalable Deep Learning for Video AnalyticsJason Anderson
Meetup Link: https://www.meetup.com/Cognitive-Computing-Enthusiasts/events/250444108/
Recording Link: https://www.youtube.com/watch?v=4uXg1KTXdQc
When developing a machine learning system, the possibilities are limitless. However, with the recent explosion of Big Data and AI, there are more options than ever to filter through. Which technologies to select, which model topologies to build, and which infrastructure to use for deployment, just to name a few. We have explored these options for our faceted refinement system for video content system (consisting of 100K+ videos) along with their many roadblocks. Three primary areas of focus involve natural language processing, video frame sampling, and infrastructure deployment.
The BioMoby Semantic Annotation ExperimentMark Wilkinson
My presentation to the Canadian Semantic Web Workshop, Kelowna BC, June 2009.
The slideshow describes a portion of Dr. Benjamin Good's PhD thesis work in which he examines the quality of annotation done through an open tagging process when the tags are constrained by a controlled vocabulary. The target for annotations were a set of BioMoby Semantic Web Services.
Now is the time to unravel the mysteries of cloud computing. The speaker is going to brief about Generative AI in the session that is related to study Jams 2023
Nature is the ultimate complex system. Nature 1.0 is seeds & soil. *Evolving.* Nature 2.0 adds silicon & steel. *Evolving.*
Presented to Complex Systems Group, Stanford University, on May 4, 2018.
This is a brief introduction to text analysis given at the April, 2020 meeting of the Ann Arbor Statistical Association. It covers sentiment scoring using TextBlob, and basic use of the NLTK module. All work is done in Python. The Jupyter notebook and .csv data set are in: https://drive.google.com/open?id=1o90unxJQrm0i3iycCJAqFpePGOLnovvg
Introduction to NLP with some practical exercises (tokenization, keyword extraction, topic modelling) using Python libraries like NLTK, Gensim and TextBlob, plus a general overview of the field.
Modern Machine Learning (ML) represents everything as vectors, from documents, to videos, to user behavior. This representation makes it possible to accurately search, retrieve, rank, and classify different items by similarity and relevance.
Running real-time applications that rely on large numbers of such high dimensional vectors requires a new kind of data infrastructure. In this talk we will discuss the need for such infrastructure, the algorithmic and engineering challenges in working with vector data at scale, and open problems we still have no adequate solutions for.
Time permitting, I will introduce Pinecone as a solution to some of these challenges.
Learn how to get small teams to embrace both using and contributing to open source in a healthy way. See case studies of open source projects and how they have contributed to the growth of a small team.
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017MLconf
Representation Learning @ Red Hat:
For many companies, the vast majority of their data is unstructured and unlabeled; however, the data often contains information that could be useful in a variety of scenarios. Representation learning is the process of extracting meaningful features from unlabeled data so that it can be used in other tasks. In this talk, you’ll hear about how Red Hat is using deep learning to discover meaningful entity representations in a number of different settings, including: (1) identifying duplicate documents on the Customer Portal, (2) finding contextually similar URLs with word2vec, and (3) clustering behaviorally similar customers with doc2vec. To close, we will walk through an example demonstrating how representation learning can be applied to Major League Baseball players.
Bio: Michael first developed his data crunching chops as an undergraduate at Auburn University (War Eagle!) where he used a number of different statistical techniques to investigate various aspects of salamander biology (work that led to several publications). He then went on to earn a M.S. in evolutionary biology from The University of Chicago (where he wrote a thesis on frog ecomorphology) before changing directions and earning a second M.S. in computer science (with a focus on intelligent systems) from The University of Texas at Dallas. As a Machine Learning Engineer – Information Retrieval at Red Hat, Michael is constantly looking for ways to use the latest and greatest machine learning technology to improve search.
Similar to The Neural Search Frontier - Doug Turnbull, OpenSource Connections (20)
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
With ecommerce experiencing explosive growth, it seems intuitive that the B2B segment of that ecosystem is mirroring the same trajectory. That said, B2B has very different needs when it comes to transacting with the same style of experiences that we see in B2C. For instance, B2B ecommerce is about precision findability, whereas B2C customers can convert at higher rates when they’re just browsing online. In order for the B2B buying experience to be successful, search needs to be tuned to meet the unique needs of the segment.
In this webinar with Forrester senior analyst Joe Cicman, you’ll learn:
-Which verticals in B2B will drive the most growth, and how machine-learning powered personalization tactics can be deployed to support those specific verticals
-Why an omnichannel selling approach must be deployed in order to see success in B2B
-How deploying content search capabilities will support a longer sales cycle at scale
-What the next steps are to support a robust B2B commerce strategy supported by new technology
Speakers
Joe Cicman, Senior Analyst, Forrester
Jenny Gomez, VP of Marketing, Lucidworks
Customer loyalty starts with quickly responding to your customer’s needs. When it comes to resolving open support cases, time is of the essence. Time spent searching for answers adds up and creates inefficiencies in resolving cases at scale. Relevant answers need to be a few clicks away and easily accessible for agents directly from their service console.
We will explore how Lucidworks’ Agent Insights application automatically connects agents with the correct answers and resources. You’ll learn how to:
-Configure a proactive widget in an agent’s case view page to access resources across third-party systems (such as Sharepoint, Confluence, JIRA, Zendesk, and ServiceNow).
-Easily set up query pipelines to autonomously route assets and resources that are relevant to the case-at-hand—directly to the right agent.
-Identify subject matter experts within your support data and access tribal knowledge with lightning-fast speed.
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
Lunch and Learn during Retail TouchPoints #RIC21 virtual event.
***
Crate & Barrel’s previous search solution couldn’t provide its shoppers with an online search and browse experience consistent with the customer-centric Crate & Barrel brand. Meanwhile, Crate & Barrel merchandisers spent the bulk of their time manually creating and maintaining search rules. The search experience impacted customer retention, loyalty, and revenue growth.
Join this lunch & learn for an interactive chat on how Crate & Barrel partnered with Lucidworks to:
-Improve search and browse by modernizing the technology stack with ML-based personalization and merchandising solutions
-Enhance the experience for both shoppers and merchandisers
-Explore signals to transform the omnichannel shopping experience
Questions? Visit https://lucidworks.com/contact/
Learn how to guide customers to relevant products using eCommerce search, hyper-personalisation, and recommendations in our ‘Best-In-Class Retail Product Discovery’ webinar.
Nowadays, shoppers want their online experience to be engaging, inspirational and fulfilling. They want to find what they’re looking for quickly and easily. If the sought after item isn’t available, they want the next best product or content surfaced to them. They want a website to understand their goals as though they were talking to a sales assistant in person, in-store.
In this webinar, we explore IMRG industry data insights and a best-in-class example of retail product discovery. You’ll learn:
- How AI can drive increased revenue through hyper-personalised experiences
- How user intent can be easily understood and results displayed immediately
- How merchandisers can be empowered to curate results and product placement – all without having to rely on IT.
Presented by:
Dave Hawkins, Principal Sales Engineer - Lucidworks
Matthew Walsh, Director of Data & Retail - IMRG
Connected Experiences Are Personalized ExperiencesLucidworks
Many companies claim personalization and omnichannel capabilities are top priorities. Few are able to deliver on those experiences.
For a recent Lucidworks-commissioned study, Forrester Consulting surveyed 350+ global business decision-makers to see what gets in the way of achieving these goals. They discovered that inefficient technology, lack of behavioral insights, and failure to tie initiatives to enterprise-wide goals are some of the most frequent blockers to personalization success.
Join guest speaker, Forrester VP and Principal Analyst, Brendan Witcher, and Lucidworks CEO, Will Hayes, to hear the results of the Forrester Consulting study, how to avoid “digital blindness,” and how to apply VoC data in real-time to delight customers with personalized experiences connected across every touchpoint.
In this webinar, you’ll learn:
- Why companies who utilize real-time customer signals report more effective personalization
- How to connect employees and customers in a shared experience through search and browse
- How Lucidworks clients Lenovo, Morgan Stanley and Red Hat fast-tracked improvements in conversion, engagement and customer satisfaction
Featuring
- Will Hayes, CEO, Lucidworks
- Brendan Witcher, VP, Principal Analyst, Forrester
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
Intelligent Policing. Leveraging Data to more effectively Serve Communities.
Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation.
Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform.
Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force.
In this webinar, we'll cover:
-The technology needs of an intelligent police force.
-How a Global Search improves an officer's interaction with existing data.
Featuring:
-Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks
-Michael Cizmar, Managing Director, MC+A
-Ian Williams, Manager of Analytics & Innovation, Toronto Police Service
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation.
Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform.
Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force.
In this webinar, we'll cover:
The technology needs of an intelligent police force.
How a Global Search improves an officer's interaction with existing data.
Featuring
-Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks
-Michael Cizmar, Managing Director, MC+A
-Ian Williams, Manager of Analytics & Innovation, Toronto Police Service
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
Wish your conversion rates were higher? Can’t figure out how to efficiently and effectively serve all the visitors on your site? Embarrassed by the quality of your product discovery experience? The bar is high and the influx of online shopping over recent months has reminded us that the opportunities are real. We’re all deep in holiday prep, but let’s take a few minutes to think about January 2021 and beyond. How can we position ourselves for success with our customers and against our competition?
Grab your lunch and let’s dive into three strategies that need to be part of your 2021 roadmap. You don’t need an army to get there. But you do need to take action and capitalize on the shoppers abandoning the product discovery journey on your site.
In this session, attendees will find out how to:
-Take control of merchandising at scale;
-Implement hands-free search relevancy; and
-Address personalization challenges.
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
For a personalized search experience, search curation requires robust text interpretation, data enrichment, relevancy tuning and recommendations. In order to achieve this, language and entity identification are crucial.
For teams working on search applications, advanced language packages allow them to achieve greater recall without sacrificing precision.
Join us for a guided tour of our new Advanced Linguistics packages, available in Fusion, thanks to the technology partnership between Lucidworks and Basistech.
We’ll explore the application of language identification and entity extraction in the context of search, along with practical examples of personalizing search and enhancing entity extraction.
In this webinar, we’ll cover:
-How Fusion uses the Rosette Basic Linguistics and Entity Extraction packages
-Tips for improving language identification and treatment as well as data enrichment for personalization
-Speech2 demo modeling Active Recommendation
-Use Rosette’s packages with Fusion Pipelines to build custom entities for specific domain use cases
Featuring:
-Radu Miclaus, Director of Product, AI and Cloud, Lucidworks, Lucidworks
-Robert Lucarini, Senior Software Engineer, Lucidworks
-Nick Belanger, Solutions Engineer, Basis Technology
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
Before COVID-19, almost 80% of the US workforce worked service in jobs that involve in-person interaction with strangers. Now, leaders of service organizations must reshape their offerings during the pandemic and prepare for whatever the new normal turns out to be. Our three panelists will share ideas for adapting their service businesses, now that closer-than-six-feet isn’t an option.
Join Lucidworks as we talk shop with 3 service business leaders, covering:
-Common impacts of the pandemic on service businesses (and what to do about them),
-How service teams can maintain a human touch across virtual channels, and
-Plans for the future, before and after the pandemic subsides.
Featuring
-Sara Nathan, President & CEO, AMIGOS
-Anthony Carruesco, Founder, AC Fly Fishing
-sara bradley, chef and proprietor, freight house
-Justin Sears, VP Product Marketing, Lucidworks
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
The COVID-19 pandemic has forced companies to support far more customers and employees through digital channels than ever before. Many are turning to chatbots to help meet increasing demand, but traditional rules-based approaches can’t keep up. Our new Smart Answers add-on to Lucidworks Fusion makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve.
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
Watch our on-demand webinar showcasing Smart Answers on Lucidworks Fusion. This technology makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve.
In this webinar, we’ll cover off:
-How search and deep learning extend conversational frameworks for improved experiences
-How Smart Answers improves customer care, call deflection, and employee self-service
-A live demo of Smart Answers for multi-channel self-service support
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
In the current climate, it’s now more important than ever to digitally enable your workforce and customers.
Hear from Simon Taylor, VP Global Partners & Alliances, Lucidworks and Matt Aslett, Research Vice President, 451 Research to get the inside scoop on how industry leaders in Europe are developing and executing their digital transformation strategies.
In this webinar, we’ll discuss:
The top challenges and aspirations European business and technology leaders are solving using AI and search technology
Which search and AI use cases are making the biggest impact in industries such as finance, healthcare, retail and energy in Europe
What technology buyers should look for when evaluating AI and search solutions
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
In this webinar with 451 Research, you'll understand how retailers are using AI to predict customer intent and learn which key performance metrics are used by more than 120 online retailers in Lucidworks’ 2019 Retail Benchmark Survey.
In this webinar, you’ll learn:
● What trends and opportunities are facing the ecommerce industry in 2020
● Why search is the universal path to understanding customer intent
● How large online retailers apply AI to maximize the effectiveness of their personalization efforts
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
Nordstrom Rack | Hautelook curates and serves customers a wide selection of on-trend apparel, accessories, and shoes at an everyday savings of up to 75 percent off regular prices. With over a million visitors shopping across different platforms every day, and a realization that customers have become accustomed to robust and personalized search interactions, Nordstrom Rack | Hautelook launched an initiative over a year ago to provide data science-driven digital experiences to their customers.
In this session, we’ll discuss Nordstrom Rack | Hautelook’s journey of operationalizing a hefty strategy, optimizing a fickle infrastructure, and rallying troops around a single vision of building an expansible machine-learning driven product discovery engine.
The audience will learn about:
-The key technical challenges and outcomes that come with onboarding a solution
-The lessons learned of creating and executing operational design
-The use of Lucidworks Fusion to plug custom data science models into search and browse applications to understand user intent and deliver personalized experiences
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
Knowledge graphs and machine learning are on the rise as enterprises hunt for more effective ways to connect the dots between the data and the business world. With newer technologies, the digital workplace can dramatically improve employee engagement, data-driven decisions, and actions that serve tangible business objectives.
In this webinar, you will learn
-- Introduction to knowledge graphs and where they fit in the ML landscape
-- How breakthroughs in search affect your business
-- The key features to consider when choosing a data discovery platform
-- Best practices for adopting AI-powered search, with real-world examples
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
3. Relevance: challenge of 'aboutness'
bitcoin regulation
'bitcoin' - aboutness
‘regulation’
aboutness
X
X
Doc 1
Doc 2
X
Query
Doc 2 more about terms,
ranked 'higher'
4. One hypothesis: TF*IDF vector similarity (and bm25 ...)
bitcoin regulation
'bitcoin' - TF*IDF
‘regulation’
TF*IDF
X
X
Doc 1
Doc 2
X
Query
Doc 2 more 'about' because higher
concentration of the corpus's
'bitcoin' and 'regulation' terms
5. Sadly 'Aboutness' is hard
After minutes - simple
TF*IDF
After years
relevance /
feature eng.
work
Possible
Search
quality
Why!?
● Searcher vocab mismatches
● Searchers expect context
(personal, emotional, ...)
● Different entities to match to
● Language evolves
● Misspellings, typos, etc
● TF*IDF doesn't always matter
6. There's Something About Aboutness?
Neural Search Frontier
Hunt for the About October?
The Last Sharknado: It's *About* Time?
7. Embeddings: another vector sim
A vector storing information about the context a word appears in
“Have you heard the hype about bitcoin currency?”
“Bitcoin a currency for hackers?”
“Have you heard the hype about bitcoin currency?”
“Hacking on my Bitcoin server! It's not hyped”
Encode
contexts
[0.6,-0.4] bitcoin
Docs
8. Easier to see as a graph
[0.5,-0.5] cryptocurrency
[0.6,-0.4] bitcoin
These two terms
occur in similar
contexts
Graph from Desmos
9. Document embeddings
Encode full text as context for document
Encode [0.5,-0.4] doc1234
Have you heard of cryptocurrency? It's the cool
currency being worked on by hackers! Bitcoin is
one prominent example. Some say it's all hype
...
10. Is shared 'context' same as 'aboutness'!?
[0.5,-0.5] cryptocurrency
[0.6,-0.4] bitcoin
[0.5,-0.4] doc1234
[0.4,-0.2] doc5678
Embeddings:
bitcoin Closer docs to 'bitcoin'
ranked higher
1. doc1234
2. doc5678
11. Sadly no :(
I want to cancel my reservation
I want to confirm my reservation
Words occur in similar contexts can
have opposite meanings!!
'Doc about canceling' not
relevant for 'confirm'
12. Fundamentally there’s still a mismatch
Fiat currencies
have a global
conspiracy
against
dogecoin
bit dollar?
Crypto expert creates
content with nerd
speak
Average Citizen
searches in lay-
speak
Embeddings derived from corpus, not our
searcher’s language
15. Dot Product
(‘similarity’)
Sigmoid
(forces to 0..1
‘probability’)
[5.5,
8.1,
...
0.6]
'Bitcoin'
Embedding
[6.0,
1.7,
...
8.4]
'Energy'
Embedding
5.5*6.0 +
8.1*1.7 +
...
+ 0.6*8.4 =
102.65
0...1
0.67
Out of
context True
context
Term Other Term In Same Context? Model Prediction
bitcoin energy 1 0.67
Trying to predict
A word2vec Model
Term
embedding
Table
(initialized w/
random vals)
Term
embedding
Table
(initialized w/
random vals)
16. Tweak…
Term Other
Term
True
Context?
Prediction
bitcoin energy 1 0.67
bitcoin dongles 0 0.78
bitcoin relevance 0 0.56
bitcoin cables 0 0.34
To get into the weeds on the loss function & gradient descent for
word2vec:Stanford CS224D Deep Learning for NLP Lecture Notes
Chaubard, Mundra, Socher
Random
unrelated
terms
Terms
sharing
context
Tweak ‘bitcoin’ to get
prediction closer to 1
Tweak ‘bitcoin’ to get
prediction closer to 0
Goal:
Goal:
(showing skipgram / negative sampling method)
17. Maximize Me!
-
Maximize ‘True’ Contexts
Dot Product
(‘similarity’)
Sigmoid
(forces to
0..1)
[5.5,
8.1,
...
0.6]
[6.0,
1.7,
...
8.4]
Minimize ‘False’ Contexts
Just the model (sigmoid of dot product)
F =
18. Did you just neural network?
Backpropagate tweaks to weights
to reduce error
Neuron
(sigmoid)
Word1Word2
(Dot product implicit)
d F
d v[0]
d F
d v[1]
d F
d v[2]
...
Our fitness function
Tweak this
component to
maximize fit
More complex/'deep' neural nets, can
propagate back error to earlier layers
to learn weights
19. Dot Product
(‘similarity’)
Sigmoid
(forces to 0..1)
[5.5,
8.1,
...
0.6]
'Bitcoin'
Embedding
[6.0,
1.7,
...
8.4]
'Energy'
Embedding
5.5*6.0*2.0 +
8.1*1.7*0.5 +
...
+ 0.6*8.4*1.4 =
102.65
0...1
0.67
Out of
context True
context
Doc2Vec, train paragraph vector w/ term vectors
[2.0,
0.5,
...
1.4]
Para.
embedding
matrix
Term
embedding
matrix
Term
embedding
matrix
backpropagate
20. Same neg sampling can apply here
Term Para True
Context?
Prediction
bitcoin Doc 1234
Para 1
1 0.67
bitcoin Doc 5678
Para 5
0 0.78
bitcoin Doc 1234
Para 8
0 0.56
bitcoin Doc 1537
Para 1
0 0.34
I’m continuing the thread of negative sampling, but doc2vec need not
use negative sampling
Random
unrelated
docs
Terms
sharing
context
Tweak embeddings to
get prediction closer to 1
Tweak embeddings to
get prediction closer to 0
21. Embedding Modeling:
[0.5,-0.4] doc1234
...For years you've been
manipulating this...
Your new clay to mold...
Manipulating the constructions of
embeddings to measure something *domain
specific*
24. Embeddings from judgments?
Query
Term
Doc Relevant? Relevance
Score
bitcoin Doc 1234 1 0.67
bitcoin Doc 5678 0 0.78
bitcoin Doc 1234 0 0.56
bitcoin Doc 1537 0 0.34
Random
unrelated
docs
Relevant
doc for
term
Tweak embeddings to
get query-doc score
closer to 1
Tweak embeddings to
get prediction closer to 0
(derived from
clicks, etc)
25. Pretrain word2vec with corpus/sessions?
Query
Term
Doc Corpus
Model
True
Relevance
Tweaked
Score
bitcoin Doc 1234 0 1 0.01
bitcoin Doc 5678 1 0 0.99
bitcoin Doc 1234 0 0 0.56
bitcoin Doc 1537 0 0 0.34
Takes a lot to overcome
original unified model,
and its a case-by-case
basis
(a kind of prior)
An online / evolutionary approach to converge on improved
embeddings?
27. Embeddings go beyond language
User Item Purchased? Recommen
ded?
Tom Jeans 1 0.67
Tom Khakis 0 0.78
Tom iPad 0 0.56
Tom Dress
Shoes
0 0.34
Tweak embeddings to
get user-item score
closer to 1
Tweak embeddings to
get prediction closer to 0
(use to find similar items / users)
28. More features beyond just 'context'
Query Term Doc Sentiment Same
Context
TweakedSco
re
cancel Doc 1234 Angry 1 0.01
confirm Doc 5678 Angry 0 0.99
cancel Doc 1234 Happy 0 0.56
confirm Doc 1537 Happy 0 0.34
See: https://arxiv.org/abs/1805.07966
29. Evolutionary/contextual bandit embeddings?
Query
Term
Doc Corpus
Model
True
Relevance
Tweaked
Score
bitcoin Doc 1234 0 1 0.01
bitcoin Doc 5678 1 0 0.99
bitcoin Doc 1234 0 0 0.56
bitcoin Doc 1537 0 0 0.34
Pull embedding in the
direction your KPIs seem
to say is successful
(a kind of prior)
An online / evolutionary approach to converge on improved
embeddings?
30. Embedding Frontier...
There's a lot we could do
- Model specificity: high specificity terms cluster differently than low
specificity terms
- You often data-model/clean A LOT before computing your embeddings
(just like you do your search index)
- See Simon Hughes "Vectors in Search" talk on encoding vectors and
using them in Solr
33. Language Model
Can seeing text, and guessing what would "come next" help us on our
quest for 'aboutness'?
“eat __?_____”
cat: 0.001
...
chair: 0.001
pizza: 0.02
...
nap: 0.001
Highest
Probability
Obvious applications: autosuggest, etc
34. Markov language model, vocab size V
pizza chair nap ...
eat 0.02 0.0002 0.0001 ...
cat 0.0001 0.0001 0.02 ...
... ... ... ...
Probability of ‘pizza’ following
‘eat’ (as in “eat pizza”)
V words
Vwords
From corpus we can build a transition matrix
reflecting frequency of word transitions:
35. Transition Matrix Math
pizza chair nap ...
eat 0.02 0.0002 0.0001 ...
cat 0.0001 0.0001 0.02 ...
... ... ... ...
Let's say we don't 100% know the 'current' word can
we still guess the 'next' word?
Prob
eat 0.25
cat 0.40
... ...
X =
Prob 'pizza' next P(chair) P(nap) ...
0.25*0.02+0.40*0.0001 + ... ... ... ...
Pcurr
Pnext:
Vwords
V words
36. Language model for embeddings?
[5.5,
8.1,
...
0.6]
Term
embedding
matrix
‘eat’
embedding
‘h’dimensions
0 1 2 ... h
0 0.2 -0.4 0.1
1 -0.1 0.3 -0.24
2
...
h
X
[0.5,
6.1,
...
-4.8]
Embedding
of next word
=
(probably clusters near
‘pizza’ etc)A transition matrix - weights learned
through backpropagation
41. Not a silver bullet
[5.5,
W_xh
[-5.5,
1.1,
... W_hh
[5.0,
[1.5,
0.1,
...
W_xh
[ 8.5,
-1.1,
...
[-5.5,
1.1,
...
[5.5,
W_hy
W_xh
A dog is loose! Please call the
animal control
dog catcher
(corpus language)
(searcher language)
Won't learn this ideal
language model if
corpus never says
'dog catcher'
44. 'Translate' documents to queries?
Encoder RNN Decoder RNN
Animal Control Law
Herefore let it be …
And therefore… with
much to be blah blah
blah
…
…
The End
Predicted Queries
Doc Query Relevant?
Animal Control Law Dog catcher 1
Littering Law Dog Catcher 0
Using judgments/clicks as training data:
45. Can we use graded judgments?
Encoder RNN Decoder RNN
Animal Control Law
Herefore let it be …
And therefore… with
much to be blah blah
blah
…
…
The End
Predicted Queries
Doc Query Relevant?
Animal Control Law Dog catcher 4
Animal Leash Law Dog Catcher 2
Construct training data to
reflect weighting
46. Skip-Thought Vectors (a kind of ‘sentence2vec’)
Encoder RNN
Decoder RNN
(sentence before)
Decoder RNN
(sentence after)
It’s fleece was white
as snow
Becomes an embedding for the
sentence encoding semantic
meaning
Mary had a little lamb
And everywhere that
mary went, that lamb
was sure to go
47. Skip-Thought Vectors for queries? (‘doc2queryVec’)
Encoder RNN
Decoder RNN
(relevant query)
Decoder RNN
(relevant query)
Becomes an embedding mapping
docs into the query’s semantic
space
q=dog catcher
q=loose dog
Animal Control Law
Herefore let it be …
And therefore… with
much to be blah blah
blah
…
…
The End
Decoder RNN
(relevant query)
...
q=dog fight
My Thoughts on Skip Thoughts
49. Anything2vec
Deep learning is especially good at learning representations of :
- Images
- User History
- Songs
- Video
- Text
If everything can be ‘tokenized’ into some kind of token space for retrieval
Everything can be ‘vectorized’ into some embedding space for retrieval / ranking
50. Lucene community needs to get better first-class
vector support
Similarity API and vectors ?
long computeNorm(FieldInvertState state)
SimWeight computeWeight(float boost, CollectionStatistics
collectionStats, TermStatistics... termStats)
SimScorer simScorer(SimWeight weight, LeafReaderContext context)
51. Some progress
- Delimited Term Frequency Filter Factory (use tf to encode
- Payloads & Payload Score
Check out Simon Hughes talk: "Vectors in Search"
52. It’s not magic, it’s math!
Join the Search Relevancy Slack Community
http://o19s.com/slack
(projects, chat, conferences, help, book authors, and more…)
53. Matching: vocab mismatch!
Taxonomical Semantical Magical Search Doug Turnbull,
Lucene/Solr Revolution 2017
"Animal
Control
Law"
Dog catcher
Dog
Catcher
Animal
Control
Concept
1234
Taxonomist /
Librarian
Costly to manually
maintain mapping
between searcher &
corpus vocabulary
No Match, despite relevant results!
legalese
lay-speak
54. Animal Control Law
Herefore let it be …
And therefore… with
much to be blah blah
blah
…
…
The End
Ranking optimization hard!
Dog catcher
Score
doc
Ranked
Results
LTR is only as good as
your features: Solr/ES
queries based on your
skill with Solr/ES queries
LTR?
Ranking can be hard
optimization problem:
Fine tuning heuristics:
TF*IDF, Solr/ES queries,
analyzers, etc...
55. How can deep learning help?
Taxonomical Semantical Magical Search Doug Turnbull,
Lucene/Solr Revolution 2017
Can we learn, quickly at scale, the semantic
relationships between concepts?
What other forms of every-day enrichment
could deep-learning accelerate?
Dog catcher
Animal control
57. Can we translate between searcher & corpus?
[5.5,
8.1,
...
0.6]
[2.0,
0.5,
...
1.4]
Insert Machine
Learning Here
Ranking
Prediction
Doc
Search term
(classic LTR could go here)
58. Embeddings
Words with similar context share close embeddings
Cryptocurrency is not hyped! It's the future
Hackers invent their own cryptocurrency
Cryptocurrency is a digital currency
Hyped cryptocurrency is for hackers!
Encode
contexts
[0.5,-0.5] cryptocurrency
[0.6,-0.4] bitcoin
Editor's Notes
This doesn’t sufficiently generalize the queries
Likely don’t have as much training data in relevance judgments as in the unlabeled data from the corpus
This is a kind of point-wise learning to rank
This doesn’t sufficiently generalize the queries
Likely don’t have as much training data in relevance judgments as in the unlabeled data from the corpus
This is a kind of point-wise learning to rank
Alternatively, the relevance-based embeddings are a kind of prior
We can leverage information about possibly relevant documents (clicks, query logs, etc.) wrt queries while learning embeddings. We can adjust query vectors and mitigate the possible vocabulary distinction between docs and queries sets
This doesn’t sufficiently generalize the queries
Likely don’t have as much training data in relevance judgments as in the unlabeled data from the corpus
This is a kind of point-wise learning to rank
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior
Alternatively, the relevance-based embeddings are a kind of prior