Describes techniques for injecting "Semantic Intelligence" into search applications. Focuses on Apache Solr and Lucidworks Fusion, but these techniques are generally applicable to any search engine because all of them use the same basic mechanism - inverted token mapping at their 'core'.
Php 102: Out with the Bad, In with the GoodJeremy Kendall
In this session, we'll look at a typical PHP application, review a few of the horrible mistakes the fictional developer made, and then refactor the app according to some best practices. Along the way you might even learn a thing or two about PHP you don't already know.
Php 102: Out with the Bad, In with the GoodJeremy Kendall
In this session, we'll look at a typical PHP application, review a few of the horrible mistakes the fictional developer made, and then refactor the app according to some best practices. Along the way you might even learn a thing or two about PHP you don't already know.
Webinar: Simpler Semantic Search with SolrLucidworks
Hear from Lucidworks Senior Solutions Consultant Ted Sullivan about how you can leverage Apache Solr and Lucidworks Fusion to improve semantic awareness of your search applications.
From Natural Language Processing to Artificial IntelligenceJonathan Mugan
Overview of natural language processing (NLP) from both symbolic and deep learning perspectives. Covers tf-idf, sentiment analysis, LDA, WordNet, FrameNet, word2vec, and recurrent neural networks (RNNs).
Programming is hard, but we can magnify our efforts with excellent API design. Let’s explore how, as we consider compactness, orthogonality, consistency, safety, coupling, state handling, layering, and more, illustrated with practical examples (and gruesome mistakes!) from several popular Python libraries.
Apple's Swift has achieved the top place in Stack Overflow's "Most Loved" list of programming languages in its 2015 Developer Survey. Based on information gleaned from GitHub and Stack Overflow, analyst firm RedMonk has seen Swift's popularity ranking soar from 68 to 22 in an unprecedented 6 months.
The "Extreme Swift" event does not require advanced, or even any, knowledge of Swift. Learn about some of the more outrageous features of the language which help explain what the fuss is all about!
Never look at programming the same way again — even if you never end up writing a single line of Swift code in your life.
Drupal 8: A story of growing up and getting off the islandAngela Byron
The Drupal project has traditionally held a strong internal value for doing things "The Drupal Way." As a result, Drupal developers have historically needed to build up reams and reams of tricks and workarounds that were specific to Drupal itself, and Drupal was inaccessible to people with a more traditional programming background. Starting in Drupal 8, however, we've effectively done a ground-up rewrite of the underlying code and in the process made major inroads to getting more inline with the rest of the PHP world. Procedural code is out, OO code is in. "Creative" hacks have been replaced with FIG standards. "Not invented here" is now "Proudly found elsewhere." This story will talk about the journey that Drupal 8 and the Drupal core development team has taken during this transition over the past 3+ years, including some of the pros and cons of this approach and how we dealt (and are dealing) with some of the community management challenges that resulted.
Nature is the ultimate complex system. Nature 1.0 is seeds & soil. *Evolving.* Nature 2.0 adds silicon & steel. *Evolving.*
Presented to Complex Systems Group, Stanford University, on May 4, 2018.
From list sorting to network routing, and from hash tables to capacity planning, a programmer's daily work is filled with probability. We use probabilistic algorithms, data structures, and systems constantly often without even thinking about it. Experienced engineers reach for probabilistic algorithms frequently and intentionally, especially when building systems of serious scale. How do probabilistic algorithms actually work in practice? And how do we know they'll be safe and reliable in our critical production systems? We'll address those questions, explore a few algorithms, and see why "with high probability" is often better than "exactly".
This talk discusses ways to keep work playful (and as a side effect do better work), including:
* Dealing with crusty data formats and protocols in a lighthearted way
* Scripting other people’s software (whether they know it or not)
* Sharing your code with co-workers without annoying them
* Deploying your programs to honest-to-goodness paying customers
Understanding Nidhi Software Pricing: A Quick Guide 🌟
Choosing the right software is vital for Nidhi companies to streamline operations. Our latest presentation covers Nidhi software pricing, key factors, costs, and negotiation tips.
📊 What You’ll Learn:
Key factors influencing Nidhi software price
Understanding the true cost beyond the initial price
Tips for negotiating the best deal
Affordable and customizable pricing options with Vector Nidhi Software
🔗 Learn more at: www.vectornidhisoftware.com/software-for-nidhi-company/
#NidhiSoftwarePrice #NidhiSoftware #VectorNidhi
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Webinar: Simpler Semantic Search with SolrLucidworks
Hear from Lucidworks Senior Solutions Consultant Ted Sullivan about how you can leverage Apache Solr and Lucidworks Fusion to improve semantic awareness of your search applications.
From Natural Language Processing to Artificial IntelligenceJonathan Mugan
Overview of natural language processing (NLP) from both symbolic and deep learning perspectives. Covers tf-idf, sentiment analysis, LDA, WordNet, FrameNet, word2vec, and recurrent neural networks (RNNs).
Programming is hard, but we can magnify our efforts with excellent API design. Let’s explore how, as we consider compactness, orthogonality, consistency, safety, coupling, state handling, layering, and more, illustrated with practical examples (and gruesome mistakes!) from several popular Python libraries.
Apple's Swift has achieved the top place in Stack Overflow's "Most Loved" list of programming languages in its 2015 Developer Survey. Based on information gleaned from GitHub and Stack Overflow, analyst firm RedMonk has seen Swift's popularity ranking soar from 68 to 22 in an unprecedented 6 months.
The "Extreme Swift" event does not require advanced, or even any, knowledge of Swift. Learn about some of the more outrageous features of the language which help explain what the fuss is all about!
Never look at programming the same way again — even if you never end up writing a single line of Swift code in your life.
Drupal 8: A story of growing up and getting off the islandAngela Byron
The Drupal project has traditionally held a strong internal value for doing things "The Drupal Way." As a result, Drupal developers have historically needed to build up reams and reams of tricks and workarounds that were specific to Drupal itself, and Drupal was inaccessible to people with a more traditional programming background. Starting in Drupal 8, however, we've effectively done a ground-up rewrite of the underlying code and in the process made major inroads to getting more inline with the rest of the PHP world. Procedural code is out, OO code is in. "Creative" hacks have been replaced with FIG standards. "Not invented here" is now "Proudly found elsewhere." This story will talk about the journey that Drupal 8 and the Drupal core development team has taken during this transition over the past 3+ years, including some of the pros and cons of this approach and how we dealt (and are dealing) with some of the community management challenges that resulted.
Nature is the ultimate complex system. Nature 1.0 is seeds & soil. *Evolving.* Nature 2.0 adds silicon & steel. *Evolving.*
Presented to Complex Systems Group, Stanford University, on May 4, 2018.
From list sorting to network routing, and from hash tables to capacity planning, a programmer's daily work is filled with probability. We use probabilistic algorithms, data structures, and systems constantly often without even thinking about it. Experienced engineers reach for probabilistic algorithms frequently and intentionally, especially when building systems of serious scale. How do probabilistic algorithms actually work in practice? And how do we know they'll be safe and reliable in our critical production systems? We'll address those questions, explore a few algorithms, and see why "with high probability" is often better than "exactly".
This talk discusses ways to keep work playful (and as a side effect do better work), including:
* Dealing with crusty data formats and protocols in a lighthearted way
* Scripting other people’s software (whether they know it or not)
* Sharing your code with co-workers without annoying them
* Deploying your programs to honest-to-goodness paying customers
Similar to The well tempered search application (20)
Understanding Nidhi Software Pricing: A Quick Guide 🌟
Choosing the right software is vital for Nidhi companies to streamline operations. Our latest presentation covers Nidhi software pricing, key factors, costs, and negotiation tips.
📊 What You’ll Learn:
Key factors influencing Nidhi software price
Understanding the true cost beyond the initial price
Tips for negotiating the best deal
Affordable and customizable pricing options with Vector Nidhi Software
🔗 Learn more at: www.vectornidhisoftware.com/software-for-nidhi-company/
#NidhiSoftwarePrice #NidhiSoftware #VectorNidhi
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
AI Genie Review: World’s First Open AI WordPress Website CreatorGoogle
AI Genie Review: World’s First Open AI WordPress Website Creator
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-genie-review
AI Genie Review: Key Features
✅Creates Limitless Real-Time Unique Content, auto-publishing Posts, Pages & Images directly from Chat GPT & Open AI on WordPress in any Niche
✅First & Only Google Bard Approved Software That Publishes 100% Original, SEO Friendly Content using Open AI
✅Publish Automated Posts and Pages using AI Genie directly on Your website
✅50 DFY Websites Included Without Adding Any Images, Content Or Doing Anything Yourself
✅Integrated Chat GPT Bot gives Instant Answers on Your Website to Visitors
✅Just Enter the title, and your Content for Pages and Posts will be ready on your website
✅Automatically insert visually appealing images into posts based on keywords and titles.
✅Choose the temperature of the content and control its randomness.
✅Control the length of the content to be generated.
✅Never Worry About Paying Huge Money Monthly To Top Content Creation Platforms
✅100% Easy-to-Use, Newbie-Friendly Technology
✅30-Days Money-Back Guarantee
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIGenieApp #AIGenieBonus #AIGenieBonuses #AIGenieDemo #AIGenieDownload #AIGenieLegit #AIGenieLiveDemo #AIGenieOTO #AIGeniePreview #AIGenieReview #AIGenieReviewandBonus #AIGenieScamorLegit #AIGenieSoftware #AIGenieUpgrades #AIGenieUpsells #HowDoesAlGenie #HowtoBuyAIGenie #HowtoMakeMoneywithAIGenie #MakeMoneyOnline #MakeMoneywithAIGenie
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
1. The Well-Tempered Search
Application
Variations on a Theme:
Why does my search app suck, and
what can I do about it?
Ted Sullivan – (old Phuddy Duddy)
Senior (very much so I’m afraid) Solutions (I hope) Architect (and sometime plumber)
Lucidworks Technical Services
2. Our Basic Premises (Premisi?)
• Lemma 1: Search Applications use algorithms that make
finding chunks of text within large datasets possible in
HTT (human-tolerable time).
• Lemma 2: These algorithms work by breaking text into
primitive components and building up a search
“experience” from that.
• Lemma 3: Lemma 2 is not sufficient to achieve Lemma 1.
3. The Basic Disconnect
• Text can be analyzed at the level of tokens
(syntax) and at the level of meaning
(semantics).
• We think one way (semantics), search engines
think another (syntax – i.e. token order).
• How do we bridge the gap? … More clever
algorithms!
4. Art and Science
• We need to be intelligent curators of these
algorithms. Craftsmen (craftswomen?) that
think of these as tools with a specific purpose.
• Like any good craftsperson – we need a wide
array of tools to get the job done (well
almost).
5. When is my search app done?
• Quick answer: NEVER (ain’t consultin’ great?)
• Long answer – As long as it is continues to
improve, like fine wine or bourbon, you are on
the path to enlightenment.
• How do you get there grasshopper? Add
semantic intelligence to the engine!
6. Search cannot be shrink-wrapped!!
What have we got for Donny behind Curtain #1 Jay?
Well Monty - Heeeeeeeeeeeerrrrrrrreeeeesssss the
Google … SEARCH Appliance!!!!
7. Search cannot be shrink-wrapped!!
What have we got for Donny behind Curtain #1 Jay?
Well Monty - Heeeeeeeeeeeerrrrrrrreeeeesssss the
Google … SEARCH Appliance!!!!*
Sorry Donny – It’s a ZONK!
* but Google Web Search has some Serious Mojo!
8. Prelude part 1– The basic problem
The inverted index and “bag-of-words” search:
The red fox jumped over the fence.
Time flies like an arrow. Fruit flies like a banana.
the 1,6
red 2
fox 3
jumped 4
over 5
fence 7
flies 2,7
like 3,8
9. Prelude part B – The Tried and True
• Phrase and Proximity boosting and “Slop”
• Synonyms and stop words
• Stemming or Lemmatization
• Autocomplete
• Best Bets / Landing Pages – the
sledgehammer
• Spell check – spell suggest – aka the warm
fuzzies.
10. Fugue - Subject or Exposition
Search engines need more ‘semantic
awareness’ or at least the illusion of this.
There is a heavy duty solution called Artificial
Intelligence – which except in the fertile
imagination of Hollywood screenwriters, is not
there yet. So we need to fake it just a bit.
11. Theme and Variations I
autophrasing and the red sofa
Theme: When multiple words mean just one thing.
Fuzzy way: Boosting phrases (proximity and phrase slop)
- pushes false positives down – i.e. out of the limelight
- i.e. - shoves ‘em under the rug
This encounters a problem with faceted search
Like the eye of Sauron in LOTR or Santa Claus, the faceting engine SEES
ALL (sins)!
Brake Pads example: hit on things that have ‘brake’ (like children’s stroller
brakes) and ‘pads’ – like mattress pads.
12. Variation I: Autophrasing
AutophrasingTokenFilter tells Lucene not to
tokenize when a noun phrase represents a single
thing - by providing a flat list of phrases.
Creates one-to-one token mapping that Lucene
prefers because it avoids the “sausagization”
problem.
https://github.com/LucidWorks/auto-phrase-tokenfilter
13. income tax refund
income tax
tax refund
“income tax” is not income.
A “tax refund” is not a tax.
Solution: Autophrasing + synonym mapping
income tax => tax
tax refund => refund
Autophrasing Example
15. Multi-term synonym problem
• New York, New York – it’s a HELLOVA town!
Subject was inspired by an old JIRA ticket: Lucene-1622
“if multi-word synonyms are indexed together with the
original token stream (at overlapping positions), then a query
for a partial synonym sequence (e.g., “big” in the synonym
“big apple” for “new york city”) causes the document to
match”
(or “apple” which will hit on my blog post if you crawl
lucidworks.com !)
16. This means certain phrase queries should match but don't (e.g.: "hotspot
is down"), and other phrase queries shouldn't match but do (e.g.: "fast
hotspot fi").
Other cases do work correctly (e.g.: "fast hotspot"). We refer to this
"lossy serialization" as sausagization, because the incoming graph is
unexpectedly turned from a correct word lattice into an incorrect
sausage.
This limitation is challenging to fix: it requires changing the index format
(and Codec APIs) to store an additional int position length per position,
and then fixing positional queries to respect this value.
Sausagization: from Mike McCandless blog Changing Bits:
Lucene's TokenStreams are actually graphs!
http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
17. Multi-term synonym demo
new york
new york state
empire state
new york city
new york new york
big apple
ny ny
city of new york
state of new york
ny state
autophrases.txt
new_york => new_york_state,new_york_city,big_apple,
new_york_new_york,ny_ny,nyc,empire_state,ny_state,
state_of_new_york
new_york_state,empire_state,ny_state,
state_of_new_york
new_york_city,big_apple,new_york_new_york,ny_ny,nyc,
city_of_new_york
synonyms.txt
18. This document is about new york state.
This document is about new york city.
There is a lot going on in NYC.
I heart the big apple.
The empire state is a great state.
New York, New York is a hellova town.
I am a native of the great state of New York.
new york new york city new york state
Multi-term synonym demo
/select /autophrase
19. This document is about new york state.
This document is about new york city.
There is a lot going on in NYC.
I heart the big apple.
The empire state is a great state.
New York, New York is a hellova town.
I am a native of the great state of New York.
empire state
Multi-term synonym demo
/select /autophrase
(Even a blind squirrel finds a nut once in a while)
20. Variation II: The Red Sofa Problem
{
"response":{"numFound":3,"start":0,"docs":[
{
"color":"red",
”text":"This is the red sofa example. Please find with 'red sofa' query.",
{
"color":"red",
”text":"This is a red beach ball. It is red in color but is not something that you
should not sit on because you would tend to roll off.",
{
"color":"blue",
”text":"This is a blue sofa, it should only hit on sofas that are blue in color."]
}}
OOTB – q=red sofa is interpreted as text:red text:sofa (default OR)
http://localhost:8983/solr/collection1/select?q=red+sofa&wt=json
21. Closing the Loop:
Content Tagging and Intelligent Query Filtering
Using the search index itself as the knowledge source:
22. Solution for the Red Sofa problem
Query Autofiltering: Search Index driven query introspection /
query rewriting:
23. Lucene FieldCache Magic
Lucene FieldCache (to be renamed UninvertedIndex in Lucene 5.0)
Inverted Index:
Show all documents that have this term value in this field.
Uninverted or Forward Index:
Show all term values that have been indexed in this field.
SolrIndexSearcher searcher = rb.req.getSearcher();
SortedDocValues fieldValues = FieldCache.DEFAULT.getTermsIndex(
searcher.getAtomicReader( ), categoryField );
…
StringTokenizer strtok = new StringTokenizer ( query, " .,:;"'" );
while (strtok.hasMoreTokens( ) ) {
String tok = strtok.nextToken( ).toLowerCase( );
BytesRef key = new BytesRef( tok.getBytes() );
if (fieldValues.lookupTerm( key ) >= 0) {
25. Architecture: its all about Plumbing
• Pipelines for every occasion.
Indexing Pipelines – good ‘ole ETL
- Content enrichment, tagging
- Metadata cleanup
Query Pipelines
– identification, query preprocessing
- introspection
One is the “hand” the other, the “glove”
26. Index Pipelines
Lots of choices here:
• Internal to Solr – DIH, UpdateRequestProcessor
Pros and cons
• External – Morphlines, Open Pipeline, Flume,
Spark, Hadoop, Custom SolrJ
• Lucidworks Fusion
27. Entity and Fact Extraction
Entities:
Things, Locations, Dates, People, Organizations, Concepts
Entity Relationships
Company was acquired by Company
Drug cures Disease
Person likes Pizza
Annotation Pipelines (UIMA, Lucidworks Fusion):
Entity Extraction followed by Fact Extraction
Pattern method:
$Drug is used to treat $Condition
Parts of Speech (POS) analysis
Subject Predicate Object
28. Theme and Variations II
The Classification Wars
• Machine Learning or Taxonomy – is it a Floor
Wax or a Dessert Topping?
Answer: It’s a floor wax AND a dessert topping! Its delicious and just look at that shine!
29. Machine Learning
Use mathematical vector-crunching algorithms like Latent
Dirichlet Allocation (LDA), Bayesian Inference, Maximum
Entropy, log likelihood, Support Vector Machines (SVM) etc., to
find patterns and to associate those patterns with concepts.
Can be supervised (i.e. given a training set) or unsupervised (the
algorithm just finds clusters). Supervised learning are called
semi-automatic classifiers.
Check out Taming Text by Ingersoll, Morton and
Farris (Manning)
31. Taxonomy or Ontology
“Knowledge graphs” that relate things and concepts to each other
either hierarchically or associatively.
Pros:
Works without large amounts of content to analyze
Encapsulates the knowledge of human subject matter experts
Cons:
Often not well designed for search (mixes semantic relationship
types / organizational logic)
Requires curation by subject matter experts whose time is costly
32. Taxonomies Designed for Search
Category nodes and Evidence nodes
Category Node:
A ‘parent’ node
Can have child nodes that are:
Sub Categories
Evidence Nodes
Evidence Node:
Tends to be a leaf node (no children)
Contains keyterms (synonyms)
May contain “rules” e.g. (if contains term a and term b but not term c)
Evidence Nodes can have more than one category node parent
Hits on Evidence Nodes add to the cumulative score of a Category Node.
Scores can be diluted as the accumulate up the hierarchy – so that the
nearest category gets the strongest ‘vote’.
34. Ford, GM, Chrysler
Fortune 100 Companies
Energy
Financial Services
Investment Banks
Commercial Banks
Health Care
Health Insurance
HMO
Medical Devices
Pharmaceuticals
Hospitality
Manufacturing
Aircraft
Automobiles
Electrical Equipment
US Corporations
Foreign Corporations
British
Chinese
French
German
Japanese
Russian
etc.
35. Ford, GM, Chrysler,Toyota,BMW
Fortune 100 Companies
Energy
Financial Services
Investment Banks
Commercial Banks
Health Care
Health Insurance
HMO
Medical Devices
Pharmaceuticals
Hospitality
Manufacturing
Aircraft
Automobiles
Electrical Equipment
US Corporations
Foreign Corporations
British
Chinese
French
German
Japanese
Russian
etc.
36. Fortune 100 Companies
Energy
Financial Services
Investment Banks
Commercial Banks
Health Care
Health Insurance
HMO
Medical Devices
Pharmaceuticals
Hospitality
Manufacturing
Aircraft
Automobiles
Electrical Equipment
Ford, GM, Chrysler,Toyota,BMW
GE, Boeing
US Corporations
Foreign Corporations
British
Chinese
French
German
Japanese
Russian
etc.
37. Fortune 100 Companies
Energy
Financial Services
Investment Banks
Commercial Banks
Health Care
Health Insurance
HMO
Medical Devices
Pharmaceuticals
Hospitality
Manufacturing
Aircraft
Automobiles
Electrical Equipment
Ford, GM, Chrysler,Toyota,BMW
GE, Boeing
Bank of America, Hyatt
US Corporations
Foreign Corporations
British
Chinese
French
German
Japanese
Russian
etc.
38. Query Pipelines
The ‘Wh’ Words: Who, What, When, Where
Who are they (authentication)?
What can they see (security - authorization)?
When can they see it (entitlement)?
What are they interested in (personalization / recommendation)?
Where are they now (location)?
39. Query Pipelines
Inferential Search
Query introspection -> Query modification.
Query Autofiltering
Are you feeling lucky today?
Topic boosting / spotlighting
Use ML to detect the topic, then boost and/or spotlight
results tagged this way.
Use a specialized collection to store ‘facet knowledge’
40. The Art of the Fugue:
Inferential Search
• Infer what the user is looking for and give them
that
• Clever software infers meaning aka query
“intent”
• When we do this right, it appears to be magic!
41. Machine Learning Drives
Query Introspection
Training Data
NLP Trainer
Stage
NLP Model
Test Data
NLP Classifier
Stage
Classified
Documents
43. Da Capo al Coda
• Killer search apps are crafted from fine
ingredients and like fine whiskey will get
better with age - if you are paying attention to
‘what’ your users are looking for.
• Putting the pieces together requires an
understanding of ‘what’ things, independent
of what words they use to describe it.
44. Thanks for your attention!
Ted Sullivan
Lucidworks, Technical Services
ted.sullivan@lucidworks.com
Skype: ted.sullivan5
LinkedIn
Metuchen, New Jersey
(You gotta problem with that?)
Editor's Notes
( * but the Google web app has some serious Mojo!)
( * but the Google web app has some serious Mojo!)
Suga “Plague” story here – on Spell Check
Sales tax is a type of tax, not a type of sale.
Cue – Utube Video here
Who are they? – Security – what can they see? Personalization – what do they like?