SlideShare a Scribd company logo
1 of 50
Taxonomies in SearchAn SLA Webinar Aug 10, 1:00pm-2:00pm EST Marjorie Hlava, President mhlava@accessinn.com Access Innovations, Inc.  www.accessinn.com Leveraging your content semantically
Agenda How search works Measuring accuracy in search Precision Recall Relevance Search theoretical basis Bayes, Boole and the rest of the guys The taxonomy effect
How does search work? Many parts Search software – of course Computer network Parsing of text Well formed or structured text CLEAN DATA Computer software – network Computer hardware Telecommunications connection Training sets for statistical systems
Technical parts of search Search technology Ranking algorithms Query language Federators Cache Inverted index Other enhancements Presentation Layer
My Main Frustration Select hardware Select software Design system Try to load the data Add the taxonomy That’s BACKWARDS
Data First! What are you building the system for? Assess the data Do the design Decide what else needs to be added Taxonomy terms Other controls Find a system that will work with your data
Access Innovations – Complex FarmWith Perfect Search Query Federators Query Servers Search Harmony Presentation  Layer Deploy Hub Index  Builders Cleanup, etc. Repository XIS (cache) Cache  Builders Source Data
CUSTOM CONNECTOR EMAIL CONNECTOR DATABASE CONNECTOR FILE TRAVERSER WEB CRAWLER MANAGEMENT API QUERY  API CONTENT API Data Harmony Governance API SEARCH SERVER FILTERSERVER FAST Search example Core Architectural Components Administrator’s Dashboard Web Content Vertical Applications Pipeline Query Pipeline Files, Documents QUERY PROCESSOR Portals Index DB Databases DOCUMENT PROCESSOR Results Custom Front-Ends Alerts Email,  Groupware Search harmony Mobile Devices Custom Applications Content Push MAIstro Agent DB
Measuring accuracy in search Relevance Recall Precision Accuracy – Hits, miss, noise Ranking Linguistics Query Processing Results Processing Display Search refinement Usability Business Rules 9
Relevance How well a set of returned documents answers the information need “Accuracy” Related to objective of search Different user communities Information resources Tension of user needs and context available A confidence “guessimate” 10
The formulas Recall = Number of relevant items retrieved         Number of relevant items in the collection Precision = Number of relevant items retrieved            Number of items retrieved Relevance = Germane (Precision)                      Pertinent (Recall)
Measuring Relevance Concepts  Context Age of documents  Completeness (recall)  Quality Statistically determined ? Nope, it is subjective  Someone has to determine the rightness of the item A confidence factor = canard!
Kinds of search Bayesian –  FAST Lucene Autonomy / Verity Boolean Dialog Endeca Perfect Search Ranking algorithms Google 13
Search Theoretical BasisThose Famous Guys Boole Bayes Bayesian Techniques Turney Turney algorithm Enriched structured data Marco Dorigo Ant Colony This is only a sample   of a large body of research
George Boole and Boolean algebra George Boole Mathematician 1815-1864 Boolean algebra An algebraic system of logic  AND, OR, NOT, ANDNOT,  Dialog, BRS, Stairs 15
Boolean representation Venn diagram showing the intersection of sets A AND B (in violet),  The union of sets A OR B (all the colored regions),  And set A XOR B (all the colored regions except the violet).  The "universe" is represented by the rectangular frame. 16
Bayes and Bayes’ Theorem Thomas Bayes Mathematician 1702 - 1761 Bayesian theorem  Uses probability inductively  Established a mathematical basis for probability inference  WHAT? A means of calculating,  from the number of times an event has not occurred,  the probability that it will occur in future trials 17
Bayesian methods - Cautions A user might wish to change the distribution of probabilities.  A user will make a novel request for information in a previously unanticipated way. The computational difficulty of exploring a previously unknown network.  The quality and extent of the prior beliefs used in Bayesian inference processing.
Bayesian cautions (cont.) A Bayesian network is only as useful as the prior knowledge is reliable.  An optimistic or pessimistic expectation of the quality of these prior beliefs will distort the entire network and invalidate the results.  Must ensure the selection of the statistical distribution induced in modeling the data.  Must have the proper distribution model to describe the data. That is you have to constantly train and retrain the data
Peter Turney and the Turney Algorithm Peter D. Turney, Canada, present Learning algorithms for keyphraseextraction Tree Induction Algorithm Lexical Semantics GenEx – with human input 80% acceptable Extraction vs. generation and sentiment of words          (hits(word AND "excellent") hits (poor))log2 ----------------------------------------         (hits(word AND "poor") hits (excellent))
Marco Dorigo and Ant Colony Optimization Marco Dorigo Research director for the Belgian Fonds de la RechercheScientifique Research director of the IRIDIA lab at the UniversitéLibre de Bruxelles Ant Colony Optimization  metaheuristicfor combinatorial optimization problems Swarm intelligence Value importance vs. heuristic importance Useful in search prediction 21
Natural Language Processing Syntactic Semantic Morphological Phraseological Lemmatization (stemming) Statistical Grammatical Common Sense
Basic areas of Automatic Language Processing (ALP) Auto Translation Auto Indexing Auto Abstracting Artificial Intelligence Searching Spell Checking Semantic Web Natural Language Processes (NLP) Computational Linguistics
Statistical Search  Cluster analysis Neural networks Co-occurrence Bayesian inference Latent Semantic  Etc. 24
Inverted Files and Boolean  are basic to all search  Searchable Index Inverted File Index Taxonomy Thesaurus Hierarchical Display
Sample Slide for Inverted File Index Demonstration Outline of Presentation ,[object Object]
Thesaurus tools Features Functions ,[object Object],Thesaurus construction Thesaurus tools ,[object Object],[object Object]
Complex Inverted File Index Example 1 key - L2, P2, H of - Stop outline - L1, P1, T presentation - L1, P3, T terminology - L2, P3, H thesaurus - (1) - L3, P1, H     (2) - L7, P1, SH     (3) - L8, P1, SH tools - (1) - L3, P2, H      (2) - L8, P2, SH when - L9, P3, H why - L9, P1, H & - Stop 1 - Stop 2 - Stop 3 - Stop 4 - Stop construction - L7, P2, SH  costs - L6, P1, H define - L2, P1, H features - L4, P1, SH functions - L5, P1, SH
Word and Term Parsing Stemming -ing, -ed, -es, -’s, -s’, etc.  Depluralization Truncation Left and right Wild cards Organi*ation Variant Spellings Centre, center Hyphens
The taxonomy effect Where do the terms go? How are they used in search What other ways can I use the taxonomy in search?
Site search Search of 53 crawled sites including journals, books, web site,  conference sites, etc. Navigation  Bookstore search  Search database for Journals and pubs For search all publications
Navigate the full taxonomy “tree” BROWSE Auto-completion using the taxonomy Guide the user Taxonomy Driven Search Presentation
A quick look behind the scenes Database Management System ,[object Object]
Validate term entry
Block invalid terms
Record candidates
Establish rules for 	term use ,[object Object],	terms Thesaurus tool Indexing tool ,[object Object]
Add terms and rules
Change terms and rules
Delete terms and rules,[object Object]
Where does the subject metadata go? Apply to content itself Use meta name field in HTML header Connect search to the keywords in the SQL or other database tables
HTML Header
RDBMS Connection Taxonomy term table
Suggested taxonomy descriptors
Integrate taxonomy to enhance findability Browsable categories of a directory Browsable faceted navigation Smart search for term equivalents Taxonomy terms (original or modified) as labels Navigation aids incorporate taxonomy terms and relationships
More Taxonomy Enrichment Spelling alternatives and correction Related concepts Statistical information about the metadata Navigation or drill downs Search refinement Recursive sets Concept linking Dictionary lookup (in taxonomy glossary)
Brand is repeated in several spots and tied to search as well
Raw Full text data feeds  Data Base Plus Search Workflow  XIS Creation SQL for ecommerce Printed source  materials Add metadata Data Crawls on 53+ sources XIS repository  Taxonomy terms  Load to Perfect Search MAI Concept Extractor Taxonomy Thesaurus Master MAI Rule Base Search Harmony Display  Search   Save data to search and repositories at the same time
Raw Full text data feeds  Data Base Plus Search Workflow  XIS Creation SQL for ecommerce Printed source  materials XIS repository  Data Crawls on data sources Add metadata Load to Search MAI Concept Extractor MAI Rule Base Search Harmony Display  Search   Taxonomy Thesaurus Master Source data Taxonomy terms  Search data Clean and enhance data

More Related Content

What's hot

Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015
Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015
Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015Gina Montgomery, V-TSP
 
Information architecture search_bettertogether
Information architecture search_bettertogetherInformation architecture search_bettertogether
Information architecture search_bettertogetherAgnes Molnar
 
SPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doSPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doChristian Buckley
 
How your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doHow your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doChristian Buckley
 
Enterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTEnterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTBert Johnson
 
Introduction To Enterprise Search - OKCSUG 2010
Introduction To Enterprise Search - OKCSUG 2010Introduction To Enterprise Search - OKCSUG 2010
Introduction To Enterprise Search - OKCSUG 2010Corey Roth
 
Metadata management in SharePoint
Metadata management in SharePointMetadata management in SharePoint
Metadata management in SharePointMetataxis
 
OnePlaceMail 6.6 for Outlook and SharePoint Highlights
OnePlaceMail 6.6 for Outlook and SharePoint HighlightsOnePlaceMail 6.6 for Outlook and SharePoint Highlights
OnePlaceMail 6.6 for Outlook and SharePoint HighlightsDavid J Rosenthal
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”VOGIN-academie
 
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015Gina Montgomery, V-TSP
 
3 25 11 Term Store Best Practices
3 25 11 Term Store Best Practices3 25 11 Term Store Best Practices
3 25 11 Term Store Best Practicespuckmiller3
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic SearchPaul Wlodarczyk
 
BPC10 BuckleyMigration-share
BPC10 BuckleyMigration-shareBPC10 BuckleyMigration-share
BPC10 BuckleyMigration-shareChristian Buckley
 
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureAccess Innovations, Inc.
 
Advanced Taxonomy for Content Strategists
Advanced Taxonomy for Content StrategistsAdvanced Taxonomy for Content Strategists
Advanced Taxonomy for Content StrategistsDawn Bovasso
 
SharePoint 2010 Managed Metadata
SharePoint 2010 Managed MetadataSharePoint 2010 Managed Metadata
SharePoint 2010 Managed MetadataNick Hobbs
 

What's hot (20)

Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015
Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015
Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015
 
Information architecture search_bettertogether
Information architecture search_bettertogetherInformation architecture search_bettertogether
Information architecture search_bettertogether
 
SPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doSPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you do
 
KMA Taxonomy TBC2010
KMA Taxonomy TBC2010KMA Taxonomy TBC2010
KMA Taxonomy TBC2010
 
How your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doHow your metadata strategy impacts everything you do
How your metadata strategy impacts everything you do
 
Enterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTEnterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FAST
 
Introduction To Enterprise Search - OKCSUG 2010
Introduction To Enterprise Search - OKCSUG 2010Introduction To Enterprise Search - OKCSUG 2010
Introduction To Enterprise Search - OKCSUG 2010
 
Metadata management in SharePoint
Metadata management in SharePointMetadata management in SharePoint
Metadata management in SharePoint
 
OnePlaceMail 6.6 for Outlook and SharePoint Highlights
OnePlaceMail 6.6 for Outlook and SharePoint HighlightsOnePlaceMail 6.6 for Outlook and SharePoint Highlights
OnePlaceMail 6.6 for Outlook and SharePoint Highlights
 
Share point metadata
Share point metadataShare point metadata
Share point metadata
 
Managed metadata in SharePoint 2010
Managed metadata in SharePoint 2010Managed metadata in SharePoint 2010
Managed metadata in SharePoint 2010
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
 
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
 
3 25 11 Term Store Best Practices
3 25 11 Term Store Best Practices3 25 11 Term Store Best Practices
3 25 11 Term Store Best Practices
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 
BPC10 BuckleyMigration-share
BPC10 BuckleyMigration-shareBPC10 BuckleyMigration-share
BPC10 BuckleyMigration-share
 
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information Architecture
 
Tools for Taxonomies
Tools for TaxonomiesTools for Taxonomies
Tools for Taxonomies
 
Advanced Taxonomy for Content Strategists
Advanced Taxonomy for Content StrategistsAdvanced Taxonomy for Content Strategists
Advanced Taxonomy for Content Strategists
 
SharePoint 2010 Managed Metadata
SharePoint 2010 Managed MetadataSharePoint 2010 Managed Metadata
SharePoint 2010 Managed Metadata
 

Viewers also liked

ecdl_windows_8_office_2013_biblia_minta
ecdl_windows_8_office_2013_biblia_mintaecdl_windows_8_office_2013_biblia_minta
ecdl_windows_8_office_2013_biblia_mintaKrist P
 
Yousef Aburub - Resume - Final(1)
Yousef Aburub - Resume - Final(1)Yousef Aburub - Resume - Final(1)
Yousef Aburub - Resume - Final(1)Yousef Aburub
 
Trend 2025 tanyer sonmezer finansbank
Trend 2025 tanyer sonmezer finansbankTrend 2025 tanyer sonmezer finansbank
Trend 2025 tanyer sonmezer finansbankTanyer Sonmezer
 
Instructor powerpoint
Instructor powerpointInstructor powerpoint
Instructor powerpointtanglin
 
Tvigle & Media3 - NOAH13 London
Tvigle & Media3 - NOAH13 LondonTvigle & Media3 - NOAH13 London
Tvigle & Media3 - NOAH13 LondonNOAH Advisors
 
Wonder Woman Feminist Icon 1031 (3)
Wonder Woman Feminist Icon 1031 (3)Wonder Woman Feminist Icon 1031 (3)
Wonder Woman Feminist Icon 1031 (3)Simmons Jessie
 
It’s A Trap! Don't fly into budget airline traps!
It’s A Trap! Don't fly into budget airline traps!It’s A Trap! Don't fly into budget airline traps!
It’s A Trap! Don't fly into budget airline traps!Joyce Lim
 
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)The Linux Foundation
 
الفجوة الرقمية
الفجوة الرقميةالفجوة الرقمية
الفجوة الرقميةimi zeghmati
 
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...The Linux Foundation
 
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...Clotilde Chenevoy
 
XPDS16: CPUID handling for guests - Andrew Cooper, Citrix
XPDS16:  CPUID handling for guests - Andrew Cooper, CitrixXPDS16:  CPUID handling for guests - Andrew Cooper, Citrix
XPDS16: CPUID handling for guests - Andrew Cooper, CitrixThe Linux Foundation
 
AWS Lambda from the Trenches
AWS Lambda from the TrenchesAWS Lambda from the Trenches
AWS Lambda from the TrenchesYan Cui
 
plan educacion_artistica
plan educacion_artisticaplan educacion_artistica
plan educacion_artisticaGermán oña
 

Viewers also liked (18)

Owuor paradigm
Owuor paradigmOwuor paradigm
Owuor paradigm
 
ecdl_windows_8_office_2013_biblia_minta
ecdl_windows_8_office_2013_biblia_mintaecdl_windows_8_office_2013_biblia_minta
ecdl_windows_8_office_2013_biblia_minta
 
Yousef Aburub - Resume - Final(1)
Yousef Aburub - Resume - Final(1)Yousef Aburub - Resume - Final(1)
Yousef Aburub - Resume - Final(1)
 
Trend 2025 tanyer sonmezer finansbank
Trend 2025 tanyer sonmezer finansbankTrend 2025 tanyer sonmezer finansbank
Trend 2025 tanyer sonmezer finansbank
 
Instructor powerpoint
Instructor powerpointInstructor powerpoint
Instructor powerpoint
 
Tvigle & Media3 - NOAH13 London
Tvigle & Media3 - NOAH13 LondonTvigle & Media3 - NOAH13 London
Tvigle & Media3 - NOAH13 London
 
Wonder Woman Feminist Icon 1031 (3)
Wonder Woman Feminist Icon 1031 (3)Wonder Woman Feminist Icon 1031 (3)
Wonder Woman Feminist Icon 1031 (3)
 
Conceptboek Grill & Chill
Conceptboek Grill & ChillConceptboek Grill & Chill
Conceptboek Grill & Chill
 
5 Signs You Are In A Waterfall Agile Transformation
5 Signs You Are In A Waterfall Agile Transformation5 Signs You Are In A Waterfall Agile Transformation
5 Signs You Are In A Waterfall Agile Transformation
 
It’s A Trap! Don't fly into budget airline traps!
It’s A Trap! Don't fly into budget airline traps!It’s A Trap! Don't fly into budget airline traps!
It’s A Trap! Don't fly into budget airline traps!
 
CCNAS Ch01
CCNAS Ch01 CCNAS Ch01
CCNAS Ch01
 
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)
 
الفجوة الرقمية
الفجوة الرقميةالفجوة الرقمية
الفجوة الرقمية
 
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...
 
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...
 
XPDS16: CPUID handling for guests - Andrew Cooper, Citrix
XPDS16:  CPUID handling for guests - Andrew Cooper, CitrixXPDS16:  CPUID handling for guests - Andrew Cooper, Citrix
XPDS16: CPUID handling for guests - Andrew Cooper, Citrix
 
AWS Lambda from the Trenches
AWS Lambda from the TrenchesAWS Lambda from the Trenches
AWS Lambda from the Trenches
 
plan educacion_artistica
plan educacion_artisticaplan educacion_artistica
plan educacion_artistica
 

Similar to Taxonomies in Search: Leveraging Content Semantically

Public PhD Defense - Ben De Meester
Public PhD Defense - Ben De MeesterPublic PhD Defense - Ben De Meester
Public PhD Defense - Ben De MeesterBen De Meester
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
Keyword searching idc
Keyword searching idcKeyword searching idc
Keyword searching idcSuchittaU
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Alison Hitchens
 
Business Research Methods. search strategies for online databases
Business Research Methods. search strategies for online databasesBusiness Research Methods. search strategies for online databases
Business Research Methods. search strategies for online databasesAhsan Khan Eco (Superior College)
 
E-LEARN: Search Strategies
E-LEARN: Search StrategiesE-LEARN: Search Strategies
E-LEARN: Search StrategiesRose Petralia
 
Tracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracingNetworks
 
The Internet
The InternetThe Internet
The Internetmscuttle
 
Question Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning IssuesQuestion Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning IssuesMichael Petychakis
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabadGeohedrick
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inKumari Naveen
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.pptHaHa501620
 
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...Enrico Santus Aversano
 
Faceted search using Solr and Ontopia
Faceted search using Solr and OntopiaFaceted search using Solr and Ontopia
Faceted search using Solr and OntopiaGeir Ove Grønmo
 

Similar to Taxonomies in Search: Leveraging Content Semantically (20)

Public PhD Defense - Ben De Meester
Public PhD Defense - Ben De MeesterPublic PhD Defense - Ben De Meester
Public PhD Defense - Ben De Meester
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
Keyword searching idc
Keyword searching idcKeyword searching idc
Keyword searching idc
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
 
Business Research Methods. search strategies for online databases
Business Research Methods. search strategies for online databasesBusiness Research Methods. search strategies for online databases
Business Research Methods. search strategies for online databases
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
E-LEARN: Search Strategies
E-LEARN: Search StrategiesE-LEARN: Search Strategies
E-LEARN: Search Strategies
 
Tracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a Nutshell
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 
The Internet
The InternetThe Internet
The Internet
 
Question Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning IssuesQuestion Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning Issues
 
2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful in
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.ppt
 
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
 
Faceted search using Solr and Ontopia
Faceted search using Solr and OntopiaFaceted search using Solr and Ontopia
Faceted search using Solr and Ontopia
 

More from TSoholt

2011 Taxonomy Standards Update
2011 Taxonomy Standards Update2011 Taxonomy Standards Update
2011 Taxonomy Standards UpdateTSoholt
 
Dealing the Cards
Dealing the CardsDealing the Cards
Dealing the CardsTSoholt
 
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...TSoholt
 
Using KOS as a Basis for Text Analytics and Trend Forecasting
Using KOS as a Basis for Text Analytics and Trend ForecastingUsing KOS as a Basis for Text Analytics and Trend Forecasting
Using KOS as a Basis for Text Analytics and Trend ForecastingTSoholt
 
Solving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksSolving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksTSoholt
 
Taxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User ExperienceTaxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User ExperienceTSoholt
 

More from TSoholt (6)

2011 Taxonomy Standards Update
2011 Taxonomy Standards Update2011 Taxonomy Standards Update
2011 Taxonomy Standards Update
 
Dealing the Cards
Dealing the CardsDealing the Cards
Dealing the Cards
 
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
 
Using KOS as a Basis for Text Analytics and Trend Forecasting
Using KOS as a Basis for Text Analytics and Trend ForecastingUsing KOS as a Basis for Text Analytics and Trend Forecasting
Using KOS as a Basis for Text Analytics and Trend Forecasting
 
Solving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksSolving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author Networks
 
Taxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User ExperienceTaxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User Experience
 

Recently uploaded

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Taxonomies in Search: Leveraging Content Semantically

  • 1. Taxonomies in SearchAn SLA Webinar Aug 10, 1:00pm-2:00pm EST Marjorie Hlava, President mhlava@accessinn.com Access Innovations, Inc. www.accessinn.com Leveraging your content semantically
  • 2. Agenda How search works Measuring accuracy in search Precision Recall Relevance Search theoretical basis Bayes, Boole and the rest of the guys The taxonomy effect
  • 3. How does search work? Many parts Search software – of course Computer network Parsing of text Well formed or structured text CLEAN DATA Computer software – network Computer hardware Telecommunications connection Training sets for statistical systems
  • 4. Technical parts of search Search technology Ranking algorithms Query language Federators Cache Inverted index Other enhancements Presentation Layer
  • 5. My Main Frustration Select hardware Select software Design system Try to load the data Add the taxonomy That’s BACKWARDS
  • 6. Data First! What are you building the system for? Assess the data Do the design Decide what else needs to be added Taxonomy terms Other controls Find a system that will work with your data
  • 7. Access Innovations – Complex FarmWith Perfect Search Query Federators Query Servers Search Harmony Presentation Layer Deploy Hub Index Builders Cleanup, etc. Repository XIS (cache) Cache Builders Source Data
  • 8. CUSTOM CONNECTOR EMAIL CONNECTOR DATABASE CONNECTOR FILE TRAVERSER WEB CRAWLER MANAGEMENT API QUERY API CONTENT API Data Harmony Governance API SEARCH SERVER FILTERSERVER FAST Search example Core Architectural Components Administrator’s Dashboard Web Content Vertical Applications Pipeline Query Pipeline Files, Documents QUERY PROCESSOR Portals Index DB Databases DOCUMENT PROCESSOR Results Custom Front-Ends Alerts Email, Groupware Search harmony Mobile Devices Custom Applications Content Push MAIstro Agent DB
  • 9. Measuring accuracy in search Relevance Recall Precision Accuracy – Hits, miss, noise Ranking Linguistics Query Processing Results Processing Display Search refinement Usability Business Rules 9
  • 10. Relevance How well a set of returned documents answers the information need “Accuracy” Related to objective of search Different user communities Information resources Tension of user needs and context available A confidence “guessimate” 10
  • 11. The formulas Recall = Number of relevant items retrieved Number of relevant items in the collection Precision = Number of relevant items retrieved Number of items retrieved Relevance = Germane (Precision) Pertinent (Recall)
  • 12. Measuring Relevance Concepts Context Age of documents Completeness (recall) Quality Statistically determined ? Nope, it is subjective Someone has to determine the rightness of the item A confidence factor = canard!
  • 13. Kinds of search Bayesian – FAST Lucene Autonomy / Verity Boolean Dialog Endeca Perfect Search Ranking algorithms Google 13
  • 14. Search Theoretical BasisThose Famous Guys Boole Bayes Bayesian Techniques Turney Turney algorithm Enriched structured data Marco Dorigo Ant Colony This is only a sample of a large body of research
  • 15. George Boole and Boolean algebra George Boole Mathematician 1815-1864 Boolean algebra An algebraic system of logic AND, OR, NOT, ANDNOT, Dialog, BRS, Stairs 15
  • 16. Boolean representation Venn diagram showing the intersection of sets A AND B (in violet), The union of sets A OR B (all the colored regions), And set A XOR B (all the colored regions except the violet). The "universe" is represented by the rectangular frame. 16
  • 17. Bayes and Bayes’ Theorem Thomas Bayes Mathematician 1702 - 1761 Bayesian theorem Uses probability inductively Established a mathematical basis for probability inference WHAT? A means of calculating, from the number of times an event has not occurred, the probability that it will occur in future trials 17
  • 18. Bayesian methods - Cautions A user might wish to change the distribution of probabilities. A user will make a novel request for information in a previously unanticipated way. The computational difficulty of exploring a previously unknown network. The quality and extent of the prior beliefs used in Bayesian inference processing.
  • 19. Bayesian cautions (cont.) A Bayesian network is only as useful as the prior knowledge is reliable. An optimistic or pessimistic expectation of the quality of these prior beliefs will distort the entire network and invalidate the results. Must ensure the selection of the statistical distribution induced in modeling the data. Must have the proper distribution model to describe the data. That is you have to constantly train and retrain the data
  • 20. Peter Turney and the Turney Algorithm Peter D. Turney, Canada, present Learning algorithms for keyphraseextraction Tree Induction Algorithm Lexical Semantics GenEx – with human input 80% acceptable Extraction vs. generation and sentiment of words          (hits(word AND "excellent") hits (poor))log2 ----------------------------------------         (hits(word AND "poor") hits (excellent))
  • 21. Marco Dorigo and Ant Colony Optimization Marco Dorigo Research director for the Belgian Fonds de la RechercheScientifique Research director of the IRIDIA lab at the UniversitéLibre de Bruxelles Ant Colony Optimization metaheuristicfor combinatorial optimization problems Swarm intelligence Value importance vs. heuristic importance Useful in search prediction 21
  • 22. Natural Language Processing Syntactic Semantic Morphological Phraseological Lemmatization (stemming) Statistical Grammatical Common Sense
  • 23. Basic areas of Automatic Language Processing (ALP) Auto Translation Auto Indexing Auto Abstracting Artificial Intelligence Searching Spell Checking Semantic Web Natural Language Processes (NLP) Computational Linguistics
  • 24. Statistical Search Cluster analysis Neural networks Co-occurrence Bayesian inference Latent Semantic Etc. 24
  • 25. Inverted Files and Boolean are basic to all search Searchable Index Inverted File Index Taxonomy Thesaurus Hierarchical Display
  • 26.
  • 27.
  • 28. Complex Inverted File Index Example 1 key - L2, P2, H of - Stop outline - L1, P1, T presentation - L1, P3, T terminology - L2, P3, H thesaurus - (1) - L3, P1, H (2) - L7, P1, SH (3) - L8, P1, SH tools - (1) - L3, P2, H (2) - L8, P2, SH when - L9, P3, H why - L9, P1, H & - Stop 1 - Stop 2 - Stop 3 - Stop 4 - Stop construction - L7, P2, SH costs - L6, P1, H define - L2, P1, H features - L4, P1, SH functions - L5, P1, SH
  • 29. Word and Term Parsing Stemming -ing, -ed, -es, -’s, -s’, etc. Depluralization Truncation Left and right Wild cards Organi*ation Variant Spellings Centre, center Hyphens
  • 30. The taxonomy effect Where do the terms go? How are they used in search What other ways can I use the taxonomy in search?
  • 31. Site search Search of 53 crawled sites including journals, books, web site, conference sites, etc. Navigation Bookstore search Search database for Journals and pubs For search all publications
  • 32. Navigate the full taxonomy “tree” BROWSE Auto-completion using the taxonomy Guide the user Taxonomy Driven Search Presentation
  • 33.
  • 37.
  • 40.
  • 41. Where does the subject metadata go? Apply to content itself Use meta name field in HTML header Connect search to the keywords in the SQL or other database tables
  • 45.
  • 46. Integrate taxonomy to enhance findability Browsable categories of a directory Browsable faceted navigation Smart search for term equivalents Taxonomy terms (original or modified) as labels Navigation aids incorporate taxonomy terms and relationships
  • 47. More Taxonomy Enrichment Spelling alternatives and correction Related concepts Statistical information about the metadata Navigation or drill downs Search refinement Recursive sets Concept linking Dictionary lookup (in taxonomy glossary)
  • 48. Brand is repeated in several spots and tied to search as well
  • 49. Raw Full text data feeds Data Base Plus Search Workflow XIS Creation SQL for ecommerce Printed source materials Add metadata Data Crawls on 53+ sources XIS repository Taxonomy terms Load to Perfect Search MAI Concept Extractor Taxonomy Thesaurus Master MAI Rule Base Search Harmony Display Search Save data to search and repositories at the same time
  • 50. Raw Full text data feeds Data Base Plus Search Workflow XIS Creation SQL for ecommerce Printed source materials XIS repository Data Crawls on data sources Add metadata Load to Search MAI Concept Extractor MAI Rule Base Search Harmony Display Search Taxonomy Thesaurus Master Source data Taxonomy terms Search data Clean and enhance data
  • 51. Client Data Full Text HTML, PDF, Data Feeds, etc. Taxonomy In Sharepoint Automatic Summarization Search Presentation:90% accuracy Browse by Subject Auto-completion Broader Terms Narrower Terms Related Terms Machine Aided Indexer (M.A.I.™) Repository Search Software Inline Tagging Client taxonomy Client Taxonomy Metadata and Entity Extractor Thesaurus Master
  • 52. What we covered How search works Measuring accuracy in search Search theoretical basis Bayes, Boole and the rest of the guys The taxonomy effect
  • 53. Do the data FIRST What do you have? What does it need? How would you LIKE to access it? Look at the data BEFORE you create the specifications DTD built without data is not going to work Then choose the system that will support your data
  • 54. Next Month Same time, same station Solving the Challenge of Connecting People and Author NetworksJay Ven Eman, Ph.D.September 14As online digital publishing continues to grow, taxonomies can be increasingly useful in connecting people with author networks through directory creation with author disambiguation and subject metadata tagging to increase the usefulness of information for researchers and community-building.
  • 55.
  • 56. Headquartered in Albuquerque
  • 58.