SlideShare a Scribd company logo
1 of 43
GammaWare Technology June 2002 Yiftach Ravid, VP R&D GammaSite Inc. [email_address]
Overview - The challenge - Taxonomies - Classification - Focused Crawler - Q&A - Categorization
The challenge: Generate Structured Taxonomies of text repositories ,[object Object],XML Word Domino Web Catalogues Forms Mail Structured Data Unstructured Data Internal DB Business, Relevant Content Information Application Services
Taxonomy
What is a Taxonomy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Web Taxonomy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Taxonomy - Sample
Taxonomy vs.  Thesaurus   Restricted to the necessary terms Mainly  browsing ,[object Object],[object Object],Documents  and their organization Taxonomy Size Retrieval Usage Focus Criteria sizes is  very large ( Terms may be added freely) Keyword  queries ,[object Object],[object Object],Terms  used in the organization Thesaurus
Classification
What is a Classifier ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Methods for Automatic Classification ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is Machine Learning ,[object Object]
Sample for Machine Learning DOGS CATS
Discriminating Features Q1 : Who is this person? Q2 : What are the most discriminating features?
Discriminating Features ,[object Object],[object Object],[object Object]
Discriminating Features The “Margaret Thatcher effect”
Supervised Inductive Learning ,[object Object],[object Object],[object Object],[object Object]
Supervised Inductive Learning Example Training Test errors correct
Evaluating a  Classifier Category Classifier
Recall and Precision Precision  (P)   =  GY / (GY + GN) = 70 / (70+50) = 0.58 F-measure  (F)   =  2/(1/P + 1/R) = 2*GY/(GY+GN+GY+BY) = 2*70/(100+120) = 0.63 Recall  (R)   =  GY / (GY + BY) = 70 / (70+30) = 0.70 Accuracy  (A)   = (GY+NN)/(GY+GN+BY+BN) = 220 / 300 = 0.73 Use a  confusion matrix  to count 300 180 120 Total Classified Bad Good True Label 200 100 150 30 50 70 Total No Yes
Supervised Statistical Machine Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to Classify documents ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Getting Started
GammaWare Work Flow Check Seed Improve Classifiers Requirements Design the Taxonomy Seeding Process Train Classifiers Catalogue Documents Ready
Requirements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Taxonomy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Seeding process ,[object Object],[object Object],[object Object],[object Object],[object Object]
Check Seed ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Train Classifiers ,[object Object],[object Object],[object Object],[object Object]
Classify Documents ,[object Object],[object Object],[object Object],[object Object]
Improve Classifiers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Categorization
Hierarchical Categorization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hierarchical Categorization ,[object Object],[object Object],[object Object],[object Object],[object Object]
Focused Crawler
Topic Specific Crawling ,[object Object],[object Object],[object Object],[object Object],[object Object]
Simple Crawling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Starting  Document
Focused Crawling via Link Classifiers My brother new  born child Herbal tea  specialist Link is irrelevant ,[object Object],[object Object],Link Classifier Link Classifier Retrieve the URL
Focused Crawler – The Learning Process ,[object Object],Herbal tea  specialist Link Classifier Retrieve the content of the link Send acknowledgment to the “link classifier” - Learning Process Crawler Classifier
GammaWare API
Architecture - Basic Relational Database Customer Client GammaWare API CORBA GammaWare Proxy File System Relational Database GammaWare Software Proxy Client ODBC CORBA GW File System Document Management Web File System Notes Domino Outlook
Multiple Servers ,[object Object],GammaWare Server 4 GammaWare Server 2 GammaWare Server 3 Database GammaWare Proxy GammaWare Server GammaWare Proxy Client Database
Q & A

More Related Content

What's hot

Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesDmitry Grapov
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisDmitry Grapov
 
ACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchGan Keng Hoon
 
CS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationCS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationPalani Kumar
 
BioSHaRE: Maelstrom Research tools for data harmonization and co-analysis - I...
BioSHaRE: Maelstrom Research tools for data harmonization and co-analysis - I...BioSHaRE: Maelstrom Research tools for data harmonization and co-analysis - I...
BioSHaRE: Maelstrom Research tools for data harmonization and co-analysis - I...Lisette Giepmans
 
Frank Harbers - Automatic genre classification of historical newspaper articles
Frank Harbers - Automatic genre classification of historical newspaper articles Frank Harbers - Automatic genre classification of historical newspaper articles
Frank Harbers - Automatic genre classification of historical newspaper articles KBNLResearch
 
Archetype Modeling Methodology
Archetype Modeling MethodologyArchetype Modeling Methodology
Archetype Modeling MethodologyDavid Moner Cano
 
Query formulation process
Query formulation processQuery formulation process
Query formulation processmalathimurugan
 
Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Sherpa Software
 

What's hot (11)

Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological Studies
 
Personalized classifiers
Personalized classifiersPersonalized classifiers
Personalized classifiers
 
Strategies for Metabolomics Data Analysis
Strategies for Metabolomics Data AnalysisStrategies for Metabolomics Data Analysis
Strategies for Metabolomics Data Analysis
 
ACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise Search
 
CS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_RecommendationCS8091_BDA_Unit_III_Content_Based_Recommendation
CS8091_BDA_Unit_III_Content_Based_Recommendation
 
BioSHaRE: Maelstrom Research tools for data harmonization and co-analysis - I...
BioSHaRE: Maelstrom Research tools for data harmonization and co-analysis - I...BioSHaRE: Maelstrom Research tools for data harmonization and co-analysis - I...
BioSHaRE: Maelstrom Research tools for data harmonization and co-analysis - I...
 
Frank Harbers - Automatic genre classification of historical newspaper articles
Frank Harbers - Automatic genre classification of historical newspaper articles Frank Harbers - Automatic genre classification of historical newspaper articles
Frank Harbers - Automatic genre classification of historical newspaper articles
 
Archetype Modeling Methodology
Archetype Modeling MethodologyArchetype Modeling Methodology
Archetype Modeling Methodology
 
Query formulation process
Query formulation processQuery formulation process
Query formulation process
 
Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?Email Classification - Why Should it Matter to You?
Email Classification - Why Should it Matter to You?
 

Viewers also liked

Language and Intelligence
Language and IntelligenceLanguage and Intelligence
Language and Intelligencebutest
 
Monte -- machine learning in Python
Monte -- machine learning in PythonMonte -- machine learning in Python
Monte -- machine learning in Pythonbutest
 
Open06
Open06Open06
Open06butest
 
WEB PAGE DESIGN
WEB PAGE DESIGNWEB PAGE DESIGN
WEB PAGE DESIGNbutest
 
Notes.doc.doc
Notes.doc.docNotes.doc.doc
Notes.doc.docbutest
 
web design course description.doc
web design course description.docweb design course description.doc
web design course description.docbutest
 
Machine Learning, LIX004M5
Machine Learning, LIX004M5Machine Learning, LIX004M5
Machine Learning, LIX004M5butest
 
final.doc
final.docfinal.doc
final.docbutest
 

Viewers also liked (9)

Language and Intelligence
Language and IntelligenceLanguage and Intelligence
Language and Intelligence
 
Monte -- machine learning in Python
Monte -- machine learning in PythonMonte -- machine learning in Python
Monte -- machine learning in Python
 
Open06
Open06Open06
Open06
 
WEB PAGE DESIGN
WEB PAGE DESIGNWEB PAGE DESIGN
WEB PAGE DESIGN
 
Notes.doc.doc
Notes.doc.docNotes.doc.doc
Notes.doc.doc
 
web design course description.doc
web design course description.docweb design course description.doc
web design course description.doc
 
2005)
2005)2005)
2005)
 
Machine Learning, LIX004M5
Machine Learning, LIX004M5Machine Learning, LIX004M5
Machine Learning, LIX004M5
 
final.doc
final.docfinal.doc
final.doc
 

Similar to Catégorisation automatisée de contenus documentaires : la ...

Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...butest
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.pptHODECE21
 
FaceTag: Integrating Bottom-up and Top-down Classification in a Social Taggin...
FaceTag: Integrating Bottom-up and Top-down Classification in a Social Taggin...FaceTag: Integrating Bottom-up and Top-down Classification in a Social Taggin...
FaceTag: Integrating Bottom-up and Top-down Classification in a Social Taggin...Andrea Resmini
 
Successful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Designsarakirsten
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Ravi Mynampaty
 
Classification, Tagging & Search
Classification, Tagging & SearchClassification, Tagging & Search
Classification, Tagging & SearchJames Melzer
 
FaceTag - IASummit 2007
FaceTag - IASummit 2007FaceTag - IASummit 2007
FaceTag - IASummit 2007Andrea Resmini
 
2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started SiglerSonya Sigler
 
Modeling & managing metadata for greater productivity
Modeling & managing metadata for greater productivityModeling & managing metadata for greater productivity
Modeling & managing metadata for greater productivityJean Graef
 
Marlabs - Navigation vs Search Final
Marlabs - Navigation vs Search FinalMarlabs - Navigation vs Search Final
Marlabs - Navigation vs Search FinalMarlabs
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic SearchPaul Wlodarczyk
 

Similar to Catégorisation automatisée de contenus documentaires : la ... (20)

Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...
 
Hybrid Approaches to Taxonomy & Folksonmy
Hybrid Approaches to Taxonomy & FolksonmyHybrid Approaches to Taxonomy & Folksonmy
Hybrid Approaches to Taxonomy & Folksonmy
 
Aiim motorola-taxo-integration-03-15-10-cg
Aiim motorola-taxo-integration-03-15-10-cgAiim motorola-taxo-integration-03-15-10-cg
Aiim motorola-taxo-integration-03-15-10-cg
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.ppt
 
FaceTag: Integrating Bottom-up and Top-down Classification in a Social Taggin...
FaceTag: Integrating Bottom-up and Top-down Classification in a Social Taggin...FaceTag: Integrating Bottom-up and Top-down Classification in a Social Taggin...
FaceTag: Integrating Bottom-up and Top-down Classification in a Social Taggin...
 
Successful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Design
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013
 
Classification, Tagging & Search
Classification, Tagging & SearchClassification, Tagging & Search
Classification, Tagging & Search
 
FaceTag - IASummit 2007
FaceTag - IASummit 2007FaceTag - IASummit 2007
FaceTag - IASummit 2007
 
FaceTag at IASummit 2007
FaceTag at IASummit 2007FaceTag at IASummit 2007
FaceTag at IASummit 2007
 
AI-SDV 2020: Kairntech
AI-SDV 2020: KairntechAI-SDV 2020: Kairntech
AI-SDV 2020: Kairntech
 
2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler
 
Modeling & managing metadata for greater productivity
Modeling & managing metadata for greater productivityModeling & managing metadata for greater productivity
Modeling & managing metadata for greater productivity
 
Pega overview
Pega overviewPega overview
Pega overview
 
Pega | pega Bpm Training
Pega | pega Bpm TrainingPega | pega Bpm Training
Pega | pega Bpm Training
 
What is rules in pega
What is rules in pegaWhat is rules in pega
What is rules in pega
 
Marlabs - Navigation vs Search Final
Marlabs - Navigation vs Search FinalMarlabs - Navigation vs Search Final
Marlabs - Navigation vs Search Final
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
Data Mining
Data MiningData Mining
Data Mining
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Catégorisation automatisée de contenus documentaires : la ...

Editor's Notes

  1. If this presentation is independent