SlideShare a Scribd company logo
1 of 17
Download to read offline
[ RMLL 2013, Bruxelles – Thursday 11th
July 2013 ]
Presentation of OpenNLP
Presenter : Dr Ir Robert Viseur
2
What is OpenNLP ?
• Toolkit for the processing of natural language text.
• Project of the Apache Foundation.
• Developped in Java.
• Under Apache License, Version 2.
• Download and documentation:
http://opennlp.apache.org/.
3
What are the features ?
• For common NLP tasks :
• tokenization,
• sentence segmentation,
• part-of-speech tagging,
• named entity extraction,
• chuncking.
4
What is the part-of-speech tagging ?
• Example :
• See more:
http://opennlp.apache.org/documentation/1.5.3
/manual/opennlp.html.
5
What is the named entity
extraction ?
• Example :
• See more:
http://opennlp.apache.org/documentation/1.5.3
/manual/opennlp.html.
6
How does it work ? (1/2)
• The features are associated to pre-trained models.
• Each pre-trained model is created for one language
and for one type of use.
• Supported languages: da, de, en, es, nl, pt, se.
• Warnings :
– The functional coverage varies with languages.
– The french language is not supported !
• See http://opennlp.sourceforge.net/models-
1.5/.
• Use in command line or as a Java library.
• Warning : loading time of models with CLI.
7
How does it work ? (2/2)
• Example (English vs Spanish languages) :
8
What are the criteria of choice ?
• Support of the product.
• License.
• Available languages.
• Precision / Recall.
• Speed of text processing.
9
Are there free (as freedom)
alternative tools ?
• Other light tools :
• Stanford Log-linear Part-Of-Speech Tagger (POST),
• Stanford Named Entity Recognizer (NER),
• TagEN,
• Java Automatic Term Extraction toolkit.
• Frameworks :
• In Java : UIMA (Java), GATE (Java).
• In other languages : NLTK (Python).
10
Example:
tag cloud creation (1/6)
• Starting point: website.
• Example: www.adacore.com.
• What we want (from website content):
• common tag cloud,
• circular tag cloud.
• Main steps : crawl, cleaning of HTML documents,
named entities (person) and terminology
extractions (+ merge) and display (tag cloud).
11
Example:
tag cloud creation (2/6)
• Cleaning:
• Remove the HTML tags and keep only the useful
content.
• Warnings:
• NLP tools are sensitive to noise in raw data.
• Pay attention to the language of the document.
• Use of HTML boilerplate tool (HTML -> TXT).
• Tool: Boilerpipe.
• See http://code.google.com/p/boilerpipe/.
• Next: normalization of the text.
12
Example:
tag cloud creation (3/6)
• Named entities extraction.
• Standard in OpenNLP : OpenNLP adds tags in text.
• Here : extraction of Person NE.
• Terminology extraction.
• First : part-of-speech tagging (POST).
• Next : identification et filtering (threshold) of :
• collocations (i.e: Name_Name, Adjective_Name,...),
• proper names (often: brands or people).
13
Example:
tag cloud creation (4/6)
• Process :
Raw HTML
document
---- --- -- ----.
--- -- -- -- ----
--- -- ----.
---- --- -- ----.
--- -- -- -- ----
--- -- ----.
_--- _-- _-- _
_---- _--.
_--- _-- _-- _--
_____
_____
_____
Conversion
to text
Normalization
POS
tagging
_____
_____
_____
Terminology
extraction
NE extraction
Tag cloud
(for a website)
Website
(Internet)
Website
(local)
Crawl
Tags
Merge
14
Example:
tag cloud creation (5/6)
• Result: common tag cloud.
15
Example:
tag cloud creation (6/6)
• Result: circular tag cloud.
16
Thanks for your attention.
Any questions ?
17
Contact
Dr Ir Robert Viseur
Email (@CETIC) : robert.viseur@cetic.be
Email (@UMONS) : robert.viseur@umons.ac.be
Phone : 0032 (0) 479 66 08 76
Website : www.robertviseur.be
This presentation is covered by « CC-BY-ND » license.

More Related Content

What's hot

Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
Ali Habeeb
 

What's hot (20)

An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Natural Language Processing: Comparing NLTK and OpenNLP
Natural Language Processing: Comparing NLTK and OpenNLPNatural Language Processing: Comparing NLTK and OpenNLP
Natural Language Processing: Comparing NLTK and OpenNLP
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
NLTK
NLTKNLTK
NLTK
 
Comparative studies on detecting abusive language on twitter
Comparative studies on detecting abusive language on twitterComparative studies on detecting abusive language on twitter
Comparative studies on detecting abusive language on twitter
 
Introduction to NP Completeness
Introduction to NP CompletenessIntroduction to NP Completeness
Introduction to NP Completeness
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
BERT
BERTBERT
BERT
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Greedy algorithms
Greedy algorithmsGreedy algorithms
Greedy algorithms
 
A Big Data Timeline
A Big Data TimelineA Big Data Timeline
A Big Data Timeline
 

Similar to Presentation of OpenNLP

Similar to Presentation of OpenNLP (20)

Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
Python presentation of Government Engineering College Aurangabad, Bihar
Python presentation of Government Engineering College Aurangabad, BiharPython presentation of Government Engineering College Aurangabad, Bihar
Python presentation of Government Engineering College Aurangabad, Bihar
 
01 html-introduction
01 html-introduction01 html-introduction
01 html-introduction
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Its2 ontology-localization
Its2 ontology-localizationIts2 ontology-localization
Its2 ontology-localization
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
 
Aspects of NLP Practice
Aspects of NLP PracticeAspects of NLP Practice
Aspects of NLP Practice
 
Lecture semantic augmentation
Lecture semantic augmentationLecture semantic augmentation
Lecture semantic augmentation
 
Medical Heritage Library (MHL) on ArchiveSpark
Medical Heritage Library (MHL) on ArchiveSparkMedical Heritage Library (MHL) on ArchiveSpark
Medical Heritage Library (MHL) on ArchiveSpark
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
Apache cTAKES - NLP in Healthcare
Apache cTAKES - NLP in HealthcareApache cTAKES - NLP in Healthcare
Apache cTAKES - NLP in Healthcare
 
Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101
 
Doctrine Project
Doctrine ProjectDoctrine Project
Doctrine Project
 
How to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the WorldHow to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the World
 
The State of #NLProc
The State of #NLProcThe State of #NLProc
The State of #NLProc
 
Apache cTAKES- NLP in Healthcare
Apache cTAKES- NLP in HealthcareApache cTAKES- NLP in Healthcare
Apache cTAKES- NLP in Healthcare
 
Approaches to document/report generation
Approaches to document/report generation Approaches to document/report generation
Approaches to document/report generation
 
Basics of python
Basics of pythonBasics of python
Basics of python
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 

More from Robert Viseur

L'open hardware dans l'électronique (et au delà...)
L'open hardware dans l'électronique (et au delà...)L'open hardware dans l'électronique (et au delà...)
L'open hardware dans l'électronique (et au delà...)
Robert Viseur
 
Open Source Hardware for Dummies
Open Source Hardware for DummiesOpen Source Hardware for Dummies
Open Source Hardware for Dummies
Robert Viseur
 
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...Comment gérer le risque de lock-in technique en cas d'usage de services de cl...
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...
Robert Viseur
 

More from Robert Viseur (20)

La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...
La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...
La PI dans les espaces de co-création et d'innovation ouverte. Propriété inte...
 
L'écosystème régional du Big Data
L'écosystème régional du Big DataL'écosystème régional du Big Data
L'écosystème régional du Big Data
 
Piloter son appareil photo numérique avec des logiciels libres
Piloter son appareil photo  numérique avec des logiciels  libresPiloter son appareil photo  numérique avec des logiciels  libres
Piloter son appareil photo numérique avec des logiciels libres
 
Exploiter les données issues de Wikipedia
Exploiter les données issues de WikipediaExploiter les données issues de Wikipedia
Exploiter les données issues de Wikipedia
 
De l’open source à l’open cloud
De l’open source à l’open cloudDe l’open source à l’open cloud
De l’open source à l’open cloud
 
Développer ses photos avec RawTherapee
Développer ses photos avec RawTherapeeDévelopper ses photos avec RawTherapee
Développer ses photos avec RawTherapee
 
Convertir ses photos en N/B avec Gimp
Convertir ses photos en N/B avec GimpConvertir ses photos en N/B avec Gimp
Convertir ses photos en N/B avec Gimp
 
L'open hardware : l'ouverture au service de l'innovation
L'open hardware : l'ouverture au service de l'innovationL'open hardware : l'ouverture au service de l'innovation
L'open hardware : l'ouverture au service de l'innovation
 
Pechakucha (Mons) : Street Art à Mons
Pechakucha (Mons) : Street Art à MonsPechakucha (Mons) : Street Art à Mons
Pechakucha (Mons) : Street Art à Mons
 
L'open hardware dans l'électronique (et au delà...)
L'open hardware dans l'électronique (et au delà...)L'open hardware dans l'électronique (et au delà...)
L'open hardware dans l'électronique (et au delà...)
 
Analyse des concepts de Fab Lab, Living Lab et Hub créatif
Analyse des concepts de Fab Lab, Living Lab et Hub créatifAnalyse des concepts de Fab Lab, Living Lab et Hub créatif
Analyse des concepts de Fab Lab, Living Lab et Hub créatif
 
Open Source Hardware for Dummies
Open Source Hardware for DummiesOpen Source Hardware for Dummies
Open Source Hardware for Dummies
 
Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...
Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...
Pratiques innovantes dans le secteur automobile: du champion de produit à l'i...
 
Etude du secteur des prestataires FLOSS en Belgique
Etude du secteur des prestataires FLOSS en BelgiqueEtude du secteur des prestataires FLOSS en Belgique
Etude du secteur des prestataires FLOSS en Belgique
 
Hacker son appareil photo avec des outils libres
Hacker son appareil photo avec des outils libresHacker son appareil photo avec des outils libres
Hacker son appareil photo avec des outils libres
 
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...Comment gérer le risque de lock-in technique en cas d'usage de services de cl...
Comment gérer le risque de lock-in technique en cas d'usage de services de cl...
 
Hacker son appareil photo, c'est possible !
Hacker son appareil photo, c'est possible !Hacker son appareil photo, c'est possible !
Hacker son appareil photo, c'est possible !
 
Comprendre les licences de logiciels libres
Comprendre les licences de logiciels libresComprendre les licences de logiciels libres
Comprendre les licences de logiciels libres
 
Impact of cloud computing on FOSS editors
Impact of cloud computing on FOSS editorsImpact of cloud computing on FOSS editors
Impact of cloud computing on FOSS editors
 
Une introduction à la co-création dans le domaine des TIC
Une introduction à la co-création dans le domaine des TICUne introduction à la co-création dans le domaine des TIC
Une introduction à la co-création dans le domaine des TIC
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Presentation of OpenNLP

  • 1. [ RMLL 2013, Bruxelles – Thursday 11th July 2013 ] Presentation of OpenNLP Presenter : Dr Ir Robert Viseur
  • 2. 2 What is OpenNLP ? • Toolkit for the processing of natural language text. • Project of the Apache Foundation. • Developped in Java. • Under Apache License, Version 2. • Download and documentation: http://opennlp.apache.org/.
  • 3. 3 What are the features ? • For common NLP tasks : • tokenization, • sentence segmentation, • part-of-speech tagging, • named entity extraction, • chuncking.
  • 4. 4 What is the part-of-speech tagging ? • Example : • See more: http://opennlp.apache.org/documentation/1.5.3 /manual/opennlp.html.
  • 5. 5 What is the named entity extraction ? • Example : • See more: http://opennlp.apache.org/documentation/1.5.3 /manual/opennlp.html.
  • 6. 6 How does it work ? (1/2) • The features are associated to pre-trained models. • Each pre-trained model is created for one language and for one type of use. • Supported languages: da, de, en, es, nl, pt, se. • Warnings : – The functional coverage varies with languages. – The french language is not supported ! • See http://opennlp.sourceforge.net/models- 1.5/. • Use in command line or as a Java library. • Warning : loading time of models with CLI.
  • 7. 7 How does it work ? (2/2) • Example (English vs Spanish languages) :
  • 8. 8 What are the criteria of choice ? • Support of the product. • License. • Available languages. • Precision / Recall. • Speed of text processing.
  • 9. 9 Are there free (as freedom) alternative tools ? • Other light tools : • Stanford Log-linear Part-Of-Speech Tagger (POST), • Stanford Named Entity Recognizer (NER), • TagEN, • Java Automatic Term Extraction toolkit. • Frameworks : • In Java : UIMA (Java), GATE (Java). • In other languages : NLTK (Python).
  • 10. 10 Example: tag cloud creation (1/6) • Starting point: website. • Example: www.adacore.com. • What we want (from website content): • common tag cloud, • circular tag cloud. • Main steps : crawl, cleaning of HTML documents, named entities (person) and terminology extractions (+ merge) and display (tag cloud).
  • 11. 11 Example: tag cloud creation (2/6) • Cleaning: • Remove the HTML tags and keep only the useful content. • Warnings: • NLP tools are sensitive to noise in raw data. • Pay attention to the language of the document. • Use of HTML boilerplate tool (HTML -> TXT). • Tool: Boilerpipe. • See http://code.google.com/p/boilerpipe/. • Next: normalization of the text.
  • 12. 12 Example: tag cloud creation (3/6) • Named entities extraction. • Standard in OpenNLP : OpenNLP adds tags in text. • Here : extraction of Person NE. • Terminology extraction. • First : part-of-speech tagging (POST). • Next : identification et filtering (threshold) of : • collocations (i.e: Name_Name, Adjective_Name,...), • proper names (often: brands or people).
  • 13. 13 Example: tag cloud creation (4/6) • Process : Raw HTML document ---- --- -- ----. --- -- -- -- ---- --- -- ----. ---- --- -- ----. --- -- -- -- ---- --- -- ----. _--- _-- _-- _ _---- _--. _--- _-- _-- _-- _____ _____ _____ Conversion to text Normalization POS tagging _____ _____ _____ Terminology extraction NE extraction Tag cloud (for a website) Website (Internet) Website (local) Crawl Tags Merge
  • 14. 14 Example: tag cloud creation (5/6) • Result: common tag cloud.
  • 15. 15 Example: tag cloud creation (6/6) • Result: circular tag cloud.
  • 16. 16 Thanks for your attention. Any questions ?
  • 17. 17 Contact Dr Ir Robert Viseur Email (@CETIC) : robert.viseur@cetic.be Email (@UMONS) : robert.viseur@umons.ac.be Phone : 0032 (0) 479 66 08 76 Website : www.robertviseur.be This presentation is covered by « CC-BY-ND » license.