Development and testing of the media monitoring tool MedISys for the early identification and reporting of existing and emerging plant health threats guided by a plant health threats ontology
How to Remove Document Management Hurdles with X-Docs?
Monitoring plant health threats with a multilingual ontology
1. GRIHO Research Group, INSPIRES Research Centre, Universitat de Lleida
Roberto García, Josep Maria Brunetti*, Rosa Gil, Jordi Virgili, Toni Granollers
Multilingual Ontology for
Plant Health Threats
Media Monitoring
(A Smart Data Approach)
2. Media Monitoring for New and (Re)Emerging Plant Health Threats
• Project: development and testing of the media monitoring tool
MedISys for the early identification and reporting of existing and
emerging plant health threats
• Timing (duration): January 2014 – June 2016 (2.5 years)
• Funding: EFSA
• Coordination: Universitat de Lleida (UdL)
• Partners: IRTA and UdL
• Other participants: Joint Research Centre (European Commission)
• Objectives:
• Collate new and appropriate media information sources
• Multilingual ontology for the global identification of emerging new plant health threats to be appended to MedISys
• English, Spanish, Italian, French, Dutch, German, Portuguese, Russian, Chinese and Arabic
• Develop and test strategies to monitor re-emerging plant health threats on global and regional scale
• Analyse and test approaches to report identified signals to EFSA Units and experts through MedISys
3. Approach
• Ontology: key component of the developed system that structures and
provides knowledge about plant health threats
• Knowledge captured from existing sources and experts
• Guides applications for
• Knowledge capture
• Indirect sources search
• Terms translation
• Media monitoring categories generation
3
An ontology is a formal, explicit specification of a shared conceptualisation.
is
means
implies expressed in
terms of
Abstract model of
portion of world
Machine-readable
and understandable
Based on a
consensus
Concepts,
properties,...
4. Ontology Skeleton
• Collected 140 pests/diseases from EPPO Alerts, 2000/29-1-A-1 and
EU Emergency Control Measures
• 117 linked to UniProt Taxonomy:
• Taxonomical information, scientific/common/other names,…
• 47 linked also to Wikipedia
• Common names in multiple languages
4
5. Plant Health Threats Ontology
• Enrich ontology with affected crops, hosts, vectors, symptoms
expressions…
5
6. Plant Health Threats Ontology
• All concepts linked to labels in different languages
• Extract as keywords for MedISys or Web search filters,…
• Example: “Maladie de Pierce” OR ( “grapevine” AND “sharpshooter” )
6
Xylella fastidiosa
Gammaproteobacteria
Nerium oleander,
Prunus salicina, Medicago
sp., Sorghum halepense,…
Homalodisca coagulata,
Graphocephala sp.,
Oncometopia sp.,
Draeculacephala sp.,…
Grapevine, Citrus, Olive,
Almond, Peach, Coffee,…
subClassOf
vector
host
crop
“Pierce's disease”, “Citrus
variegated chlorosis” en
“Maladie de Pierce” fr
“葉緣焦枯病菌” zn
“Glassy-winged sharpshooter”,
“Spittlebugs”, “Froghoppers”,
“Planthoppers”,… en
“vite” it,… …
7. Ontology Editor
• Assist experts during the knowledge capture process
7
http://indagus.udl.cat/medisys/editor/
12. Multilingual Ontology
• Threats names
• 1609 terms
• 27 languages
Not available
617
38%
Latin
375
23%
English
262
16%
French
81
5%
German
68
4%
Spanish
65
4%
Japanese
21
1%
Dutch
17
1%
Italian
16
1%
Portugues
15
1%
Finish
8
1%
Chinese
7
1%
Russian
6
1%
Other
51
3%
15. Ontology Browser
• Complex queries
• Example: “all threats with symptoms affecting the leaves”
http://indagus.udl.cat/plantHealthThreats/
16. Identification of Information Source to Monitor
• Objective: collect relevant information sources to be monitored by
MedISys
• Methodology
• Identify information sources already known by experts, previous research
projects, official sources like EPPO, journals,…
Direct Sources
• Identify web information sources (newspapers, blogs, webs, etc.) unknown
discovered using search engines and ontology terms
Indirect Sources
• Analyse and evaluate all collected sources using Information Quality measure
• First , filter duplicates, irrelevant, non-monitorable, etc.
17. Methodology
Plant Health Threats Sources
Inventory
Known Sources Web Search
Reference
resources
(expert
knowledge)
Existing projects related
to pest and food/feed
risks (EFSA)
MedISys
sources
(JRC)
Filtering and
Evaluation
process
List of relevant
sources
List of relevant
sources
Filtering
process
(avoid duplicates
& evaluation)
Final list
Search
Mechanisms
(query Process)
1956 sources
(72 known + 1884 web search)
Ontology
18. Monitor Known Threats
• Known threats: explicit mention of the threat name
• Generate automatically from ontology
• MedISys category for each threat with
list of keywords (terms) with threshold
• 117 categories for known threats:
• Bacteria: Xylella fastidiosa, Acidovorax citrulli,… (6)
• Fungi: Ceratocystis fagacearum, Diplocarpon mali,… (18)
• Insects: Agrilus coxalis auroguttatus, Agrilus planipennis,… (54)
• Mollusks: Pomacea (1)
• Nematodes: Bursaphelenchus xylophilus, Nacobbus aberrans,… (7)
• Oomycetes: Phytophthora ramorum (1)
• Phytoplalsma: Elm yellows phytoplasma, Candidatus Phytoplasma pruni,… (7)
• Viroid: Tomato apical stunt viroid, Potato spindle tuber viroid (2)
• Virus: Andean potato latent virus, Andean potato mottle virus,… (21)
http://medisys.newsbrief.eu/medisys/groupedition/en/PlantHealthAll.html
18
Keyword sources Threshold
Scientific names 100
Common names (all languages) 100
Other names 100
19. Monitor Unknown Threats
• Unknown Threats: name not explicitly mentioned
• Approach 1: manual generation of MedISys categories by experts
http://medisys.newsbrief.eu/medisys/filteredition/en/EFSAUnknownPestFilteredEmailAlert.html
19
A combination of Combinations (Proximity: 15)
at least one of alien, danger, dangerous, deadly…
and at least one of agricultural, agriculture, almond…
and at least one of bacteria, bacterial, crop+failure,…
but none of allergies, allergy, animal+abuse,…
20. Monitor Unknown Threats
• Approach 2: automatic generation from ontology (multilingual)
• Concepts associated to the threats (but not their names)
• Affected crops, vectors, hosts, symptoms, plant parts,...
• Currently, the ontology models the symptoms for just 7 threats:
• Phytophthora ramorum, Anoplophora glabripennis, Bactrocera tryoni, Agrilus planipennis, Xylella
fastidiosa, Candidatus liberibacter and Rhynchophorus ferrugineus
• http://medisys.newsbrief.eu/medisys/alertedition/en/AgrilusPlanipennis-PHT-Symptoms.html
• http://medisys.newsbrief.eu/medisys/alertedition/en/AnoplophoraGlabripennis-PHT-Symptoms.html
• http://medisys.newsbrief.eu/medisys/alertedition/en/BactroceraTryoni-PHT-Symptoms.html
• http://medisys.newsbrief.eu/medisys/alertedition/en/CandidatusLiberibacter-PHT-Symptoms.html
• http://medisys.newsbrief.eu/medisys/alertedition/en/PhytophthoraRamorum-PHT-Symptoms.html
• http://medisys.newsbrief.eu/medisys/alertedition/en/RhynchophorusFerrugineus-PHT-Symptoms.html
• http://medisys.newsbrief.eu/medisys/alertedition/en/XylellaFastidiosa-PHT-Symptoms.html
20
Combinations tree (Proximity 10) Example
Affected crop AND Symptom AND Plant Part “walnut” AND “necrosis” AND “tree”
OR
Affected crop AND Vectors “lime” AND “asian citrus psyllid”
21. Results
• Known threats
• MedISys categories using threat names as keywords very effective
• Example Xylella fastidiosa:
• 5078 relevant news items selected from February 2015 to May 2016 (16 months)
• However, they miss items not explicitly mentioning the threat
• Unknown threats
• Manually defined categories by experts
• 80% items relevant
• 10 items per day
• Categories generated automatically using symptoms, crops, vectors…
• 60% items relevant
• Just 7 per week
• A lot of noise, terms ambiguity
• Added negative words to filter false positives but increased false negatives
• Anyway, just preliminary work (just 7 threats modelled)…
21
22. Future work
Build Disease-Symptom network like for human health?
22
Zho u, X., Menche, J., Barabási, A. L., & Sharma, A. (2014)
Human symptoms–disease network. Nature communications, 5
23. Thank you very much for your attention
Questions?
Roberto García
rgarcia@diei.udl.cat
http://rhizomik.net/~roberto/
Editor's Notes
Fonts directes: identificació del que ja està establert en el domini (plant health), és a dir, les fonts d'informació reconegudes com a rellevants per la comunitat, l'estat de l’art, Fonts oficials, publicacions científiques i tècniques, etc.
Fonts indirectes: Fonts de notícies (revistes i diaris digitals), blogs, webs no oficials, etc. Aquestes Fonts són recollides de forma automática per un buscador propi (adaptat per al projecte) interessant per descubrir malalties de plantes noves i/o reemergents.
En aquest primer procés d’identificació i revisió de Fonts ja es fa un primer filtre per tal:
Eliminar / Descartar Fonts repetides
Identificació de recursos pertinents (descartar recursos que no están relacionats amb “salut vegetal”:
Identificació de Fonts d’informació relacionats amb' Salut vegetal’ però amb informació estàtica (descriptiva) descartats per al monitoreig (MedISys).
Identificació de recursos amb informació rellevant sobre ‘Salut vegetal’, actualitzats i amb seccions de notícies però NO monitoritzables descartats per al monitoreig (MedISys)
Identificació de recursos rellevants, actualitzats, amb secció de notícies i mecanismes d’alerta que sí permeten ser monitoritzats (RSS) s’inclouen a MedISys