This document discusses developing an automated framework for epidemiological analytics that allows event extraction and named entity recognition in the domain of veterinary medicine. It aims to protect public health by collecting, sharing, managing, modeling and analyzing data on emerging infectious animal diseases. The goals are to recognize disease names, locations, dates and other entities, and extract and classify disease outbreak events from unstructured web data. Research questions focus on constructing an animal disease ontology, resolving location ambiguities, merging extracted entities into event tuples, and classifying events to reason about confidence.
This document summarizes research on constructing a lexicon called CLex that explores associations between colors, concepts, and emotions based on crowdsourced data. The lexicon contains over 15,000 annotations linking colors to concepts and emotions. The research found cultural differences in color-emotion associations between the US and India and identified frequent color-concept pairs. The lexicon could help applications like sentiment analysis by capturing meaning conveyed through color terms.
Multimodal Information Extraction: Disease, Date and Location RetrievalSvitlana volkova
The document describes a multimodal information extraction system for disease, date, time, and location retrieval. It outlines document level analysis including entity recognition for diseases, dates, locations. It describes temporal and spatial tagging modules to extract dates/times and locations. It discusses representing extracted information through timelines and maps. It proposes future improvements like integrating extraction and visualization components and applying clustering and relationship extraction among events.
This paper presents a methodology to boost biomedical entity extraction through automated ontology learning from unlabeled text. The methodology involves (1) manual construction of an initial ontology, (2) using syntactic patterns to automatically extract relationships and expand the ontology, and (3) evaluating the expanded ontology on a biomedical entity extraction task. Experimental results show the automatically constructed ontology improves precision from 54% to 85% and recall from 25% to 79% for entity extraction, compared to using the initial manually constructed ontology. Future work involves generalizing the automated ontology construction approach to other domains and entity types.
This document outlines Svitlana Volkova's thesis on entity extraction and animal disease-related event recognition from web documents. It provides background on existing animal disease monitoring systems, both manually supported web interfaces and automated web services. It then discusses related work on text categorization, entity extraction, relation extraction, and event recognition. The document outlines Volkova's proposed framework for epidemiological analytics, including the main system components of data collection, data sharing, search, data analysis, and visualization. It provides details on disease-related document classification, domain-specific entity extraction, and ontology-based entity extraction. The goal is to build a system that can automatically extract information on animal disease outbreaks from unstructured web data.
This document summarizes a thesis presentation on entity extraction, animal disease-related event recognition and classification from web data. The presentation discusses developing a framework for automated epidemiological analysis that can classify disease-related documents, extract domain-specific entities like disease names and locations, and recognize and classify events related to animal diseases mentioned in unstructured web data. The methodology involves classifying documents, recognizing entities using techniques like ontology-based and sequence labeling approaches, and classifying recognized events. Experimental results on document classification, entity extraction and event recognition show promising precision, recall and F1-measure. The work aims to address limitations of existing disease monitoring systems and automate online epidemiological analysis.
This document discusses computational knowledge and information management for veterinary epidemiology. It summarizes various animal disease monitoring systems including manually supported web interfaces from international organizations and automated web services. It then outlines a proposed framework for epidemiological analytics including web crawling, domain-specific entity extraction, and animal disease event recognition. The framework aims to address challenges in aggregating and analyzing unstructured data from multiple sources to monitor animal infectious disease outbreaks.
The document describes a rule-based approach to recognize animal disease events from unstructured text in 3 steps: 1) Entity recognition of diseases, locations, species, dates. 2) Sentence classification into event-related vs unrelated and confirmation status. 3) Generation of event tuples combining extracted entities and aggregation of related tuples. The methodology was tested on 100 documents about foot-and-mouth disease and Rift valley fever, achieving a pyramid score between 0.6-1.0 for most event tuples extracted. Future work will focus on deeper syntactic analysis and co-reference resolution to improve accuracy.
This document summarizes a presentation on multilingual named entity recognition using Wikipedia. It discusses crawling Wikipedia to build gazetteers in multiple languages, using Google Sets to discover synonyms and expand the gazetteers, designing an experiment to perform named entity extraction on disease texts using the expanded gazetteers, and presenting conclusions on the novelty and limitations of the approach.
This document summarizes research on constructing a lexicon called CLex that explores associations between colors, concepts, and emotions based on crowdsourced data. The lexicon contains over 15,000 annotations linking colors to concepts and emotions. The research found cultural differences in color-emotion associations between the US and India and identified frequent color-concept pairs. The lexicon could help applications like sentiment analysis by capturing meaning conveyed through color terms.
Multimodal Information Extraction: Disease, Date and Location RetrievalSvitlana volkova
The document describes a multimodal information extraction system for disease, date, time, and location retrieval. It outlines document level analysis including entity recognition for diseases, dates, locations. It describes temporal and spatial tagging modules to extract dates/times and locations. It discusses representing extracted information through timelines and maps. It proposes future improvements like integrating extraction and visualization components and applying clustering and relationship extraction among events.
This paper presents a methodology to boost biomedical entity extraction through automated ontology learning from unlabeled text. The methodology involves (1) manual construction of an initial ontology, (2) using syntactic patterns to automatically extract relationships and expand the ontology, and (3) evaluating the expanded ontology on a biomedical entity extraction task. Experimental results show the automatically constructed ontology improves precision from 54% to 85% and recall from 25% to 79% for entity extraction, compared to using the initial manually constructed ontology. Future work involves generalizing the automated ontology construction approach to other domains and entity types.
This document outlines Svitlana Volkova's thesis on entity extraction and animal disease-related event recognition from web documents. It provides background on existing animal disease monitoring systems, both manually supported web interfaces and automated web services. It then discusses related work on text categorization, entity extraction, relation extraction, and event recognition. The document outlines Volkova's proposed framework for epidemiological analytics, including the main system components of data collection, data sharing, search, data analysis, and visualization. It provides details on disease-related document classification, domain-specific entity extraction, and ontology-based entity extraction. The goal is to build a system that can automatically extract information on animal disease outbreaks from unstructured web data.
This document summarizes a thesis presentation on entity extraction, animal disease-related event recognition and classification from web data. The presentation discusses developing a framework for automated epidemiological analysis that can classify disease-related documents, extract domain-specific entities like disease names and locations, and recognize and classify events related to animal diseases mentioned in unstructured web data. The methodology involves classifying documents, recognizing entities using techniques like ontology-based and sequence labeling approaches, and classifying recognized events. Experimental results on document classification, entity extraction and event recognition show promising precision, recall and F1-measure. The work aims to address limitations of existing disease monitoring systems and automate online epidemiological analysis.
This document discusses computational knowledge and information management for veterinary epidemiology. It summarizes various animal disease monitoring systems including manually supported web interfaces from international organizations and automated web services. It then outlines a proposed framework for epidemiological analytics including web crawling, domain-specific entity extraction, and animal disease event recognition. The framework aims to address challenges in aggregating and analyzing unstructured data from multiple sources to monitor animal infectious disease outbreaks.
The document describes a rule-based approach to recognize animal disease events from unstructured text in 3 steps: 1) Entity recognition of diseases, locations, species, dates. 2) Sentence classification into event-related vs unrelated and confirmation status. 3) Generation of event tuples combining extracted entities and aggregation of related tuples. The methodology was tested on 100 documents about foot-and-mouth disease and Rift valley fever, achieving a pyramid score between 0.6-1.0 for most event tuples extracted. Future work will focus on deeper syntactic analysis and co-reference resolution to improve accuracy.
This document summarizes a presentation on multilingual named entity recognition using Wikipedia. It discusses crawling Wikipedia to build gazetteers in multiple languages, using Google Sets to discover synonyms and expand the gazetteers, designing an experiment to perform named entity extraction on disease texts using the expanded gazetteers, and presenting conclusions on the novelty and limitations of the approach.
1. The document presents a method for information extraction of animal disease entities from unstructured web documents using a dictionary lookup approach and ontology construction.
2. Key steps include collecting a gazetteer of animal disease terms from various sources, discovering relations like synonyms between concepts to expand the ontology, and using the ontology to extract disease names and other entities from texts through dictionary matching.
3. Experimental results show that combining the gazetteer with synonyms and abbreviations achieved the best performance, with an average disease name extraction rate of 84.36% over 50 websites. The size and quality of the ontology was found to influence extraction accuracy.
This document summarizes the agenda and key topics for a CIS 890 project final presentation on topics modelling with LDA. The presentation will cover LDA modelling, HMMLDA modelling, LDA with collocations modelling, and experimental results on the NIPS collection. It will discuss topic modelling approaches like LDA, discriminative vs generative methods, and limitations of bag-of-words assumptions.
This document proposes using Wikipedia concepts to improve topic modeling. It discusses using n-grams like bigrams and phrases from Wikipedia categories rather than only individual words. The goal is to develop a topic model that associates words and Wikipedia concepts with mixtures of topics. Important steps include collecting a dataset, preprocessing it, performing topic modeling using an LDA model that incorporates Wikipedia concepts, and evaluating the results against models using only unigrams or n-grams. Key benefits noted are the ability to represent topics with more representative concepts and reduce ambiguity compared to models using only individual words.
The document summarizes Svitlana Volkova's presentation on link prediction in social networks. The presentation covers: introducing link prediction and related studies; the methodology including mathematical representations and similarity measures; planned experiments crawling Facebook and using visualization tools; and conclusions on approaches to predicting previously unobserved links. Key approaches discussed are supervised vs. unsupervised methods, and similarity measures for node-wise link prediction.
Gender differences exist in how life satisfaction changes with parenthood. A study found that males reported higher life satisfaction after becoming parents compared to before, while females reported lower satisfaction. Specifically:
- Males ranked family as a higher priority and spent more time on it after parenthood, while females' priorities and time spent did not change as much.
- Females' life satisfaction score was higher before parenthood when they had more opportunities for activities like career and money making that increased independence.
- Younger females aged 25-30 reported a smaller decrease in life satisfaction compared to females aged over 25 or under 30.
Ukraine gained independence in 1991. It has a population of over 48 million people and borders Russia, Belarus, Moldova, Poland, Hungary, Romania and Slovakia. The document provides details on Ukraine's geography, climate, natural resources, cities, transportation infrastructure and economy. It also discusses Ukraine's system of government, which consists of legislative, executive and judicial branches, and outlines the country's constitution, symbols and history of independence.
This document provides an overview of Ukraine. It notes that Ukraine is located in Eastern Europe and has a population of over 46 million people. It is divided into 24 regions and its capital and largest city is Kyiv. The document discusses Ukraine's geography, climate, natural resources, transportation infrastructure, government structure, economy, and culture. It aims to give the reader a broad understanding of the country of Ukraine.
http://www.testassistant.com);
project management system (MS Project);
version control system (CVS, Subversion);
bug tracking system (Bugzilla);
instant messengers (Skype, ICQ, MSN Messenger);
e-mail.
• The choice of communication tools depends on the project tasks,
participants’ locations and their technical capabilities.
The Effectiveness of Communications
- Timely information delivery
- Feedback and response
- Understanding and clarity
- Participation and involvement
- Coordination and cooperation
- Trust and relationships
- Motivation
The document discusses Svitlana O. Volkova, a PhD student at Mykolaiv State Humanities University. It provides details about the university such as its location in Mykolaiv, Ukraine, population statistics, and academic programs. It also mentions the university's consortium with Kyiv-Mohyla Academy and focus on combining Western educational technologies with national traditions.
1. The document presents a method for information extraction of animal disease entities from unstructured web documents using a dictionary lookup approach and ontology construction.
2. Key steps include collecting a gazetteer of animal disease terms from various sources, discovering relations like synonyms between concepts to expand the ontology, and using the ontology to extract disease names and other entities from texts through dictionary matching.
3. Experimental results show that combining the gazetteer with synonyms and abbreviations achieved the best performance, with an average disease name extraction rate of 84.36% over 50 websites. The size and quality of the ontology was found to influence extraction accuracy.
This document summarizes the agenda and key topics for a CIS 890 project final presentation on topics modelling with LDA. The presentation will cover LDA modelling, HMMLDA modelling, LDA with collocations modelling, and experimental results on the NIPS collection. It will discuss topic modelling approaches like LDA, discriminative vs generative methods, and limitations of bag-of-words assumptions.
This document proposes using Wikipedia concepts to improve topic modeling. It discusses using n-grams like bigrams and phrases from Wikipedia categories rather than only individual words. The goal is to develop a topic model that associates words and Wikipedia concepts with mixtures of topics. Important steps include collecting a dataset, preprocessing it, performing topic modeling using an LDA model that incorporates Wikipedia concepts, and evaluating the results against models using only unigrams or n-grams. Key benefits noted are the ability to represent topics with more representative concepts and reduce ambiguity compared to models using only individual words.
The document summarizes Svitlana Volkova's presentation on link prediction in social networks. The presentation covers: introducing link prediction and related studies; the methodology including mathematical representations and similarity measures; planned experiments crawling Facebook and using visualization tools; and conclusions on approaches to predicting previously unobserved links. Key approaches discussed are supervised vs. unsupervised methods, and similarity measures for node-wise link prediction.
Gender differences exist in how life satisfaction changes with parenthood. A study found that males reported higher life satisfaction after becoming parents compared to before, while females reported lower satisfaction. Specifically:
- Males ranked family as a higher priority and spent more time on it after parenthood, while females' priorities and time spent did not change as much.
- Females' life satisfaction score was higher before parenthood when they had more opportunities for activities like career and money making that increased independence.
- Younger females aged 25-30 reported a smaller decrease in life satisfaction compared to females aged over 25 or under 30.
Ukraine gained independence in 1991. It has a population of over 48 million people and borders Russia, Belarus, Moldova, Poland, Hungary, Romania and Slovakia. The document provides details on Ukraine's geography, climate, natural resources, cities, transportation infrastructure and economy. It also discusses Ukraine's system of government, which consists of legislative, executive and judicial branches, and outlines the country's constitution, symbols and history of independence.
This document provides an overview of Ukraine. It notes that Ukraine is located in Eastern Europe and has a population of over 46 million people. It is divided into 24 regions and its capital and largest city is Kyiv. The document discusses Ukraine's geography, climate, natural resources, transportation infrastructure, government structure, economy, and culture. It aims to give the reader a broad understanding of the country of Ukraine.
http://www.testassistant.com);
project management system (MS Project);
version control system (CVS, Subversion);
bug tracking system (Bugzilla);
instant messengers (Skype, ICQ, MSN Messenger);
e-mail.
• The choice of communication tools depends on the project tasks,
participants’ locations and their technical capabilities.
The Effectiveness of Communications
- Timely information delivery
- Feedback and response
- Understanding and clarity
- Participation and involvement
- Coordination and cooperation
- Trust and relationships
- Motivation
The document discusses Svitlana O. Volkova, a PhD student at Mykolaiv State Humanities University. It provides details about the university such as its location in Mykolaiv, Ukraine, population statistics, and academic programs. It also mentions the university's consortium with Kyiv-Mohyla Academy and focus on combining Western educational technologies with national traditions.
1. Automated Event Extraction and Named Entity
Recognition in the Domain of Veterinary Medicine
K-State Lab for Knowledge
Discovery in Databases
Svitlana Volkova, PhD Student, Johns Hopkins University
MOTIVATION
ONTOLOGY-BASED RECOGNITION
EVENT EXTRACTION
Global epidemic surveillance is an essential task for national biosecurity Synonym relationships
ANIMAL
Type 1: Emergent Outbreak-Related Events
“On 2 Jun 2010, a total of 35 individuals infected with a matching strain of salmonella”
management and bioterrorism prevention.
<E1 is a kind of E2>
DISEASES
Type 2: Non-Emergent Outbreak-Related Events
Animal Infectious Disease Outbreaks
E1 = “swine influenza” is a kind of
Dipylidium Baylisasca- “The US saw its latest FMD outbreak in Montebello, California in 1929”
E2 = “swine fever”
infection
Q fever
riasis
Type 3: Disease Outbreak Non-Related Events
Hyponym relationships
“A meeting on foot and mouth disease was held in Brussels on Oct 17, 2007”
<E3 and E4 are diseases>
Tapeworm
Coxiella
B. melis
Types 4 & 5: Hypothetical Events or Negation of the Events
burnetii
E3 = “anthrax”, E4 = “yellow fever”
are diseases
EVENT TUPLE
C. burnetii
B. procyonis
Eventi =< disease; date; location; species; status >
Causal relationships <E5 is caused by V5>
Class 1 – Susceptible Status
influence on cause economic can cause loss of E5 = “Ovine epididymitis” is caused by V5 = “Brucella ovis”
healthi
popul
open
vulner
expos
respons
sign
separ
contamin
international travel crises, political human life (61% of B. transfuga
and trade
instability
animal disease)
Synonymic
Hyponymic
Causal
Class 2 – Infected Status
E1 is a E2 and E3
D such as E1, E2
E1 is caused by V1
outbreak
infect
report
confirm
affect
sick
diagnos
readi
inciner
The goal is to protect the public from major health threads by developing the E1 also known as E2
D for example E1
V2 causes E2
B. columnaris
Class 3 – Recovered Status
framework for epidemiological analytics that allows automated data E1 is also called E2
D including E1, E2
V3 is an agent of E3
destroi
burn
erad
dispos
dead
buri
slaughter
elimin
cull
collection, sharing, management, modeling and analysis in the domain of
emerging infectious diseases.
APPLYING SYNTACTIC FEATURES
DATA
“Severe disease in dairy cattle caused by Salmonella Newport”
Step 1: Entity Recognition
POS = [NNP, IN, NNS, VBN, …] = [2, 0, 2, 5, …]
Foot-and-mouth disease[DIS] on hog[SP] farm in Taoyuan[LOC]
Literature
News & RSS
E-mails
Blog Posts
WWW - reports, Xi = [POSi, CAPi, ICAPi, SPOSi, DPOSi, FREQi]
Taiwan's TVBS television station reports that agricultural authorities confirmed foot-and-mouth
surveillance networks
Xi-3 = [2, 0, 0, 5, 5, 1]
disease[DIS] on a hog[SP] farm in Taoyuan[LOC]. On 9 Jun 2009[DT], the farm's owner reported
Xi = [2, 1, 0, 8, 8, 1]
Xi-2 = [5, 0, 0, 6, 6, 1]
symptoms of FMD[DIS] in more than 30 hogs[SP]. Subsequent testing confirmed FMD[DIS]. Agricultural
Crawler Component
Newport
authorities asked the farmer to strengthen immunization. The outbreak has not affected other farms.
Xi-1 = [0, 0, 0, 7, 7, 1]
Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks.
Document …
…
Query
Collection
wi-3
wi-2
wi-1
wi
wi+1
wi+2
wi+3
Step 2: Sentence Classification
1. Foot-and-mouth disease[DIS] on hog[SP] farm in Taoyuan[LOC].
cattle
caused
by
Salmonella
Xi+1 = [2, 1, 0, 9, 9, 1]
2. Taiwan's TVBS television station reports that agricultural authorities confirmed foot-
PROBLEM FORMULATION
Fi = [Xi, Xi-1, Xi-2, Xi-3, Xi+1, Xi+2, Xi+3], w = 3
Xi+2 = [-1, -1, -1, -1, -1, -1]
and-mouth disease[DIS] on a hog[SP] farm in Taoyuan[LOC].
3. On 9 Jun 2009[DT], the farm's owner reported symptoms of FMD[DIS] in more than 30
Class = {0, 1}
Xi+3 = [-1, -1, -1, -1, -1, -1]
Introduce the following functionality to the framework for epidemiological hogs[SP].
analytics:
Fi = [2, 1, 0, 8, 8, 1, 0, 0, 0, 7, 7, 1, 5, 0, 0, 6, 6, 1, 2, 0, 0, 5, 5, 1,
2, 1, 0, 9, 9, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1], Class = [1]
4. Subsequent testing confirmed FMD[DIS].
5. Agricultural authorities asked the farmer to strengthen immunization.
Domain-specific and domain-independent named entity recognition:
ontology-based and using syntactic features:
RESULTS
6. The outbreak has not affected other farms.
7. Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks.
disease names (e.g. “foot and mouth disease”);
Step 3a: Tuple Generation
viruses (e.g. “picornavirus”) and serotypes (e.g. “Asia-1”);
E1 = <Foot-and-mouth disease, ?, Taoyuan, hog, ?>
species (e.g. “sheep”, “cattle”);
E2 = <Foot-and-mouth disease, ?, Taoyuan, hog, confirmed>
locations (e.g. “United Kingdom”, “eastern provinces of Shandong and E3 = <FMD, 9 Jun 2009, ?, hog, reported>
Jiangsu, China” – different level of granularity);
E4 = <FMD, ?, ?, ?, confirmed>
dates in different formats including special cases (e.g. “last Tuesday”, Step 3b: Tuple Aggregation
“two month ago”).
E = <disease, date, location, species, status> =
<Foot-and-mouth disease, 9 Jun 2009, Taoyuan, hog, infected>
Automated animal disease event extraction and classification
from unstructured web data.
EVENT VISUALIZATION
RESEARCH QUESTIONS
How do we construct an ontology of animal disease names, their synonyms
and corresponding viruses and learn semantic relationships between them?
How should we resolve location disambiguation “Rabies in Isle of Wight”, Classifier
Wi+1
Wi-1
Wi+/-1
Wi+/-3
Wi+5
Wi-5
Wi+/-5
Wi+/-7
geo-tag in Virginia, USA or UK?
Random Forest
0.771
0.773
0.782
0.775
0.764
0.771
0.757
0.745
AdaBoost
0.758
0.759
0.759
0.758
0.761
0.761
0.761
0.761
How should we merge extracted entities into corresponding event tuples?
Naïve Bayes
0.700
0.706
0.685
0.661
0.662
0.600
0.647
0.639
How do we classify extracted event tuples in order to reason about event Logistic
0.738
0.739
0.739
0.739
0.734
0.736
0.753
0.735
confidence?
2001 foot-and-mouth disease outbreak over time in United Kingdom: February, March, April
Acknowledgements: William H. Hsu, Doina Caragea, Chris Callison-Burch