Your SlideShare is downloading. ×
0
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
MS Thesis Short
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MS Thesis Short

1,553

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,553
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Laboratory for Knowledge Discovery in Databases
    Entity Extraction, Animal Disease-related Event Recognition and Classification from Web
    Presenter: Svitlana Volkova
    Adviser: William H. Hsu
    Committee: Dr. Doina Caragea, Dr. Gurdip Singh
    Supported by: K-State National Agricultural Biosecurity Center (NABC), US Department of Defense
  • 2. Agenda
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
    Background
    Related Work
    Framework for Epidemiological Analysis
    Disease-related Document Classification
    Domain-specific Entity Extraction
    • Ontology-based Entity Extraction
    • 3. Sequence Labeling using Syntactic Features
    Disease-related Event Recognition and Classification
    Summary & Future Work
  • 4. Importance of the Problem
    influence on the travel and trade
    cause economic crises, political instability
    diseases, zoonotic in type can cause loss of life
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 5. Animal Disease Monitoring Systems - Automated Web Services
    Information retrieval system MedISys - http://medusa.jrc.it/medisys/homeedition/all/home.html
    Pattern-based Understanding and Learning System (PULS) - http://sysdb.cs.helsinki.fi/puls/jrc/all
    BioCaster - http://biocaster.nii.ac.jp/
    HealthMap - http://healthmap.org/en
    EpiSpider- http://www.epispider.org/
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 6. Limitations of the Existing Systems
    No timeline visualization (BioCaster)
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 7. Problem Statement
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
    Introduce the following features to the framework for the epidemiological analysis:
    Classification of the disease-related documents collected from different domains
    Domain-specific entity extraction - animal disease names, viruses, disease serotypes
    Automated animal disease-related event recognition and classification from unstructured web data
  • 8. Methodology
    Suppose we have a document collection D with documents collected from different domains C:
    news, web pages, scientific papers, medical literature, e-mails.
    We classify documents into two classes:
    disease-related documents DR;
    disease non-related document DNR.
    We extract a set of events E from every document di in DR for every domain cj.
    For every event ek in E we extract a set of domain-specific and domain-independent entities:
    disease, species, location, date, event status.
    We classify recognized events from E into:
    two classes – suspected or confirmed;
    three classes – susceptible, infected or recovered.
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 9. Related Work
    Approaches for text categorization: supervised, unsupervised and semi-supervised learning and different feature representations: “bag-of-words”, terms frequency, binary features, word bigrams, classification algorithms: lazy learners, decision trees, Naïve Bayes, Maximum Entropy.
    Entity extraction approaches: gazetteers, regular expressions, Hidden Markov Models and Conditional random Fields; ontology-based biomedical entity extraction.
    Relation extraction for automated ontology construction works.
    Animal disease-related event recognition methods.
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 10. Framework for Epidemiological Analytics
    Framework for Epidemiological Analysis
    Main Functional Components
    Data Collection (Document Relevance Classification) -> Data Sharing -> Search -> Data Analysis (Entity Extraction and Event Recognition) -> Visualization
  • 11. Advantages of the Designed System
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 12. Phases of Data Processing
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 13. 1. Data Collection (1)
    Crawl the web using Heritrix crawler - http://crawler.archive.org/
    set of seeds (ProMED-Mail, DEFRA etc.)
    set of terms (animal disease names from the ontology)
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 14. 2. Data Sharing
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
    Document relevance classification
    Relevant
    Non-relevant
  • 15. 3. Search
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
    Lucene-based* ranking
    Query-based keyword search
    Search by animal disease name and/or location
    *Lucene - http://lucene.apache.org
  • 16. 4. Data Analysis
    Event example:
    “On 12 September 2007, a new foot-and-mouth disease outbreak was confirmed in Egham, Surrey”
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 17. 5. Visualization
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
    Map View
    GoogleMaps API - http://code.google.com/apis/maps/
    TimeLine View
    SIMILE API - http://www.simile-widgets.org/timeline/
  • 18. Disease-related Document Classification
    Binary Classification using Supervised Learning
    Feature Representations: “Bag-of-words”, TF, Bigrams
    Classifiers: Naïve Bayes, MaxEntropy, J48
  • 19. SupervisedLearningFramework
    New
    Documents
    DTest
    Feature
    Representation R1

    Feature
    Representation Rn
    Learned
    Model M1

    Learned
    Model Mk
    Crawled
    Documents DTrain
    Classifier
    Disease Related - DR
    (processed to the next phases)
    Disease Non-related – DNR (eliminated from the index)
    Feature Representations:
    R1 – “bag-of-words” binary, |R1|=28908
    R2 – “bag-of-words” term frequency, |R2|=28908
    R3 – “bag-of-words” bigrams, |R3|=99108
    R4 – noun and verb keywords represented as binary counts, |R4|=2
    R5 – noun and verb keywords normalized frequency, |R5|=2
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 20. Experiment ADisease-related Document Classification
    ~1500 crawled documents
    Foot-and-mouth disease (FMD)
    Rift valley fever (RVF)
    Focused Crawl Terms
    [foot and mouth disease, FMD, rift valley fever, RVF]
    After labeling - 813 related and 752 non-related docs
    Testing with 10-fold cross validation
    + OR -
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 21. Classification Results: Precision, Recall,F-Measure, Area Under Curve
    Simplified Binary Counts as Features
    Simplified Noun and Verb Frequency as Features
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 22. Classification Results: Accuracy
    Comprehensive “Bag-of-words” Binary Features
    Comprehensive
    “Bag-of-words”, unigrams, bigrams and term frequency features
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 23. Summary (1)
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
    “Bag-of-words” representation gives higher accuracy;
    Generative approaches give the highest accuracy:
    Naïve Bayes together with comprehensive feature representation R3 using bigram as features – 0.97;
    Maximum Entropy classier using unigram “bag-of words” representation R2 – 0.96;
    Maximum Entropy classier using comprehensive binary counts as feature representation R1 – 0.94.
    Normalized term frequency is much better than just binary features.
  • 24. Summary (2)
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 25. Entity Extraction in the Domain of Veterinary Medicine (1)
    Ontology-based Entity Extraction
    Automated Ontology Construction
  • 26. Domain Meta-data
    Domain-independent knowledge
    Domain-specific knowledge
    Location hierarchy
    names of countries, states, cities;
    Time hierarchy
    canonical dates.
    Medical ontology
    diseases, serotypes, and viruses.
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 27. Manually-constructedInitial Ontology
    |OINIT|=429 |OS|=581 |OA|=581 |OS+A|=605
    1. Disease names and fact sheets from Iowa State University Center for Food Security and Public Health (CFSPH):
    • http://www.cfsph.iastate.edu/diseaseinfo/animaldiseaseindex.htm
    2. Word Organization of Animal Health (OIE) Animal Disease Data:
    • http://www.oie.int/eng/maladies/en_alpha.htm
    3. Department for Environmental Food and Rural Affairs, UK (DEFRA):
    • http://www.defra.gov.uk/animalh/diseases/vetsurveillance/az_index.htm
    4. Wikipedia
    • http://en.wikipedia.org/wiki/Animal_diseases
  • Relationship Types
    • Synonymic relationships – “E1 is a kind of E2”
    E1 = “swine influenza” is a kind of E2 = “swine fever”
    • Hyponymic relationships – “E1 and E1 are diseases”
    E1 = “anthrax”, E2 = “yellow fever” are diseases
    • Causal relationships – “E1 is caused by E2”
    E1 = “Ovine epididymitis” is caused by E2 = “Brucella ovis”
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 28. Experiment BOntology-based Entity Extraction
    • 100 unlabeled documents for ontology expansion
    • 29. 100 manually labeled document for entity extraction
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 30. Entity Extraction Results: ROC Curves
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 31. Entity Extraction Results: Learning Curves
    |OG|=754..1238
    |OR|=772..1287
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 32. Entity Extraction in the Domain of Veterinary Medicine (2)
    Sequence Labeling using Syntactic Features
    with Sliding Window
  • 33. Syntactic Feature Extraction
    POS tag
    numeric word-level feature
    Capitalization
    binary word-level feature
    Capitalization inside
    binary word-level feature for identifying abbreviations
    Position in the sentence
    numeric document-level feature
    Position in the document
    numeric document-level feature
    Frequency
    numeric document-level feature
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 34. Sequence Labeling Approach
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 35. An Example of Syntactic Feature Extraction
    “Severe disease in dairy cattle caused by Salmonella Newport”
    POS= [NNP, IN, NNS, VBN, …] = [2, 0, 2, 5, …]
    Xi = [POSi, CAPi, ICAPi, SPOSi, DPOSi, FREQi]
    Xi-3 = [2, 0, 0, 5, 5, 1]
    Xi = [2, 1, 0, 8, 8, 1]
    Xi-2 = [5, 0, 0, 6, 6, 1]
    Newport
    Xi-1 = [0, 0, 0, 7, 7, 1]


    wi
    wi+1
    wi+2
    wi-3
    wi-1
    wi+3
    wi-2
    cattle
    caused
    by
    Salmonella
    Xi+1 = [2, 1, 0, 9, 9, 1]
    Xi+2 = [-1, -1, -1, -1, -1, -1]
    Fi = [Xi, Xi-1, Xi-2, Xi-3, Xi+1, Xi+2, Xi+3], w = 3
    Class = {0, 1}
    Xi+3 = [-1, -1, -1, -1, -1, -1]
    Fi = [2, 1, 0, 8, 8, 1, 0, 0, 0, 7, 7, 1, 5, 0, 0, 6, 6, 1, 2, 0, 0, 5, 5, 1,
    2, 1, 0, 9, 9, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1], Class = [1]
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 36. Experiment CSequence Labeling using Syntactic Features
    100 manually labeled documents from Experiment B
    Number of the disease names is more that 5 per document
    Keep capitalization
    Remove stop words
    202977 examples in the dataset
    80% for training (approx. 160000 examples)
    20% for testing (approx. 40000 examples)
    Results are averaged over 3 runs
    We do not report accuracy because the data set is unbalanced (approx. 8570 positive examples vs. approx. 194430 negative examples)
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 37. Entity Extraction Results: Precision, Recall, AUC (1)
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 38. Entity Extraction Results: Precision, Recall, AUC (2)
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 39. Summary
    BioCaster named entity recognition system
    200 news articles
    F-score – 76.9 for all named entity classes
    SVM and feature window -2/+1 including surface word, orthography, biomedical prefixes/suffixes, lemma, head noun etc.
    DNA, RNA, cell type extraction
    SVM and orthographic features
    F-score – 79.9 during the identification phase and 66.5 during the classification phase;
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 40. Disease-related Event Recognition and Classification (1)
    Sentence-based Event Recognition and Classification
  • 41. Animal Disease-related Event Types
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 42. Event Recognition Methodology
    Step 1. Entity recognition from raw text.
    Step 2. Sentence classification from which entities are extracted as being related to an event or not; if they are related to an event we classify them as confirmed or suspected.
    Step 3. Combination of entities within an event sentence into the structured tuples and aggregation of tuples related to the same event into one comprehensive tuple.
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 43. Step 1.Entity Recognition
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
    Locate and classify atomic elements into predefined categories:
    Disease names:“foot and mouth disease”, “rift valley fever”; viruses: “picornavirus”; serotypes: “Asia-1”;
    Species: “sheep”, “pigs”, “cattle” and “livestock”;
    Locationsof events specified at different levels of geo-granularity: “United Kingdom", “eastern provinces of Shandong and Jiangsu, China”;
    Datesin different formats: “last Tuesday”, “two month ago”.
  • 44. Entity Recognition Tools
    Animal Disease Extractor*
    relies on a medical ontology, automatically-enriched with synonyms and causative viruses.
    Species Extractor*
    pattern matching on a stemmed dictionary of animal names from Wikipedia.
    Location Extractor
    Stanford NER Tool** (uses conditional random fields);
    NGA GEOnet Names Database (GNS)*** for location disambiguation and retrieving latitude/longitude.
    Date/Time Extractor
    set of regular expressions.
    *KDD KSU DSEx - http://fingolfin.user.cis.ksu.edu:8080/diseaseextractor/
    **Stanford NER - http://nlp.stanford.edu/ner/index.shtml
    ***GNS - http://earth-info.nga.mil/gns/html/
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 45. Step 2. Event Sentence Classification
    Constraint: True events should include a disease name together with a status verb from Google Sets* and WordNet** (eliminate event non-related sentences).
    “Foot and mouth disease is[V] a highly pathogenic animal disease”.
    Confirmed status verbs “happened” and verb phrases “strike out”
    “On 9 Jun 2009, the farm's owner reported[V] symptoms of FMD in more than 30 hogs”.
    Suspected status verbs “catch” and verb phrases “be taken in”
    “RVF is suspected[V] in Saudi Arabia in September 2000”.
    *GoogleSets - http://labs.google.com/sets
    **WordNet - http://wordnet.princeton.edu/
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 46. Step 3. Event Tuple Generation
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
    Event attributes:
    disease
    date
    location
    species
    confirmation status
    Event tuple:
    Eventi = < disease; date; location; species; status > =
    <FMD, 9 Jun 2009, Taoyuan, hog, confirmed>
    Event tuple with missing attributes:
    Eventj = <FMD, ?, ?, ?, confirmed>
  • 47. Event Recognition Workflow
    Step 1: Entity Recognition
    Foot-and-mouth disease[DIS]on hog[SP] farm in Taoyuan[LOC].
    Taiwan's TVBS television station reports that agricultural authorities confirmed foot-and-mouth disease[DIS] on a hog[SP] farm in Taoyuan[LOC]. On 9 Jun 2009[DT], the farm's owner reported symptoms of FMD[DIS] in more than 30 hogs[SP]. Subsequent testing confirmed FMD[DIS]. Agricultural authorities asked the farmer to strengthen immunization. The outbreak has not affected other farms. Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks.
    Step 2: Sentence Classification
    YES 1. Foot-and-mouth disease[DIS]on hog[SP] farm in Taoyuan[LOC].
    YES 2.Taiwan's TVBS television station reports that agricultural authorities confirmedfoot-and-mouth disease[DIS]on a hog[SP] farm in Taoyuan[LOC].
    YES 3. On 9 Jun 2009[DT], the farm's owner reported symptoms of FMD[DIS] in more than 30 hogs[SP].
    YES 4. Subsequent testing confirmedFMD[DIS].
    NO 5. Agricultural authorities asked the farmer to strengthen immunization.
    NO 6. The outbreak has not affected other farms.
    NO 7. Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks.
    Step 3a: Tuple Generation
    E1 = <Foot-and-mouth disease, ?, Taoyuan, hog, ?> E3 = <FMD, 9 Jun 2009,?, hog, reported>
    E2 = <Foot-and-mouth disease, ?, Taoyuan, hog, confirmed > E4 = <FMD, ?, ?, ?, confirmed>
    Step 3b: Tuple Aggregation
    E = <disease, date, location, species, status> = <Foot-and-mouth disease, 9 Jun 2009, Taoyuan, hog, confirmed >
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 48. Experiment DEvent Recognition and Classification
    The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010)
    ~100 event-related documents
    Foot-and-mouth disease (FMD)
    Rift valley fever (RVF)
    Manually created 2 sets of summaries for 100 docs
    DUCView Pyramid Scoring Tool* – Score [0..1]
    relies on multiple summaries to assign the significance weights to summarization content units (i.e., entities)
    to compare automatically generated event tuples with entities from human summaries.
    Scorei = < wddisease; wtdate; wllocation; wsspecies; wcstatus… >,
    subject to disease + status = 2
  • 49. Event Score Distribution by Range
    We interpret the Pyramid score values as an event extraction accuracy:
    # of unique contributing entities (TP);
    # of entities not in the summary (FP);
    # of extra contributing entities from summary (FN).
    multiple summaries – majority voting for annotation.
    The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010)
  • 50. Disease-related Event Recognition and Classification (2)
    Event Recognition and Classification in Predictive Epidemiology Domain
  • 51. ENTITY EXTRACTION
    Document 3, sentence s31
    Almost 2000 cattle[SP] are waiting to be slaughtered on 02/28/2001[DATE]since the resurgence of FMD[DIS] in Northumberland[LOC].
    Document 2, sentence s21
    The UK Ministry of Agriculture confirmed on 2/20/01[DATE] that 27 pigs[SP] found with vesicles in an abattoir near Brentwood, Essex[LOC] have FMD[DIS].
    Document 1, sentence s11, s12
    The signs suggested the 27 pigs[SP] could be suffering from foot and mouth disease[DIS] in Anglesey, Wales[LOC].It was reported on 02/18/01[DATE].


    EVENT TUPLE GENERATION
    e11 = [27 pigs, FMD, ?, Anglesey, Wales, “suggest”]
    e12 = [?,?, 02/18/01, ?, “report”]
    e21 = [27 pigs, FMD, 2/20/01, Brentwood, Essex, “confirm”]
    e31 = [2000 cattle, FMD, 2/28/01, Northumberland, “slaughter”]
    EVENT TUPLE CLASSIFICATION
    Susceptible
    Recovered
    Infected
    EVENT TUPLE AGGREGATION
    E2 = [27 pigs, FMD, 2/20/01, Brentwood, Essex, Infected]
    E3 = [2000 cattle, FMD, 2/28/01, Northumberland, Recovered]
    E1= [27 pigs, FMD, 02/18/01, Anglesey, Wales, Susceptible]
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 52. The spread of foot-and-mouth disease outbreak in UK, 2001
    118 ProMed-Mail reports
    yellow - susceptible
    red - infected
    green - recovered
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 53. Summary
    The accuracy of the event recognition depends on the separate entity extraction accuracy
    The event aggregation and deduplication requires much comprehensive heuristics and additional knowledge, for example co-reference resolution
    BioCaster
    950 disease-location pairs per month
    reported results - 887/950 correct disease-location pairs and 0.934 precision
    MedISys/PULS
    100 English-language documents with 156 events
    Reported results – 0.88 precision
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 54. Conclusions, Contributions and Future Work
    Summary:
    1. Disease-related Document Classification
    2. Ontology-based Entity Extraction
    3. Entity Extraction using Sequence Labeling
    4. Event Recognition and Classification
  • 55. Conclusions
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
    Disease-related Document Classification
    supervised framework
    feature representations and classification algorithms
    Ontology-based Domain-specific Entity Extraction
    semantic relationship extraction approach
    sequence labeling using syntactic patterns
    Event Recognition and Classification
    novel sentence-based approach
  • 56. Contributions
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
    Paper “Computational Knowledge and Information Management in Veterinary Epidemiology”
    IEEE Intelligence and Security Informatics Conference (ISI'10), 23-26 May 2010, Vancouver, BC, Canada
    Paper “Animal Disease Event Recognition and Classification”
    First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx'10), WWW Conference, 26-30 April 2010, Raleigh, NC, USA
    Paper “Boosting Biomedical Entity Extraction by Using Syntactic Patterns for Semantic Relation Discovery” (to appear)
    2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI'10), August 31 - September 3, York University, Toronto, Canada
    Poster “Named Entity Recognition and Tagging in the Domain of Epizootics”
    Women in Machine Learning Workshop (WiML'09) Workshop, 6-7 Dec 2009, Vancouver, Canada
    ACM Poster Presentation Competition “Automated Event Extraction and Named Entity Recognition in the Domain of Veterinary Medicine”
    2010 Grace Hopper Celebration of Women in Computing (GHC'10),September 28 - October 1, Atlanta, Georgia, USA
  • 57. Future Work
    Domain-specific Entity Extraction
    multilingual ontology construction using Wikipedia.
    Automated Ontology Construction
    generalize for other named entities
    Event Recognition and Classification
    deeper syntactic analysis
    co-reference resolution
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
  • 58. Acknowledgments
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010
    Faculty:
    Dr. William H. Hsu
    Dr. Doina Caragea
    Dr. Gurdip Singh
    KDD Lab alumni:
    Tim Weninger (crawler deployment) and Jing Xia (rule-based event extraction)
    KDD Lab assistants:
    Information Extraction Team: John Drouhard, Landon Fowles, Swathi Bujuru
    Spatial Data Mining Team: Wesam Elshamy, AndrewBerggren
    Topic Detection & Tracking Team: Danny Jones, Srinivas Reddy
    Fulbright Program supported by the US Department of State's Bureau of Education and Cultural Affairs
  • 59. Thank you!
    Svitlana Volkova, svitlana.volkova@gmail.com
    http://people.cis.ksu.edu/~svitlana
    Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010

×