MS Thesis Short

1,968 views

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,968
On SlideShare
0
From Embeds
0
Number of Embeds
208
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

MS Thesis Short

  1. 1. Laboratory for Knowledge Discovery in Databases<br />Entity Extraction, Animal Disease-related Event Recognition and Classification from Web<br />Presenter: Svitlana Volkova <br />Adviser: William H. Hsu<br />Committee: Dr. Doina Caragea, Dr. Gurdip Singh <br />Supported by: K-State National Agricultural Biosecurity Center (NABC), US Department of Defense<br />
  2. 2. Agenda<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />Background<br />Related Work<br />Framework for Epidemiological Analysis<br />Disease-related Document Classification<br />Domain-specific Entity Extraction<br /><ul><li>Ontology-based Entity Extraction
  3. 3. Sequence Labeling using Syntactic Features</li></ul>Disease-related Event Recognition and Classification<br /> Summary & Future Work<br />
  4. 4. Importance of the Problem<br />influence on the travel and trade<br />cause economic crises, political instability<br />diseases, zoonotic in type can cause loss of life<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  5. 5. Animal Disease Monitoring Systems - Automated Web Services<br />Information retrieval system MedISys - http://medusa.jrc.it/medisys/homeedition/all/home.html<br />Pattern-based Understanding and Learning System (PULS) - http://sysdb.cs.helsinki.fi/puls/jrc/all<br />BioCaster - http://biocaster.nii.ac.jp/<br />HealthMap - http://healthmap.org/en<br />EpiSpider- http://www.epispider.org/<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  6. 6. Limitations of the Existing Systems<br />No timeline visualization (BioCaster)<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  7. 7. Problem Statement<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />Introduce the following features to the framework for the epidemiological analysis:<br />Classification of the disease-related documents collected from different domains<br />Domain-specific entity extraction - animal disease names, viruses, disease serotypes<br />Automated animal disease-related event recognition and classification from unstructured web data<br />
  8. 8. Methodology<br />Suppose we have a document collection D with documents collected from different domains C:<br />news, web pages, scientific papers, medical literature, e-mails.<br />We classify documents into two classes:<br />disease-related documents DR;<br />disease non-related document DNR.<br />We extract a set of events E from every document di in DR for every domain cj. <br />For every event ek in E we extract a set of domain-specific and domain-independent entities:<br />disease, species, location, date, event status.<br />We classify recognized events from E into:<br />two classes – suspected or confirmed;<br />three classes – susceptible, infected or recovered.<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  9. 9. Related Work<br />Approaches for text categorization: supervised, unsupervised and semi-supervised learning and different feature representations: “bag-of-words”, terms frequency, binary features, word bigrams, classification algorithms: lazy learners, decision trees, Naïve Bayes, Maximum Entropy.<br />Entity extraction approaches: gazetteers, regular expressions, Hidden Markov Models and Conditional random Fields; ontology-based biomedical entity extraction.<br />Relation extraction for automated ontology construction works.<br />Animal disease-related event recognition methods. <br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  10. 10. Framework for Epidemiological Analytics<br />Framework for Epidemiological Analysis<br />Main Functional Components<br />Data Collection (Document Relevance Classification) -> Data Sharing -> Search -> Data Analysis (Entity Extraction and Event Recognition) -> Visualization<br />
  11. 11. Advantages of the Designed System<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  12. 12. Phases of Data Processing<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  13. 13. 1. Data Collection (1)<br />Crawl the web using Heritrix crawler - http://crawler.archive.org/<br />set of seeds (ProMED-Mail, DEFRA etc.)<br />set of terms (animal disease names from the ontology) <br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  14. 14. 2. Data Sharing<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />Document relevance classification<br />Relevant<br />Non-relevant<br />
  15. 15. 3. Search<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />Lucene-based* ranking<br />Query-based keyword search<br />Search by animal disease name and/or location<br />*Lucene - http://lucene.apache.org<br />
  16. 16. 4. Data Analysis<br />Event example:<br />“On 12 September 2007, a new foot-and-mouth disease outbreak was confirmed in Egham, Surrey”<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  17. 17. 5. Visualization<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />Map View<br />GoogleMaps API - http://code.google.com/apis/maps/<br />TimeLine View<br />SIMILE API - http://www.simile-widgets.org/timeline/<br />
  18. 18. Disease-related Document Classification<br />Binary Classification using Supervised Learning<br />Feature Representations: “Bag-of-words”, TF, Bigrams<br />Classifiers: Naïve Bayes, MaxEntropy, J48<br />
  19. 19. SupervisedLearningFramework<br />New<br />Documents<br />DTest<br />Feature<br />Representation R1<br />…<br />Feature<br />Representation Rn<br />Learned<br />Model M1<br />…<br />Learned<br />Model Mk<br />Crawled<br />Documents DTrain<br />Classifier<br />Disease Related - DR <br />(processed to the next phases)<br />Disease Non-related – DNR (eliminated from the index)<br />Feature Representations:<br />R1 – “bag-of-words” binary, |R1|=28908<br />R2 – “bag-of-words” term frequency, |R2|=28908<br />R3 – “bag-of-words” bigrams, |R3|=99108<br />R4 – noun and verb keywords represented as binary counts, |R4|=2<br />R5 – noun and verb keywords normalized frequency, |R5|=2<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  20. 20. Experiment ADisease-related Document Classification<br />~1500 crawled documents<br />Foot-and-mouth disease (FMD)<br />Rift valley fever (RVF)<br />Focused Crawl Terms<br />[foot and mouth disease, FMD, rift valley fever, RVF]<br />After labeling - 813 related and 752 non-related docs<br />Testing with 10-fold cross validation <br />+ OR -<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  21. 21. Classification Results: Precision, Recall,F-Measure, Area Under Curve<br />Simplified Binary Counts as Features<br />Simplified Noun and Verb Frequency as Features<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  22. 22. Classification Results: Accuracy<br />Comprehensive “Bag-of-words” Binary Features<br />Comprehensive<br />“Bag-of-words”, unigrams, bigrams and term frequency features<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  23. 23. Summary (1)<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />“Bag-of-words” representation gives higher accuracy;<br />Generative approaches give the highest accuracy: <br />Naïve Bayes together with comprehensive feature representation R3 using bigram as features – 0.97;<br />Maximum Entropy classier using unigram “bag-of words” representation R2 – 0.96;<br />Maximum Entropy classier using comprehensive binary counts as feature representation R1 – 0.94.<br />Normalized term frequency is much better than just binary features.<br />
  24. 24. Summary (2)<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  25. 25. Entity Extraction in the Domain of Veterinary Medicine (1)<br />Ontology-based Entity Extraction<br />Automated Ontology Construction<br />
  26. 26. Domain Meta-data<br />Domain-independent knowledge<br />Domain-specific knowledge<br />Location hierarchy<br />names of countries, states, cities;<br />Time hierarchy<br />canonical dates.<br />Medical ontology<br />diseases, serotypes, and viruses.<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  27. 27. Manually-constructedInitial Ontology<br />|OINIT|=429 |OS|=581 |OA|=581 |OS+A|=605<br />1. Disease names and fact sheets from Iowa State University Center for Food Security and Public Health (CFSPH): <br /><ul><li>http://www.cfsph.iastate.edu/diseaseinfo/animaldiseaseindex.htm</li></ul>2. Word Organization of Animal Health (OIE) Animal Disease Data:<br /><ul><li>http://www.oie.int/eng/maladies/en_alpha.htm</li></ul>3. Department for Environmental Food and Rural Affairs, UK (DEFRA):<br /><ul><li>http://www.defra.gov.uk/animalh/diseases/vetsurveillance/az_index.htm </li></ul>4. Wikipedia<br /><ul><li>http://en.wikipedia.org/wiki/Animal_diseases</li></li></ul><li>Relationship Types<br /><ul><li>Synonymic relationships – “E1 is a kind of E2”</li></ul> E1 = “swine influenza” is a kind of E2 = “swine fever”<br /><ul><li>Hyponymic relationships – “E1 and E1 are diseases”</li></ul> E1 = “anthrax”, E2 = “yellow fever” are diseases<br /><ul><li>Causal relationships – “E1 is caused by E2”</li></ul> E1 = “Ovine epididymitis” is caused by E2 = “Brucella ovis”<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  28. 28. Experiment BOntology-based Entity Extraction<br /><ul><li>100 unlabeled documents for ontology expansion
  29. 29. 100 manually labeled document for entity extraction</li></ul>Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  30. 30. Entity Extraction Results: ROC Curves<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  31. 31. Entity Extraction Results: Learning Curves<br />|OG|=754..1238 <br />|OR|=772..1287<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  32. 32. Entity Extraction in the Domain of Veterinary Medicine (2)<br />Sequence Labeling using Syntactic Features<br /> with Sliding Window <br />
  33. 33. Syntactic Feature Extraction<br />POS tag<br />numeric word-level feature<br />Capitalization<br />binary word-level feature<br />Capitalization inside<br />binary word-level feature for identifying abbreviations<br />Position in the sentence<br />numeric document-level feature<br />Position in the document<br />numeric document-level feature<br />Frequency<br />numeric document-level feature<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  34. 34. Sequence Labeling Approach<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  35. 35. An Example of Syntactic Feature Extraction<br />“Severe disease in dairy cattle caused by Salmonella Newport”<br />POS= [NNP, IN, NNS, VBN, …] = [2, 0, 2, 5, …]<br />Xi = [POSi, CAPi, ICAPi, SPOSi, DPOSi, FREQi]<br />Xi-3 = [2, 0, 0, 5, 5, 1]<br />Xi = [2, 1, 0, 8, 8, 1]<br />Xi-2 = [5, 0, 0, 6, 6, 1]<br />Newport<br />Xi-1 = [0, 0, 0, 7, 7, 1]<br />…<br />…<br />wi<br />wi+1<br />wi+2<br />wi-3<br />wi-1<br />wi+3<br />wi-2<br />cattle <br />caused <br />by <br />Salmonella <br />Xi+1 = [2, 1, 0, 9, 9, 1]<br />Xi+2 = [-1, -1, -1, -1, -1, -1]<br />Fi = [Xi, Xi-1, Xi-2, Xi-3, Xi+1, Xi+2, Xi+3], w = 3<br />Class = {0, 1}<br />Xi+3 = [-1, -1, -1, -1, -1, -1]<br />Fi = [2, 1, 0, 8, 8, 1, 0, 0, 0, 7, 7, 1, 5, 0, 0, 6, 6, 1, 2, 0, 0, 5, 5, 1,<br /> 2, 1, 0, 9, 9, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1], Class = [1]<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  36. 36. Experiment CSequence Labeling using Syntactic Features<br />100 manually labeled documents from Experiment B<br />Number of the disease names is more that 5 per document<br />Keep capitalization<br />Remove stop words<br />202977 examples in the dataset<br />80% for training (approx. 160000 examples)<br />20% for testing (approx. 40000 examples)<br />Results are averaged over 3 runs<br />We do not report accuracy because the data set is unbalanced (approx. 8570 positive examples vs. approx. 194430 negative examples)<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  37. 37. Entity Extraction Results: Precision, Recall, AUC (1)<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  38. 38. Entity Extraction Results: Precision, Recall, AUC (2)<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  39. 39. Summary<br />BioCaster named entity recognition system<br />200 news articles <br />F-score – 76.9 for all named entity classes<br />SVM and feature window -2/+1 including surface word, orthography, biomedical prefixes/suffixes, lemma, head noun etc.<br />DNA, RNA, cell type extraction<br />SVM and orthographic features<br />F-score – 79.9 during the identification phase and 66.5 during the classification phase;<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  40. 40. Disease-related Event Recognition and Classification (1)<br />Sentence-based Event Recognition and Classification<br />
  41. 41. Animal Disease-related Event Types<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  42. 42. Event Recognition Methodology<br />Step 1. Entity recognition from raw text.<br />Step 2. Sentence classification from which entities are extracted as being related to an event or not; if they are related to an event we classify them as confirmed or suspected.<br />Step 3. Combination of entities within an event sentence into the structured tuples and aggregation of tuples related to the same event into one comprehensive tuple.<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  43. 43. Step 1.Entity Recognition<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />Locate and classify atomic elements into predefined categories:<br />Disease names:“foot and mouth disease”, “rift valley fever”; viruses: “picornavirus”; serotypes: “Asia-1”;<br />Species: “sheep”, “pigs”, “cattle” and “livestock”;<br />Locationsof events specified at different levels of geo-granularity: “United Kingdom", “eastern provinces of Shandong and Jiangsu, China”;<br />Datesin different formats: “last Tuesday”, “two month ago”.<br />
  44. 44. Entity Recognition Tools<br />Animal Disease Extractor*<br />relies on a medical ontology, automatically-enriched with synonyms and causative viruses.<br />Species Extractor* <br />pattern matching on a stemmed dictionary of animal names from Wikipedia.<br />Location Extractor<br />Stanford NER Tool** (uses conditional random fields);<br />NGA GEOnet Names Database (GNS)*** for location disambiguation and retrieving latitude/longitude.<br />Date/Time Extractor<br />set of regular expressions.<br />*KDD KSU DSEx - http://fingolfin.user.cis.ksu.edu:8080/diseaseextractor/<br />**Stanford NER - http://nlp.stanford.edu/ner/index.shtml<br />***GNS - http://earth-info.nga.mil/gns/html/<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  45. 45. Step 2. Event Sentence Classification <br />Constraint: True events should include a disease name together with a status verb from Google Sets* and WordNet** (eliminate event non-related sentences).<br />“Foot and mouth disease is[V] a highly pathogenic animal disease”.<br />Confirmed status verbs “happened” and verb phrases “strike out”<br />“On 9 Jun 2009, the farm's owner reported[V] symptoms of FMD in more than 30 hogs”.<br />Suspected status verbs “catch” and verb phrases “be taken in”<br />“RVF is suspected[V] in Saudi Arabia in September 2000”.<br /> *GoogleSets - http://labs.google.com/sets<br /> **WordNet - http://wordnet.princeton.edu/<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  46. 46. Step 3. Event Tuple Generation<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />Event attributes:<br />disease<br />date<br />location<br />species<br />confirmation status<br />Event tuple:<br />Eventi = < disease; date; location; species; status > = <br /> <FMD, 9 Jun 2009, Taoyuan, hog, confirmed><br />Event tuple with missing attributes:<br />Eventj = <FMD, ?, ?, ?, confirmed><br />
  47. 47. Event Recognition Workflow<br />Step 1: Entity Recognition<br />Foot-and-mouth disease[DIS]on hog[SP] farm in Taoyuan[LOC]. <br />Taiwan's TVBS television station reports that agricultural authorities confirmed foot-and-mouth disease[DIS] on a hog[SP] farm in Taoyuan[LOC]. On 9 Jun 2009[DT], the farm's owner reported symptoms of FMD[DIS] in more than 30 hogs[SP]. Subsequent testing confirmed FMD[DIS]. Agricultural authorities asked the farmer to strengthen immunization. The outbreak has not affected other farms. Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks.<br />Step 2: Sentence Classification<br />YES 1. Foot-and-mouth disease[DIS]on hog[SP] farm in Taoyuan[LOC]. <br />YES 2.Taiwan's TVBS television station reports that agricultural authorities confirmedfoot-and-mouth disease[DIS]on a hog[SP] farm in Taoyuan[LOC]. <br />YES 3. On 9 Jun 2009[DT], the farm's owner reported symptoms of FMD[DIS] in more than 30 hogs[SP]. <br />YES 4. Subsequent testing confirmedFMD[DIS].<br />NO 5. Agricultural authorities asked the farmer to strengthen immunization.<br />NO 6. The outbreak has not affected other farms. <br />NO 7. Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks.<br />Step 3a: Tuple Generation<br />E1 = <Foot-and-mouth disease, ?, Taoyuan, hog, ?> E3 = <FMD, 9 Jun 2009,?, hog, reported><br />E2 = <Foot-and-mouth disease, ?, Taoyuan, hog, confirmed > E4 = <FMD, ?, ?, ?, confirmed><br />Step 3b: Tuple Aggregation<br />E = <disease, date, location, species, status> = <Foot-and-mouth disease, 9 Jun 2009, Taoyuan, hog, confirmed > <br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  48. 48. Experiment DEvent Recognition and Classification<br />The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010) <br />~100 event-related documents<br />Foot-and-mouth disease (FMD)<br />Rift valley fever (RVF)<br />Manually created 2 sets of summaries for 100 docs<br />DUCView Pyramid Scoring Tool* – Score [0..1]<br />relies on multiple summaries to assign the significance weights to summarization content units (i.e., entities)<br />to compare automatically generated event tuples with entities from human summaries.<br />Scorei = < wddisease; wtdate; wllocation; wsspecies; wcstatus… >,<br />subject to disease + status = 2<br />
  49. 49. Event Score Distribution by Range<br />We interpret the Pyramid score values as an event extraction accuracy:<br /># of unique contributing entities (TP);<br /># of entities not in the summary (FP);<br /># of extra contributing entities from summary (FN).<br />multiple summaries – majority voting for annotation.<br />The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010) <br />
  50. 50. Disease-related Event Recognition and Classification (2)<br />Event Recognition and Classification in Predictive Epidemiology Domain<br />
  51. 51. ENTITY EXTRACTION<br />Document 3, sentence s31<br />Almost 2000 cattle[SP] are waiting to be slaughtered on 02/28/2001[DATE]since the resurgence of FMD[DIS] in Northumberland[LOC].<br />Document 2, sentence s21<br />The UK Ministry of Agriculture confirmed on 2/20/01[DATE] that 27 pigs[SP] found with vesicles in an abattoir near Brentwood, Essex[LOC] have FMD[DIS].<br />Document 1, sentence s11, s12<br />The signs suggested the 27 pigs[SP] could be suffering from foot and mouth disease[DIS] in Anglesey, Wales[LOC].It was reported on 02/18/01[DATE].<br />…<br />…<br />EVENT TUPLE GENERATION<br />e11 = [27 pigs, FMD, ?, Anglesey, Wales, “suggest”]<br />e12 = [?,?, 02/18/01, ?, “report”]<br />e21 = [27 pigs, FMD, 2/20/01, Brentwood, Essex, “confirm”]<br />e31 = [2000 cattle, FMD, 2/28/01, Northumberland, “slaughter”]<br />EVENT TUPLE CLASSIFICATION<br />Susceptible<br />Recovered<br />Infected<br />EVENT TUPLE AGGREGATION<br />E2 = [27 pigs, FMD, 2/20/01, Brentwood, Essex, Infected]<br />E3 = [2000 cattle, FMD, 2/28/01, Northumberland, Recovered]<br />E1= [27 pigs, FMD, 02/18/01, Anglesey, Wales, Susceptible]<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  52. 52. The spread of foot-and-mouth disease outbreak in UK, 2001<br />118 ProMed-Mail reports<br />yellow - susceptible<br />red - infected<br />green - recovered<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  53. 53. Summary<br />The accuracy of the event recognition depends on the separate entity extraction accuracy<br />The event aggregation and deduplication requires much comprehensive heuristics and additional knowledge, for example co-reference resolution<br />BioCaster<br />950 disease-location pairs per month<br />reported results - 887/950 correct disease-location pairs and 0.934 precision<br />MedISys/PULS<br />100 English-language documents with 156 events<br />Reported results – 0.88 precision<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  54. 54. Conclusions, Contributions and Future Work<br />Summary:<br /> 1. Disease-related Document Classification<br /> 2. Ontology-based Entity Extraction<br /> 3. Entity Extraction using Sequence Labeling<br /> 4. Event Recognition and Classification<br />
  55. 55. Conclusions<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />Disease-related Document Classification<br />supervised framework<br />feature representations and classification algorithms<br />Ontology-based Domain-specific Entity Extraction<br />semantic relationship extraction approach<br />sequence labeling using syntactic patterns<br />Event Recognition and Classification<br />novel sentence-based approach<br />
  56. 56. Contributions<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />Paper “Computational Knowledge and Information Management in Veterinary Epidemiology”<br />IEEE Intelligence and Security Informatics Conference (ISI'10), 23-26 May 2010, Vancouver, BC, Canada<br />Paper “Animal Disease Event Recognition and Classification”<br />First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx'10), WWW Conference, 26-30 April 2010, Raleigh, NC, USA<br />Paper “Boosting Biomedical Entity Extraction by Using Syntactic Patterns for Semantic Relation Discovery” (to appear)<br />2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI'10), August 31 - September 3, York University, Toronto, Canada<br />Poster “Named Entity Recognition and Tagging in the Domain of Epizootics”<br />Women in Machine Learning Workshop (WiML'09) Workshop, 6-7 Dec 2009, Vancouver, Canada<br />ACM Poster Presentation Competition “Automated Event Extraction and Named Entity Recognition in the Domain of Veterinary Medicine”<br />2010 Grace Hopper Celebration of Women in Computing (GHC'10),September 28 - October 1, Atlanta, Georgia, USA<br />
  57. 57. Future Work<br />Domain-specific Entity Extraction<br />multilingual ontology construction using Wikipedia.<br />Automated Ontology Construction<br />generalize for other named entities<br />Event Recognition and Classification<br />deeper syntactic analysis <br />co-reference resolution<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />
  58. 58. Acknowledgments<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />Faculty: <br />Dr. William H. Hsu<br />Dr. Doina Caragea<br />Dr. Gurdip Singh<br />KDD Lab alumni: <br />Tim Weninger (crawler deployment) and Jing Xia (rule-based event extraction)<br />KDD Lab assistants:<br />Information Extraction Team: John Drouhard, Landon Fowles, Swathi Bujuru<br />Spatial Data Mining Team: Wesam Elshamy, AndrewBerggren<br />Topic Detection & Tracking Team: Danny Jones, Srinivas Reddy<br />Fulbright Program supported by the US Department of State's Bureau of Education and Cultural Affairs<br />
  59. 59. Thank you!<br />Svitlana Volkova, svitlana.volkova@gmail.com<br />http://people.cis.ksu.edu/~svitlana<br />Thesis "Entity Extraction, Animal Disease-related Event Recognition and Classification from Web", July 30 2010<br />

×