+                             Computer-              Query user-                                aided                  def...
+                                                                                   2    Index      Brief     CV        ...
+                                                                                   3    Brief CVResearch Group in Computa...
+                                                                                   6    Teaching experience     Software...
+                                                                                   7    Knowledge integrationResearch Gro...
+ Specific Domain Knowledge source. UMLS (I)                                                                8             ...
+ Specific Domain Knowledge source. UMLS (II)                                                  9    Project NLM Unified Me...
+ Specific Domain Knowledge source. UMLS (III)                                      10     Metathesaurus: very    large, ...
+ Specific Domain Knowledge source. UMLS (IV)                                       11                                    ...
+ General Domain Knowledge Source: Freebase (I)       Freebase is a large public database that collects three kinds of   ...
+ General Domain Knowledge Source: Freebase (II)     Freebase connects entities together as a graph,       defines  its ...
+ General Domain Knowledge Source: Freebase (III)  The Schema  Schema (the way Freebases data is laid out) is expressed th...
+ General Domain Knowledge Source: Freebase (III)  The Schema  Schema (the way Freebases data is laid out) is expressed th...
+ General Domain Knowledge Source: Freebase (III)  The Schema  Schema (the way Freebases data is laid out) is expressed th...
+ General Domain Knowledge Source: Freebase (III)  The Schema  Schema (the way Freebases data is laid out) is expressed th...
+ General Domain Knowledge Source: Freebase (IV)  The Schema: Medicine
+ General Domain Knowledge Source: Freebase (V)  How can we use it…      As a reference or information source       Crea...
+ General Domain Knowledge Source: Freebase (IV)  The Freebase approach
+ MQL (Metaweb Query Language)•  http://api.freebase.com/api/service/mqlread?query={"query":{"type":"/   music/artist","na...
+                                                                                   22    Knowledge integrationResearch Gr...
+                                                                                   23    Experiences in Automatic summari...
+                                                                                   24    Experiences in Automatic summari...
+                                                                                   25    Experiences in Automatic summari...
+                                                                                   26    Experiences in Automatic summari...
+                                                                                   27    Experiences in Automatic summari...
+                                                                                   28    Experiences in Automatic summari...
+                                                                                                                    29   ...
+                                                                                   30    Knowledge integrationResearch Gr...
+                                                                                    31    Experiences in Computer-aided  ...
+                                                                                   32    Experiences in Computer-aided   ...
+                                                                                   33    Knowledge integrationResearch Gr...
+                                                                                   34    Experiences in Information Retri...
Search	  and	  Informa.on	  Retrieval	                                                         Our	  implementa.on	       ...
Search	  and	  Informa.on	  Retrieval	                                                  Our	  implementa.on	  (and	  II)	 ...
Clustering	                                                      Our	  implementa.on	          38    Weka for Clustering  ...
Clustering	                                              Why	  Simple-­‐K-­‐Means?	   Clustering algorithm:  Simple-K-Mean...
Visualiza.on	  on	  Mobile	  Devices	                     Our	  interface	  Cancer skin                                   ...
+                                                                                   41    Knowledge integrationResearch Gr...
+    Experiences in Information Retrieval    and Query user-defined expansion (I)      Userhave problems to define their ...
+                                                                                   43    Experiences in Information Retri...
+    Experiences in Information Retrieval    and Query user-defined expansion (II)    How does it works?      Pre-retriev...
+                                                                                   45    Research: Information Retrieval ...
+                                                                                                       46    Tools knowns...
Upcoming SlideShare
Loading in...5
×

Experiences on integrating explicit knowledge on information access tools in the medical domain

3,189

Published on

Talk at the Research Institute in Information and Language Processing

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,189
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Experiences on integrating explicit knowledge on information access tools in the medical domain

  1. 1. + Computer- Query user- aided defined summarization expansion Post-retrieval Extractive clustering Summarization Experiences on integrating explicit knowledge on information access tools in the medical domain Manuel de la Villa Department of Information Technologies University of Huelva
  2. 2. + 2 Index   Brief CV   Why a research stay? In Wolverhampton?   Teaching  Integrating explicit knowledge on information access tools  Knowledge sources (UMLS & Freebase)  Automatic Text Summarization  Information RetrievalResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  3. 3. + 3 Brief CVResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  4. 4. + 6 Teaching experience  Software Engineering  Process and Methodologies, Metrics, Requirements analysis, Design, …  Software Engineering Lab (UML, NetBeans, Subversion, Java, JUnit, Persistence…)  Multimedia applications development  Adobe Director, Flash, Photoshop, Premiere  Sony Sound Forge, AudacityResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  5. 5. + 7 Knowledge integrationResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  6. 6. + Specific Domain Knowledge source. UMLS (I) 8 ICD-10 LOINC SNOMED-CT UK-Clinical Terms UMLS MeSH DSM-IV … Gene Ontology RxNormAn homogeneus group of terminologies A saturation of different terminologies UMLS aims to overcome a significant barrier, the variety of ways the same concepts are expressed in different machine-readable sources. Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  7. 7. + Specific Domain Knowledge source. UMLS (II) 9 Project NLM Unified Medical Language System (UMLS):   Aim, to develop tools that help researchers in the knowledge representation, retrieval and integration of biomedical information.   UMLS Knowledge Sources ‫‏‬   Software tools Three main components: SPECIALIST Lexicon: Compilation of lexical elements (>200.000) with grammatical information and linguistic variants. “Anaesthetic” “Anaesthetic” {base=anesthetic {base=anesthetic spelling_variant=anaesthetic spelling_variant=anaesthetic entry=E0330018 cat=noun entry=E0330019 cat=adj variants=reg variants=uncount } variants=inv position=attrib(3) position=pred stative }Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  8. 8. + Specific Domain Knowledge source. UMLS (III) 10   Metathesaurus: very large, multi-purpose, and multi-lingual vocabulary database (compiles more than 100 source vocabularios), https://uts.nlm.nih.gov/metathesaurus.html   every term (>5M) associated with a concept (>1.5M), terms related (e.g., synonyms) (16M relations)   each concept assigned to one or more semantic types of the 135 existing Different terms… for a same concept… Included in a semantic typeResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  9. 9. + Specific Domain Knowledge source. UMLS (IV) 11 https://uts.nlm.nih.gov/semanticnetwork.html  UMLS Semantic Network: is an ontology with 135 semantic types and to 54 types of relationships between typesResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  10. 10. + General Domain Knowledge Source: Freebase (I)    Freebase is a large public database that collects three kinds of information:  data;  texts ; and  media , that references…   …entities or topics (≈ 12 million). An entity is a unique single person, place, or thing.  A single concept or real-world thing.  A topic could also be called an entity, resource or element or thing, it is a fundamental unit in Freebase.  /common/topic  Each topic has a Guid or globally unique ID  http://www.freebase.com/view/en/barack_obama  http://www.freebase.com/guid/9202a8c04000641f800000000029c277
  11. 11. + General Domain Knowledge Source: Freebase (II)   Freebase connects entities together as a graph,  defines its data structure as a set of nodes and a set of links that establish relationships between the nodes.   Most of our topics are associated with one or more types (such as people, places, books, films, etc) and may have additional properties like "date of birth" for a person or latitude and longitude for a location. These types and properties and related concepts are called Schema.
  12. 12. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebases data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  13. 13. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebases data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  14. 14. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebases data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  15. 15. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebases data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  16. 16. + General Domain Knowledge Source: Freebase (IV) The Schema: Medicine
  17. 17. + General Domain Knowledge Source: Freebase (V) How can we use it…   As a reference or information source   Create interesting Views and Visualizations and share them with others   Embed Freebase data in your website   Use our API or Acre, our hosted app development platform, to build apps that use Freebase data   Download our Data dumps  Use Freebases RDF for Semantic Web applications
  18. 18. + General Domain Knowledge Source: Freebase (IV) The Freebase approach
  19. 19. + MQL (Metaweb Query Language)•  http://api.freebase.com/api/service/mqlread?query={"query":{"type":"/ music/artist","name":"U2","album":[]}}•  http://api.freebase.com/api/service/mqlread?query={"query": [{"type":"/medicine/disease", "name":null, "symptoms": {"name":"Nausea"}}]}•  Query Editor
  20. 20. + 22 Knowledge integrationResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  21. 21. + 23 Experiences in Automatic summarization (I)+ We develop a proposal with this main characteristics:  Sentences extraction  Document representation as a graph  Centered on biomedical concepts  Using concept frequency to measure relevanceResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  22. 22. + 24 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selectedResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  23. 23. + 25 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selectedResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  24. 24. + 26 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selectedResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  25. 25. + 27 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selectedResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  26. 26. + 28 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selectedResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  27. 27. + 29 Automatic Summarization. Evaluation   Evaluation with ROUGE (based on n-grams) against generic summarizers   Our method obtains good results, specially with small n-grams de la Villa, M., Maña, M. “Propuesta y evaluación de un método de generación de resúmenes extractivo basado en conceptos en el ámbito biomédico”. XXV edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural 2009 (SEPLN´09) San Sebastián (Sept-2009).Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  28. 28. + 30 Knowledge integrationResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  29. 29. + 31 Experiences in Computer-aided summarization(I)   Computer-aided summarization combines automatic and human summarization.   The CAS system suggest an initial summary, selecting relevant sentences   The human can change the sentences selection and edit manually the summary.   Purpose: construction of a Gold-Standard building assistant.   Novelty: Considering biomedical concepts distribution (Reeve et al., 2006)Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  30. 30. + 32 Experiences in Computer-aided summarization(and II)Experience in the design and construction of a Gold-Standard building assistant (or Computer- aided summarization)Considering biomedical concepts distribution (Reeve et al., 2006) -Client-server app -Centralized repository -Supports PDF, XMLResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  31. 31. + 33 Knowledge integrationResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  32. 32. + 34 Experiences in Information Retrieval and Post-retrieval clustering Experience in the design and construction of an information retrieval system with: •  ost-retrieval clustering, P •  rientation to biomedical o documents and •  obile devices mResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  33. 33. Search  and  Informa.on  Retrieval   Our  implementa.on   36 Document sources: Biomed Central (web crawling in progress) Text Processing: lowercasing, stemming, stop-words ,… Lucene for indexing…Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  34. 34. Search  and  Informa.on  Retrieval   Our  implementa.on  (and  II)   37 … and Lucene for searchingResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  35. 35. Clustering   Our  implementa.on   38 Weka for Clustering The post-processing clustering is to associate, according to their similarity, a set of documents retrieved from a query in different subsetsResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011 38  
  36. 36. Clustering   Why  Simple-­‐K-­‐Means?   Clustering algorithm: Simple-K-Means vs Expectation Maximization Algorithms     Simple-­‐K-­‐means   EM  Querys  (Documents)   Ligaments  (10)   1   2   Cancer  Skin  (25)   4   12   Cancer  (46)   5   26   Disease  (62)   8   57   Time it takes to perform the grouping in seconds K? It depends on the number of documents retrieved. 39  
  37. 37. Visualiza.on  on  Mobile  Devices   Our  interface  Cancer skin 40  
  38. 38. + 41 Knowledge integrationResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  39. 39. + Experiences in Information Retrieval and Query user-defined expansion (I)   Userhave problems to define their information needs in a query string (Jansen, Spink y Koshman, 2007).   Queries containe less than three terms (75,2%) and the majority of queries contained one (18,5%), two (32,2%)   Methods to improve (expand) query:   Relevance feedback.   Local analysis or global analysis.   Natural Language Processing Resources.   Experiments with users show the preferences of these to maintain control over how the query is reformulated (Belkin et al., 2001).
  40. 40. + 43 Experiences in Information Retrieval and Query user-defined expansion (II)   Experience on using Ontologies to assist the definition of the search string… previoslyResearch Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  41. 41. + Experiences in Information Retrieval and Query user-defined expansion (II) How does it works?   Pre-retrieval   Construction o f the Graph
  42. 42. + 45 Research: Information Retrieval (and III)   … or using Ontologies to build an enriched concept graph that assist the definition of the search string http://www.uhu.es/manuel.villa/viewmed/ de la Villa, M., Garcia, S., Maña, M. “¿De verdad sabes lo que quieres buscar? Expansión guiada visualmente de la cadena de búsqueda usando ontologías y grafos de conceptos”. XXVII edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural 2011 (SEPLN´11) Huelva (Sept-2011).Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  43. 43. + 46 Tools knowns. Expectations.   UMLS:   Metathesaurus, Semantic Network   Ioffer my collaboration if   Tools: you’re interested in using   Metamap, any of these resources   MMTx API,   I’mopen to collaborate on   Semrep whatever task you   UTS Web Services, … consider related and…   Freebase   … to receive some   MQL (Metaweb Query Language) guidelines to improve summarization method   Newbie with UIMA & GATE Any questions?Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×