Presentación en IDEAL 2008


Published on

Presentación de nuestro artículo de traducción+MetaMap en IDEAL 2008

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Presentación en IDEAL 2008

  1. 1. Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Francisco Carrero 1,2 ; José Carlos Cortizo 1,2 ; José Mª Gómez 3 1 Wipley, Social Gaming Platform 2 Universidad Europea de Madrid 3 Optenet
  2. 2. Outline The MIRCAT project The challenge English MetaMap, a big effort Approaching a Spanish MetaMap Experiments Discussion of the Results and Future Work Francisco Carrero Garcia
  3. 3. The MIRCAT Project The Interface Francisco Carrero Garcia
  4. 4. The MIRCAT Project System’s Architecture Francisco Carrero Garcia
  5. 5. The Challenge Our Goal English docs Medical record Spanish docs Francisco Carrero Garcia
  6. 6. The Challenge The problem We can extract UMLS concepts from English texts using MetaMap... ...but there is no Spanish version of MetaMap Is it difficult to construct a tool like MetaMap? Francisco Carrero Garcia
  7. 7. English MetaMap A big Effort ∼3 years!! Francisco Carrero Garcia
  8. 8. Approaching Spanish MetaMap Two Main Approaches Considered Francisco Carrero Garcia
  9. 9. Approaching Spanish MetaMap Our Approach: Translation and Reuse Optional Francisco Carrero Garcia
  10. 10. Experimental Design Text Collections MedLine Plus medical News Excellent online resource 2000 news, some in English, some in Spanish 600 available in both languages Francisco Carrero Garcia
  11. 11. Experiments Experimental Design MetaMap extracts concepts, allowing multiple representations A => Using compound concepts B => simple concepts 1 => resolves ambiguity by adding all the concepts 2 => ignores ambiguities by choosing the first possibility 4 representations: A1, A2, B1, B2 Francisco Carrero Garcia
  12. 12. Experiments Filtering Data representations containing a lot of features do not usually perform very well in text tasks Many classifiers degrade in prediction accuracy when faced with many irrelevant features or redundant/correlated ones (“curse of dimensionality”) We apply Zipf’s Law to filter the attributes Francisco Carrero Garcia
  13. 13. Experiments Results Number of concepts for each representation Francisco Carrero Garcia
  14. 14. Experiments Results Average Similarities Francisco Carrero Garcia
  15. 15. Experiments Results Last Experiments (not in IDEAL paper) Francisco Carrero Garcia
  16. 16. Discussion of the Results Translation The worst results (similarity) are achieved with the most complex (near to humans) representation: A1 B1 is less complex and produces the best results => Our model seems to be more suitable as a plain bag-of- concepts representation Similar to bag-of-words representation, widely used in text processing tasks Francisco Carrero Garcia
  17. 17. Discussion of the Results Classification All results are comparable to classification on original English texts In some cases, are even better Best results using A2+Zipf, +7.8% in AUC UNMKD representations never achieves worse classifications than English Francisco Carrero Garcia
  18. 18. Conclussions and Future Work The “easy way” to construct a Spanish MetaMap is promising Google Translation seems a good tool to adapt English resources to any other languages (like Spanish) We should try other translation tools We are working on applying this approach to other text tasks (like Information Retrieval and Filtering) Francisco Carrero Garcia
  19. 19. Ending... Thank you very much for your attention Francisco Carrero Garcia
  20. 20. Any Question? Francisco Carrero Garcia