Porting the QALL-ME framework to Romanian

1,211 views

Published on

Invited talk at Processing ROmanian in Multilingual, Interoperational and Scalable Environments (PROMISE 2010) on how to port the QALL-ME framework to a new language

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,211
On SlideShare
0
From Embeds
0
Number of Embeds
221
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Porting the QALL-ME framework to Romanian

  1. 1. Porting the QALL-ME framework to Romanian Constantin Or˘san a Research Group in Computational Linguistics Research Institute in Information and Language Processing University of Wolverhampton http://www.wlv.ac.uk/~in6093/ 29th March 2010
  2. 2. 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  3. 3. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  4. 4. Need to access information • as a result of the Internet development more and more information becomes available • this information is in many languages • fields from computational linguistics such as automatic summarisation, question answering, text mining, etc. can help people deal with information
  5. 5. Need to access information • as a result of the Internet development more and more information becomes available • this information is in many languages • fields from computational linguistics such as automatic summarisation, question answering, text mining, etc. can help people deal with information
  6. 6. Question answering (QA) • Question answering aims at identifying the answer to a question in a large collection of documents • the information provided by QA is more focused than information retrieval • the output can be the exact answer or a text snippet which contains the answer • the domain took off as a result of the introduction of QA track in TREC, whilst cross-lingual QA as a result of CLEF
  7. 7. Types of QA systems • open-domain QA systems: can answer any question from any collection + can potentially answer any question - very low accuracy (especially in cross-lingual settings)
  8. 8. Types of QA systems • open-domain QA systems: can answer any question from any collection + can potentially answer any question - very low accuracy (especially in cross-lingual settings) • canned QA systems: rely on a very large repository of questions for which the answer is known + very little processing necessary - limited to the answers in the database
  9. 9. Types of QA systems • open-domain QA systems: can answer any question from any collection + can potentially answer any question - very low accuracy (especially in cross-lingual settings) • canned QA systems: rely on a very large repository of questions for which the answer is known + very little processing necessary - limited to the answers in the database • closed-domain QA systems: are built for very specific domains and exploit expert knowledge in them + very high accuracy - can require extensive language processing and limited to one domain
  10. 10. Purpose of the presentation • briefly present the QALL-ME project
  11. 11. Purpose of the presentation • briefly present the QALL-ME project • show how it was adapted to answer questions in Romanian about movies
  12. 12. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  13. 13. The QALL-ME project • QALL-ME = Question Answering Learning technologies in a multiLingual and Multimodal Environment • EU-funded project part of FP6 • 7 partners: • FBK-irst, Italy • University of Wolverhampton, UK • University of Alicante, Spain • DFKI, Germany • Comdata, Italy • UbiEST, Italy • WayCom, Italy • Web page: http://qallme.fbk.eu
  14. 14. The QALL-ME project • aimed at establishing a shared infrastructure for multilingual and multimodal QA in the domain of tourism • In the QALL-ME system • users ask natural language questions in several languages (both in textual and speech modality) using a variety of input devices (e.g. mobile phones), and • returns a list of specific answers formatted in the most appropriate modality, ranging from small texts, maps, videos, and pictures.
  15. 15. Local Information  Semantic  Sources representation Service Provider English Answer  German Answer  Extractor Extractor QALL­ME central  QA planner Spanish Answer  Italian Answer  Extractor Extractor Question Type  Answer Type  Speech  Dialog Models ontology ontology Recognizers
  16. 16. Main outputs of the project • an ontology for the domain of tourism • entailment based QA framework • the QALL-ME benchmark • an entailment framework (all accessible from the project’s web page: http://qallme.fbk.eu)
  17. 17. The ontology • A domain-specific ontology for the tourism domain was developed and shared among all the partners. • The ontology was used to serve as: • bridge between different languages • communication language between different components of the system • The ontology was linked to domain independent ontologies such as MultiWordNet and Sumo • For more information see (Ou et al., 2008)
  18. 18. Design of the ontology • Analysis of data from content providers • Analysis of users requirements • Inspired by similar ontologies: • Harmonise and eTourism: focus on static information (e.g. accommodation and events/activities) • Similar to eTourism as is written in OWL rather RDFs • but wider coverage • Introspection
  19. 19. The ontology • Main classes: Country, Destination, Site (i.e. Accommodation, Attraction, Gastro, and Infrastructure), Transportation, EventContent and Event • Element classes: Facility, Room, PersonOrganization, Language, and Currency • Attribute classes: Contact, Location, Period and Price. • Element and attribute classes cannot exist independently and have to be attached to other main or element classes
  20. 20. Price Site GPSCoordinate priceType hasGPSCoordinate subClassOf subClassOf PostalAddress priceValue Event hasPostalAddress TicketPrice Cinema DirectionLocation hasCurrency subClassOf DirectionLocation Currency isInSite hasPrice hasContact name description Contact hasSiteFacility MovieShow hasRoom CinemaRoom SiteFacility Period EventContent hasRoomFacility endTime startTime hasPeriod hasEventContent RoomFacility subClassOf subClassOf TimePeriod Director hasTimePeriod hasDirector DateTimePeriod Movie hasProducer Producer hasDatePeriod hasStar DatePeriod hasWriter Star name certificate endDate startDate synposis genre Writer
  21. 21. The ontology • Encoded using OWL DL, since it has more expressive power than OWL Lite and has more efficient reasoning support than OWL Full • Used Protege-OWL as the editor and RacerPro7 as the reasoner • The ontology contains • 122 classes (concepts), • 55 datatype properties and • 52 object properties which indicate the relationships among the 122 classes. • 15 top-level classes. • The class hierarchy has a maximum depth of 4.
  22. 22. The QALL-ME framework • is an architecture skeleton for multilingual QA systems for closed domains • designed in such a way that it allows fast development of closed domain QA systems • freely available from http://qallme.sourceforge.net/ • is based on a Service Oriented Architecture (SOA) which is realised using web services • relies on textual entailment recognisers
  23. 23. Web services 1 Context providers: are used to anchor questions in space and time 2 Annotators: Currently three types of annotators are available: • named entity annotators which identify names of cinemas, movies, persons, etc. • term annotators which identify hotel facilities, movie genres and other domain-specific terminology • temporal annotators that are used to recognise and normalise temporal expressions in user questions 3 Entailment engine: determines whether a user question entails a retrieval procedure 4 Query generator: which relies on an entailment engine to generate a query to extract the answer. 5 Answer pool: retrieves the answers from a database.
  24. 24. Context providers • are used to anchor a question in space and time • return the current position and time • used by the presentation module when maps are displayed • used by temporal process to normalise temporal entities • determines which services are used in a cross-lingual scenario • can be static or determined from a mobile phone
  25. 25. Named entity and term annotators • named entity recogniser = identifies names of hotels, movies, persons, etc. • term annotator = identifies domain specific terms such as hotel facilities, movie genres, etc. • the entities and terms are known, so the task is reduced to a database look up • Gazetteers are the main source for determining the entities • The annotation module needs to determine the canonical form of a entity • greedy algorithm that uses character based similarity, a modified TF*IDF and a greedy algorithm • does not allow overlapping and there are few ambiguities
  26. 26. Named entity and term annotators • Annotates both standard and non-standard entities: cinema, movie, location, genre, certificate • Needs to deal with nosy input: • misspelt words/input from ASR engines/SMS input e.g. becaming Jane, becoming Jade • free word order (Will Smith / Smith, Will) • equivalent strings (saw III / three / 3; Smith, Will / Smith, W.) • Needs to deal with questions in mixed languages • Needs to deal with ambiguous entities
  27. 27. Temporal annotator • questions from the domain of tourism contain a large number of temporal expressions • we use a simplified version of the tagger implemented by Pu¸ca¸u (2004) s s • the simplification was done to reduce the processing time (Varga, Pu¸ca¸u, and Or˘san, 2009) s s a • identifies both self-contained temporal expressions (TEs) and indexical/under-specified TEs • uses TIMEX2 standard • the output is used by TIMEX2SPARQL service to restrict the extracted answers
  28. 28. Entailment engine • often closed-domain QA systems transform a question to a Prolog fact or SQL query • often this solution works only partially due to language variability • in QALL-ME this problem is solved using textual entailment • the entailment engine determines whether two questions entail the same meaning so they share the same retrieval procedure: • T the input question • H is textual pattern stored in a repository • textual patterns have SPARQL retrieval procedures • we calculate the similarity between two sentences to determine whether between them there is an entailment relation
  29. 29. Query generation service • produces a SPARQL query that can be used to answer the question • has a list of question templates with their associated SPARQL queries • relies on the entailment engine to determine which of the question patterns entail the same meaning as the user question • fills in the slots of the question patterns
  30. 30. Example User question (T): What movie can I see tonight in Wolverhampton? List of patterns (H): • Who is the director of [MOVIE]? • Where can I see [MOVIE] [TIMEX]? • What movies are on in [DESTINATION] [TIMEX]? • What is the address of [CINEMA]? • ...
  31. 31. Example User question (T): What movie can I see tonight in Wolverhampton? → What movie can I see [TIMEX] in [DESTINATION]? List of patterns (H): • Who is the director of [MOVIE]? • Where can I see [MOVIE] [TIMEX]? • What movies are on in [DESTINATION] [TIMEX]? • What is the address of [CINEMA]? • ... Select the retrieval pattern associated with the question What movies are on in Wolverhampton tonight
  32. 32. Answer Pool service • takes the SPARQL query generated by the query generator and extracts the answer • SPARQL is a query language for accessing RDF graphs by the W3C RDF Data Access Working Group • SPARQL provides interoperability between languages
  33. 33. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  34. 34. Cross-lingual QA • QALL-ME tourism prototype is design to allow both monolingual and cross-lingual QA • relevant web services are activated depending on the source and target language • user scenario: Romanian tourist in UK who wants to find out more about the movies in Wolverhampton
  35. 35. Cross-lingual QA
  36. 36. Prototype for Romanian • we wanted to find out how long it takes to develop a demo for Romanian • components had to be adapted: • named entity and term annotators had to be trained on a different list of entities • a simple temporal annotator was implemented on the basis of the English one • the language independent similarity entailment engine was used • the question patterns were translated to Romanian • answer pool did not required any change • the whole process took under one week
  37. 37. Romanian demo http://qallme.wlv.ac.uk: 8080/QALL-ME-web-demo/index.jsp
  38. 38. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  39. 39. Conclusions • multilinguality is a very important issue for the QALL-ME project • the ontology constitute the bridge between languages • the QALL-ME framework can be used to quickly develop prototypes for other languages
  40. 40. Thank you!
  41. 41. References
  42. 42. Ou, Shiyan, Viktor Pekar, Constantin Or˘san, Christian Spurk, and Matteo Negri. a 2008. Development and alignment of a domain-specific ontology for question answering. In European Language Resources Association (ELRA), editor, Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco, May 28 – 30. Pu¸ca¸u, Georgiana. 2004. A framework for temporal resolution. In Proceedings of s s the 4th Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, May, 26-28. Varga, Andrea, Georgiana Pu¸ca¸u, and Constantin Or˘san. 2009. Identification of s s a temporal expressions in the domain of tourism. In Knowledge Engineering: Principles and Techniques, volume 1, pages 29 – 32, Cluj-Napoca, Romania, July 2 – 4.

×