SlideShare a Scribd company logo
1 of 42
Download to read offline
Porting the QALL-ME framework to Romanian

                    Constantin Or˘san
                                 a

           Research Group in Computational Linguistics
    Research Institute in Information and Language Processing
                   University of Wolverhampton
                http://www.wlv.ac.uk/~in6093/


                      29th March 2010
1 Introduction



2 The QALL-ME project



3 Multilingual information access in QALL-ME



4 Conclusions
Structure of the presentation



1 Introduction


2 The QALL-ME project


3 Multilingual information access in QALL-ME


4 Conclusions
Need to access information




• as a result of the Internet development more and more
  information becomes available
• this information is in many languages
• fields from computational linguistics such as automatic
  summarisation, question answering, text mining, etc. can help
  people deal with information
Need to access information




• as a result of the Internet development more and more
  information becomes available
• this information is in many languages
• fields from computational linguistics such as automatic
  summarisation, question answering, text mining, etc. can help
  people deal with information
Question answering (QA)



• Question answering aims at identifying the answer to a
  question in a large collection of documents
• the information provided by QA is more focused than
  information retrieval
• the output can be the exact answer or a text snippet which
  contains the answer
• the domain took off as a result of the introduction of QA
  track in TREC, whilst cross-lingual QA as a result of CLEF
Types of QA systems

• open-domain QA systems: can answer any question from any
  collection
  + can potentially answer any question
  - very low accuracy (especially in cross-lingual settings)
Types of QA systems

• open-domain QA systems: can answer any question from any
  collection
  + can potentially answer any question
  - very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository of
  questions for which the answer is known
  + very little processing necessary
  - limited to the answers in the database
Types of QA systems

• open-domain QA systems: can answer any question from any
  collection
  + can potentially answer any question
  - very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository of
  questions for which the answer is known
  + very little processing necessary
  - limited to the answers in the database
• closed-domain QA systems: are built for very specific domains
  and exploit expert knowledge in them
  + very high accuracy
  - can require extensive language processing and limited to one
  domain
Purpose of the presentation




• briefly present the QALL-ME project
Purpose of the presentation




• briefly present the QALL-ME project
• show how it was adapted to answer questions in Romanian
  about movies
Structure of the presentation



1 Introduction


2 The QALL-ME project


3 Multilingual information access in QALL-ME


4 Conclusions
The QALL-ME project


• QALL-ME = Question Answering Learning technologies in a
  multiLingual and Multimodal Environment
• EU-funded project part of FP6
• 7 partners:
    • FBK-irst, Italy
    • University of Wolverhampton, UK
    • University of Alicante, Spain
    • DFKI, Germany
    • Comdata, Italy
    • UbiEST, Italy
    • WayCom, Italy
• Web page: http://qallme.fbk.eu
The QALL-ME project



• aimed at establishing a shared infrastructure for multilingual
  and multimodal QA in the domain of tourism
• In the QALL-ME system
     • users ask natural language questions in several languages (both
       in textual and speech modality) using a variety of input devices
       (e.g. mobile phones), and
     • returns a list of specific answers formatted in the most
       appropriate modality, ranging from small texts, maps, videos,
       and pictures.
Local Information      Semantic 
     Sources         representation




                                                  Service Provider
                          English Answer                                       German Answer 
                            Extractor                                             Extractor

                                                  QALL­ME central 
                                                    QA planner


                         Spanish Answer                                        Italian Answer 
                            Extractor                                             Extractor




                     Question Type          Answer Type            Speech            Dialog Models
                       ontology              ontology            Recognizers
Main outputs of the project




  • an ontology for the domain of tourism
  • entailment based QA framework
  • the QALL-ME benchmark
  • an entailment framework

(all accessible from the project’s web page:
http://qallme.fbk.eu)
The ontology



• A domain-specific ontology for the tourism domain was
  developed and shared among all the partners.
• The ontology was used to serve as:
    • bridge between different languages
    • communication language between different components of the
      system
• The ontology was linked to domain independent ontologies
  such as MultiWordNet and Sumo
• For more information see (Ou et al., 2008)
Design of the ontology



• Analysis of data from content providers
• Analysis of users requirements
• Inspired by similar ontologies:
     • Harmonise and eTourism: focus on static information (e.g.
       accommodation and events/activities)
     • Similar to eTourism as is written in OWL rather RDFs
     • but wider coverage
• Introspection
The ontology



• Main classes: Country, Destination, Site (i.e.
  Accommodation, Attraction, Gastro, and Infrastructure),
  Transportation, EventContent and Event
• Element classes: Facility, Room, PersonOrganization,
  Language, and Currency
• Attribute classes: Contact, Location, Period and Price.

• Element and attribute classes cannot exist independently and
  have to be attached to other main or element classes
Price                                               Site
                                                                                                                                              GPSCoordinate
priceType
                                                                                                                  hasGPSCoordinate
                              subClassOf                                          subClassOf
                                                                                                                                              PostalAddress
priceValue                                              Event                                                      hasPostalAddress

                              TicketPrice                                          Cinema
                                                                                                                    DirectionLocation
             hasCurrency                            subClassOf                                                                               DirectionLocation
Currency                                                        isInSite
                                                hasPrice
                                                                                                                              hasContact
                                                                           name             description
                                                                                                                                                Contact
                                                                                                            hasSiteFacility
                                                    MovieShow                                                                  hasRoom


                                                                                                                                               CinemaRoom
                                                                                                                    SiteFacility
                               Period                                        EventContent
                                                                                                                                       hasRoomFacility
endTime      startTime                      hasPeriod
                                                                 hasEventContent                                   RoomFacility

                              subClassOf                                          subClassOf
     TimePeriod

                                                                                                                                                 Director
              hasTimePeriod
                                                                                                                   hasDirector
                           DateTimePeriod                                           Movie                          hasProducer                   Producer
              hasDatePeriod                                                                                         hasStar

     DatePeriod                                                                                                     hasWriter                      Star
                                                                name                                certificate



endDate       startDate                                                synposis             genre                                                 Writer
The ontology


• Encoded using OWL DL, since it has more expressive power
  than OWL Lite and has more efficient reasoning support than
  OWL Full
• Used Protege-OWL as the editor and RacerPro7 as the
  reasoner
• The ontology contains
    • 122 classes (concepts),
    • 55 datatype properties and
    • 52 object properties which indicate the relationships among
      the 122 classes.
    • 15 top-level classes.
• The class hierarchy has a maximum depth of 4.
The QALL-ME framework



• is an architecture skeleton for multilingual QA systems for
  closed domains
• designed in such a way that it allows fast development of
  closed domain QA systems
• freely available from http://qallme.sourceforge.net/
• is based on a Service Oriented Architecture (SOA) which is
  realised using web services
• relies on textual entailment recognisers
Web services
1   Context providers: are used to anchor questions in space
    and time
2   Annotators: Currently three types of annotators are
    available:
      • named entity annotators which identify names of cinemas,
        movies, persons, etc.
      • term annotators which identify hotel facilities, movie genres
        and other domain-specific terminology
      • temporal annotators that are used to recognise and normalise
        temporal expressions in user questions
3   Entailment engine: determines whether a user question
    entails a retrieval procedure
4   Query generator: which relies on an entailment engine to
    generate a query to extract the answer.
5   Answer pool: retrieves the answers from a database.
Context providers



• are used to anchor a question in space and time
• return the current position and time
• used by the presentation module when maps are displayed
• used by temporal process to normalise temporal entities
• determines which services are used in a cross-lingual scenario
• can be static or determined from a mobile phone
Named entity and term annotators

• named entity recogniser = identifies names of hotels, movies,
  persons, etc.
• term annotator = identifies domain specific terms such as
  hotel facilities, movie genres, etc.
• the entities and terms are known, so the task is reduced to a
  database look up
• Gazetteers are the main source for determining the entities
• The annotation module needs to determine the canonical form
  of a entity
• greedy algorithm that uses character based similarity, a
  modified TF*IDF and a greedy algorithm
• does not allow overlapping and there are few ambiguities
Named entity and term annotators


• Annotates both standard and non-standard entities: cinema,
  movie, location, genre, certificate
• Needs to deal with nosy input:
    • misspelt words/input from ASR engines/SMS input e.g.
       becaming Jane, becoming Jade
    • free word order (Will Smith / Smith, Will)
    • equivalent strings (saw III / three / 3; Smith, Will / Smith,
       W.)
• Needs to deal with questions in mixed languages
• Needs to deal with ambiguous entities
Temporal annotator


• questions from the domain of tourism contain a large number
  of temporal expressions
• we use a simplified version of the tagger implemented by
  Pu¸ca¸u (2004)
    s s
• the simplification was done to reduce the processing time
  (Varga, Pu¸ca¸u, and Or˘san, 2009)
            s s          a
• identifies both self-contained temporal expressions (TEs) and
  indexical/under-specified TEs
• uses TIMEX2 standard
• the output is used by TIMEX2SPARQL service to restrict the
  extracted answers
Entailment engine

• often closed-domain QA systems transform a question to a
  Prolog fact or SQL query
• often this solution works only partially due to language
  variability
• in QALL-ME this problem is solved using textual entailment
• the entailment engine determines whether two questions entail
  the same meaning so they share the same retrieval procedure:
    • T the input question
    • H is textual pattern stored in a repository
    • textual patterns have SPARQL retrieval procedures
• we calculate the similarity between two sentences to determine
  whether between them there is an entailment relation
Query generation service



• produces a SPARQL query that can be used to answer the
  question
• has a list of question templates with their associated SPARQL
  queries
• relies on the entailment engine to determine which of the
  question patterns entail the same meaning as the user
  question
• fills in the slots of the question patterns
Example

User question (T): What movie can I see tonight in
Wolverhampton?


List of patterns (H):
  • Who is the director of [MOVIE]?
  • Where can I see [MOVIE] [TIMEX]?
  • What movies are on in [DESTINATION] [TIMEX]?
  • What is the address of [CINEMA]?
  • ...
Example
User question (T): What movie can I see tonight in
Wolverhampton? → What movie can I see [TIMEX] in
[DESTINATION]?


List of patterns (H):
  • Who is the director of [MOVIE]?
  • Where can I see [MOVIE] [TIMEX]?
  • What movies are on in [DESTINATION] [TIMEX]?
  • What is the address of [CINEMA]?
  • ...



Select the retrieval pattern associated with the question
What movies are on in Wolverhampton tonight
Answer Pool service




• takes the SPARQL query generated by the query generator
  and extracts the answer
• SPARQL is a query language for accessing RDF graphs by the
  W3C RDF Data Access Working Group
• SPARQL provides interoperability between languages
Structure of the presentation



1 Introduction


2 The QALL-ME project


3 Multilingual information access in QALL-ME


4 Conclusions
Cross-lingual QA




• QALL-ME tourism prototype is design to allow both
  monolingual and cross-lingual QA
• relevant web services are activated depending on the source
  and target language
• user scenario: Romanian tourist in UK who wants to find out
  more about the movies in Wolverhampton
Cross-lingual QA
Prototype for Romanian


• we wanted to find out how long it takes to develop a demo for
  Romanian
• components had to be adapted:
    • named entity and term annotators had to be trained on a
      different list of entities
    • a simple temporal annotator was implemented on the basis of
      the English one
    • the language independent similarity entailment engine was used
    • the question patterns were translated to Romanian
    • answer pool did not required any change
• the whole process took under one week
Romanian demo




http://qallme.wlv.ac.uk:
8080/QALL-ME-web-demo/index.jsp
Structure of the presentation



1 Introduction


2 The QALL-ME project


3 Multilingual information access in QALL-ME


4 Conclusions
Conclusions




• multilinguality is a very important issue for the QALL-ME
  project
• the ontology constitute the bridge between languages
• the QALL-ME framework can be used to quickly develop
  prototypes for other languages
Thank you!
References
Ou, Shiyan, Viktor Pekar, Constantin Or˘san, Christian Spurk, and Matteo Negri.
                                        a
2008. Development and alignment of a domain-specific ontology for question
answering. In European Language Resources Association (ELRA), editor, Proceedings
of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech,
Morocco, May 28 – 30.
Pu¸ca¸u, Georgiana. 2004. A framework for temporal resolution. In Proceedings of
   s s
the 4th Conference on Language Resources and Evaluation (LREC 2004), Lisbon,
Portugal, May, 26-28.
Varga, Andrea, Georgiana Pu¸ca¸u, and Constantin Or˘san. 2009. Identification of
                             s s                     a
temporal expressions in the domain of tourism. In Knowledge Engineering: Principles
and Techniques, volume 1, pages 29 – 32, Cluj-Napoca, Romania, July 2 – 4.

More Related Content

Viewers also liked

Fond memories of Zanzibar
Fond memories of ZanzibarFond memories of Zanzibar
Fond memories of ZanzibarHeena Modi
 
Developing Cocoa Applications with macRuby
Developing Cocoa Applications with macRubyDeveloping Cocoa Applications with macRuby
Developing Cocoa Applications with macRubyBrendan Lim
 
Jean Fares Couture BIO
Jean Fares Couture BIO Jean Fares Couture BIO
Jean Fares Couture BIO Norma HAYEK
 
Linked In Presentation
Linked In PresentationLinked In Presentation
Linked In PresentationBenaud Jacob
 
Software Testing Services
Software Testing ServicesSoftware Testing Services
Software Testing ServicesFuad Mak
 
Art Mini Portfolio
Art Mini PortfolioArt Mini Portfolio
Art Mini Portfoliozbent
 
Subtraction problem
Subtraction problemSubtraction problem
Subtraction problemHeena Modi
 
24 Tirthankaras
24 Tirthankaras24 Tirthankaras
24 TirthankarasHeena Modi
 
Interview with Warren Buffet
Interview with Warren BuffetInterview with Warren Buffet
Interview with Warren BuffetHeena Modi
 
Fear Factor with Outsourcing
Fear Factor with OutsourcingFear Factor with Outsourcing
Fear Factor with OutsourcingBenaud Jacob
 
Way out cafe - amazing vegan desserts!
Way out cafe - amazing vegan desserts!Way out cafe - amazing vegan desserts!
Way out cafe - amazing vegan desserts!Heena Modi
 

Viewers also liked (19)

Milieu
MilieuMilieu
Milieu
 
Milieu
MilieuMilieu
Milieu
 
Fond memories of Zanzibar
Fond memories of ZanzibarFond memories of Zanzibar
Fond memories of Zanzibar
 
Developing Cocoa Applications with macRuby
Developing Cocoa Applications with macRubyDeveloping Cocoa Applications with macRuby
Developing Cocoa Applications with macRuby
 
Linkedin power point
Linkedin power pointLinkedin power point
Linkedin power point
 
Lecture 02 - DSA
Lecture 02 - DSALecture 02 - DSA
Lecture 02 - DSA
 
Jean Fares Couture BIO
Jean Fares Couture BIO Jean Fares Couture BIO
Jean Fares Couture BIO
 
Linked In Presentation
Linked In PresentationLinked In Presentation
Linked In Presentation
 
Iso dinkes
Iso dinkesIso dinkes
Iso dinkes
 
Kansas sights
Kansas sightsKansas sights
Kansas sights
 
IOS-Basic Configuration
IOS-Basic ConfigurationIOS-Basic Configuration
IOS-Basic Configuration
 
Software Testing Services
Software Testing ServicesSoftware Testing Services
Software Testing Services
 
Art Mini Portfolio
Art Mini PortfolioArt Mini Portfolio
Art Mini Portfolio
 
Prem Ni Parab
Prem Ni ParabPrem Ni Parab
Prem Ni Parab
 
Subtraction problem
Subtraction problemSubtraction problem
Subtraction problem
 
24 Tirthankaras
24 Tirthankaras24 Tirthankaras
24 Tirthankaras
 
Interview with Warren Buffet
Interview with Warren BuffetInterview with Warren Buffet
Interview with Warren Buffet
 
Fear Factor with Outsourcing
Fear Factor with OutsourcingFear Factor with Outsourcing
Fear Factor with Outsourcing
 
Way out cafe - amazing vegan desserts!
Way out cafe - amazing vegan desserts!Way out cafe - amazing vegan desserts!
Way out cafe - amazing vegan desserts!
 

Similar to Porting the QALL-ME framework to Romanian

Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
 
Curation Technologies for Multilingual Europe
Curation Technologies for Multilingual EuropeCuration Technologies for Multilingual Europe
Curation Technologies for Multilingual EuropeGeorg Rehm
 
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET Journal
 
Content Processing Architecture and Applications - Introduction to Text Mining
Content Processing Architecture and Applications - Introduction to Text MiningContent Processing Architecture and Applications - Introduction to Text Mining
Content Processing Architecture and Applications - Introduction to Text MiningFindwise
 
Localize your business - Software Localization Services LocServ
Localize your business - Software Localization Services LocServLocalize your business - Software Localization Services LocServ
Localize your business - Software Localization Services LocServSoftengi
 
LocServ - presentation of great localization and internationalization services
LocServ - presentation of great localization and internationalization servicesLocServ - presentation of great localization and internationalization services
LocServ - presentation of great localization and internationalization servicesLocServ
 
traffic sign detection using deep learning.pptx
traffic sign detection using deep learning.pptxtraffic sign detection using deep learning.pptx
traffic sign detection using deep learning.pptxbrijeshbs2
 
Plone at Harvard School of Engineering and Applied Sciences
Plone at Harvard School of Engineering and Applied SciencesPlone at Harvard School of Engineering and Applied Sciences
Plone at Harvard School of Engineering and Applied SciencesJazkarta, Inc.
 
Mobile Multi-domain Search over Structured Web Data
Mobile Multi-domain Search over Structured Web DataMobile Multi-domain Search over Structured Web Data
Mobile Multi-domain Search over Structured Web DataAtakanAral
 
Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012John Domingue
 
44 language resources for computer assisted translation
44 language resources for computer assisted translation44 language resources for computer assisted translation
44 language resources for computer assisted translationAEGIS-ACCESSIBLE Projects
 
Adaptive streaming for immersive communication
Adaptive streaming for immersive communicationAdaptive streaming for immersive communication
Adaptive streaming for immersive communicationSilvia Rossi
 
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, MicrosoftUsing Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, MicrosoftGuhan Suriyanarayanan
 
Denovo SIP VoIP Termination SBC Session Boarder Controler @ denofolab.com
Denovo SIP VoIP Termination SBC Session Boarder Controler @ denofolab.comDenovo SIP VoIP Termination SBC Session Boarder Controler @ denofolab.com
Denovo SIP VoIP Termination SBC Session Boarder Controler @ denofolab.comAnne Kwong
 
Steven Ramage: THE LANGUAGE OF BUSINESS
Steven Ramage: THE LANGUAGE OF BUSINESSSteven Ramage: THE LANGUAGE OF BUSINESS
Steven Ramage: THE LANGUAGE OF BUSINESSAGI Geocommunity
 

Similar to Porting the QALL-ME framework to Romanian (20)

Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
Curation Technologies for Multilingual Europe
Curation Technologies for Multilingual EuropeCuration Technologies for Multilingual Europe
Curation Technologies for Multilingual Europe
 
Text mining and Visualizations
Text mining  and VisualizationsText mining  and Visualizations
Text mining and Visualizations
 
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...
 
COBWEB Authentication Workshop
COBWEB Authentication WorkshopCOBWEB Authentication Workshop
COBWEB Authentication Workshop
 
Content Processing Architecture and Applications - Introduction to Text Mining
Content Processing Architecture and Applications - Introduction to Text MiningContent Processing Architecture and Applications - Introduction to Text Mining
Content Processing Architecture and Applications - Introduction to Text Mining
 
Localize your business - Software Localization Services LocServ
Localize your business - Software Localization Services LocServLocalize your business - Software Localization Services LocServ
Localize your business - Software Localization Services LocServ
 
Flow OGX GCDP
Flow OGX GCDPFlow OGX GCDP
Flow OGX GCDP
 
LocServ - presentation of great localization and internationalization services
LocServ - presentation of great localization and internationalization servicesLocServ - presentation of great localization and internationalization services
LocServ - presentation of great localization and internationalization services
 
traffic sign detection using deep learning.pptx
traffic sign detection using deep learning.pptxtraffic sign detection using deep learning.pptx
traffic sign detection using deep learning.pptx
 
Plone at Harvard School of Engineering and Applied Sciences
Plone at Harvard School of Engineering and Applied SciencesPlone at Harvard School of Engineering and Applied Sciences
Plone at Harvard School of Engineering and Applied Sciences
 
DaViT.pdf
DaViT.pdfDaViT.pdf
DaViT.pdf
 
Mobile Multi-domain Search over Structured Web Data
Mobile Multi-domain Search over Structured Web DataMobile Multi-domain Search over Structured Web Data
Mobile Multi-domain Search over Structured Web Data
 
Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012
 
44 language resources for computer assisted translation
44 language resources for computer assisted translation44 language resources for computer assisted translation
44 language resources for computer assisted translation
 
Adaptive streaming for immersive communication
Adaptive streaming for immersive communicationAdaptive streaming for immersive communication
Adaptive streaming for immersive communication
 
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, MicrosoftUsing Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
 
Denovo SIP VoIP Termination SBC Session Boarder Controler @ denofolab.com
Denovo SIP VoIP Termination SBC Session Boarder Controler @ denofolab.comDenovo SIP VoIP Termination SBC Session Boarder Controler @ denofolab.com
Denovo SIP VoIP Termination SBC Session Boarder Controler @ denofolab.com
 
Steven Ramage: THE LANGUAGE OF BUSINESS
Steven Ramage: THE LANGUAGE OF BUSINESSSteven Ramage: THE LANGUAGE OF BUSINESS
Steven Ramage: THE LANGUAGE OF BUSINESS
 

More from Constantin Orasan

New trends in NLP applications
New trends in NLP applicationsNew trends in NLP applications
New trends in NLP applicationsConstantin Orasan
 
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?Constantin Orasan
 
QALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebQALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebConstantin Orasan
 
The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingThe role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingConstantin Orasan
 
What is Computer-Aided Summarisation and does it really work?
What is Computer-Aided Summarisation and does it really work?What is Computer-Aided Summarisation and does it really work?
What is Computer-Aided Summarisation and does it really work?Constantin Orasan
 
Tutorial on automatic summarization
Tutorial on automatic summarizationTutorial on automatic summarization
Tutorial on automatic summarizationConstantin Orasan
 
Annotation of anaphora and coreference for automatic processing
Annotation of anaphora and coreference for automatic processingAnnotation of anaphora and coreference for automatic processing
Annotation of anaphora and coreference for automatic processingConstantin Orasan
 

More from Constantin Orasan (8)

New trends in NLP applications
New trends in NLP applicationsNew trends in NLP applications
New trends in NLP applications
 
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
 
QALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic WebQALL-ME: Ontology and Semantic Web
QALL-ME: Ontology and Semantic Web
 
The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processingThe role of linguistic information for shallow language processing
The role of linguistic information for shallow language processing
 
What is Computer-Aided Summarisation and does it really work?
What is Computer-Aided Summarisation and does it really work?What is Computer-Aided Summarisation and does it really work?
What is Computer-Aided Summarisation and does it really work?
 
Tutorial on automatic summarization
Tutorial on automatic summarizationTutorial on automatic summarization
Tutorial on automatic summarization
 
Message project leaflet
Message project leafletMessage project leaflet
Message project leaflet
 
Annotation of anaphora and coreference for automatic processing
Annotation of anaphora and coreference for automatic processingAnnotation of anaphora and coreference for automatic processing
Annotation of anaphora and coreference for automatic processing
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 

Recently uploaded (20)

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 

Porting the QALL-ME framework to Romanian

  • 1. Porting the QALL-ME framework to Romanian Constantin Or˘san a Research Group in Computational Linguistics Research Institute in Information and Language Processing University of Wolverhampton http://www.wlv.ac.uk/~in6093/ 29th March 2010
  • 2. 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  • 3. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  • 4. Need to access information • as a result of the Internet development more and more information becomes available • this information is in many languages • fields from computational linguistics such as automatic summarisation, question answering, text mining, etc. can help people deal with information
  • 5. Need to access information • as a result of the Internet development more and more information becomes available • this information is in many languages • fields from computational linguistics such as automatic summarisation, question answering, text mining, etc. can help people deal with information
  • 6. Question answering (QA) • Question answering aims at identifying the answer to a question in a large collection of documents • the information provided by QA is more focused than information retrieval • the output can be the exact answer or a text snippet which contains the answer • the domain took off as a result of the introduction of QA track in TREC, whilst cross-lingual QA as a result of CLEF
  • 7. Types of QA systems • open-domain QA systems: can answer any question from any collection + can potentially answer any question - very low accuracy (especially in cross-lingual settings)
  • 8. Types of QA systems • open-domain QA systems: can answer any question from any collection + can potentially answer any question - very low accuracy (especially in cross-lingual settings) • canned QA systems: rely on a very large repository of questions for which the answer is known + very little processing necessary - limited to the answers in the database
  • 9. Types of QA systems • open-domain QA systems: can answer any question from any collection + can potentially answer any question - very low accuracy (especially in cross-lingual settings) • canned QA systems: rely on a very large repository of questions for which the answer is known + very little processing necessary - limited to the answers in the database • closed-domain QA systems: are built for very specific domains and exploit expert knowledge in them + very high accuracy - can require extensive language processing and limited to one domain
  • 10. Purpose of the presentation • briefly present the QALL-ME project
  • 11. Purpose of the presentation • briefly present the QALL-ME project • show how it was adapted to answer questions in Romanian about movies
  • 12. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  • 13. The QALL-ME project • QALL-ME = Question Answering Learning technologies in a multiLingual and Multimodal Environment • EU-funded project part of FP6 • 7 partners: • FBK-irst, Italy • University of Wolverhampton, UK • University of Alicante, Spain • DFKI, Germany • Comdata, Italy • UbiEST, Italy • WayCom, Italy • Web page: http://qallme.fbk.eu
  • 14. The QALL-ME project • aimed at establishing a shared infrastructure for multilingual and multimodal QA in the domain of tourism • In the QALL-ME system • users ask natural language questions in several languages (both in textual and speech modality) using a variety of input devices (e.g. mobile phones), and • returns a list of specific answers formatted in the most appropriate modality, ranging from small texts, maps, videos, and pictures.
  • 15. Local Information  Semantic  Sources representation Service Provider English Answer  German Answer  Extractor Extractor QALL­ME central  QA planner Spanish Answer  Italian Answer  Extractor Extractor Question Type  Answer Type  Speech  Dialog Models ontology ontology Recognizers
  • 16. Main outputs of the project • an ontology for the domain of tourism • entailment based QA framework • the QALL-ME benchmark • an entailment framework (all accessible from the project’s web page: http://qallme.fbk.eu)
  • 17. The ontology • A domain-specific ontology for the tourism domain was developed and shared among all the partners. • The ontology was used to serve as: • bridge between different languages • communication language between different components of the system • The ontology was linked to domain independent ontologies such as MultiWordNet and Sumo • For more information see (Ou et al., 2008)
  • 18. Design of the ontology • Analysis of data from content providers • Analysis of users requirements • Inspired by similar ontologies: • Harmonise and eTourism: focus on static information (e.g. accommodation and events/activities) • Similar to eTourism as is written in OWL rather RDFs • but wider coverage • Introspection
  • 19. The ontology • Main classes: Country, Destination, Site (i.e. Accommodation, Attraction, Gastro, and Infrastructure), Transportation, EventContent and Event • Element classes: Facility, Room, PersonOrganization, Language, and Currency • Attribute classes: Contact, Location, Period and Price. • Element and attribute classes cannot exist independently and have to be attached to other main or element classes
  • 20. Price Site GPSCoordinate priceType hasGPSCoordinate subClassOf subClassOf PostalAddress priceValue Event hasPostalAddress TicketPrice Cinema DirectionLocation hasCurrency subClassOf DirectionLocation Currency isInSite hasPrice hasContact name description Contact hasSiteFacility MovieShow hasRoom CinemaRoom SiteFacility Period EventContent hasRoomFacility endTime startTime hasPeriod hasEventContent RoomFacility subClassOf subClassOf TimePeriod Director hasTimePeriod hasDirector DateTimePeriod Movie hasProducer Producer hasDatePeriod hasStar DatePeriod hasWriter Star name certificate endDate startDate synposis genre Writer
  • 21. The ontology • Encoded using OWL DL, since it has more expressive power than OWL Lite and has more efficient reasoning support than OWL Full • Used Protege-OWL as the editor and RacerPro7 as the reasoner • The ontology contains • 122 classes (concepts), • 55 datatype properties and • 52 object properties which indicate the relationships among the 122 classes. • 15 top-level classes. • The class hierarchy has a maximum depth of 4.
  • 22. The QALL-ME framework • is an architecture skeleton for multilingual QA systems for closed domains • designed in such a way that it allows fast development of closed domain QA systems • freely available from http://qallme.sourceforge.net/ • is based on a Service Oriented Architecture (SOA) which is realised using web services • relies on textual entailment recognisers
  • 23. Web services 1 Context providers: are used to anchor questions in space and time 2 Annotators: Currently three types of annotators are available: • named entity annotators which identify names of cinemas, movies, persons, etc. • term annotators which identify hotel facilities, movie genres and other domain-specific terminology • temporal annotators that are used to recognise and normalise temporal expressions in user questions 3 Entailment engine: determines whether a user question entails a retrieval procedure 4 Query generator: which relies on an entailment engine to generate a query to extract the answer. 5 Answer pool: retrieves the answers from a database.
  • 24. Context providers • are used to anchor a question in space and time • return the current position and time • used by the presentation module when maps are displayed • used by temporal process to normalise temporal entities • determines which services are used in a cross-lingual scenario • can be static or determined from a mobile phone
  • 25. Named entity and term annotators • named entity recogniser = identifies names of hotels, movies, persons, etc. • term annotator = identifies domain specific terms such as hotel facilities, movie genres, etc. • the entities and terms are known, so the task is reduced to a database look up • Gazetteers are the main source for determining the entities • The annotation module needs to determine the canonical form of a entity • greedy algorithm that uses character based similarity, a modified TF*IDF and a greedy algorithm • does not allow overlapping and there are few ambiguities
  • 26. Named entity and term annotators • Annotates both standard and non-standard entities: cinema, movie, location, genre, certificate • Needs to deal with nosy input: • misspelt words/input from ASR engines/SMS input e.g. becaming Jane, becoming Jade • free word order (Will Smith / Smith, Will) • equivalent strings (saw III / three / 3; Smith, Will / Smith, W.) • Needs to deal with questions in mixed languages • Needs to deal with ambiguous entities
  • 27. Temporal annotator • questions from the domain of tourism contain a large number of temporal expressions • we use a simplified version of the tagger implemented by Pu¸ca¸u (2004) s s • the simplification was done to reduce the processing time (Varga, Pu¸ca¸u, and Or˘san, 2009) s s a • identifies both self-contained temporal expressions (TEs) and indexical/under-specified TEs • uses TIMEX2 standard • the output is used by TIMEX2SPARQL service to restrict the extracted answers
  • 28. Entailment engine • often closed-domain QA systems transform a question to a Prolog fact or SQL query • often this solution works only partially due to language variability • in QALL-ME this problem is solved using textual entailment • the entailment engine determines whether two questions entail the same meaning so they share the same retrieval procedure: • T the input question • H is textual pattern stored in a repository • textual patterns have SPARQL retrieval procedures • we calculate the similarity between two sentences to determine whether between them there is an entailment relation
  • 29. Query generation service • produces a SPARQL query that can be used to answer the question • has a list of question templates with their associated SPARQL queries • relies on the entailment engine to determine which of the question patterns entail the same meaning as the user question • fills in the slots of the question patterns
  • 30. Example User question (T): What movie can I see tonight in Wolverhampton? List of patterns (H): • Who is the director of [MOVIE]? • Where can I see [MOVIE] [TIMEX]? • What movies are on in [DESTINATION] [TIMEX]? • What is the address of [CINEMA]? • ...
  • 31. Example User question (T): What movie can I see tonight in Wolverhampton? → What movie can I see [TIMEX] in [DESTINATION]? List of patterns (H): • Who is the director of [MOVIE]? • Where can I see [MOVIE] [TIMEX]? • What movies are on in [DESTINATION] [TIMEX]? • What is the address of [CINEMA]? • ... Select the retrieval pattern associated with the question What movies are on in Wolverhampton tonight
  • 32. Answer Pool service • takes the SPARQL query generated by the query generator and extracts the answer • SPARQL is a query language for accessing RDF graphs by the W3C RDF Data Access Working Group • SPARQL provides interoperability between languages
  • 33. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  • 34. Cross-lingual QA • QALL-ME tourism prototype is design to allow both monolingual and cross-lingual QA • relevant web services are activated depending on the source and target language • user scenario: Romanian tourist in UK who wants to find out more about the movies in Wolverhampton
  • 36. Prototype for Romanian • we wanted to find out how long it takes to develop a demo for Romanian • components had to be adapted: • named entity and term annotators had to be trained on a different list of entities • a simple temporal annotator was implemented on the basis of the English one • the language independent similarity entailment engine was used • the question patterns were translated to Romanian • answer pool did not required any change • the whole process took under one week
  • 38. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  • 39. Conclusions • multilinguality is a very important issue for the QALL-ME project • the ontology constitute the bridge between languages • the QALL-ME framework can be used to quickly develop prototypes for other languages
  • 42. Ou, Shiyan, Viktor Pekar, Constantin Or˘san, Christian Spurk, and Matteo Negri. a 2008. Development and alignment of a domain-specific ontology for question answering. In European Language Resources Association (ELRA), editor, Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco, May 28 – 30. Pu¸ca¸u, Georgiana. 2004. A framework for temporal resolution. In Proceedings of s s the 4th Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, May, 26-28. Varga, Andrea, Georgiana Pu¸ca¸u, and Constantin Or˘san. 2009. Identification of s s a temporal expressions in the domain of tourism. In Knowledge Engineering: Principles and Techniques, volume 1, pages 29 – 32, Cluj-Napoca, Romania, July 2 – 4.