Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

SemChat: Extracting Personal Information from Chat Conversations (EKAW 2010)

on

  • 671 views

This paper was presented in the 1st Workshop on Personal Semantic Data (PSD 2010: http://semanticweb.org/wiki/Personal_Semantic_Data) at EKAW 2010 (http://ekaw2010.inesc-id.pt/) Conference on ...

This paper was presented in the 1st Workshop on Personal Semantic Data (PSD 2010: http://semanticweb.org/wiki/Personal_Semantic_Data) at EKAW 2010 (http://ekaw2010.inesc-id.pt/) Conference on Knowledge Engineering and Knowledge Management by the Masses in Lisbon, Portugal on 11 October 2010.

The full paper can be found on: http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-629/psd2010_paper2.pdf

Statistics

Views

Total Views
671
Views on SlideShare
671
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • -As a brief introduction...-Internet has brought about a radical change in the way people interact-Online communities have flourished, first fueled by electronic mail (e-mail), and nowadays complemented by instant messaging (IM)-Development of IM in 1993 to overcome e-mail which is not as fast, where it became very popular over the last couple of years- This led to an increase...
  • Semantic Desktop focuses on the provision of a personal information management (PIM) system that integrates and presents relevant content within the user’s desktop in a manageable and practical fashion / user-relevant mannerConcepts are the most important forms of semantics. A Concept can be a location, person, organization or date, event (meeting mentioned during a conversation)Quick overview structure of talk: Aims and Objectives, SemChat, Evaluation, Related Works to our research, Future Work, Conclusions
  • PIMO - contains all concepts, resources and relationships found on a user’s desktop.A concept can be a location, person, organization or date and EventExtracted concepts which have not already been stored within the user’s PIMOAnnotationof concepts deemed relevant to the user, within the user’s PIMOAn event refers to a meeting mentioned during a conversationAn option to annotate such events should be available in the task/Event schedulerAn event schedulernotifying users when forthcoming saved events are due
  • -All concepts and events (from different users) related to the search criterion are displayed to the user in a comprehensive manner-Create a chat client plug-in which includes all the functionality mentionedExtraction of concepts and events is done once the user ends a chat conversation sessionMultiprotocols such as: AOL IM, ICQ, MSN Messenger, Yahoo! and IRC amongst many others
  • -General architecture and it’s main components-Motivation behind this architecture partly came from work performed on Semanta and SemNotes which are applications that also exploit theideas behind SD and SSD. The former is a semantic email component while the latter is a note-taking tool, and both integrate closely with NEPOMUKOnce the closure of a chat session or a chat room is detected – SemChat starts its processingChat conversation of both users is retrieved and passed to the XtraK4Me key phrase extractorXtraK4Me key phrase extractor used to identify the main key words (concepts and events) from it and find the ones which are not already stored within the user’s PIMO in NEPOMUKGATE’s ANNIE was used for identifying the entities of the extracted key phrasesSemChat searches for any possible events within the chat conversation via the use of ANNIE NERAnnotated concepts stored in user’s PIMO whilst annotated events stored in Spark IM task schedulerKnowledge obtained from chat conversations can be exposed and linked to the knowledge found on the user’s desktop through NEPOMUK
  • -XMPP : Extensible Messaging and Presence Protocol -Spark IM ideal candidate for SemChat - open source cross platform IM client and could be further extended through plug-in development-multiprotocol chat clientpossibility to connect to multiple chat protocols from within the same client-Openfire server - XMPP server used which provides the possibility of using several IM networks such as GaduGadu, Yahoo!, AIM, MSN, IRC, ICQ, GTalk and XMPP
  • -non-intrusive –no notifications displayed whilst user is still chatting-Iqbal and Horvitz – Interruption management : disrupted task in our case would be the current chat activity-Once concept extraction process is complete the user is presented with a notification linked to a list of extracted concepts which is displayed in a separate tab-Intention behind this feature is to make the whole process less disruptive and distracting-All concepts are displayed according to the date of when they were extracted, in descending order
  • -Context Menu: If a user right clicks on the Concept’s name, a context menu is displayed showing three options that a user can choose from-1st option – to save the concept within the user’s PIMO in NEPOMUK-2nd option - to delete the concept from the list-3rd option - to retrieve more information about the concept, where the first paragraph from Wikipedia containing information about that concept (if any is found) is displayed in a separate pop-up window
  • -Liverpool Concept stored in user’s PIMO
  • -ANNIE NER isn’t able to recognize Event entities, therefore we extended it to be able to do so-JAPE is a Java Annotation Patterns Engine which contains finite state transduction over annotations which are mainly based on regular expressions-This grammar is used to add resources which can already be found in GATE-Implemented JAPE rules look up for different kinds of text sequences, such as phrases that may indicate a possible meeting, and different types of dates and time.-Extracted events are also represented beneath each other according to the date of their extraction-User has the possibility to edit both the title of an extracted event, and also the prospective date-Any annotated event will automatically be saved within Spark’s Task List
  • JAPE rule : implemented to look up phrases which might indicate a possible event within a conversationEx. of possible events catered for: event_triggergrammer + time + person &event_triggergrammer + date + person &event_triggergrammer + person + time &event_triggergrammer + person + date &event_triggergrammer + date and time + person &event_triggergrammer + person + date and time &event_triggergrammer + date and timeThe EventRule rule will match any text that is an annotation of the event_triggergrammarevent_triggergrammer: Meeting at, meeting with, etc..When rule matches a sequence of text, the whole sequence is allocated a label by the rule, in our case this is eventTriggerTherefore we say that this sequence of text will be given an annotation of type EventTrigger (name of new ‘Event’ class) and a rule feature set to EventRule
  • -edit of date possible by selecting an appropriate day and/or month from their dropdown menu and edit the year if required
  • -this feature helps user to retrieve any of the past annotated concepts-User doesn’t need to go through whole transcript to find any previously annotated concepts incl. events-User will be returned with any semantically related concepts that satisfy these search criteria-Jena framework - chosen as a RDF store for SemChat since it is aimed at building semantic web applications-Jena contains a querying language – SPARQL that can be used for querying a RDF storage with ease
  • -To our knowledge no formal evaluation conducted-Previous research (Dumas and Redish, 1999) outlined that 6-12 participants are enough to test the usability of a system-Past tests have shown that this amount is normally enough to come up with certain conclusions after the evaluation of the system is complete-Session was split in 3 parts: --Part 1 – walkthrough of SemChat to show how it can be used and what its main features are --Part 2 – involved each user evaluating SemChat by chatting with another user for approximately 20 minutes --Part 3 – each user had to fill in a feedback form which targeted several aspects of the system
  • -We were able to identify limitations as well as possible improvements to our system-Search feature:users did not see the need to search for any past annotated concepts-Search feature: users may not have been accustomed to search within chat conversations-Search feature: users may have been unaware of the potential that a semantic search facility can offer, possibly due to limited search facilities offered by most chat clients-Search feature:there was a high level of satisfaction amongst the users who used the semantic search facility
  • -It took between 3 to 5 seconds to extract a conversation of approximately 20 minutes-If certain important key phrases are not repeatedly mentioned they will be deemed as being not relevant within the scope of the conversation
  • -example of anevent which was not recognized: “will be going to Holland” – no date, or person’s name was included in the phrase such an event-In research conducted by Creswell et. al the main problematic issues related to extracting info. from chats can be attributed due to the “noisy” structure of a chat conversation - misspellings, non-standard use of orthography, punctuation and grammar – presents difficulties for generic information extraction engines
  • - Semantic Chat - topic which is still being developed and thus, several works found were either partially developed of which were no longer continued, or else are still being developed to improve their functionality or not evaluated.-ConChat: Context-aware chat program which improves electronic communication by presenting contextual informationConChat: Context in this application is accessible either via a first order predicate having four arguments being: Context Type, Subject, Relater and Object, or via Boolean algebraIn the case of time and date formats (semantic ambiguities), SemChat caters for them in a different manner from ConChat since several JAPE rules were implemented to recognize different types of date formats that can be used within a chat conversation.
  • -GaChat: Appends all related information about dialogue text between its users-GaChat: This additional data is automatically displayed on the chat windows of both user and sender of the message-GaChat: Helps in reducing the elements of ambiguity like searching and also the asking of some particular details of a particular phrase -In SemChat, the user has the option to seek further information from Wikipedia about each extracted concept
  • - SAM: Semantic Aware Instant MessagingExtends the BuddySpacechat client by semantic annotations, semantic search, semantic browsing and semantic meta-data communication Taxonomy panel: annotation of messages helps outline which messages are more relevant or not Semantic Querying – attribute ex. Search by date-SemChat: we extend Spark IM which is also an XMPP (Extensible Messaging and Presence Protocol ) protocol client-SemChat: semantic annotations of concepts extracted from a chat conversation-SemChat: semantic search feature based on the concepts that are annotated by the user-(on the other hand in) SemChat: we store extracted concepts within NEPOMUK’s PIMO and events in an event scheduler making SemChat more versatile and in line with PIM tools
  • -Simon Scerri-Semanta: Plug-in to two popular email clients-Not directly related to semantic chat-Has some similarities to SemChat-Architecture behind SemChat was inspired by Semanta-Semanta:introduced the message concept to the semantic desktop so that it would be able to link people, projects, Events and tasks together
  • -Simon Corston-Oliver, Eric Ringger, Michael Gamon and Richard Campbell – Microsoft Research-Summary of message: consists of a list of action items extracted from the messageSmartMail performs a superficial analysis of an email message to distinguish the header, message body (containing the new message content), and forwarded sections1) Smart Mail breaks message body into sentences2) then determines speech act of each sentence by consulting a machine-learned classifier3)if sentence classified as task – performs linguistic processing to reformulate sentence as a task description4) task description then presented to user SemChat: able to extract events from chat conversations which are manually annotated by the user and stored within Spark’s task list scheduler
  • -ex. Lightning scheduler which is a Thunderbird plug-in
  • -optimize searching process: since it has to sift through many annotated concepts and it takes some time to find all the semantic relations between the concepts satisfying the search criteria-chat transcript feature: user would better recall the context within which a particular concept was mentioned during a chat conv.-quantitatively evaluation: users assigned a set of tasks that will be conducted first on a normal chat client and then on SemChat – to provide the costs and benefits of using a semantic chat client
  • -Currently the only slang that SemChat caters for is related to events that are mentioned in chats for ex. ‘mtg’ – meeting and ‘gotta’ – got to
  • -Our initial effort -Each and every annotated concept is directly stored in the user’s PIMO, thus offering the possibility of linking important resources found on the user’s desktop with the concepts extracted from chat conversations-area is relatively young, and is still being developed and researched-We are confident in labelling SemChat as successful, even though it is still a prototype and novel in its kind and requires further work to be realized as complete PIM tool-Difficulties were therefore related to the latter, since we found it difficult to find similar pragmatic projections which could have faced and overcome problems similar to the ones we had-In conclusion, further development in this area is a must, and evolution in the domains of semantic chat continues

SemChat: Extracting Personal Information from Chat Conversations (EKAW 2010) SemChat: Extracting Personal Information from Chat Conversations (EKAW 2010) Presentation Transcript

  • SemChat: Extracting Personal Information from Chat Conversations
    By Keith Cortis & Charlie Abela
  • Instant Messaging (IM) - communication in real time were messages are transferred in a seemingly peer-to-peer manner
    Increase in the fragmentation of personal information
    Several tools developed to aid users in the management of their personal information space
    Introduction
  • Vision behind Semantic Desktop (SD) - tackling the difficulties when managing personal information
    Research - towards this area & extraction of semantics from chat conversations
    Improve PIM by linking the different content found on the desktop with the extracted semantics
    Introduction (cont)
  • Exploiting and extending NEPOMUK’s Social Semantic Desktop framework with a semantic chat client component, ‘SemChat’
    Extraction and annotation of important concepts from a chat conversation
    Storage of any concepts that were not annotated, for reference in future SemChat sessions
    Aims and Objectives
  • Semantic search for specific concepts (incl. events) in different ways, for example by date
    Ability to use this plug-in from different chat clients achievable by using a client that can handle multiple protocols
    Aims and Objectives (cont)
  • General Architecture
    SemChat
  • NEPOMUK – allows user to manage all data found on her desktop and to link the documents within the PIMO
    Spark IM – XMPP chat client that satisfied our needs
    Spark IM – enhanced with multiprotocol functionality via the availability of an XMPP server
    SemChat – Technologies
  • End of chat session - non-intrusive system
    Cost of interruptions varies on average between 10-15 minutes before users return their focus to the disrupted task
    SemChat – Concept Extraction
  • Context menus used to represent operations that a user can do, for each extracted concept
    SemChat – Concept Extraction (cont)
  • SemChat – Concept Extraction (cont)
  • JAPE rules implemented – to recognize possible events within a chat conversation using regular expressions in annotations
    SemChat – Concept Extraction (cont)
  • Rule: EventRule
    (
    { Lookup.majorType==event_trigger }
    ):eventTrigger
    -->
    {
    AnnotationSetmatchedAnns= (AnnotationSet) bindings.get("eventTrigger");
    FeatureMapnewFeatures= Factory.newFeatureMap();
    newFeatures.put("rule","EventRule");
    outputAS.add(matchedAnns.firstNode(),matchedAnns.lastNode(),
    "EventTrigger",newFeatures);
    }
    SemChat – Concept Extraction (cont)
  • SemChat – Concept Extraction (cont)
    Title and prospective date of the extracted event can be edited by the user
    Annotated event will automatically be saved within Spark’s Task List
  • User can filter out a search by several criteria for example by date
    SemChat – Semantic Search
  • No formal evaluation was performed on any of the semantic chat clients’ projects that we considered in the related works section
    A session was organized were 8 users tried out SemChat
    6-12 participants are enough to test the usability of a system (Dumas and Redish)
    Evaluation
  • Features of extracting concepts from chat conversations – proved as a popular choice
    Semantic search feature proved to be less popular with several users
    Majority of users experienced the extraction of concepts and/or events from their chat conversation
    Evaluation - Main Findings
  • All extracted concepts/events annotated by users were successfully stored in the PIMO and Task List respectively
    In some cases important concepts flagged within a conversation were not extracted
    Problem – XtraK4Me selects most important key phrases ordered by occurrence rate
    Evaluation - Main Findings (cont)
  • Problem addressed by improving XtraK4Me or possibly using a better key phrase extractor
    Limitation – some events not extracted since they didn’t conform to the structure that SemChat was implemented to recognize
    Possible solution – further extend ANNIE NER to recognize all possible types of events that can be present within a chat conversation
    Evaluation - Main Findings (cont)
  • Context-aware chat program
    Tries to solve semantic conflicts which occur between chatting users through the tagging of ambiguous chat messages
    Solves part of this problem and is a step forward towards eliminating semantic conflicts which occur in chat sessions
    Related Work - ConChat
  • Morphological analysis used to extract proper nouns from the dialogue text
    Online images and articles from Wikipedia related to the extracted nouns are simultaneously displayed alongside the dialogue text
    Helps in reducing the elements of ambiguity like searching
    Related Work - GaChat
  • Identify and improve problems that IM systems encounter moving towards the Networked Semantic Desktop
    Chat window offers a taxonomy panel where annotation of messages is permitted whilst a user is chatting
    Semantic Querying - search of messages wanted by specifying a particular attribute
    Related Work - SAM
  • System uses existing email transport technology
    Is integrated with NEPOMUK
    Handles and keeps track of action items within email messages
    Extracts tasks and appointments found within email messages which are then added to the email client’s scheduler
    Information Extraction within emails: Semanta
  • Prototype system
    Automatically identifies action items (tasks) in email messages
    Presents user with a task-focused summary of a message
    User can add action items to their “to do” list
    Information Extraction within emails: SmartMail
  • Integration of SemChat with popular applications such as a an email client like Thunderbird
    Extracted events would be logged automatically into the client’s event scheduler
    Extend ANNIE NER through JAPE so that other entities could be extracted from conversations such as: emails, products, etc.
    Future Work
  • Semantic search feature – further optimize the searching process
    Semantic search feature – further enhanced to display part of chat transcript satisfying the search criteria
    Semantic annotations generated by SemChat – quantitatively evaluated in the future
    Future Work (cont)
  • Investigate slang language in IM into more depth so that SemChat would be adopted to be handle it
    Ex. : “mt b4 lunch @11.30am nxttue”
    We can further extend ANNIE NER with JAPE to be able to recognize such an event
    ‘mt b4’ as being ‘meet before’ and ‘nxttue’ as being ‘next Tuesday’
    Future Work (cont)
  • We have presented a semantic chat component in SemChat which was integrated with a SSD application – NEPOMUK
    SemChat contributes further to area of PIM through the integration of concepts in the user’s PIMO and the integration of events within an events scheduler
    SemChat also reflects the research being done in the area of the SD in relation to Semantic Chat
    Conclusion
  • Thank you for your attention !
    Any Questions?