SemChat: Extracting Personal Information from Chat Conversations (EKAW 2010)

731 views
628 views

Published on

This paper was presented in the 1st Workshop on Personal Semantic Data (PSD 2010: http://semanticweb.org/wiki/Personal_Semantic_Data) at EKAW 2010 (http://ekaw2010.inesc-id.pt/) Conference on Knowledge Engineering and Knowledge Management by the Masses in Lisbon, Portugal on 11 October 2010.

The full paper can be found on: http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-629/psd2010_paper2.pdf

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
731
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • -As a brief introduction...
    -Internet has brought about a radical change in the way people interact
    -Online communities have flourished, first fueled by electronic mail (e-mail), and nowadays complemented by instant messaging (IM)
    -Development of IM in 1993 to overcome e-mail which is not as fast, where it became very popular over the last couple of years
    - This led to an increase...
  • Semantic Desktop focuses on the provision of a personal information management (PIM) system that integrates and presents relevant content within the user’s desktop in a manageable and practical fashion / user-relevant manner
    Concepts are the most important forms of semantics.
    A Concept can be a location, person, organization or date, event (meeting mentioned during a conversation)
    Quick overview structure of talk: Aims and Objectives, SemChat, Evaluation, Related Works to our research, Future Work, Conclusions
  • PIMO - contains all concepts, resources and relationships found on a user’s desktop.
    A concept can be a location, person, organization or date and Event
    Extracted concepts which have not already been stored within the user’s PIMO
    Annotation of concepts deemed relevant to the user, within the user’s PIMO
    An event refers to a meeting mentioned during a conversation
    An option to annotate such events should be available in the task/Event scheduler
    An event scheduler notifying users when forthcoming saved events are due




  • -All concepts and events (from different users) related to the search criterion are displayed to the user in a comprehensive manner
    -Create a chat client plug-in which includes all the functionality mentioned
    Extraction of concepts and events is done once the user ends a chat conversation session
    Multiprotocols such as: AOL IM, ICQ, MSN Messenger, Yahoo! and IRC amongst many others


  • -General architecture and it’s main components
    -Motivation behind this architecture partly came from work performed on Semanta and SemNotes which are applications that also exploit the
    ideas behind SD and SSD. The former is a semantic email component while the latter is a note-taking tool, and both integrate closely with NEPOMUK
    Once the closure of a chat session or a chat room is detected – SemChat starts its processing
    Chat conversation of both users is retrieved and passed to the XtraK4Me key phrase extractor
    XtraK4Me key phrase extractor used to identify the main key words (concepts and events) from it and find the ones which are not already stored within the user’s PIMO in NEPOMUK
    GATE’s ANNIE was used for identifying the entities of the extracted key phrases
    SemChat searches for any possible events within the chat conversation via the use of ANNIE NER
    Annotated concepts stored in user’s PIMO whilst annotated events stored in Spark IM task scheduler
    Knowledge obtained from chat conversations can be exposed and linked to the knowledge found on the user’s desktop through NEPOMUK



  • -XMPP : Extensible Messaging and Presence Protocol 
    -Spark IM ideal candidate for SemChat - open source cross platform IM client and could be further extended through plug-in development
    -multiprotocol chat client possibility to connect to multiple chat protocols from within the same client
    -Openfire server - XMPP server used which provides the possibility of using several IM networks such as GaduGadu, Yahoo!, AIM, MSN, IRC, ICQ, GTalk and XMPP
  • -non-intrusive – no notifications displayed whilst user is still chatting
    -Iqbal and Horvitz – Interruption management : disrupted task in our case would be the current chat activity
    -Once concept extraction process is complete the user is presented with a notification linked to a list of extracted concepts which is displayed in a separate tab
    -Intention behind this feature is to make the whole process less disruptive and distracting
    -All concepts are displayed according to the date of when they were extracted, in descending order

  • -Context Menu: If a user right clicks on the Concept’s name, a context menu is displayed showing three options that a user can choose from
    -1st option – to save the concept within the user’s PIMO in NEPOMUK
    -2nd option - to delete the concept from the list
    -3rd option - to retrieve more information about the concept, where the first paragraph from Wikipedia containing information about that concept (if any is found) is displayed in a separate pop-up window

  • -Liverpool Concept stored in user’s PIMO
  • -ANNIE NER isn’t able to recognize Event entities, therefore we extended it to be able to do so
    -JAPE is a Java Annotation Patterns Engine which contains finite state transduction over annotations which are mainly based on regular expressions
    -This grammar is used to add resources which can already be found in GATE
    -Implemented JAPE rules look up for different kinds of text sequences, such as phrases that may indicate a possible meeting, and different types of dates and time.
    -Extracted events are also represented beneath each other according to the date of their extraction
    -User has the possibility to edit both the title of an extracted event, and also the prospective date
    -Any annotated event will automatically be saved within Spark’s Task List


  • JAPE rule : implemented to look up phrases which might indicate a possible event within a conversation
    Ex. of possible events catered for: event_trigger grammer + time + person & event_trigger grammer + date + person & event_trigger grammer + person + time & event_trigger grammer + person + date & event_trigger grammer + date and time + person & event_trigger grammer + person + date and time & event_trigger grammer + date and time
    The EventRule rule will match any text that is an annotation of the event_trigger grammar
    event_trigger grammer: Meeting at, meeting with, etc..
    When rule matches a sequence of text, the whole sequence is allocated a label by the rule, in our case this is eventTrigger
    Therefore we say that this sequence of text will be given an annotation of type EventTrigger (name of new ‘Event’ class) and a rule feature set to EventRule



  • -edit of date possible by selecting an appropriate day and/or month from their dropdown menu and edit the year if required
  • -this feature helps user to retrieve any of the past annotated concepts
    -User doesn’t need to go through whole transcript to find any previously annotated concepts incl. events
    -User will be returned with any semantically related concepts that satisfy these search criteria
    -Jena framework - chosen as a RDF store for SemChat since it is aimed at building semantic web applications
    -Jena contains a querying language – SPARQL that can be used for querying a RDF storage with ease
  • -To our knowledge no formal evaluation conducted
    -Previous research (Dumas and Redish, 1999) outlined that 6-12 participants are enough to test the usability of a system
    -Past tests have shown that this amount is normally enough to come up with certain conclusions after the evaluation of the system is complete
    -Session was split in 3 parts:
    --Part 1 – walkthrough of SemChat to show how it can be used and what its main features are
    --Part 2 – involved each user evaluating SemChat by chatting with another user for approximately 20 minutes
    --Part 3 – each user had to fill in a feedback form which targeted several aspects of the system
  • -We were able to identify limitations as well as possible improvements to our system
    -Search feature: users did not see the need to search for any past annotated concepts
    -Search feature: users may not have been accustomed to search within chat conversations
    -Search feature: users may have been unaware of the potential that a semantic search facility can offer, possibly due to limited search facilities offered by most chat clients
    -Search feature: there was a high level of satisfaction amongst the users who used the semantic search facility
  • -It took between 3 to 5 seconds to extract a conversation of approximately 20 minutes
    -If certain important key phrases are not repeatedly mentioned they will be deemed as being not relevant within the scope of the conversation
  • -example of an event which was not recognized: “will be going to Holland” – no date, or person’s name was included in the phrase such an event
    -In research conducted by Creswell et. al the main problematic issues related to extracting info. from chats can be attributed due to the “noisy” structure of a chat conversation - misspellings, non-standard use of orthography, punctuation and grammar – presents difficulties for generic information extraction engines
  • - Semantic Chat - topic which is still being developed and thus, several works found were either partially developed of which were no longer continued, or else are still being developed to improve their functionality or not evaluated.

    -ConChat: Context-aware chat program which improves electronic communication by presenting contextual information
    ConChat: Context in this application is accessible either via a first order predicate having four arguments being: Context Type, Subject, Relater and Object, or via Boolean algebra
    In the case of time and date formats (semantic ambiguities), SemChat caters for them in a different manner from ConChat since several JAPE rules were implemented to recognize different types of date formats that can be used within a chat conversation.


  • -GaChat: Appends all related information about dialogue text between its users
    -GaChat: This additional data is automatically displayed on the chat windows of both user and sender of the message
    -GaChat: Helps in reducing the elements of ambiguity like searching and also the asking of some particular details of a particular phrase
    -In SemChat, the user has the option to seek further information from Wikipedia about each extracted concept



  • - SAM: Semantic Aware Instant Messaging
    Extends the BuddySpace chat client by semantic annotations, semantic search, semantic browsing and semantic meta-data communication
    Taxonomy panel: annotation of messages helps outline which messages are more relevant or not
    Semantic Querying – attribute ex. Search by date
    -SemChat: we extend Spark IM which is also an XMPP (Extensible Messaging and Presence Protocol ) protocol client
    -SemChat: semantic annotations of concepts extracted from a chat conversation
    -SemChat: semantic search feature based on the concepts that are annotated by the user
    -(on the other hand in) SemChat: we store extracted concepts within NEPOMUK’s PIMO and events in an event scheduler making SemChat more versatile and in line with PIM tools



  • -Simon Scerri
    -Semanta: Plug-in to two popular email clients
    -Not directly related to semantic chat
    -Has some similarities to SemChat
    -Architecture behind SemChat was inspired by Semanta
    -Semanta: introduced the message concept to the semantic desktop so that it would be able to link people, projects, Events and tasks together
  • -Simon Corston-Oliver, Eric Ringger, Michael Gamon and Richard Campbell – Microsoft Research
    -Summary of message: consists of a list of action items extracted from the message
    SmartMail performs a superficial analysis of an email message to distinguish the header, message body (containing the new message content), and forwarded sections
    1) Smart Mail breaks message body into sentences
    2) then determines speech act of each sentence by consulting a machine-learned classifier
    3)if sentence classified as task – performs linguistic processing to reformulate sentence as a task description
    4) task description then presented to user

    SemChat: able to extract events from chat conversations which are manually annotated by the user and stored within Spark’s task list scheduler
  • -ex. Lightning scheduler which is a Thunderbird plug-in
  • -optimize searching process: since it has to sift through many annotated concepts and it takes some time to find all the semantic relations between the concepts satisfying the search criteria
    -chat transcript feature: user would better recall the context within which a particular concept was mentioned during a chat conv.
    -quantitatively evaluation: users assigned a set of tasks that will be conducted first on a normal chat client and then on SemChat – to provide the costs and benefits of using a semantic chat client
  • -Currently the only slang that SemChat caters for is related to events that are mentioned in chats for ex. ‘mtg’ – meeting and ‘gotta’ – got to
  • -Our initial effort
    -Each and every annotated concept is directly stored in the user’s PIMO, thus offering the possibility of linking important resources found on the user’s desktop with the concepts extracted from chat conversations
    -area is relatively young, and is still being developed and researched
    -We are confident in labelling SemChat as successful, even though it is still a prototype and novel in its kind and requires further work to be realized as complete PIM tool
    -Difficulties were therefore related to the latter, since we found it difficult to find similar pragmatic projections which could have faced and overcome problems similar to the ones we had
    -In conclusion, further development in this area is a must, and evolution in the domains of semantic chat continues

  • SemChat: Extracting Personal Information from Chat Conversations (EKAW 2010)

    1. 1. By Keith Cortis & Charlie Abela
    2. 2.  Instant Messaging (IM) - communication in real time were messages are transferred in a seemingly peer-to-peer manner  Increase in the fragmentation of personal information  Several tools developed to aid users in the management of their personal information space
    3. 3.  Vision behind Semantic Desktop (SD) - tackling the difficulties when managing personal information  Research - towards this area & extraction of semantics from chat conversations  Improve PIM by linking the different content found on the desktop with the extracted semantics
    4. 4.  Exploiting and extending NEPOMUK’s Social Semantic Desktop framework with a semantic chat client component, ‘SemChat’  Extraction and annotation of important concepts from a chat conversation  Storage of any concepts that were not annotated, for reference in future SemChat sessions
    5. 5.  Semantic search for specific concepts (incl. events) in different ways, for example by date  Ability to use this plug-in from different chat clients achievable by using a client that can handle multiple protocols
    6. 6.  General Architecture
    7. 7.  NEPOMUK – allows user to manage all data found on her desktop and to link the documents within the PIMO  Spark IM – XMPP chat client that satisfied our needs  Spark IM – enhanced with multiprotocol functionality via the availability of an XMPP server
    8. 8.  End of chat session - non-intrusive system  Cost of interruptions varies on average between 10-15 minutes before users return their focus to the disrupted task
    9. 9.  Context menus used to represent operations that a user can do, for each extracted concept
    10. 10.  JAPE rules implemented – to recognize possible events within a chat conversation using regular expressions in annotations
    11. 11. Rule: EventRule ( { Lookup.majorType==event_trigger } ):eventTrigger --> { AnnotationSet matchedAnns= (AnnotationSet) bindings.get("eventTrigger"); FeatureMap newFeatures= Factory.newFeatureMap(); newFeatures.put("rule","EventRule"); outputAS.add(matchedAnns.firstNode(),matchedAnns.lastNode(), "EventTrigger",newFeatures); }
    12. 12.  Title and prospective date of the extracted event can be edited by the user  Annotated event will automatically be saved within Spark’s Task List
    13. 13.  User can filter out a search by several criteria for example by date
    14. 14.  No formal evaluation was performed on any of the semantic chat clients’ projects that we considered in the related works section  A session was organized were 8 users tried out SemChat  6-12 participants are enough to test the usability of a system (Dumas and Redish)
    15. 15.  Features of extracting concepts from chat conversations – proved as a popular choice  Semantic search feature proved to be less popular with several users  Majority of users experienced the extraction of concepts and/or events from their chat conversation
    16. 16.  All extracted concepts/events annotated by users were successfully stored in the PIMO and Task List respectively  In some cases important concepts flagged within a conversation were not extracted  Problem – XtraK4Me selects most important key phrases ordered by occurrence rate
    17. 17.  Problem addressed by improving XtraK4Me or possibly using a better key phrase extractor  Limitation – some events not extracted since they didn’t conform to the structure that SemChat was implemented to recognize  Possible solution – further extend ANNIE NER to recognize all possible types of events that can be present within a chat conversation
    18. 18.  Context-aware chat program  Tries to solve semantic conflicts which occur between chatting users through the tagging of ambiguous chat messages  Solves part of this problem and is a step forward towards eliminating semantic conflicts which occur in chat sessions
    19. 19.  Morphological analysis used to extract proper nouns from the dialogue text  Online images and articles from Wikipedia related to the extracted nouns are simultaneously displayed alongside the dialogue text  Helps in reducing the elements of ambiguity like searching
    20. 20.  Identify and improve problems that IM systems encounter moving towards the Networked Semantic Desktop  Chat window offers a taxonomy panel where annotation of messages is permitted whilst a user is chatting  Semantic Querying - search of messages wanted by specifying a particular attribute
    21. 21.  System uses existing email transport technology  Is integrated with NEPOMUK  Handles and keeps track of action items within email messages  Extracts tasks and appointments found within email messages which are then added to the email client’s scheduler
    22. 22.  Prototype system  Automatically identifies action items (tasks) in email messages  Presents user with a task-focused summary of a message  User can add action items to their “to do” list
    23. 23.  Integration of SemChat with popular applications such as a an email client like Thunderbird  Extracted events would be logged automatically into the client’s event scheduler  Extend ANNIE NER through JAPE so that other entities could be extracted from conversations such as: emails, products, etc.
    24. 24.  Semantic search feature – further optimize the searching process  Semantic search feature – further enhanced to display part of chat transcript satisfying the search criteria  Semantic annotations generated by SemChat – quantitatively evaluated in the future
    25. 25.  Investigate slang language in IM into more depth so that SemChat would be adopted to be handle it  Ex. : “mt b4 lunch @11.30am nxt tue”  We can further extend ANNIE NER with JAPE to be able to recognize such an event  ‘mt b4’ as being ‘meet before’ and ‘nxt tue’ as being ‘next Tuesday’
    26. 26.  We have presented a semantic chat component in SemChat which was integrated with a SSD application – NEPOMUK  SemChat contributes further to area of PIM through the integration of concepts in the user’s PIMO and the integration of events within an events scheduler  SemChat also reflects the research being done in the area of the SD in relation to Semantic Chat
    27. 27. Thank you for your attention ! Any Questions?

    ×