A Noun Phrase Analysis Tool for Mining Online Community Conversations

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    A Noun Phrase Analysis Tool for Mining Online Community Conversations - Presentation Transcript

    1. A Noun Phrase Analysis Tool for Mining Online Community Conversations Caroline Haythornthwaite <haythorn@uiuc.edu> Anatoliy Gruzd <agruzd2@uiuc.edu>
    2. Problem
      • Online communities are creating a growing volume of texts
        • B y 2010, > 70% of all digital information will be user-generated . ( T echnology C onsultancy IDC )
        • And the majority of it will still be text-based
      • How can we analyze and make sense of such a vast amount of textual data?
      • Possible Solution :
        • Automated text analysis
    3. Research Questions
      • Can we automate the process of creating an “ effective representation” of the texts produced by online communities?
      • A “ representation” is an excerpt [of the original text] that describes or may stand in for the original text.
      • An “ e ffective ” representation can help us to answer:
        • Can we discover what the community interests and priorities are?
        • Can we discover patterns of language and interaction that characterize a community?
    4. Why Nouns & Noun Phrases?
      • Nouns and noun phrases tend to be the most informative elements of any sentence
      • Noun phrases make it easy to disambiguate the meaning of words
        • e.g. ‘travel information ’, ‘ information center’, ‘ information management’
      • Noun phrases extraction is easy and cheap to accomplish with today’s Natural Language Processing (NLP) tenchiques
        • NLP is a set of computational techniques for processing natural (human) languages
    5. Example of Noun-Phrase Extraction
      • “ We will have minivan during the conference to help shuttle attendees from the other hotels to and from the Kellogg Center. ”
      • Step 1. Part-of-Speech Tagging
      • <We / PRP > <will / MD > <have / VB > <minivan / NN > <during / IN > <the / DT > <conference / NN > <to / TO > <help / VB > <shuttle / VB > <attendees / NNS > <from / IN > <the / DT > <other / JJ > <hotels / NNS > <to / TO > <and / CC > <from / IN > <the / DT > <Kellogg / NNP > <Center / NNP >
      • Step 2. Chunking
      • <We / PRP > <will / MD > <have / VB > <minivan / NN > <during / IN > <the / DT > <conference / NN > <to / TO > <help / VB > <shuttle / VB > <attendees / NNS > <from / IN > <the / DT > ( NP : <other / JJ > <hotels / NNS > ) <to / TO > <and / CC > <from / IN > <the / DT > ( NP : <Kellogg / NNP > <Center / NNP > )
      • Representation
      • <We / PRP > <will / MD > <have / VB > ( NP : < minivan / NN > ) <during / IN > ( NP : <the / DT > < conference / NN > ) <to / TO > <help / VB > <shuttle / VB > ( NP : < attendees / NNS > ) <from / IN > ( NP : <the / DT > < other _ hotels / NNS > ) <to / TO > <and / CC > <from / IN > ( NP : <the / DT > < Kellogg _ Center / NNP > )
    6. Examples of Part-of-Speech Tags commonly used in NLP We … … Center
    7. Examples of Open-source NLP Toolkits
      • NLTK - http://nltk.sourceforge.net
      • LingPipe - http://www.alias-i.com/lingpipe/
      • MII NLP Toolkit - http://www.mii.ucla.edu/nlp/
      • OpenNLP - http ://opennlp.sourceforge.net/
      WARNING ! Advanced Knowledge of Computational Linguistics & Programming Skills Required !!!
    8. ICTA Internet Community Text Analyzer bulletin board messages Data source Information Organization a nd Access Course name 1200 - 2100 No. of messages per class 31 - 54 No. of students per class 15 weeks Duration of each class 2001 – 2004 School years 8 Classes
    9. Preliminary Exploration of the dataset using ICTA
      • Most frequently used words
      • Important Topics Over Time
      • Community Style
      • Community Support
    10. Preliminary Exploration 1. Most frequently used words
      • Profession -related words
        • book/s, information, library/libraries, librarian/s
        • user/s, and patron/s, people
        • database/s, search, document/s
      • Learning -related words
        • question/s, article/s, example/s, way, study, class, course, research, journal, reading, method, problem, hard time
    11. Preliminary Exploration 2. Important Topics Over Time
    12. Preliminary Exploration 3. Community Style
    13. Preliminary Exploration 4. Community Support
    14. Future Work
      • Exploring the use of other word classes (e.g. verbs)
      • Training NLP-algorithms on a CMC-type of corpora
      • Automatic grouping of noun phrases into concepts (with manual override)
        • e.g. RDB, database, relational database
      • Connecting ICTA to external textual data from other public online communities (e.g. blogs, myspace)
        • RSS feeds/ web APIs
      • Understanding the social science of language use in online communities
      • Collaboration with the NCSA’s DISCUS project
        • concept maps
        • social networks

    + Dalhousie University, CanadaDalhousie University, Canada, 3 years ago

    custom

    1266 views, 1 favs, 2 embeds more stats

    A Noun Phrase Analysis Tool for Mining Online Commu more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1266
      • 1255 on SlideShare
      • 11 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 32
    Most viewed embeds
    • 10 views on http://people.lis.uiuc.edu
    • 1 views on http://people.lis.illinois.edu

    more

    All embeds
    • 10 views on http://people.lis.uiuc.edu
    • 1 views on http://people.lis.illinois.edu

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories