A Noun Phrase Analysis Tool for  Mining Online Community Conversations Caroline Haythornthwaite <haythorn@uiuc.edu>  Anato...
Problem <ul><li>Online communities are creating a growing volume of texts </li></ul><ul><ul><li>B y 2010,  > 70% of all di...
Research Questions <ul><li>Can we automate the process of creating an “ effective   representation”  of the texts produced...
Why Nouns & Noun Phrases? <ul><li>Nouns and noun phrases tend to be the most informative elements of any sentence   </li><...
Example of Noun-Phrase Extraction <ul><li>“ We will have minivan during the conference to help shuttle attendees from the ...
Examples of Part-of-Speech Tags commonly used in NLP We … …  Center
Examples of Open-source NLP Toolkits <ul><li>NLTK  -  http://nltk.sourceforge.net </li></ul><ul><li>LingPipe   -  http://w...
ICTA  Internet Community Text Analyzer bulletin board messages Data source Information   Organization  a nd Access Course ...
Preliminary Exploration of  the dataset using ICTA <ul><li>Most frequently used words </li></ul><ul><li>Important Topics  ...
Preliminary Exploration  1. Most frequently used words <ul><li>Profession -related words </li></ul><ul><ul><li>book/s, inf...
Preliminary Exploration  2. Important Topics   Over Time
Preliminary Exploration  3. Community Style
Preliminary Exploration  4. Community Support
Future Work <ul><li>Exploring the use of other word classes (e.g. verbs)  </li></ul><ul><li>Training NLP-algorithms on a C...
Upcoming SlideShare
Loading in …5
×

A Noun Phrase Analysis Tool for Mining Online Community Conversations

3,299 views
2,973 views

Published on

A Noun Phrase Analysis Tool for Mining Online Community Conversations
Caroline Haythornthwaite and Anatoliy Gruzd (U of Illinois)
See a full paper at http://www.iisi.de/fileadmin/IISI/upload/C_T/2007/Haythornthwaite.pdf


Abstract: Online communities are creating a growing legacy of texts. These texts record conversation, knowledge exchange, and variation in topic and orientation as groups grow, mature, and decline; they represent a rich history of group interaction and an opportunity to explore the purpose and development of online communities. The problem is how to approach and make sense of the vast amount of data stored by these communities and to use that information for some useful outcome. In this paper we use automated processes, including natural language processing, to explore the case of text accumulated from bulletin board postings from eight iterations of an online class. The paper presents work done on creating and refining the natural language processing procedures used to examine these data, and a description of results so far from these examinations.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,299
On SlideShare
0
From Embeds
0
Number of Embeds
45
Actions
Shares
0
Downloads
68
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

A Noun Phrase Analysis Tool for Mining Online Community Conversations

  1. 1. A Noun Phrase Analysis Tool for Mining Online Community Conversations Caroline Haythornthwaite <haythorn@uiuc.edu> Anatoliy Gruzd <agruzd2@uiuc.edu>
  2. 2. Problem <ul><li>Online communities are creating a growing volume of texts </li></ul><ul><ul><li>B y 2010, > 70% of all digital information will be user-generated . ( T echnology C onsultancy IDC ) </li></ul></ul><ul><ul><li>And the majority of it will still be text-based </li></ul></ul><ul><li>How can we analyze and make sense of such a vast amount of textual data? </li></ul><ul><li>Possible Solution : </li></ul><ul><ul><li>Automated text analysis </li></ul></ul>
  3. 3. Research Questions <ul><li>Can we automate the process of creating an “ effective representation” of the texts produced by online communities? </li></ul><ul><li>A “ representation” is an excerpt [of the original text] that describes or may stand in for the original text. </li></ul><ul><li>An “ e ffective ” representation can help us to answer: </li></ul><ul><ul><li>Can we discover what the community interests and priorities are? </li></ul></ul><ul><ul><li>Can we discover patterns of language and interaction that characterize a community? </li></ul></ul>
  4. 4. Why Nouns & Noun Phrases? <ul><li>Nouns and noun phrases tend to be the most informative elements of any sentence </li></ul><ul><li>Noun phrases make it easy to disambiguate the meaning of words </li></ul><ul><ul><li>e.g. ‘travel information ’, ‘ information center’, ‘ information management’ </li></ul></ul><ul><li>Noun phrases extraction is easy and cheap to accomplish with today’s Natural Language Processing (NLP) tenchiques </li></ul><ul><ul><li>NLP is a set of computational techniques for processing natural (human) languages </li></ul></ul>
  5. 5. Example of Noun-Phrase Extraction <ul><li>“ We will have minivan during the conference to help shuttle attendees from the other hotels to and from the Kellogg Center. ” </li></ul><ul><li>Step 1. Part-of-Speech Tagging </li></ul><ul><li><We / PRP > <will / MD > <have / VB > <minivan / NN > <during / IN > <the / DT > <conference / NN > <to / TO > <help / VB > <shuttle / VB > <attendees / NNS > <from / IN > <the / DT > <other / JJ > <hotels / NNS > <to / TO > <and / CC > <from / IN > <the / DT > <Kellogg / NNP > <Center / NNP > </li></ul><ul><li>Step 2. Chunking </li></ul><ul><li><We / PRP > <will / MD > <have / VB > <minivan / NN > <during / IN > <the / DT > <conference / NN > <to / TO > <help / VB > <shuttle / VB > <attendees / NNS > <from / IN > <the / DT > ( NP : <other / JJ > <hotels / NNS > ) <to / TO > <and / CC > <from / IN > <the / DT > ( NP : <Kellogg / NNP > <Center / NNP > ) </li></ul><ul><li>Representation </li></ul><ul><li><We / PRP > <will / MD > <have / VB > ( NP : < minivan / NN > ) <during / IN > ( NP : <the / DT > < conference / NN > ) <to / TO > <help / VB > <shuttle / VB > ( NP : < attendees / NNS > ) <from / IN > ( NP : <the / DT > < other _ hotels / NNS > ) <to / TO > <and / CC > <from / IN > ( NP : <the / DT > < Kellogg _ Center / NNP > ) </li></ul>
  6. 6. Examples of Part-of-Speech Tags commonly used in NLP We … … Center
  7. 7. Examples of Open-source NLP Toolkits <ul><li>NLTK - http://nltk.sourceforge.net </li></ul><ul><li>LingPipe - http://www.alias-i.com/lingpipe/ </li></ul><ul><li>MII NLP Toolkit - http://www.mii.ucla.edu/nlp/ </li></ul><ul><li>OpenNLP - http ://opennlp.sourceforge.net/ </li></ul>WARNING ! Advanced Knowledge of Computational Linguistics & Programming Skills Required !!!
  8. 8. ICTA Internet Community Text Analyzer bulletin board messages Data source Information Organization a nd Access Course name 1200 - 2100 No. of messages per class 31 - 54 No. of students per class 15 weeks Duration of each class 2001 – 2004 School years 8 Classes
  9. 9. Preliminary Exploration of the dataset using ICTA <ul><li>Most frequently used words </li></ul><ul><li>Important Topics Over Time </li></ul><ul><li>Community Style </li></ul><ul><li>Community Support </li></ul>
  10. 10. Preliminary Exploration 1. Most frequently used words <ul><li>Profession -related words </li></ul><ul><ul><li>book/s, information, library/libraries, librarian/s </li></ul></ul><ul><ul><li>user/s, and patron/s, people </li></ul></ul><ul><ul><li>database/s, search, document/s </li></ul></ul><ul><li>Learning -related words </li></ul><ul><ul><li>question/s, article/s, example/s, way, study, class, course, research, journal, reading, method, problem, hard time </li></ul></ul>
  11. 11. Preliminary Exploration 2. Important Topics Over Time
  12. 12. Preliminary Exploration 3. Community Style
  13. 13. Preliminary Exploration 4. Community Support
  14. 14. Future Work <ul><li>Exploring the use of other word classes (e.g. verbs) </li></ul><ul><li>Training NLP-algorithms on a CMC-type of corpora </li></ul><ul><li>Automatic grouping of noun phrases into concepts (with manual override) </li></ul><ul><ul><li>e.g. RDB, database, relational database </li></ul></ul><ul><li>Connecting ICTA to external textual data from other public online communities (e.g. blogs, myspace) </li></ul><ul><ul><li>RSS feeds/ web APIs </li></ul></ul><ul><li>Understanding the social science of language use in online communities </li></ul><ul><li>Collaboration with the NCSA’s DISCUS project </li></ul><ul><ul><li>concept maps </li></ul></ul><ul><ul><li>social networks </li></ul></ul>

×