Your SlideShare is downloading. ×
SEASR and UIMA
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

SEASR and UIMA

2,311
views

Published on


0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,311
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
58
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SEASR and UIMA Mike Haberman mikeh@ncsa.uiuc.edu National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
  • 2. UIMA Unstructured Information Management Applications
  • 3. UIMA to SEASR SEASR
  • 4. UIMA + P.O.S. tagging Four Analysis Engines to analyze document to record POS information. OpenNLP OpenNLP OpenNLP POSWriter Tokenizer PosTagger SentanceDetector Serialization of the UIMA CAS
  • 5. UIMA Structured data •  POSWriter is a CAS Consumer –  Extracted data from the CAS –  Ready for import into SEASR
  • 6. UIMA + P.O.S. tagging: step 1
  • 7. UIMA + P.O.S. tagging: step 2
  • 8. UIMA + P.O.S. tagging: step 3
  • 9. UIMA + P.O.S. tagging: step 4
  • 10. UIMA Structured data •  Two SEASR examples using UIMA POS data –  Frequent patterns (rule associations) on nouns (fpgrowth) –  Sentiment analysis on adjectives
  • 11. UIMA to SEASR: Experiment I •  Finding patterns
  • 12. SEASR + UIMA: Frequent Patterns Frequent Pattern Analysis on nouns •  Goal: –  Discover a cast of characters within the text –  Discover nouns that frequently occur together •  character relationships
  • 13. Frequent Patterns: nouns •  Use of item sets in fpgrowth •  What’s new: –  handling sparse item sets Transac'on
Id
 Item
 Item
 Item
 ••• A
 B
 C
 1
 0
 1
 1
 2
 1
 1
 1
 3
 1
 0
 1
 4
 1
 0
 0

  • 14. Frequent Patterns: nouns •  What’s new: –  handling sparse item sets Transac'on
 {A,B,C}
 {X,Y}
 {F,E,A,C,E}
 {A,Z,X,U,I,O}

  • 15. Frequent Patterns: nouns Reads UIMA’s CAS consumer output SEASR Flow Enter number UIMA data source •  url of the of sentences to group http://repository.seasr.org/Meandre/Locations/1.4/Demo-UIMA/repository.ttl {word=tom} (similar to fpgrowth demo) http://repository.seasr.org/Datasets/POS/ {word=answer} Enter support: 10% {word=tom} tomSawyer.NN.is, tomSawyer.NNP.is {word=lady,word=spectacles,word=room,word=thing,word=boy,word=state,wor uncleTom.NN.is, uncleTom.NNP.is {word=bed,word=broom,word=breath,word=punches,word=nothing,word=cat} {word=aunt,word=polly,word=moment,word=laugh} {word=boy,word=anything,word=aint,word=tricks,word=fools,word=fools,word=
  • 16. Frequent Patterns: visualization Analysis of Tom Sawyer 10 paragraph window Support set to 10%
  • 17. Frequent Patterns: nouns •  Recap: SEASR flow information •  The repository location is: –  http://repository.seasr.org/Meandre/Locations/1.4/ Demo-UIMA/repository.ttl •  Reads UIMA’s CAS consumer output –  Select file/url of the UIMA data source –  http://repository.seasr.org/Datasets/POS tomSawyer.NN.is, tomSawyer.NNP.is, uncleTom.NN.is, uncleTom.NNP.is •  Similar to fpgrowth demo
  • 18. UIMA + SEASR: Frequent Patterns •  Extensions –  Analysis for separate chapters •  Discover new relationships that occur over small windows –  Adjectives, Adverbs •  Common, repeating word usage, phrases –  Entity Extraction: Dates, Locations, Geo
  • 19. UIMA to SEASR: Experiment II •  Sentiment Analysis
  • 20. UIMA + SEASR: Sentiment Analysis •  Classifying text based on its sentiment –  Determining the attitude of a speaker or a writer –  Determining whether a review is positive/negative
  • 21. UIMA + SEASR: Sentiment Analysis •  Ask: What emotion is being conveyed within a body of text? –  Look at only adjectives (UIMA POS) •  lots of issues, challenges, and but’s “but … “
  • 22. UIMA + SEASR: Sentiment Analysis •  Need to Answer: –  What emotions to track? –  How to measure/classify an adjective to one of the selected emotions? –  How to visualize the results
  • 23. UIMA + SEASR: Sentiment Analysis •  Which emotions: –  http://en.wikipedia.org/wiki/List_of_emotions –  http://changingminds.org/explanations/emotions/ basic%20emotions.htm –  http://www.emotionalcompetency.com/ recognizing.htm •  Parrot’s classification (2001) –  six core emotions –  Love, Joy, Surprise, Anger, Sadness, Fear
  • 24. UIMA + SEASR: Sentiment Analysis
  • 25. UIMA + SEASR: Sentiment Analysis •  How to classify adjectives: –  Lots of metrics we could use … •  Lists of adjectives already classified –  http://www.derose.net/steve/resources/emotionwords/ewords.html –  Need a “nearness” metric for missing adjectives –  How about the thesaurus game ?
  • 26. UIMA + SEASR: Sentiment Analysis •  Using only a thesaurus, find a path between two words –  no antonyms –  no colloquialisms or slang
  • 27. UIMA + SEASR: Sentiment Analysis •  How to get from delightful to rainy ? ['delightful', 'fair', 'balmy', 'moist', 'rainy']. •  sexy to joyless? ['sexy', 'provocative', 'blue', 'joyless’] •  bitter to lovable? ['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]
  • 28. UIMA + SEASR: Sentiment Analysis •  Use this game as a metric for measuring a given adjective to one of the six emotions. •  Assume the longer the path, the “farther away” the two words are. •  address some of issues
  • 29. UIMA + SEASR: Sentiment Analysis •  SynNet: a traversable graph of synonyms (adjectives)
  • 30. SynNet: rainy to pleasant
  • 31. UIMA + SEASR: Sentiment Analysis •  SynNet Metrics •  Common nodes •  Path length •  Symmetric: a->b->c c->b->a •  Link strength: •  tangy->sweet •  sweet->lovable •  Use of slang or informal usage
  • 32. UIMA + SEASR: Sentiment Analysis •  Common Nodes •  depth of common
  • 33. UIMA + SEASR: Sentiment Analysis •  Symmetry of path in common nodes
  • 34. UIMA + SEASR: Sentiment Analysis •  Find the shortest path between adjective and each emotion: •  ['delightful', 'beatific', 'joyful'] •  ['delightful', 'ineffable', 'unspeakable', 'fearful'] •  Pick the emotion with shortest path length •  tie breaking procedures
  • 35. UIMA + SEASR: Sentiment Analysis •  Not a perfect solution –  still need context to get quality •  Vain –  ['vain', 'insignificant', 'contemptible', 'hateful'] –  ['vain', 'misleading', 'puzzling', 'surprising’] •  Animal ['animal', 'sensual', 'pleasing', 'joyful'] –  ['animal', 'bestial', 'vile', 'hateful'] –  ['animal', 'gross', 'shocking', 'fearful'] –  ['animal', 'gross', 'grievous', 'sorrowful'] –  •  Negation –  “My mother was not a hateful person.”
  • 36. UIMA + SEASR: Sentiment Analysis •  A word about WordNet •  http://wordnetweb.princeton.edu/ •  English nouns, verbs, adjectives and adverbs organized into sets of synonyms (synsets)
  • 37. UIMA + SEASR: Sentiment Analysis •  Adjective islands •  There is no path from delightful to happy •  happy: {beaming, beamy, effulgent, felicitous, glad, happy, radiant, refulgent, well-chosen}
  • 38. UIMA + SEASR: Sentiment Analysis •  Process Overview •  Extract the adjectives (UIMA POS analysis) •  Read in adjectives (SEASR library) •  Label each adjective (SynNet) •  Summarize windows of adjectives •  lots of experimentation here •  Visualize the windows
  • 39. UIMA + SEASR: Sentiment Analysis •  Visualization •  New SEASR visualization component •  Based on flare ActionScript Library •  http://flare.prefuse.org/ •  Still in development •  http://demo.seasr.org:1714/public/resources/data/emotions/ ev/EmotionViewer.html
  • 40. UIMA + SEASR: Sentiment Analysis
  • 41. UIMA + SEASR: Sentiment Analysis •  Extensions •  Adverbs, nouns, verbs •  Analysis of metrics, etc •  Goal and Relevancy •  Two new components •  SynNet •  Flash based visualization of sequential based data