Your SlideShare is downloading. ×
0
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
SEASR and UIMA
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SEASR and UIMA

2,356

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,356
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
58
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SEASR and UIMA Mike Haberman mikeh@ncsa.uiuc.edu National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
  • 2. UIMA Unstructured Information Management Applications
  • 3. UIMA to SEASR SEASR
  • 4. UIMA + P.O.S. tagging Four Analysis Engines to analyze document to record POS information. OpenNLP OpenNLP OpenNLP POSWriter Tokenizer PosTagger SentanceDetector Serialization of the UIMA CAS
  • 5. UIMA Structured data •  POSWriter is a CAS Consumer –  Extracted data from the CAS –  Ready for import into SEASR
  • 6. UIMA + P.O.S. tagging: step 1
  • 7. UIMA + P.O.S. tagging: step 2
  • 8. UIMA + P.O.S. tagging: step 3
  • 9. UIMA + P.O.S. tagging: step 4
  • 10. UIMA Structured data •  Two SEASR examples using UIMA POS data –  Frequent patterns (rule associations) on nouns (fpgrowth) –  Sentiment analysis on adjectives
  • 11. UIMA to SEASR: Experiment I •  Finding patterns
  • 12. SEASR + UIMA: Frequent Patterns Frequent Pattern Analysis on nouns •  Goal: –  Discover a cast of characters within the text –  Discover nouns that frequently occur together •  character relationships
  • 13. Frequent Patterns: nouns •  Use of item sets in fpgrowth •  What’s new: –  handling sparse item sets Transac'on
Id
 Item
 Item
 Item
 ••• A
 B
 C
 1
 0
 1
 1
 2
 1
 1
 1
 3
 1
 0
 1
 4
 1
 0
 0

  • 14. Frequent Patterns: nouns •  What’s new: –  handling sparse item sets Transac'on
 {A,B,C}
 {X,Y}
 {F,E,A,C,E}
 {A,Z,X,U,I,O}

  • 15. Frequent Patterns: nouns Reads UIMA’s CAS consumer output SEASR Flow Enter number UIMA data source •  url of the of sentences to group http://repository.seasr.org/Meandre/Locations/1.4/Demo-UIMA/repository.ttl {word=tom} (similar to fpgrowth demo) http://repository.seasr.org/Datasets/POS/ {word=answer} Enter support: 10% {word=tom} tomSawyer.NN.is, tomSawyer.NNP.is {word=lady,word=spectacles,word=room,word=thing,word=boy,word=state,wor uncleTom.NN.is, uncleTom.NNP.is {word=bed,word=broom,word=breath,word=punches,word=nothing,word=cat} {word=aunt,word=polly,word=moment,word=laugh} {word=boy,word=anything,word=aint,word=tricks,word=fools,word=fools,word=
  • 16. Frequent Patterns: visualization Analysis of Tom Sawyer 10 paragraph window Support set to 10%
  • 17. Frequent Patterns: nouns •  Recap: SEASR flow information •  The repository location is: –  http://repository.seasr.org/Meandre/Locations/1.4/ Demo-UIMA/repository.ttl •  Reads UIMA’s CAS consumer output –  Select file/url of the UIMA data source –  http://repository.seasr.org/Datasets/POS tomSawyer.NN.is, tomSawyer.NNP.is, uncleTom.NN.is, uncleTom.NNP.is •  Similar to fpgrowth demo
  • 18. UIMA + SEASR: Frequent Patterns •  Extensions –  Analysis for separate chapters •  Discover new relationships that occur over small windows –  Adjectives, Adverbs •  Common, repeating word usage, phrases –  Entity Extraction: Dates, Locations, Geo
  • 19. UIMA to SEASR: Experiment II •  Sentiment Analysis
  • 20. UIMA + SEASR: Sentiment Analysis •  Classifying text based on its sentiment –  Determining the attitude of a speaker or a writer –  Determining whether a review is positive/negative
  • 21. UIMA + SEASR: Sentiment Analysis •  Ask: What emotion is being conveyed within a body of text? –  Look at only adjectives (UIMA POS) •  lots of issues, challenges, and but’s “but … “
  • 22. UIMA + SEASR: Sentiment Analysis •  Need to Answer: –  What emotions to track? –  How to measure/classify an adjective to one of the selected emotions? –  How to visualize the results
  • 23. UIMA + SEASR: Sentiment Analysis •  Which emotions: –  http://en.wikipedia.org/wiki/List_of_emotions –  http://changingminds.org/explanations/emotions/ basic%20emotions.htm –  http://www.emotionalcompetency.com/ recognizing.htm •  Parrot’s classification (2001) –  six core emotions –  Love, Joy, Surprise, Anger, Sadness, Fear
  • 24. UIMA + SEASR: Sentiment Analysis
  • 25. UIMA + SEASR: Sentiment Analysis •  How to classify adjectives: –  Lots of metrics we could use … •  Lists of adjectives already classified –  http://www.derose.net/steve/resources/emotionwords/ewords.html –  Need a “nearness” metric for missing adjectives –  How about the thesaurus game ?
  • 26. UIMA + SEASR: Sentiment Analysis •  Using only a thesaurus, find a path between two words –  no antonyms –  no colloquialisms or slang
  • 27. UIMA + SEASR: Sentiment Analysis •  How to get from delightful to rainy ? ['delightful', 'fair', 'balmy', 'moist', 'rainy']. •  sexy to joyless? ['sexy', 'provocative', 'blue', 'joyless’] •  bitter to lovable? ['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]
  • 28. UIMA + SEASR: Sentiment Analysis •  Use this game as a metric for measuring a given adjective to one of the six emotions. •  Assume the longer the path, the “farther away” the two words are. •  address some of issues
  • 29. UIMA + SEASR: Sentiment Analysis •  SynNet: a traversable graph of synonyms (adjectives)
  • 30. SynNet: rainy to pleasant
  • 31. UIMA + SEASR: Sentiment Analysis •  SynNet Metrics •  Common nodes •  Path length •  Symmetric: a->b->c c->b->a •  Link strength: •  tangy->sweet •  sweet->lovable •  Use of slang or informal usage
  • 32. UIMA + SEASR: Sentiment Analysis •  Common Nodes •  depth of common
  • 33. UIMA + SEASR: Sentiment Analysis •  Symmetry of path in common nodes
  • 34. UIMA + SEASR: Sentiment Analysis •  Find the shortest path between adjective and each emotion: •  ['delightful', 'beatific', 'joyful'] •  ['delightful', 'ineffable', 'unspeakable', 'fearful'] •  Pick the emotion with shortest path length •  tie breaking procedures
  • 35. UIMA + SEASR: Sentiment Analysis •  Not a perfect solution –  still need context to get quality •  Vain –  ['vain', 'insignificant', 'contemptible', 'hateful'] –  ['vain', 'misleading', 'puzzling', 'surprising’] •  Animal ['animal', 'sensual', 'pleasing', 'joyful'] –  ['animal', 'bestial', 'vile', 'hateful'] –  ['animal', 'gross', 'shocking', 'fearful'] –  ['animal', 'gross', 'grievous', 'sorrowful'] –  •  Negation –  “My mother was not a hateful person.”
  • 36. UIMA + SEASR: Sentiment Analysis •  A word about WordNet •  http://wordnetweb.princeton.edu/ •  English nouns, verbs, adjectives and adverbs organized into sets of synonyms (synsets)
  • 37. UIMA + SEASR: Sentiment Analysis •  Adjective islands •  There is no path from delightful to happy •  happy: {beaming, beamy, effulgent, felicitous, glad, happy, radiant, refulgent, well-chosen}
  • 38. UIMA + SEASR: Sentiment Analysis •  Process Overview •  Extract the adjectives (UIMA POS analysis) •  Read in adjectives (SEASR library) •  Label each adjective (SynNet) •  Summarize windows of adjectives •  lots of experimentation here •  Visualize the windows
  • 39. UIMA + SEASR: Sentiment Analysis •  Visualization •  New SEASR visualization component •  Based on flare ActionScript Library •  http://flare.prefuse.org/ •  Still in development •  http://demo.seasr.org:1714/public/resources/data/emotions/ ev/EmotionViewer.html
  • 40. UIMA + SEASR: Sentiment Analysis
  • 41. UIMA + SEASR: Sentiment Analysis •  Extensions •  Adverbs, nouns, verbs •  Analysis of metrics, etc •  Goal and Relevancy •  Two new components •  SynNet •  Flash based visualization of sequential based data

×