Your SlideShare is downloading. ×
0
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
SEASR Text
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SEASR Text

1,318

Published on

Pathway to SEASR Workshop in March 2009 in North Carolina

Pathway to SEASR Workshop in March 2009 in North Carolina

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,318
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Text National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
  • 2. MONK Project MONK provides: •  1400 works of literature in English from the 16th - 19th century = 108 million words, POS-tagged, TEI-tagged, in a MySQL database. •  Several different open-source interfaces for working with this data •  A public API to the datastore •  SEASR under the hood, for analytics
  • 3. MONK Project Executes flows for each analysis requested –  Predictive modeling using Naïve Bayes –  Predictive modeling using Support Vector Machines (SVM)
  • 4. Dunning Loglikelihood TagCloud •  Words that are under-represented in writings by Victorian women as compared to Victorian men. —Sara Steger
  • 5. Feature Lens “The discussion of the children introduces each of the short internal narratives. This champions the view that her method of repetition was patterned: controlled, intended, and a measured means to an end. It would have been impossible to discern through traditional reading“
  • 6. Semantic Analysis: Information Extraction •  Definition: Information extraction is the identification of specific semantic elements within a text (e.g., entities, properties, relations) •  Extract
the
relevant
informa1on
and
ignore
 non‐relevant
informa1on
(important!)
 •  Link
related
informa1on
and
output
in
a
 predetermined
format

  • 7. Information Extraction Informa(on
Type
 State
of
the
art
(Accuracy)
 En((es
 90‐98%
 an
object
of
interest
such
as
a
 person
or
organiza1on.
 A9ributes
 80%
 a
property
of
an
en1ty
such
as
its
 name,
alias,
descriptor,
or
type.
 Facts
 60‐70%
 a
rela1onship
held
between
two
or
 more
en11es
such
as
Posi1on
of
a
 Person
in
a
Company.
 Events
 50‐60%
 an
ac1vity
involving
several
en11es
 such
as
a
terrorist
act,
airline
crash,
 management
change,
new
product
 introduc1on.
 “Introduction to Text Mining,” Ronen Feldman, Computer Science Department, Bar-Ilan University, ISRAEL
  • 8. Information Extraction Approaches •  Terminology (name) lists –  This works very well if the list of names and name expressions is stable and available •  Tokenization and morphology –  This works well for things like formulas or dates, which are readily recognized by their internal format (e.g., DD/MM/YY or chemical formulas) •  Use of characteristic patterns –  This works fairly well for novel entities –  Rules can be created by hand or learned via machine learning or statistical algorithms –  Rules
capture
local
paFerns
that
characterize
en11es
from
 instances
of
annotated
training
data

  • 9. Semantic Analytics Named Entity (NE) Tagging NE:Person NE:Time Mayor Rex Luthor announced today the establishment NE:Location of a new research facility in Alderwood. It will be NE:Organization known as Boynton Laboratory.
  • 10. Semantic Analysis Co-reference Resolution for entities and unnamed entities Mayor Rex Luthor announced today the establishment UNE:Organization of a new research facility in Alderwood. It will be known as Boynton Laboratory.
  • 11. Semantic Analysis Semantic Role Analysis ACTOR ACTION WHEN OBJECT Mayor Rex Luthor announced today the establishment WHERE OBJECT of a new research facility in Alderwoon. It will be ACTION COMPL known as Boynton Laboratory
  • 12. Semantic Analysis Concept-Relation Extraction today e tim n ) time e (w h actor Rex Luthor announce (who) person action ob w h a ( je t) ct establ. loc(whe event ha t (w jec at re) t) b io o n Boynton Alderwood Lab organiz. location
  • 13. Results: Timeline
  • 14. Results: Maps
  • 15. UIMA Structured data •  Two SEASR examples using UIMA POS data –  Frequent patterns (rule associations) on nouns (fpgrowth) –  Sentiment analysis on adjectives
  • 16. UIMA Unstructured Information Management Applications
  • 17. UIMA + P.O.S. tagging Four Analysis Engines to analyze document to record Part Of Speech information. OpenNLP OpenNLP OpenNLP POSWriter Tokenizer PosTagger SentanceDetector Serialization of the UIMA CAS
  • 18. UIMA to SEASR: Experiment I •  Finding patterns
  • 19. SEASR + UIMA: Frequent Patterns Frequent Pattern Analysis on nouns •  Goal: –  Discover a cast of characters within the text –  Discover nouns that frequently occur together •  character relationships
  • 20. Frequent Patterns: visualization Analysis of Tom Sawyer 10 paragraph window Support set to 10%
  • 21. UIMA to SEASR: Experiment II •  Sentiment Analysis
  • 22. UIMA + SEASR: Sentiment Analysis •  Classifying text based on its sentiment –  Determining the attitude of a speaker or a writer –  Determining whether a review is positive/negative •  Ask: What emotion is being conveyed within a body of text? –  Look at only adjectives (UIMA POS) •  lots of issues, challenges, and but’s “but … “ •  Need to Answer: –  What emotions to track? –  How to measure/classify an adjective to one of the selected emotions? –  How to visualize the results?
  • 23. UIMA + SEASR: Sentiment Analysis •  Which emotions: –  http://en.wikipedia.org/wiki/List_of_emotions –  http://changingminds.org/explanations/emotions/ basic%20emotions.htm –  http://www.emotionalcompetency.com/ recognizing.htm •  Parrot’s classification (2001) –  six core emotions –  Love, Joy, Surprise, Anger, Sadness, Fear
  • 24. UIMA + SEASR: Sentiment Analysis
  • 25. UIMA + SEASR: Sentiment Analysis •  How to classify adjectives: –  Lots of metrics we could use … •  Lists of adjectives already classified –  http://www.derose.net/steve/resources/emotionwords/ewords.html –  Need a “nearness” metric for missing adjectives –  How about the thesaurus game ? •  Using only a thesaurus, find a path between two words –  no antonyms –  no colloquialisms or slang
  • 26. UIMA + SEASR: Sentiment Analysis •  How to get from delightful to rainy ? ['delightful', 'fair', 'balmy', 'moist', 'rainy']. •  sexy to joyless? ['sexy', 'provocative', 'blue', 'joyless’] •  bitter to lovable? ['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]
  • 27. UIMA + SEASR: Sentiment Analysis •  Use this game as a metric for measuring a given adjective to one of the six emotions. •  Assume the longer the path, the “farther away” the two words are. •  address some of issues
  • 28. SynNet: rainy to pleasant
  • 29. UIMA + SEASR: Sentiment Analysis •  SynNet Metrics •  Common nodes •  Path length •  Symmetric: a->b->c c->b->a •  Link strength: •  tangy->sweet •  sweet->lovable •  Use of slang or informal usage
  • 30. UIMA + SEASR: Sentiment Analysis •  Common Nodes •  depth of common
  • 31. UIMA + SEASR: Sentiment Analysis •  Symmetry of path in common nodes
  • 32. UIMA + SEASR: Sentiment Analysis •  Find the shortest path between adjective and each emotion: •  ['delightful', 'beatific', 'joyful'] •  ['delightful', 'ineffable', 'unspeakable', 'fearful'] •  Pick the emotion with shortest path length •  tie breaking procedures
  • 33. UIMA + SEASR: Sentiment Analysis •  Not a perfect solution –  still need context to get quality •  Vain –  ['vain', 'insignificant', 'contemptible', 'hateful'] –  ['vain', 'misleading', 'puzzling', 'surprising’] •  Animal ['animal', 'sensual', 'pleasing', 'joyful'] –  ['animal', 'bestial', 'vile', 'hateful'] –  ['animal', 'gross', 'shocking', 'fearful'] –  ['animal', 'gross', 'grievous', 'sorrowful'] –  •  Negation –  “My mother was not a hateful person.”
  • 34. UIMA + SEASR: Sentiment Analysis •  Process Overview •  Extract the adjectives (UIMA POS analysis) •  Read in adjectives (SEASR library) •  Label each adjective (SynNet) •  Summarize windows of adjectives •  lots of experimentation here •  Visualize the windows
  • 35. UIMA + SEASR: Sentiment Analysis •  Visualization •  New SEASR visualization component •  Based on flare ActionScript Library •  http://flare.prefuse.org/ •  Still in development •  http://demo.seasr.org:1714/public/resources/data/emotions/ ev/EmotionViewer.html
  • 36. UIMA + SEASR: Sentiment Analysis

×