7. Non-Topical Analysis
Agent: Opinion holder
Target: Target of Opinion being expressed (a topic, a person,
organization etc.)
Attitude: includes Expressive Element
! (&)!678!!05!"4&!2#*3!01!./&!%&#'!(*#-!,)*!'+&"!!()*!!"#$!'%!!"#$%&
[anSary nE kha myry ray^E myN eamr shyl ayk bd dmaG awr Zdy XKS hyN ]
[Ansari said, “according to me Aamir Sohail is one crazy and stubborn man”]!
TARGET
AGENT Attributes
Attributes !""#"$%&' ID:t1
ID:a1 !(()*+,(-.! EXPRESSIVE ELEMENT
Nested- "#$%&'&(!)**+*,-./01.$! Attributes
source: “w” 2.3%*+4.! ID:ex1 , TargetID:t1,
TargetID:t1 "2*.25+*0$!6+36(!! Emotion:anger
7.3%*+4.8*9:%;-$!*&! Intensity:high,
<95+*+4.8*9:%;-$!2,==! Nested-Source: “w”, a1,
Polarity:negative
8. FACETED SEARCH: DRILL DOWN TO RELEVANT CONTENT/DATA
People are filled with anger and sorrow because of the policies made by Musharaf.
OPINION HOLDER – Writer, People
www.janyainc.com
TARGET –Musharaf’s policies (Musharaf is an implied target)
9. Human Behavior Analysis
• Process social media content, provide tools for analysts to: Predictive
• Identify social networks: groups, members
• Identify topics of discussion and sentiment
Modeling
• E.g. angry at govt., wanting retaliation, peacemakers
• Thought influencers
Link Diagrams
• Identify social goals through analysis of verbal
communication
• Manipulation: Persuasion, threats, coercion
• Religious supremacy: religious analogues
• recruitment
Social Media
Content
14. Code Mixing, Switching
! Use of Latin script: lack of transliteration
standards makes it difficult to process
! Urdish, Spanglish, Hinglish etc.
Afsoos key baat hai . kal tak jo batain Non Muslim bhi kartay
hoay dartay thay abhi this man has brought it out in the open.
[It is sad to see that those words that even a non muslim would
fear to utter until yesterday, this man has brought it out in the
open]
Solutions:
• Apply “romanized” POS tagger, English tagger in tandem: use machine
learning to combine evidence and generate final tag, language ID
• For longer English spans, use English NLP system
15. Language Resource Acquisition
Less Commonly taught languages (LCTL)
• Yoruba, Russian, Swahili
• Dialects
Very few few linguistics resources
available
• electronic lexicons
• translation lexicons
• part-of-speech taggers, chunkers
• Typically, very expensive to produce
these resources by hand
• The web provides a new opportunity to
automatically acquire these resources
“web as corpus”