VionLabs Movie&TV Hackathon @ 24 - 26 / 01 /14

Movie Hack Attack !

Roelof Pieters!
roelof@vionlabs.com
Aim of the Hack
(As it’s really only a small hack, the real aim is slightly less grand:

playing with Python as a Rest Api/Server and make some nice stuff in a
couple of days… and not spend time on that wasteful activity called “sleep”)
❖

Provide extra content information to users while
watching a Movie or TV Show:!
❖

Moods and Sentiment of a Movie!

❖

Persons and Places mentioned or featuring in a
Movie
and now… The Hack!

Persons/Locations/

Organizations
Objects/Expressions
In this case the sentence
“mandarin of television” is
recognized as an Expression. The
word “Mandarin” is recognized,
as well as the word “Television”
and marching pictures are
searched and shown

in this case the word
“Howard Beale” is seen as a
“Person”
The Network (1976) [Movie]


Sentiment

current

Positive or Negative
Sentiment of each
Sentence

over time 

(you can see it needs more 

normalization for long video items)
Obama 2012 victory speech [Talk]
Persons/Locations/

Organizations
in this case the word
“America” is seen as the
“Location”, “USA”

,

Emotions
in this case the word “Fall” is
seen as the emotion “Triumph”
(because of the context of the
sentence)

The Pictures are random matches 

on the specific concept and grabbed from 

Google Image/Flickr in Realtime
Language Tech
(Don’t worry, only 2 slides about tech… but C’mon its a Hackathon, isn’t it?)
❖

Analysis of Subtitles is done through:!
❖

Language pre- and post-Processing (tokenize,remove
stopwords,punctuation, etc) [nltk]!

❖

Part-Of-String Tagging (POS) for identifying the
Grammar of a sentence [nltk pos-tagger + Brown’s text
corpus]!

❖

Named Entity Tagging (NER) for identifying “objects”
of interest: Persons, Locations, Organizations
[Stanford’s ner tagger]
Language Tech (2)
❖

Analysis of Subtitles is done through:!
❖

Sentiment extraction through a trained sentiment
model, adapted/hacked to be more applicable for
movie data [Princeton's SentiWordNet (annotated
sentiment lexicon) + hack]!

❖

Matching Emotions through many different
techniques [Princeton’s Wordnet annotated sunset
lexicon, all previous steps, and many… many…
hacks]
Possible use cases
❖

Get sentiment and extracted emotional values from news broadcasts from different
channels (ie Al Jazeera, CNN, Russia Today) and get a quick indiction of their specific
viewpoint (or “bias”) of a “news event”;!

❖

Filter content by emotional thresholds (Today I only want to read “Happy” news/
items with a overall positive sentiment / emotional values;!

❖

Plug movie/video content into the Semantic Web by linking extracted subtitle entities/
chunks to their specific ontologies (adding ref header tags to movie information pages;!

❖

Enable a richer user interaction through adding extra meta information to existing
content and user interfaces;!

❖

Enable smart semantic (textual) searching for non-textual content through feature
extraction by some of the technologies showcased here.!

❖

(…)

Hackathon 2014 NLP Hack

  • 1.
    VionLabs Movie&TV Hackathon@ 24 - 26 / 01 /14 Movie Hack Attack ! Roelof Pieters! roelof@vionlabs.com
  • 2.
    Aim of theHack (As it’s really only a small hack, the real aim is slightly less grand:
 playing with Python as a Rest Api/Server and make some nice stuff in a couple of days… and not spend time on that wasteful activity called “sleep”) ❖ Provide extra content information to users while watching a Movie or TV Show:! ❖ Moods and Sentiment of a Movie! ❖ Persons and Places mentioned or featuring in a Movie
  • 3.
    and now… TheHack! Persons/Locations/
 Organizations Objects/Expressions In this case the sentence “mandarin of television” is recognized as an Expression. The word “Mandarin” is recognized, as well as the word “Television” and marching pictures are searched and shown in this case the word “Howard Beale” is seen as a “Person”
  • 4.
    The Network (1976)[Movie] 
 Sentiment current Positive or Negative Sentiment of each Sentence over time 
 (you can see it needs more 
 normalization for long video items)
  • 5.
    Obama 2012 victoryspeech [Talk] Persons/Locations/
 Organizations in this case the word “America” is seen as the “Location”, “USA” ,
 Emotions in this case the word “Fall” is seen as the emotion “Triumph” (because of the context of the sentence) The Pictures are random matches 
 on the specific concept and grabbed from 
 Google Image/Flickr in Realtime
  • 6.
    Language Tech (Don’t worry,only 2 slides about tech… but C’mon its a Hackathon, isn’t it?) ❖ Analysis of Subtitles is done through:! ❖ Language pre- and post-Processing (tokenize,remove stopwords,punctuation, etc) [nltk]! ❖ Part-Of-String Tagging (POS) for identifying the Grammar of a sentence [nltk pos-tagger + Brown’s text corpus]! ❖ Named Entity Tagging (NER) for identifying “objects” of interest: Persons, Locations, Organizations [Stanford’s ner tagger]
  • 7.
    Language Tech (2) ❖ Analysisof Subtitles is done through:! ❖ Sentiment extraction through a trained sentiment model, adapted/hacked to be more applicable for movie data [Princeton's SentiWordNet (annotated sentiment lexicon) + hack]! ❖ Matching Emotions through many different techniques [Princeton’s Wordnet annotated sunset lexicon, all previous steps, and many… many… hacks]
  • 8.
    Possible use cases ❖ Getsentiment and extracted emotional values from news broadcasts from different channels (ie Al Jazeera, CNN, Russia Today) and get a quick indiction of their specific viewpoint (or “bias”) of a “news event”;! ❖ Filter content by emotional thresholds (Today I only want to read “Happy” news/ items with a overall positive sentiment / emotional values;! ❖ Plug movie/video content into the Semantic Web by linking extracted subtitle entities/ chunks to their specific ontologies (adding ref header tags to movie information pages;! ❖ Enable a richer user interaction through adding extra meta information to existing content and user interfaces;! ❖ Enable smart semantic (textual) searching for non-textual content through feature extraction by some of the technologies showcased here.! ❖ (…)