Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hackathon 2014 NLP Hack


Published on

One weekend software hack called "Movie Hack Attack".
Video content is played and is analyzed realtime for sentiment, emotions, and more.

Sentiment shown in chart below, emotions+objects of attention to the left with random picture grabbed from google search, persons/locations/organizations to the right with random picture grabbed from google search.

Published in: Technology
  • Be the first to comment

Hackathon 2014 NLP Hack

  1. 1. VionLabs Movie&TV Hackathon @ 24 - 26 / 01 /14 Movie Hack Attack ! Roelof Pieters!
  2. 2. Aim of the Hack (As it’s really only a small hack, the real aim is slightly less grand:
 playing with Python as a Rest Api/Server and make some nice stuff in a couple of days… and not spend time on that wasteful activity called “sleep”) ❖ Provide extra content information to users while watching a Movie or TV Show:! ❖ Moods and Sentiment of a Movie! ❖ Persons and Places mentioned or featuring in a Movie
  3. 3. and now… The Hack! Persons/Locations/
 Organizations Objects/Expressions In this case the sentence “mandarin of television” is recognized as an Expression. The word “Mandarin” is recognized, as well as the word “Television” and marching pictures are searched and shown in this case the word “Howard Beale” is seen as a “Person”
  4. 4. The Network (1976) [Movie] 
 Sentiment current Positive or Negative Sentiment of each Sentence over time 
 (you can see it needs more 
 normalization for long video items)
  5. 5. Obama 2012 victory speech [Talk] Persons/Locations/
 Organizations in this case the word “America” is seen as the “Location”, “USA” ,
 Emotions in this case the word “Fall” is seen as the emotion “Triumph” (because of the context of the sentence) The Pictures are random matches 
 on the specific concept and grabbed from 
 Google Image/Flickr in Realtime
  6. 6. Language Tech (Don’t worry, only 2 slides about tech… but C’mon its a Hackathon, isn’t it?) ❖ Analysis of Subtitles is done through:! ❖ Language pre- and post-Processing (tokenize,remove stopwords,punctuation, etc) [nltk]! ❖ Part-Of-String Tagging (POS) for identifying the Grammar of a sentence [nltk pos-tagger + Brown’s text corpus]! ❖ Named Entity Tagging (NER) for identifying “objects” of interest: Persons, Locations, Organizations [Stanford’s ner tagger]
  7. 7. Language Tech (2) ❖ Analysis of Subtitles is done through:! ❖ Sentiment extraction through a trained sentiment model, adapted/hacked to be more applicable for movie data [Princeton's SentiWordNet (annotated sentiment lexicon) + hack]! ❖ Matching Emotions through many different techniques [Princeton’s Wordnet annotated sunset lexicon, all previous steps, and many… many… hacks]
  8. 8. Possible use cases ❖ Get sentiment and extracted emotional values from news broadcasts from different channels (ie Al Jazeera, CNN, Russia Today) and get a quick indiction of their specific viewpoint (or “bias”) of a “news event”;! ❖ Filter content by emotional thresholds (Today I only want to read “Happy” news/ items with a overall positive sentiment / emotional values;! ❖ Plug movie/video content into the Semantic Web by linking extracted subtitle entities/ chunks to their specific ontologies (adding ref header tags to movie information pages;! ❖ Enable a richer user interaction through adding extra meta information to existing content and user interfaces;! ❖ Enable smart semantic (textual) searching for non-textual content through feature extraction by some of the technologies showcased here.! ❖ (…)