Using opinion mining techniques for early crisis detection

  • 373 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
373
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
5
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. “Al. I. Cuza”, University of Iasi, RomaniaFaculty of Computer Science Adrian Iftene, Alexandru Lucian Gînscă ICCCC 2012, 8-12 May, Băile Felix, Oradea, Romania
  • 2.  System overview Data acquisition Topic detection Data processing Identification of opinions Results Visualization Conclusions ICCCC 2012, 8-12 May, Băile Felix, Oradea
  • 3. ICCCC 2012, 8-12 May, Băile Felix, Oradea 3
  • 4.  Scenario: Street protests in Romania (between 13 and 26 January, 2012) Crawler component, RSS feeds Scraping: removed links, photos, menus, special characters Data locally stored ICCCC 2012, 8-12 May, Băile Felix, Oradea 4
  • 5.  The topic is very important in detecting articles reffering to a crisis situation Latent Dirichlet Allocation: state of the art topic model Problems: • The number of topics needs to be specified from start • The results are lists of representative words for each topic resulting for a need for human intervention in interpreting them Solution: WordNet based similarity measures • WuPalmer • Lin • Resnik (best results) ICCCC 2012, 8-12 May, Băile Felix, Oradea 5
  • 6.  Computing the similarity between 2 sets of wordsT1, T2 = two sets of words.sim(t1, t2) = one of the Wu and Palmer, Resnik or Lin similarity measures. ICCCC 2012, 8-12 May, Băile Felix, Oradea 6
  • 7.  LDA results for our street protests corpus when tracking 3 topics ICCCC 2012, 8-12 May, Băile Felix, Oradea 7
  • 8.  Language specific resources that contain cities (Iasi, Bucuresti, Ploiesti, etc.), regions (Bucovina, Moldova, Transilvania, etc.) (Iftene et al., 2011) Introducing a more localized approach: new resources and rules for street (Iasi, Bulevardul Independentei, Bucuresti, Calea Victoriei, etc.) and smaller inner city regions identification (Pacurari district, center of Iasi, Arch of Triumph Square) Example of Rules: to identify streets (Street + entity, Boulevard + entity, etc.), to identify small regions (the area between street A and street B or the area of the building A) ICCCC 2012, 8-12 May, Băile Felix, Oradea 8
  • 9.  538 files with 2,806 entities of "street" and “area” types The overall quality of NE identification component is around 92% and the quality of NE classification component is around 67% Problems: ◦ incorrect spelling ◦ anaphora resolution ◦ ambigous situations when from the context we cannot conclude that the NE is a person name or a street name ICCCC 2012, 8-12 May, Băile Felix, Oradea 9
  • 10.  Rule based opinion mining system (Gînscă et al., 2011) Easily adaptible from a crisis scenario to another – in opposition with a statistical approach Use of manually built resources to identify opinion keywords (good, bad etc.), amplifiers (most, more etc.), diminishers (less, etc.), negation (not, never etc.) Calculate the valences for groups of feelings and pairing named entities with scores based on the distance, punctuation and context Use a dedicated vocabulary for a specific crisis situation with 21 initial words (protest, conflict, fight, etc.) + similar words from WordNet (synonyms, hypernyms, etc.) ICCCC 2012, 8-12 May, Băile Felix, Oradea 10
  • 11.  Greedy approach – adding iteratively intermediate green points to the current path until solution cannot be improved Advantages – we reduce the search space for optimal routes and the Greedy solution is obtained very fast Disavantages – the Greedy solution is closed to the optimal solution ICCCC 2012, 8-12 May, Băile Felix, Oradea 11
  • 12.  Cumulated sentiment values by days302010 0 13 14 15 16 17 18 19 20 21 22 23 25-10-20-30-40 ICCCC 2012, 8-12 May, Băile Felix, Oradea 12
  • 13.  Location type entities mentions by day250200150100 50 0 13 14 15 16 17 18 19 20 21 22 23 25 ICCCC 2012, 8-12 May, Băile Felix, Oradea 13
  • 14.  GoogleMaps API Our algorithm is able to find another path (longer) which passes near the red islands and prefers the ways near the green islands Thus, at every step is possible to insert penalties when the partial solution crosses red islands (with potential risks) and add bonuses when the partial solution crosses green islands (without potential risk) ICCCC 2012, 8-12 May, Băile Felix, Oradea 14
  • 15. ICCCC 2012, 8-12 May, Băile Felix, Oradea 15
  • 16. ICCCC 2012, 8-12 May, Băile Felix, Oradea 16
  • 17.  When we haven’t green islands we must specify another method to select intermediate points in order to improve the quality of current solution If in the cases of streets and boulevards the GoogleMaps API is able to put these entities on the map, for specific squares and areas it is not able to do this. In such cases we built an additional resource which specifies the GIS coordinates for them ICCCC 2012, 8-12 May, Băile Felix, Oradea 17
  • 18.  We present a system that can be easily adapted from a crisis situation to another (changing the dictionaries, changing the interest topics) Efficient topic identification using LDA Suggestive visualization using GoogleAPI ICCCC 2012, 8-12 May, Băile Felix, Oradea 18
  • 19. ICCCC 2012, 8-12 May, Băile Felix, Oradea 19