Your SlideShare is downloading. ×
Maritime safety events extraction from news articles
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Maritime safety events extraction from news articles

358
views

Published on

This is a presentation for my master's thesis. The system created for the thesis extracts maritime safety events from news articles using mainly Text Classification and Information Extraction

This is a presentation for my master's thesis. The system created for the thesis extracts maritime safety events from news articles using mainly Text Classification and Information Extraction

Published in: Business, Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
358
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Vrije UniversiteitMSc Information Sciences Maritime Safety Events Extraction from News Articles Anastasios Martidis anastasios.martidis@student.vu.nl July 31, 2012 Supervisors: Willem R. van Hage, Dr Davide Ceolin, MSc 1
  • 2. Outline Introduction  Training Sets Information  System Overview Spectrum  Test sets Problem Statement  Evaluation Significance of  Results Research  Conclusions Research Questions Hypotheses Materials and Methods 2
  • 3. Introduction “We are drowning in information, and starved for knowledge. ” John Naisbitt 3
  • 4. Information SpectrumStructured Data: Automatic Identification System (AIS) theoceandreamer.files.wordpress.com/ 2011/03/img_21861.jpg Free Text: News Articles http://www.tideway.nl/images/NorthWestEveningMail- PortSettoRockasTurbinesGetBoostfromaRollingstone-Walney2010-kleinbestan.jpg 4
  • 5. Problem StatementNews Articles: Descriptive and informative, but… Vast in number, daily growing and updated Free text, difficult to process automatically Generic Natural Language Processing tools: Popular and useful, but… Present limitations in recognizing specific types of maritime safety events and ship names 5
  • 6. Significance of the ResearchApplications Potential Stakeholders Risk assessments  Ship owners, operators Improvement of vessel and managers safety standards  Insurance Companies Port facility security  Coast Guard assessments  International Maritime Recognition of problematic Organization (IMO) areas (Piracy)  International Maritime Identification of shipping Security (IMS) companies, ships, ship  Private Security constructors with history Companies (PCSs) in maritime safety events Maritime education and training 6
  • 7. Research Questions1. Can we automatically process a news article in order to determine if it concerns a maritime safety event?2. Can we automatically extract a description of a maritime safety event? The objective of the description is to automatically recognize the type of maritime safety event, ships involved, location, date and time.3. Can we recognize relations and significance of the extracted information from the text? -Can we recognize the dominant event? Dominant event is considered the event that is primarily described in the news article. -Can we identify relations between extracted locations and specific event types described in the text? 7
  • 8. Hypotheses1. We can define sets of keywords that if are present in certain combinations in the text under processing, indicate that it concerns a maritime safety event.2. We can extract a description for the event described in the news article using rule based text classification and sets of keywords, datasets of ship names, regular expressions matching and Name Entity Recognition tasks.3. We can evaluate the extracted information from the text: -identifying the dominant event by measuring the frequency of keyword indicators for each event type -recognize relation between locations and event types by examining the position of locations and event type indicators in the text 8
  • 9. Materials & Methods Rule Based Text Classification Information Extraction OpenCalais NLTK AIS dbpedia 9
  • 10. Training Set 200 news articles (retrieved from CBS news) 100 related to maritime safety (53937 tokens) 100 of general domains (47053 tokens) Word Frequency Maritime Safety Related General Domains 10
  • 11. Training Set Outcomes Manual discrimination of significant words Categorize into sets of keywords by their meaning Use of keywords for text classification Mapping of keywords into maritime safety event types Use of keywords as event type indicators 11
  • 12. Text Classification Document D Lists of keywords: L1, most frequent keywords L2, safety related keywords L3, vessel type keywords L4, maritime related keywords L5, naval hierarchy keywords L6, part of ship keywords L7, water based locations keywords 12
  • 13. Event Type Recognition Document D, Event Types (ET): Piracy Capsizing Sinking Drifting Oil spill Leakage Fire/Explosion Evacuation Grounding Collision 13
  • 14. Ship Names Extraction Datasetof ship names retrieved from AIS messages and dbpedia Comparison of the dataset entries to the text Compromises  Location names  Part of names 14
  • 15. Locations Extraction Use of OpenCalais for NER tasks Interested in locations only Four types of locations recognized by Calais: Continent Country City Provenance or State 15
  • 16. Date and Time Extraction Chucked sentences Pattern matching using regular expressions  Numeric representation of date (e.g., 1322012, 22-07-12)  Months (e.g., January or Jan.)  Days (e.g., Monday or Mon.)  Day periods (e.g., morning, afternoon)  Time (e.g., 11:00am or 11.00 a.m.) Presented in specific order for each sentence 16
  • 17. Dominant Event Recognition For each list of event type indicators keywords Sum of keywords occurrence in the text Event type with the highest sum is predicted as the dominant event 17
  • 18. Location to Event Relations Chunked sentences For every sentence containing an extracted location, if a keyword indicator of an event type also occurs in the same sentence Then is predicted that the location is related to the event type 18
  • 19. Test Set 200 news articles (BBC, Reuters) 100 maritime safety related 100 of general domains (50 of them selected as an attempt to mislead the system) Each news article manually labeled and automatically processed by the system Comparison of the results to the labeled news article 19
  • 20. Labeled News Article 20
  • 21. Results of the System 21
  • 22. Evaluation 22
  • 23. Results: Text ClassificationPrecision: 100 %Recall: 100 %F-measure: 100 % 23
  • 24. Results: Event Type RecognitionPrecision: 88%Recall: 97 %F-measure: 92.2 % 24
  • 25. Results: Ship Name ExtractionPrecision: 18.5%Recall: 45.3%F-measure: 26.3% 25
  • 26. Results: Location ExtractionPrecision: 88.5%Recall: 74.7%F-measure: 81% 26
  • 27. Results: Date and Time ExtractionPrecision: 95.3%Recall: 89.4%F-measure: 92.3% 27
  • 28. Results: Dominant Event RecognitionPrecision: 92%Recall: 92%F-measure: 92% 28
  • 29. Results: Location to Event RelationsPrecision: 81%Recall: 67.8%F-measure: 73.8% 29
  • 30. Conclusions The system accomplished the extraction of maritime safety events from news articles Overall performance of the system was satisfying The system can be improved and refined Ship names extraction require a different approach 30
  • 31. Vrije UniversiteitMSc Information Sciences Maritime Safety Events Extraction from News Articles Anastasios Martidis anastasios.martidis@student.vu.nl July 31, 2012 Supervisors: Willem R. van Hage, Dr Davide Ceolin, MSc 31