SlideShare a Scribd company logo
Vrije Universiteit
MSc Information Sciences




     Maritime Safety Events Extraction
            from News Articles

                  Anastasios Martidis
            anastasios.martidis@student.vu.nl
                        July 31, 2012
   Supervisors:
   Willem R. van Hage, Dr
   Davide Ceolin, MSc
                                                1
Outline

 Introduction          Training Sets
 Information           System Overview
  Spectrum              Test sets
 Problem Statement     Evaluation
 Significance of       Results
  Research              Conclusions
 Research Questions
 Hypotheses
 Materials and
  Methods
                                           2
Introduction


 “We are drowning in information, and
  starved for knowledge. ”
  John Naisbitt




                                        3
Information Spectrum


Structured Data: Automatic Identification System (AIS)




                                                                                              theoceandreamer.files.wordpress.com/
                                                                                              2011/03/img_21861.jpg




                                                                               Free Text: News Articles



 http://www.tideway.nl/images/NorthWestEveningMail-
 PortSettoRockasTurbinesGetBoostfromaRollingstone-Walney2010-kleinbestan.jpg
                                                                                                                                     4
Problem Statement

News Articles:
 Descriptive and informative, but…
 Vast in number, daily growing and updated
 Free text, difficult to process automatically
 Generic Natural Language Processing tools:
 Popular and useful, but…
 Present limitations in recognizing specific
  types of maritime safety events and ship
  names
                                                  5
Significance of the Research

Applications                   Potential Stakeholders
 Risk assessments              Ship owners, operators
 Improvement of vessel          and managers
  safety standards              Insurance Companies
 Port facility security        Coast Guard
  assessments                   International Maritime
 Recognition of problematic     Organization (IMO)
  areas (Piracy)                International Maritime
 Identification of shipping     Security (IMS)
  companies, ships, ship        Private Security
  constructors with history      Companies (PCSs)
  in maritime safety events
 Maritime education and
  training


                                                          6
Research Questions

1.   Can we automatically process a news article in order to
     determine if it concerns a maritime safety event?

2.   Can we automatically extract a description of a maritime
     safety event? The objective of the description is to
     automatically recognize the type of maritime safety event,
     ships involved, location, date and time.

3.   Can we recognize relations and significance of the
     extracted information from the text?
       -Can we recognize the dominant event? Dominant
       event is considered the event that is primarily described
       in the news article.
       -Can we identify relations between extracted locations
       and specific event types described in the text?

                                                                   7
Hypotheses

1.   We can define sets of keywords that if are present in
     certain combinations in the text under processing, indicate
     that it concerns a maritime safety event.

2.   We can extract a description for the event described in
     the news article using rule based text classification and
     sets of keywords, datasets of ship names, regular
     expressions matching and Name Entity Recognition tasks.

3.   We can evaluate the extracted information from the text:
      -identifying the dominant event by measuring the
      frequency of keyword indicators for each event type
      -recognize relation between locations and event types
      by examining the position of locations and event type
      indicators in the text

                                                                   8
Materials & Methods

 Rule Based Text Classification
 Information Extraction
 OpenCalais
 NLTK
 AIS
 dbpedia




                                   9
Training Set

 200 news articles (retrieved from CBS news)
 100 related to maritime safety (53937 tokens)
 100 of general domains (47053 tokens)
 Word Frequency
 Maritime Safety Related     General Domains




                                                  10
Training Set Outcomes

 Manual  discrimination of significant words
 Categorize into sets of keywords by their
  meaning
 Use of keywords for text classification
 Mapping of keywords into maritime safety
  event types
 Use of keywords as event type indicators



                                                11
Text Classification

 Document    D
 Lists of keywords:
 L1, most frequent keywords
 L2, safety related keywords
 L3, vessel type keywords
 L4, maritime related keywords
 L5, naval hierarchy keywords
 L6, part of ship keywords
 L7, water based locations keywords



                                       12
Event Type Recognition

 Document   D,
 Event Types (ET):
  Piracy             Capsizing
  Sinking            Drifting
  Oil spill          Leakage
  Fire/Explosion     Evacuation
  Grounding          Collision




                                    13
Ship Names Extraction

 Datasetof ship names retrieved from AIS
  messages and dbpedia
 Comparison of the dataset entries to the
  text
 Compromises
  Location names
  Part of names




                                             14
Locations Extraction

 Use  of OpenCalais for NER tasks
 Interested in locations only
 Four types of locations recognized by
  Calais:
     Continent
     Country
     City
     Provenance or State




                                          15
Date and Time Extraction

 Chucked  sentences
 Pattern matching using regular
  expressions
  Numeric representation of date (e.g., 1322012, 22-07-12)
  Months (e.g., January or Jan.)
  Days (e.g., Monday or Mon.)
  Day periods (e.g., morning, afternoon)
  Time (e.g., 11:00am or 11.00 a.m.)

 Presented      in specific order for each
 sentence
                                                                16
Dominant Event Recognition

 For each list of event type indicators
  keywords
 Sum of keywords occurrence in the text
 Event type with the highest sum is
  predicted as the dominant event




                                           17
Location to Event Relations

 Chunked   sentences
 For every sentence containing an
  extracted location, if a keyword indicator
  of an event type also occurs in the same
  sentence
 Then is predicted that the location is
  related to the event type



                                               18
Test Set

 200 news articles (BBC, Reuters)
 100 maritime safety related
 100 of general domains (50 of them
  selected as an attempt to mislead the
  system)
 Each news article manually labeled and
  automatically processed by the system
 Comparison of the results to the labeled
  news article
                                             19
Labeled News Article




                       20
Results of the System




                        21
Evaluation




             22
Results: Text Classification



Precision: 100 %
Recall: 100 %
F-measure: 100 %




                                23
Results: Event Type Recognition



Precision: 88%
Recall: 97 %
F-measure: 92.2 %




                                   24
Results: Ship Name Extraction



Precision: 18.5%
Recall: 45.3%
F-measure: 26.3%




                                 25
Results: Location Extraction



Precision: 88.5%
Recall: 74.7%
F-measure: 81%




                                26
Results: Date and Time Extraction



Precision: 95.3%
Recall: 89.4%
F-measure: 92.3%




                                     27
Results: Dominant Event Recognition



Precision: 92%
Recall: 92%
F-measure: 92%




                                       28
Results: Location to Event Relations



Precision: 81%
Recall: 67.8%
F-measure: 73.8%




                                        29
Conclusions

 The  system accomplished the extraction
  of maritime safety events from news
  articles
 Overall performance of the system was
  satisfying
 The system can be improved and refined
 Ship names extraction require a different
  approach

                                              30
Vrije Universiteit
MSc Information Sciences




     Maritime Safety Events Extraction
            from News Articles

                  Anastasios Martidis
            anastasios.martidis@student.vu.nl
                        July 31, 2012
   Supervisors:
   Willem R. van Hage, Dr
   Davide Ceolin, MSc
                                                31

More Related Content

Similar to Maritime safety events extraction from news articles

Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
Gregoire Burel
 
Linking Safety Culture & Safety Performance In Marine Transportation
Linking Safety Culture & Safety Performance In Marine TransportationLinking Safety Culture & Safety Performance In Marine Transportation
Linking Safety Culture & Safety Performance In Marine Transportation
Stephanie Camay
 
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
Dr. Haxel Consult
 
Gramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdf
Gramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdfGramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdf
Gramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdf
Gramax Cybersec
 
Hello dr. aguiar and classmates,for this week’s forum we were as
Hello dr. aguiar and classmates,for this week’s forum we were asHello dr. aguiar and classmates,for this week’s forum we were as
Hello dr. aguiar and classmates,for this week’s forum we were as
simba35
 
Disaster Planning Lightning
Disaster Planning   LightningDisaster Planning   Lightning
Disaster Planning Lightning
Dagrashley
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
Paolo Missier
 
VGarcia_SEFPoster_Final.emf
VGarcia_SEFPoster_Final.emfVGarcia_SEFPoster_Final.emf
VGarcia_SEFPoster_Final.emf
Vanessa Garcia
 
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Prashant Khare
 
Message mapping by Dr. Vincent Covello
Message mapping by Dr. Vincent CovelloMessage mapping by Dr. Vincent Covello
Message mapping by Dr. Vincent Covello
Patrice Cloutier
 
Future-proofing maritime ports against emerging cyber-physical threats
Future-proofing maritime ports against emerging cyber-physical threatsFuture-proofing maritime ports against emerging cyber-physical threats
Future-proofing maritime ports against emerging cyber-physical threats
Steven SIM Kok Leong
 
Pre-defense_talk
Pre-defense_talkPre-defense_talk
Pre-defense_talk
aphex34
 
Unit III AssessmentQuestion 1 1. Compare and contrast two.docx
Unit III AssessmentQuestion 1 1. Compare and contrast two.docxUnit III AssessmentQuestion 1 1. Compare and contrast two.docx
Unit III AssessmentQuestion 1 1. Compare and contrast two.docx
marilucorr
 
Thin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the UnknownsThin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the Unknowns
Michele Chubirka
 
The rise of the robot and the lie of resilience
The rise of the robot and the lie of resilienceThe rise of the robot and the lie of resilience
The rise of the robot and the lie of resilience
Girija Shettar
 
Classifying Crises-Information Relevancy with Semantics
Classifying Crises-Information Relevancy with SemanticsClassifying Crises-Information Relevancy with Semantics
Classifying Crises-Information Relevancy with Semantics
COMRADES project
 
Lessons Learned from the DICOM Standardization Effort Lessons Learned from ...
Lessons Learned from the DICOM Standardization Effort 	 Lessons Learned from ...Lessons Learned from the DICOM Standardization Effort 	 Lessons Learned from ...
Lessons Learned from the DICOM Standardization Effort Lessons Learned from ...
MedicineAndDermatology
 
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docxWORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
ambersalomon88660
 
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLYMaritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
Marianne Molchan
 
Seacurity Hacking for Defense 2017
Seacurity Hacking for Defense 2017Seacurity Hacking for Defense 2017
Seacurity Hacking for Defense 2017
Stanford University
 

Similar to Maritime safety events extraction from news articles (20)

Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
 
Linking Safety Culture & Safety Performance In Marine Transportation
Linking Safety Culture & Safety Performance In Marine TransportationLinking Safety Culture & Safety Performance In Marine Transportation
Linking Safety Culture & Safety Performance In Marine Transportation
 
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
 
Gramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdf
Gramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdfGramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdf
Gramax-Cybersec-Role of Cybersecurity in Maritime A high-risk sector.pdf
 
Hello dr. aguiar and classmates,for this week’s forum we were as
Hello dr. aguiar and classmates,for this week’s forum we were asHello dr. aguiar and classmates,for this week’s forum we were as
Hello dr. aguiar and classmates,for this week’s forum we were as
 
Disaster Planning Lightning
Disaster Planning   LightningDisaster Planning   Lightning
Disaster Planning Lightning
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
VGarcia_SEFPoster_Final.emf
VGarcia_SEFPoster_Final.emfVGarcia_SEFPoster_Final.emf
VGarcia_SEFPoster_Final.emf
 
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
 
Message mapping by Dr. Vincent Covello
Message mapping by Dr. Vincent CovelloMessage mapping by Dr. Vincent Covello
Message mapping by Dr. Vincent Covello
 
Future-proofing maritime ports against emerging cyber-physical threats
Future-proofing maritime ports against emerging cyber-physical threatsFuture-proofing maritime ports against emerging cyber-physical threats
Future-proofing maritime ports against emerging cyber-physical threats
 
Pre-defense_talk
Pre-defense_talkPre-defense_talk
Pre-defense_talk
 
Unit III AssessmentQuestion 1 1. Compare and contrast two.docx
Unit III AssessmentQuestion 1 1. Compare and contrast two.docxUnit III AssessmentQuestion 1 1. Compare and contrast two.docx
Unit III AssessmentQuestion 1 1. Compare and contrast two.docx
 
Thin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the UnknownsThin Slicing a Black Swan: A Search for the Unknowns
Thin Slicing a Black Swan: A Search for the Unknowns
 
The rise of the robot and the lie of resilience
The rise of the robot and the lie of resilienceThe rise of the robot and the lie of resilience
The rise of the robot and the lie of resilience
 
Classifying Crises-Information Relevancy with Semantics
Classifying Crises-Information Relevancy with SemanticsClassifying Crises-Information Relevancy with Semantics
Classifying Crises-Information Relevancy with Semantics
 
Lessons Learned from the DICOM Standardization Effort Lessons Learned from ...
Lessons Learned from the DICOM Standardization Effort 	 Lessons Learned from ...Lessons Learned from the DICOM Standardization Effort 	 Lessons Learned from ...
Lessons Learned from the DICOM Standardization Effort Lessons Learned from ...
 
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docxWORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
WORK & STRESS, 1998, VOL. 12, NO. 3 293-306 Achieving a sa.docx
 
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLYMaritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
Maritime Surveillance PG24 MTR Sept 15 ARTICLE ONLY
 
Seacurity Hacking for Defense 2017
Seacurity Hacking for Defense 2017Seacurity Hacking for Defense 2017
Seacurity Hacking for Defense 2017
 

Recently uploaded

High-Quality IPTV Monthly Subscription for $15
High-Quality IPTV Monthly Subscription for $15High-Quality IPTV Monthly Subscription for $15
High-Quality IPTV Monthly Subscription for $15
advik4387
 
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
essorprof62
 
Truck Loading Conveyor Manufacturers Chennai
Truck Loading Conveyor Manufacturers ChennaiTruck Loading Conveyor Manufacturers Chennai
Truck Loading Conveyor Manufacturers Chennai
ConveyorSystem
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
Revolutionizing Surface Protection Xlcoatings Nano Based Solutions
Revolutionizing Surface Protection Xlcoatings Nano Based SolutionsRevolutionizing Surface Protection Xlcoatings Nano Based Solutions
Revolutionizing Surface Protection Xlcoatings Nano Based Solutions
Excel coatings
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Niswey
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
L'indice de performance des ports à conteneurs de l'année 2023
L'indice de performance des ports à conteneurs de l'année 2023L'indice de performance des ports à conteneurs de l'année 2023
L'indice de performance des ports à conteneurs de l'année 2023
SPATPortToamasina
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
Adani Group Requests For Additional Land For Its Dharavi Redevelopment Projec...
Adani Group Requests For Additional Land For Its Dharavi Redevelopment Projec...Adani Group Requests For Additional Land For Its Dharavi Redevelopment Projec...
Adani Group Requests For Additional Land For Its Dharavi Redevelopment Projec...
Adani case
 
20240609_ TJ Communications Credentials.pdf
20240609_ TJ Communications Credentials.pdf20240609_ TJ Communications Credentials.pdf
20240609_ TJ Communications Credentials.pdf
tjcomstrang
 
Lukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptxLukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptx
pavelborek
 
Dpboss Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Satta Matta Matka Kalyan Chart Indian MatkaDpboss Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Matka
 
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdfPDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
HajeJanKamps
 
Enabling Digital Sustainability by Jutta Eckstein
Enabling Digital Sustainability by Jutta EcksteinEnabling Digital Sustainability by Jutta Eckstein
Enabling Digital Sustainability by Jutta Eckstein
Jutta Eckstein
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
一比一原版(lbs毕业证书)英国伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)英国伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)英国伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)英国伦敦商学院毕业证如何办理
eaqmokn
 
AI Transformation Playbook: Thinking AI-First for Your Business
AI Transformation Playbook: Thinking AI-First for Your BusinessAI Transformation Playbook: Thinking AI-First for Your Business
AI Transformation Playbook: Thinking AI-First for Your Business
Arijit Dutta
 

Recently uploaded (20)

High-Quality IPTV Monthly Subscription for $15
High-Quality IPTV Monthly Subscription for $15High-Quality IPTV Monthly Subscription for $15
High-Quality IPTV Monthly Subscription for $15
 
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
❽❽❻❼❼❻❻❸❾❻ DPBOSS NET SPBOSS SATTA MATKA RESULT KALYAN MATKA GUESSING FREE KA...
 
Truck Loading Conveyor Manufacturers Chennai
Truck Loading Conveyor Manufacturers ChennaiTruck Loading Conveyor Manufacturers Chennai
Truck Loading Conveyor Manufacturers Chennai
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
Revolutionizing Surface Protection Xlcoatings Nano Based Solutions
Revolutionizing Surface Protection Xlcoatings Nano Based SolutionsRevolutionizing Surface Protection Xlcoatings Nano Based Solutions
Revolutionizing Surface Protection Xlcoatings Nano Based Solutions
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
L'indice de performance des ports à conteneurs de l'année 2023
L'indice de performance des ports à conteneurs de l'année 2023L'indice de performance des ports à conteneurs de l'année 2023
L'indice de performance des ports à conteneurs de l'année 2023
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
Adani Group Requests For Additional Land For Its Dharavi Redevelopment Projec...
Adani Group Requests For Additional Land For Its Dharavi Redevelopment Projec...Adani Group Requests For Additional Land For Its Dharavi Redevelopment Projec...
Adani Group Requests For Additional Land For Its Dharavi Redevelopment Projec...
 
20240609_ TJ Communications Credentials.pdf
20240609_ TJ Communications Credentials.pdf20240609_ TJ Communications Credentials.pdf
20240609_ TJ Communications Credentials.pdf
 
Lukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptxLukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptx
 
Dpboss Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Satta Matta Matka Kalyan Chart Indian MatkaDpboss Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Satta Matta Matka Kalyan Chart Indian Matka
 
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdfPDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
PDT 99 - $3.5M - Seed - Feel Therapeutics.pdf
 
Enabling Digital Sustainability by Jutta Eckstein
Enabling Digital Sustainability by Jutta EcksteinEnabling Digital Sustainability by Jutta Eckstein
Enabling Digital Sustainability by Jutta Eckstein
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
一比一原版(lbs毕业证书)英国伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)英国伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)英国伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)英国伦敦商学院毕业证如何办理
 
AI Transformation Playbook: Thinking AI-First for Your Business
AI Transformation Playbook: Thinking AI-First for Your BusinessAI Transformation Playbook: Thinking AI-First for Your Business
AI Transformation Playbook: Thinking AI-First for Your Business
 

Maritime safety events extraction from news articles

  • 1. Vrije Universiteit MSc Information Sciences Maritime Safety Events Extraction from News Articles Anastasios Martidis anastasios.martidis@student.vu.nl July 31, 2012 Supervisors: Willem R. van Hage, Dr Davide Ceolin, MSc 1
  • 2. Outline  Introduction  Training Sets  Information  System Overview Spectrum  Test sets  Problem Statement  Evaluation  Significance of  Results Research  Conclusions  Research Questions  Hypotheses  Materials and Methods 2
  • 3. Introduction “We are drowning in information, and starved for knowledge. ” John Naisbitt 3
  • 4. Information Spectrum Structured Data: Automatic Identification System (AIS) theoceandreamer.files.wordpress.com/ 2011/03/img_21861.jpg Free Text: News Articles http://www.tideway.nl/images/NorthWestEveningMail- PortSettoRockasTurbinesGetBoostfromaRollingstone-Walney2010-kleinbestan.jpg 4
  • 5. Problem Statement News Articles:  Descriptive and informative, but…  Vast in number, daily growing and updated  Free text, difficult to process automatically  Generic Natural Language Processing tools:  Popular and useful, but…  Present limitations in recognizing specific types of maritime safety events and ship names 5
  • 6. Significance of the Research Applications Potential Stakeholders  Risk assessments  Ship owners, operators  Improvement of vessel and managers safety standards  Insurance Companies  Port facility security  Coast Guard assessments  International Maritime  Recognition of problematic Organization (IMO) areas (Piracy)  International Maritime  Identification of shipping Security (IMS) companies, ships, ship  Private Security constructors with history Companies (PCSs) in maritime safety events  Maritime education and training 6
  • 7. Research Questions 1. Can we automatically process a news article in order to determine if it concerns a maritime safety event? 2. Can we automatically extract a description of a maritime safety event? The objective of the description is to automatically recognize the type of maritime safety event, ships involved, location, date and time. 3. Can we recognize relations and significance of the extracted information from the text? -Can we recognize the dominant event? Dominant event is considered the event that is primarily described in the news article. -Can we identify relations between extracted locations and specific event types described in the text? 7
  • 8. Hypotheses 1. We can define sets of keywords that if are present in certain combinations in the text under processing, indicate that it concerns a maritime safety event. 2. We can extract a description for the event described in the news article using rule based text classification and sets of keywords, datasets of ship names, regular expressions matching and Name Entity Recognition tasks. 3. We can evaluate the extracted information from the text: -identifying the dominant event by measuring the frequency of keyword indicators for each event type -recognize relation between locations and event types by examining the position of locations and event type indicators in the text 8
  • 9. Materials & Methods  Rule Based Text Classification  Information Extraction  OpenCalais  NLTK  AIS  dbpedia 9
  • 10. Training Set  200 news articles (retrieved from CBS news)  100 related to maritime safety (53937 tokens)  100 of general domains (47053 tokens)  Word Frequency Maritime Safety Related General Domains 10
  • 11. Training Set Outcomes  Manual discrimination of significant words  Categorize into sets of keywords by their meaning  Use of keywords for text classification  Mapping of keywords into maritime safety event types  Use of keywords as event type indicators 11
  • 12. Text Classification  Document D  Lists of keywords: L1, most frequent keywords L2, safety related keywords L3, vessel type keywords L4, maritime related keywords L5, naval hierarchy keywords L6, part of ship keywords L7, water based locations keywords 12
  • 13. Event Type Recognition  Document D,  Event Types (ET): Piracy Capsizing Sinking Drifting Oil spill Leakage Fire/Explosion Evacuation Grounding Collision 13
  • 14. Ship Names Extraction  Datasetof ship names retrieved from AIS messages and dbpedia  Comparison of the dataset entries to the text  Compromises  Location names  Part of names 14
  • 15. Locations Extraction  Use of OpenCalais for NER tasks  Interested in locations only  Four types of locations recognized by Calais: Continent Country City Provenance or State 15
  • 16. Date and Time Extraction  Chucked sentences  Pattern matching using regular expressions  Numeric representation of date (e.g., 1322012, 22-07-12)  Months (e.g., January or Jan.)  Days (e.g., Monday or Mon.)  Day periods (e.g., morning, afternoon)  Time (e.g., 11:00am or 11.00 a.m.)  Presented in specific order for each sentence 16
  • 17. Dominant Event Recognition  For each list of event type indicators keywords  Sum of keywords occurrence in the text  Event type with the highest sum is predicted as the dominant event 17
  • 18. Location to Event Relations  Chunked sentences  For every sentence containing an extracted location, if a keyword indicator of an event type also occurs in the same sentence  Then is predicted that the location is related to the event type 18
  • 19. Test Set  200 news articles (BBC, Reuters)  100 maritime safety related  100 of general domains (50 of them selected as an attempt to mislead the system)  Each news article manually labeled and automatically processed by the system  Comparison of the results to the labeled news article 19
  • 21. Results of the System 21
  • 23. Results: Text Classification Precision: 100 % Recall: 100 % F-measure: 100 % 23
  • 24. Results: Event Type Recognition Precision: 88% Recall: 97 % F-measure: 92.2 % 24
  • 25. Results: Ship Name Extraction Precision: 18.5% Recall: 45.3% F-measure: 26.3% 25
  • 26. Results: Location Extraction Precision: 88.5% Recall: 74.7% F-measure: 81% 26
  • 27. Results: Date and Time Extraction Precision: 95.3% Recall: 89.4% F-measure: 92.3% 27
  • 28. Results: Dominant Event Recognition Precision: 92% Recall: 92% F-measure: 92% 28
  • 29. Results: Location to Event Relations Precision: 81% Recall: 67.8% F-measure: 73.8% 29
  • 30. Conclusions  The system accomplished the extraction of maritime safety events from news articles  Overall performance of the system was satisfying  The system can be improved and refined  Ship names extraction require a different approach 30
  • 31. Vrije Universiteit MSc Information Sciences Maritime Safety Events Extraction from News Articles Anastasios Martidis anastasios.martidis@student.vu.nl July 31, 2012 Supervisors: Willem R. van Hage, Dr Davide Ceolin, MSc 31