DATA-DRIVEN VESSEL SERVICE TIME FORECASTING USING LONG
SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS
Ibrahim AbuAlhaol, Rafael Falcon, Rami Abielmona, and Emil Petriu
Project : Big Data Analytics for the Maritime Internet of Things
Funding: NSERC CRD 499024-16, Ontario Centres of Excellence (OCE) , Larus Technologies
Motivation
12/11/18 2
‣ Maritime Port congestion causes delay
in the shipping services which results
in financial and reputation losses.
‣ Disruption management mitigates the
impact of disruption events but
requires data-driven insights to
evaluate the disruption.
‣ Can we model and forecast average
Vessel Service Time by using
Automatic Identification System (AIS)?
Contributions
12/11/18 3
‣ Spatiotemporal mining
algorithms to calculate
Convex hull area,
Geohash area, and
vessels proximity.
‣ Analytical formulation of two Port Congestion Indicators (PCIs) to
capture port spatial complexity and spatial density.
‣ AIS-based mining algorithm to estimate average Vessel Service Time.
‣ LSTM-based model to forecast average Vessel Service Time.
12/11/18 4
‣ Reliable shipping services is an important challenge for maritime port
authorities and liner shipping companies.
‣ Regular uncertainties (e.g., port productivity) and disruptions (e.g., weather
conditions) are main factors that affect the quality of maritime port services.
‣ Port authorities need to mitigate disruption by considering all possible causes
and appropriate countermeasures.
‣ We propose data-driven Port Congestion Indicators (PCIs) and Average Vessel
Service Time forecasting model to provide actionable insights to port authority
operation managers and other stakeholders (liner shipping companies)
‣ Seaborne trade is estimated to have 90% of
the volume of global trade and therefore
maritime port performance and resilience is
crucial in sustaining global economic growth.
Connecting the dots!
Data: Automatic Identification System (AIS)
12/11/18
* Draught of a ship’s hull is the vertical distance
between the waterline and the bottom of the hull.
‣ AIS is a vessel tracking system
that provides regular updates
on a vessel’s movement and
other relevant ship voyage
data to other parties.
‣ Static and Dynamic vessel
information can be
electronically exchanged
between AIS-receiving
stations (on board, ashore or
satellite). *
Big Data characteristic of AIS
12/11/18 6
‣ Volume: The AIS data is big in volume where
one year of Terrestrial data is around 1
Terabyte.
‣ Velocity: AIS dynamic messages are broadcast
at different time intervals depending on the
vessel’s speed and rate of turn (from 2 seconds
to 3 minutes).
‣ Veracity: Some of the fields in an AIS message could be either left outdated
or intentionally spoofed.
‣ Variety: AIS messages report different data types (e.g., Destination is text
whereas Maritime Mobile Service Identity (MMSI) is an integer).
Area of Interest [Port of Singapore]
12/11/18 7
Convex Hull Area
Geohash Area
12/11/18 8
Geohash precision and cell height/width
‣ Geohash is a geocoding system encodes a geographic
location into a short string of letters and digits.
‣ We used precision-7 (PREC = 7 ) geohashes which divides the
Area of Interest into 153m x 153m cells.
Geohash Area
AIS Data mining Framework
12/11/18 9
The Framework is composed of
• Cassandra for the consumption of AIS data,
• Spark for mining and extracting, and
• TensorFlow for LSTM modeling and forecasting.
Spatiotemporal Data Mining
12/11/18 10
‣ Convex Hull Area (ConvArea): The port
convex hull area is defined as the area
that encloses all vessels in the smallest
perimeter fence.
‣ Geohash Area (GeoArea):
Geohash is a geocoding system which
encodes a geographic location into a
short string of letters and digits.
Geohash area with precsion-7 is the
sum of all blue squares shown in the
figure.
Convex hull and precision-7 geohashes for Port
of Hong Kong in July 2015 (ShipT ype : 70 - 79,
ShipSpeed <= 5 knots, ConvArea = 63.55 Km2
,
GeoArea = 4.28 Km2).
‣ Average Vessel Proximity (D ): The average distance between the
locations of all vessels that are reported as either “Anchored” or
“Moored” and have a speed less than a predefined threshold.
12/11/18 11
12/11/18 12
12/11/18 13
12/11/18 14
Port Congestion Indicators (PCI) [1/2]
12/11/18 15
‣ Spatial Complexity (SpComplexity): is calculated after mining the convex
hull area (ConvArea) and the average vessel proximity (D ) as presented in
Algorithm 1 and Algorithm 3.
• i is the hour index from all hours (I) in January and February of 2018.
• G(i) denotes the number of unique geohashes at the ith aggregation period.
Port Congestion Indicators (PCI) [2/2]
12/11/18 16
‣ Spatial Density (SpDensity) calculated after mining the convex hull area
(ConvArea) and the Geohash area (GeoArea) as presented in Algorithm 1
and Algorithm 2.
• i is the hour index from all hours (I) in January and February of 2018.
Time series visualization with one hour aggregation
12/11/18 17
Average VST auto-correlation
12/11/18 18
Cross-correlation analysis
12/11/18 19
LSTM with dense layer architecture to forecast Average VST
12/11/18 20
‣ Lag Features: Current and past VST, spatiotemporal characteristics (i.e., ConvArea ,
GeotoArea , !), and congestion indicators (i.e., SpComplexity and SpDensity ).
k = 0; 1; 2; 3; 4 in the current (i.e., k=0) and the
past four aggregation periods (k = 4, is
selected after trying many possible lags).
K=0K=1K=2K=3
K=4
Architecture parameters and MSE Performance
12/11/18 21
We ran each experiment 50 times
and provided the 95% confidence
intervals.
12/11/18 22
LSTM Models Performance with
different time granularities
Summary and Conclusion
12/11/18 23
‣ The mined ConvArea and GeoArea were negatively correlated with
average VST which aligns with the fact that a larger area reduces the
congestion and therefore decreases average VST values.
‣ SpComplexity, SpDensity , and ! were positively correlated with
average VST ; this corroborates the fact that smaller values of these
congestion indicators lead to low average VST values.
‣ The results provides an empirical evidence of the practicality of using
LSTM Recurrent Neural Networks to model and forecast average VST
using current/past spatiotemporal characteristics and congestion
indicators mined from AIS data.
Limitations and Future Work
12/11/18 24
‣ One necessary extension of this work is to train and validate the
models on more AIS data.
‣ Investigate the model on several time and geohash granularities.
‣ The LSTM architecture could be enhanced by adding more layers
or/and incorporating more lags.
‣ Investigate advanced LSTM architectures such as
§ Bi-directional LSTM
§ Stacked LSTM
12/11/18 25
Questions and Feedback!

Bd 2018 ibrahim

  • 1.
    DATA-DRIVEN VESSEL SERVICETIME FORECASTING USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS Ibrahim AbuAlhaol, Rafael Falcon, Rami Abielmona, and Emil Petriu Project : Big Data Analytics for the Maritime Internet of Things Funding: NSERC CRD 499024-16, Ontario Centres of Excellence (OCE) , Larus Technologies
  • 2.
    Motivation 12/11/18 2 ‣ MaritimePort congestion causes delay in the shipping services which results in financial and reputation losses. ‣ Disruption management mitigates the impact of disruption events but requires data-driven insights to evaluate the disruption. ‣ Can we model and forecast average Vessel Service Time by using Automatic Identification System (AIS)?
  • 3.
    Contributions 12/11/18 3 ‣ Spatiotemporalmining algorithms to calculate Convex hull area, Geohash area, and vessels proximity. ‣ Analytical formulation of two Port Congestion Indicators (PCIs) to capture port spatial complexity and spatial density. ‣ AIS-based mining algorithm to estimate average Vessel Service Time. ‣ LSTM-based model to forecast average Vessel Service Time.
  • 4.
    12/11/18 4 ‣ Reliableshipping services is an important challenge for maritime port authorities and liner shipping companies. ‣ Regular uncertainties (e.g., port productivity) and disruptions (e.g., weather conditions) are main factors that affect the quality of maritime port services. ‣ Port authorities need to mitigate disruption by considering all possible causes and appropriate countermeasures. ‣ We propose data-driven Port Congestion Indicators (PCIs) and Average Vessel Service Time forecasting model to provide actionable insights to port authority operation managers and other stakeholders (liner shipping companies) ‣ Seaborne trade is estimated to have 90% of the volume of global trade and therefore maritime port performance and resilience is crucial in sustaining global economic growth. Connecting the dots!
  • 5.
    Data: Automatic IdentificationSystem (AIS) 12/11/18 * Draught of a ship’s hull is the vertical distance between the waterline and the bottom of the hull. ‣ AIS is a vessel tracking system that provides regular updates on a vessel’s movement and other relevant ship voyage data to other parties. ‣ Static and Dynamic vessel information can be electronically exchanged between AIS-receiving stations (on board, ashore or satellite). *
  • 6.
    Big Data characteristicof AIS 12/11/18 6 ‣ Volume: The AIS data is big in volume where one year of Terrestrial data is around 1 Terabyte. ‣ Velocity: AIS dynamic messages are broadcast at different time intervals depending on the vessel’s speed and rate of turn (from 2 seconds to 3 minutes). ‣ Veracity: Some of the fields in an AIS message could be either left outdated or intentionally spoofed. ‣ Variety: AIS messages report different data types (e.g., Destination is text whereas Maritime Mobile Service Identity (MMSI) is an integer).
  • 7.
    Area of Interest[Port of Singapore] 12/11/18 7 Convex Hull Area Geohash Area
  • 8.
    12/11/18 8 Geohash precisionand cell height/width ‣ Geohash is a geocoding system encodes a geographic location into a short string of letters and digits. ‣ We used precision-7 (PREC = 7 ) geohashes which divides the Area of Interest into 153m x 153m cells. Geohash Area
  • 9.
    AIS Data miningFramework 12/11/18 9 The Framework is composed of • Cassandra for the consumption of AIS data, • Spark for mining and extracting, and • TensorFlow for LSTM modeling and forecasting.
  • 10.
    Spatiotemporal Data Mining 12/11/1810 ‣ Convex Hull Area (ConvArea): The port convex hull area is defined as the area that encloses all vessels in the smallest perimeter fence. ‣ Geohash Area (GeoArea): Geohash is a geocoding system which encodes a geographic location into a short string of letters and digits. Geohash area with precsion-7 is the sum of all blue squares shown in the figure. Convex hull and precision-7 geohashes for Port of Hong Kong in July 2015 (ShipT ype : 70 - 79, ShipSpeed <= 5 knots, ConvArea = 63.55 Km2 , GeoArea = 4.28 Km2). ‣ Average Vessel Proximity (D ): The average distance between the locations of all vessels that are reported as either “Anchored” or “Moored” and have a speed less than a predefined threshold.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    Port Congestion Indicators(PCI) [1/2] 12/11/18 15 ‣ Spatial Complexity (SpComplexity): is calculated after mining the convex hull area (ConvArea) and the average vessel proximity (D ) as presented in Algorithm 1 and Algorithm 3. • i is the hour index from all hours (I) in January and February of 2018. • G(i) denotes the number of unique geohashes at the ith aggregation period.
  • 16.
    Port Congestion Indicators(PCI) [2/2] 12/11/18 16 ‣ Spatial Density (SpDensity) calculated after mining the convex hull area (ConvArea) and the Geohash area (GeoArea) as presented in Algorithm 1 and Algorithm 2. • i is the hour index from all hours (I) in January and February of 2018.
  • 17.
    Time series visualizationwith one hour aggregation 12/11/18 17
  • 18.
  • 19.
  • 20.
    LSTM with denselayer architecture to forecast Average VST 12/11/18 20 ‣ Lag Features: Current and past VST, spatiotemporal characteristics (i.e., ConvArea , GeotoArea , !), and congestion indicators (i.e., SpComplexity and SpDensity ). k = 0; 1; 2; 3; 4 in the current (i.e., k=0) and the past four aggregation periods (k = 4, is selected after trying many possible lags). K=0K=1K=2K=3 K=4
  • 21.
    Architecture parameters andMSE Performance 12/11/18 21 We ran each experiment 50 times and provided the 95% confidence intervals.
  • 22.
    12/11/18 22 LSTM ModelsPerformance with different time granularities
  • 23.
    Summary and Conclusion 12/11/1823 ‣ The mined ConvArea and GeoArea were negatively correlated with average VST which aligns with the fact that a larger area reduces the congestion and therefore decreases average VST values. ‣ SpComplexity, SpDensity , and ! were positively correlated with average VST ; this corroborates the fact that smaller values of these congestion indicators lead to low average VST values. ‣ The results provides an empirical evidence of the practicality of using LSTM Recurrent Neural Networks to model and forecast average VST using current/past spatiotemporal characteristics and congestion indicators mined from AIS data.
  • 24.
    Limitations and FutureWork 12/11/18 24 ‣ One necessary extension of this work is to train and validate the models on more AIS data. ‣ Investigate the model on several time and geohash granularities. ‣ The LSTM architecture could be enhanced by adding more layers or/and incorporating more lags. ‣ Investigate advanced LSTM architectures such as § Bi-directional LSTM § Stacked LSTM
  • 25.