Automated Summarisation of Big Data, useR! 2018

IntRoduction
Automated Summarisation of Big Data
Using data from the Catlin Seaview Survey - a global coral reef
monitoring eﬀort
Amy StringeR
1University of Queensland
UseR! July 2018
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 1 / 35

IntRoduction
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Eﬃcient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaﬂet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work

IntRoduction
[Insert witty crowd banter]

Context
The Catlin Seaview Survey
Coral reef monitoring
program endeavouring to
develop a global baseline on
reef health and then monitor
the state of reefs through
resurvey eﬀorts
5 regions around the world
so far: Australia, the
Caribbean, Southeast Asia,
the Indian Ocean, The
Paciﬁc
Within these 5 major
regions, we have a total of
25 survey countries
(a) Bleaching at the
Maldives, 2016
(b) Bleaching at Heron
Island, 2016

Context The Catlin Data
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
4 New Challenges
Data Challenges
5 Solutions
Child Documents
6 Future Work

Context The Catlin Data
Eﬃcient Monitoring
Three main stages:
1 Collection of images
2 Annotation of images
3 Calculating proportions, and visualing trends between surveys

Context Image Collection
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
4 New Challenges
Data Challenges
5 Solutions
Child Documents
6 Future Work

The Catlin Seaview Survey - Image Collection
High deﬁnition images
collected in 2km transects
along a reef section - taken
automatically every 3 seconds
Each image is GPS located
Speed of collection increased
from traditional 60m2 per dive
(45 min) to 2000m2 per dive Figure: A diver pushing the SVII scooter
during a survey of the Great Barrier reef.
For more on collection methodology, see
[7] c XL Catlin Seaview Survey

Figure: An example image from a survey of the Great Barrier Reef. Images like
this, along with the data, are available on the XL Catlin Global Reef Record [1].

Context Image Annotation
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
4 New Challenges
Data Challenges
5 Solutions
Child Documents
6 Future Work

Neural Network for Image Annotations
Previously a time consuming, manual task (potentially 3 decades of
work for the CSS images)
An automatic point-annotation method is now used based on
machine learning algorithms (See [3])
Colour and texture of images are used as descriptors for label
categories
Coverage estimates are uploading within a week of collection

Figure: The same image from earlier showing the points used for annotation

The Need Eﬃcient SummaRisation
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
4 New Challenges
Data Challenges
5 Solutions
Child Documents
6 Future Work

The Need Eﬃcient SummaRisation
The Next Stage
Fast data collection → fast annotation → bottle neck in processing
Data stored using MySQL database, allowing for easy integration with
R [6, 5]
Introducing Rmarkdown [2]
rmarkdown provides a solution for quick/consistent exploratory
analysis of the data in a report format
Visualisations! [8]

New Challenges Contextual Challenges
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
4 New Challenges
Data Challenges
5 Solutions
Child Documents
6 Future Work

New Challenges Contextual Challenges
Usability for non R users
Meaningful visualisations
They need to be useful for more than just the researchers; local
government and others in charge of marine protection need to get
some value
Comparisons to literature
Many label groups, and spatial scales in the dataset

New Challenges Data Challenges
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
4 New Challenges
Data Challenges
5 Solutions
Child Documents
6 Future Work

Data Challenges - Structure
Sub-region Reef Count Transect Count Image Count
Cairns-Cooktown 13 43 87631
Coral Sea 3 32 23573
Far Northern 12 33 68367
Mackay-Capricorn 4 12 14151
Townsville-Whitsunday 4 10 5722
Total 36 130 199444
Table: A summary of the various spatial scales within just one region, the Great
Barrier Reef. This structure is consistent across all 5 regions.

Data Challenges - Labels
Benthic labels describe the community benthic category
Global labels describe morphological categories
Each region has 5 functional groups
Hard corals, soft corals
Algae
Other invertibrates, other
Region Benthic Labels Global Labels
GBR 27 13
Indian Ocean 49 17
Caribbean 67 16
Southeast Asia 71 17
Paciﬁc 40 12

Solutions Dynamic Plotting Environments
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
4 New Challenges
Data Challenges
5 Solutions
Child Documents
6 Future Work

Plotting Hiccups
Plots have been created at multiple spatial scales; reef scale,
subregion scale, transect scale
Plot will have varying sizes based on the number of
reefs/subregions/transects in the respective regional dataset
Diﬀering label sets among region makes visualisation an exciting
challenge at each of these spatial scales
Near impossible for clearly identiﬁable colours on community benthic
level
Visualisations at this level are more overwhelming than helpful
Single survey regions need some kind of conditioning on their
temporal plot construction

Dynamic Plotting
The code wrapper in the rmarkdown source script allows you to set a
variable to the ﬁgure heights and widths
Figure: Plot wrapper with an exmaple of the ﬁgure height addjustment according
to the number of plot facets. Also shows here is a boolean variable for evaluation
of the code segment.

Reef Scale Visualisations
Figure: An example visualisation of only 3 of the 36 GBR reefs. This plot is
created at the functional group scale. Note that the x axis is year, and the y axis
is percentage coverage over the reef in question.

Reef Scale Visualisations
Figure: An example of the reef scale visualisaton for the reefs only surveyed once.
Coverage here is represented in the same way as the previous plot, giving a
percentage coverage for each basic functional group.

Change at the Transect Scale
Extra challenges that arise
from survey design
Visualised at the global label
level
Investigate change only over
consecutive survey years

Solutions Interactive Maps using Leaﬂet
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
4 New Challenges
Data Challenges
5 Solutions
Child Documents
6 Future Work

Solutions Interactive Maps using Leaflet
Leaflet
Using Leaflet [4] for interactive maps allows for readers to see where
exactly the surveys take place. Each transect marker represents a 2km
survey region.
Figure: Disclaimer - this image is not so interactive

Solutions RMySQL and Parameterised Rmarkdown
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
4 New Challenges
Data Challenges
5 Solutions
Child Documents
6 Future Work

Parameterising Rmarkdown Documents and RMySQL
RMySQL [5] allows for accessing the database through Rstudio negating
the need for an external program
(a) The header when making use of
document parameters. In future, more
parameters may be added to simplify the
source script, but in the current stages
things have been kept simple.
(b) Working example accessing the
database with the document input
parameters. The use of RMySQL allows
for connection to the database and
extraction of data within the Rmarkdown
source script.

How’s the SeRenity?
Figure: The only bit of code a user needs to deal with to generate up to 25
reports.

Solutions Child Documents
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
4 New Challenges
Data Challenges
5 Solutions
Child Documents
6 Future Work

Solutions Child Documents
Child Documents for Textual Components
Introductions and discussions will need to be different across the
regions
Using the parameters of the source we can import a specific
introduction file for each desired region
Child documents allow for easy editing of
introductions/methods/discussions, without needing to open the main
source document which is overwhelming and complicated

Future Work
Future Work
Currently this isn’t a fully automated process
Talk of linking these reports to a website
Extra parameterisation of the document - perhaps a structure change
based on the individual generating the report (e.g. management,
research etc)

Appendix For Further Reading
References I
The Ocean Agency. Global Reef Record. 2017. url:
http://globalreefrecord.org/data.
JJ Allaire et al. rmarkdown: Dynamic Documents for R. R package
version 1.6. 2017. url:
https://CRAN.R-project.org/package=rmarkdown.
O. Beijbom et al. “Towards Automated Annotation of Benthic
Survey Images: Variability of Human Experts and Operational Modes
of Automation”. In: (2015).
Joe Cheng, Bhaskar Karambelkar, and Yihui Xie. leaﬂet: Create
Interactive Web Maps with the JavaScript ’Leaﬂet’ Library. R
package version 1.1.0. 2017. url:
https://CRAN.R-project.org/package=leaflet.

Appendix For Further Reading
References II
Jeroen Ooms et al. RMySQL: Database Interface and ’MySQL’
Driver for R. R package version 0.10.13. 2017. url:
https://CRAN.R-project.org/package=RMySQL.
R Core Team. R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing. Vienna,
Austria, 2013. url: http://www.R-project.org/.
Manuel Gonzlez - Rivero et al. “Scaling up ecological measurements
of coral reefs using semi-automated ﬁeld image collection and
analysis”. In: Remote Sensing 8 (2016). url:
http://www.mdpi.com/2072-4292/8/1/30.
Hadley Wickham. ggplot2: Elegant Graphics for Data Analysis.
Springer-Verlag New York, 2009. isbn: 978-0-387-98140-6. url:
http://ggplot2.org.

Automated Summarisation of Big Data, useR! 2018

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Automated Summarisation of Big Data, useR! 2018

Similar to Automated Summarisation of Big Data, useR! 2018 (20)

Recently uploaded

Recently uploaded (20)

Automated Summarisation of Big Data, useR! 2018