SlideShare a Scribd company logo
IntRoduction
Automated Summarisation of Big Data
Using data from the Catlin Seaview Survey - a global coral reef
monitoring effort
Amy StringeR
1University of Queensland
UseR! July 2018
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 1 / 35
IntRoduction
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 2 / 35
IntRoduction
[Insert witty crowd banter]
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 3 / 35
Context
The Catlin Seaview Survey
Coral reef monitoring
program endeavouring to
develop a global baseline on
reef health and then monitor
the state of reefs through
resurvey efforts
5 regions around the world
so far: Australia, the
Caribbean, Southeast Asia,
the Indian Ocean, The
Pacific
Within these 5 major
regions, we have a total of
25 survey countries
(a) Bleaching at the
Maldives, 2016
(b) Bleaching at Heron
Island, 2016
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 4 / 35
Context The Catlin Data
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 5 / 35
Context The Catlin Data
Efficient Monitoring
Three main stages:
1 Collection of images
2 Annotation of images
3 Calculating proportions, and visualing trends between surveys
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 6 / 35
Context Image Collection
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 7 / 35
Context Image Collection
The Catlin Seaview Survey - Image Collection
High definition images
collected in 2km transects
along a reef section - taken
automatically every 3 seconds
Each image is GPS located
Speed of collection increased
from traditional 60m2 per dive
(45 min) to 2000m2 per dive Figure: A diver pushing the SVII scooter
during a survey of the Great Barrier reef.
For more on collection methodology, see
[7] c XL Catlin Seaview Survey
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 8 / 35
Context Image Collection
Figure: An example image from a survey of the Great Barrier Reef. Images like
this, along with the data, are available on the XL Catlin Global Reef Record [1].
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 9 / 35
Context Image Annotation
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 10 / 35
Context Image Annotation
Neural Network for Image Annotations
Previously a time consuming, manual task (potentially 3 decades of
work for the CSS images)
An automatic point-annotation method is now used based on
machine learning algorithms (See [3])
Colour and texture of images are used as descriptors for label
categories
Coverage estimates are uploading within a week of collection
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 11 / 35
Context Image Annotation
Figure: The same image from earlier showing the points used for annotation
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 12 / 35
The Need Efficient SummaRisation
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 13 / 35
The Need Efficient SummaRisation
The Next Stage
Fast data collection → fast annotation → bottle neck in processing
Data stored using MySQL database, allowing for easy integration with
R [6, 5]
Introducing Rmarkdown [2]
rmarkdown provides a solution for quick/consistent exploratory
analysis of the data in a report format
Visualisations! [8]
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 14 / 35
New Challenges Contextual Challenges
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 15 / 35
New Challenges Contextual Challenges
Contextual Challenges
Usability for non R users
Meaningful visualisations
They need to be useful for more than just the researchers; local
government and others in charge of marine protection need to get
some value
Comparisons to literature
Many label groups, and spatial scales in the dataset
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 16 / 35
New Challenges Data Challenges
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 17 / 35
New Challenges Data Challenges
Data Challenges - Structure
Sub-region Reef Count Transect Count Image Count
Cairns-Cooktown 13 43 87631
Coral Sea 3 32 23573
Far Northern 12 33 68367
Mackay-Capricorn 4 12 14151
Townsville-Whitsunday 4 10 5722
Total 36 130 199444
Table: A summary of the various spatial scales within just one region, the Great
Barrier Reef. This structure is consistent across all 5 regions.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 18 / 35
New Challenges Data Challenges
Data Challenges - Labels
Benthic labels describe the community benthic category
Global labels describe morphological categories
Each region has 5 functional groups
Hard corals, soft corals
Algae
Other invertibrates, other
Region Benthic Labels Global Labels
GBR 27 13
Indian Ocean 49 17
Caribbean 67 16
Southeast Asia 71 17
Pacific 40 12
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 19 / 35
Solutions Dynamic Plotting Environments
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 20 / 35
Solutions Dynamic Plotting Environments
Plotting Hiccups
Plots have been created at multiple spatial scales; reef scale,
subregion scale, transect scale
Plot will have varying sizes based on the number of
reefs/subregions/transects in the respective regional dataset
Differing label sets among region makes visualisation an exciting
challenge at each of these spatial scales
Near impossible for clearly identifiable colours on community benthic
level
Visualisations at this level are more overwhelming than helpful
Single survey regions need some kind of conditioning on their
temporal plot construction
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 21 / 35
Solutions Dynamic Plotting Environments
Dynamic Plotting
The code wrapper in the rmarkdown source script allows you to set a
variable to the figure heights and widths
Figure: Plot wrapper with an exmaple of the figure height addjustment according
to the number of plot facets. Also shows here is a boolean variable for evaluation
of the code segment.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 22 / 35
Solutions Dynamic Plotting Environments
Reef Scale Visualisations
Figure: An example visualisation of only 3 of the 36 GBR reefs. This plot is
created at the functional group scale. Note that the x axis is year, and the y axis
is percentage coverage over the reef in question.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 23 / 35
Solutions Dynamic Plotting Environments
Reef Scale Visualisations
Figure: An example of the reef scale visualisaton for the reefs only surveyed once.
Coverage here is represented in the same way as the previous plot, giving a
percentage coverage for each basic functional group.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 24 / 35
Solutions Dynamic Plotting Environments
Change at the Transect Scale
Extra challenges that arise
from survey design
Visualised at the global label
level
Investigate change only over
consecutive survey years
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 25 / 35
Solutions Interactive Maps using Leaflet
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 26 / 35
Solutions Interactive Maps using Leaflet
Leaflet
Using Leaflet [4] for interactive maps allows for readers to see where
exactly the surveys take place. Each transect marker represents a 2km
survey region.
Figure: Disclaimer - this image is not so interactive
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 27 / 35
Solutions RMySQL and Parameterised Rmarkdown
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 28 / 35
Solutions RMySQL and Parameterised Rmarkdown
Parameterising Rmarkdown Documents and RMySQL
RMySQL [5] allows for accessing the database through Rstudio negating
the need for an external program
(a) The header when making use of
document parameters. In future, more
parameters may be added to simplify the
source script, but in the current stages
things have been kept simple.
(b) Working example accessing the
database with the document input
parameters. The use of RMySQL allows
for connection to the database and
extraction of data within the Rmarkdown
source script.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 29 / 35
Solutions RMySQL and Parameterised Rmarkdown
How’s the SeRenity?
Figure: The only bit of code a user needs to deal with to generate up to 25
reports.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 30 / 35
Solutions Child Documents
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 31 / 35
Solutions Child Documents
Child Documents for Textual Components
Introductions and discussions will need to be different across the
regions
Using the parameters of the source we can import a specific
introduction file for each desired region
Child documents allow for easy editing of
introductions/methods/discussions, without needing to open the main
source document which is overwhelming and complicated
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 32 / 35
Future Work
Future Work
Currently this isn’t a fully automated process
Talk of linking these reports to a website
Extra parameterisation of the document - perhaps a structure change
based on the individual generating the report (e.g. management,
research etc)
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 33 / 35
Appendix For Further Reading
References I
The Ocean Agency. Global Reef Record. 2017. url:
http://globalreefrecord.org/data.
JJ Allaire et al. rmarkdown: Dynamic Documents for R. R package
version 1.6. 2017. url:
https://CRAN.R-project.org/package=rmarkdown.
O. Beijbom et al. “Towards Automated Annotation of Benthic
Survey Images: Variability of Human Experts and Operational Modes
of Automation”. In: (2015).
Joe Cheng, Bhaskar Karambelkar, and Yihui Xie. leaflet: Create
Interactive Web Maps with the JavaScript ’Leaflet’ Library. R
package version 1.1.0. 2017. url:
https://CRAN.R-project.org/package=leaflet.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 34 / 35
Appendix For Further Reading
References II
Jeroen Ooms et al. RMySQL: Database Interface and ’MySQL’
Driver for R. R package version 0.10.13. 2017. url:
https://CRAN.R-project.org/package=RMySQL.
R Core Team. R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing. Vienna,
Austria, 2013. url: http://www.R-project.org/.
Manuel Gonzlez - Rivero et al. “Scaling up ecological measurements
of coral reefs using semi-automated field image collection and
analysis”. In: Remote Sensing 8 (2016). url:
http://www.mdpi.com/2072-4292/8/1/30.
Hadley Wickham. ggplot2: Elegant Graphics for Data Analysis.
Springer-Verlag New York, 2009. isbn: 978-0-387-98140-6. url:
http://ggplot2.org.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 35 / 35

More Related Content

What's hot

Application of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailandApplication of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailand
AIMS (Agricultural Information Management Standards)
 
Steve- Fall 2015 Research Poster revision4
Steve- Fall 2015 Research Poster revision4Steve- Fall 2015 Research Poster revision4
Steve- Fall 2015 Research Poster revision4
Jeehwan Steve Lee
 
The GEM database of seismic hazard models
The GEM database of seismic hazard modelsThe GEM database of seismic hazard models
The GEM database of seismic hazard models
Global Earthquake Model Foundation
 
Using GEM’S Tools and Datasets for Calculating Hazard Across the Globe
Using GEM’S Tools and Datasets for Calculating Hazard Across the GlobeUsing GEM’S Tools and Datasets for Calculating Hazard Across the Globe
Using GEM’S Tools and Datasets for Calculating Hazard Across the Globe
Global Earthquake Model Foundation
 
How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1
wang yaohui
 
TexelTek - Andrew Levine - Hadoop World 2010
TexelTek - Andrew Levine - Hadoop World 2010TexelTek - Andrew Levine - Hadoop World 2010
TexelTek - Andrew Levine - Hadoop World 2010
Cloudera, Inc.
 
Expert judgment-based fragility functions to better characterize physical vu...
Expert judgment-based fragility functions to better characterize physical vu...Expert judgment-based fragility functions to better characterize physical vu...
Expert judgment-based fragility functions to better characterize physical vu...
Global Earthquake Model Foundation
 
K venkata reddy
K venkata reddyK venkata reddy
K venkata reddy
ClimDev15
 
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
The Statistical and Applied Mathematical Sciences Institute
 
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
Brad Evans
 
The Seismic Hazard Modeller’s Toolkit: An Open-Source Library for the Const...
The Seismic Hazard Modeller’s Toolkit:  An Open-Source Library for the Const...The Seismic Hazard Modeller’s Toolkit:  An Open-Source Library for the Const...
The Seismic Hazard Modeller’s Toolkit: An Open-Source Library for the Const...
Global Earthquake Model Foundation
 
GEM’s hazard products: outcomes and applications
GEM’s hazard products: outcomes and applicationsGEM’s hazard products: outcomes and applications
GEM’s hazard products: outcomes and applications
Global Earthquake Model Foundation
 
Opendtect course
Opendtect courseOpendtect course
Opendtect course
Amir Hossein Mardan
 
Application packaging and systematic processing in earth observation exploita...
Application packaging and systematic processing in earth observation exploita...Application packaging and systematic processing in earth observation exploita...
Application packaging and systematic processing in earth observation exploita...
terradue
 
Space & AI, Franck Marchis
Space & AI, Franck MarchisSpace & AI, Franck Marchis
Space & AI, Franck Marchis
Matti Watt
 
Denis Reznik Data driven future
Denis Reznik Data driven futureDenis Reznik Data driven future
Denis Reznik Data driven future
Аліна Шепшелей
 
MoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical DataMoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical Data
Kerstin Lehnert
 

What's hot (17)

Application of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailandApplication of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailand
 
Steve- Fall 2015 Research Poster revision4
Steve- Fall 2015 Research Poster revision4Steve- Fall 2015 Research Poster revision4
Steve- Fall 2015 Research Poster revision4
 
The GEM database of seismic hazard models
The GEM database of seismic hazard modelsThe GEM database of seismic hazard models
The GEM database of seismic hazard models
 
Using GEM’S Tools and Datasets for Calculating Hazard Across the Globe
Using GEM’S Tools and Datasets for Calculating Hazard Across the GlobeUsing GEM’S Tools and Datasets for Calculating Hazard Across the Globe
Using GEM’S Tools and Datasets for Calculating Hazard Across the Globe
 
How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1
 
TexelTek - Andrew Levine - Hadoop World 2010
TexelTek - Andrew Levine - Hadoop World 2010TexelTek - Andrew Levine - Hadoop World 2010
TexelTek - Andrew Levine - Hadoop World 2010
 
Expert judgment-based fragility functions to better characterize physical vu...
Expert judgment-based fragility functions to better characterize physical vu...Expert judgment-based fragility functions to better characterize physical vu...
Expert judgment-based fragility functions to better characterize physical vu...
 
K venkata reddy
K venkata reddyK venkata reddy
K venkata reddy
 
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
 
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
 
The Seismic Hazard Modeller’s Toolkit: An Open-Source Library for the Const...
The Seismic Hazard Modeller’s Toolkit:  An Open-Source Library for the Const...The Seismic Hazard Modeller’s Toolkit:  An Open-Source Library for the Const...
The Seismic Hazard Modeller’s Toolkit: An Open-Source Library for the Const...
 
GEM’s hazard products: outcomes and applications
GEM’s hazard products: outcomes and applicationsGEM’s hazard products: outcomes and applications
GEM’s hazard products: outcomes and applications
 
Opendtect course
Opendtect courseOpendtect course
Opendtect course
 
Application packaging and systematic processing in earth observation exploita...
Application packaging and systematic processing in earth observation exploita...Application packaging and systematic processing in earth observation exploita...
Application packaging and systematic processing in earth observation exploita...
 
Space & AI, Franck Marchis
Space & AI, Franck MarchisSpace & AI, Franck Marchis
Space & AI, Franck Marchis
 
Denis Reznik Data driven future
Denis Reznik Data driven futureDenis Reznik Data driven future
Denis Reznik Data driven future
 
MoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical DataMoonDB: Restoration & Synthesis of Planetary Geochemical Data
MoonDB: Restoration & Synthesis of Planetary Geochemical Data
 

Similar to Automated Summarisation of Big Data, useR! 2018

"Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them..."Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them...
Tom Moritz
 
Nuclear emergency response and Big Data technologies
Nuclear emergency response and Big Data technologiesNuclear emergency response and Big Data technologies
Nuclear emergency response and Big Data technologies
BigData_Europe
 
geostack
geostackgeostack
geostack
Joana Simoes
 
ML & Decision Making
ML & Decision MakingML & Decision Making
ML & Decision Making
ChristineCheong4
 
big data.pptx
big data.pptxbig data.pptx
big data.pptx
TejashreeKumar3
 
 Gigapixel resolution imaging for near-remote sensing and phenomics
 Gigapixel resolution imaging for near-remote sensing and phenomics Gigapixel resolution imaging for near-remote sensing and phenomics
 Gigapixel resolution imaging for near-remote sensing and phenomics
TimeScience
 
IRJET- Geological Boundary Detection for Satellite Images using AI Technique
IRJET- Geological Boundary Detection for Satellite Images using AI TechniqueIRJET- Geological Boundary Detection for Satellite Images using AI Technique
IRJET- Geological Boundary Detection for Satellite Images using AI Technique
IRJET Journal
 
Enviromental impact assesment for highway projects
Enviromental impact assesment for highway projectsEnviromental impact assesment for highway projects
Enviromental impact assesment for highway projects
Kushal Patel
 
Model Build ArcPy Into Your FME Workflows
Model Build ArcPy Into Your FME WorkflowsModel Build ArcPy Into Your FME Workflows
Model Build ArcPy Into Your FME Workflows
Safe Software
 
IRJET- Land Cover Index Classification using Satellite Images with Different ...
IRJET- Land Cover Index Classification using Satellite Images with Different ...IRJET- Land Cover Index Classification using Satellite Images with Different ...
IRJET- Land Cover Index Classification using Satellite Images with Different ...
IRJET Journal
 
SC10 project slides
SC10 project slidesSC10 project slides
SC10 project slides
Jason Riedy
 
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
IRJET Journal
 
2016EvergladesFinalPresentation
2016EvergladesFinalPresentation2016EvergladesFinalPresentation
2016EvergladesFinalPresentation
Caitlin Toner
 
Performance Analysis of 5 MWP Grid-Connected Solar PV Power Plant Using IE...
Performance Analysis  of  5 MWP Grid-Connected  Solar PV Power Plant Using IE...Performance Analysis  of  5 MWP Grid-Connected  Solar PV Power Plant Using IE...
Performance Analysis of 5 MWP Grid-Connected Solar PV Power Plant Using IE...
IRJET Journal
 
Final thesis presentation
Final thesis presentationFinal thesis presentation
Final thesis presentation
Pawan Singh
 
IAOS 2018 - Satellite imagery analysis for Sustainable Development Goals: req...
IAOS 2018 - Satellite imagery analysis for Sustainable Development Goals: req...IAOS 2018 - Satellite imagery analysis for Sustainable Development Goals: req...
IAOS 2018 - Satellite imagery analysis for Sustainable Development Goals: req...
StatsCommunications
 
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshop
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshopDay 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshop
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshop
ICIMOD
 
Climate Monitoring and Prediction using Supervised Machine Learning
Climate Monitoring and Prediction using Supervised Machine LearningClimate Monitoring and Prediction using Supervised Machine Learning
Climate Monitoring and Prediction using Supervised Machine Learning
IRJET Journal
 
Using explainable machine learning to evaluate climate change projections
Using explainable machine learning to evaluate climate change projectionsUsing explainable machine learning to evaluate climate change projections
Using explainable machine learning to evaluate climate change projections
Zachary Labe
 
Assessing the performance of random forest regression for estimating canopy h...
Assessing the performance of random forest regression for estimating canopy h...Assessing the performance of random forest regression for estimating canopy h...
Assessing the performance of random forest regression for estimating canopy h...
IJECEIAES
 

Similar to Automated Summarisation of Big Data, useR! 2018 (20)

"Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them..."Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them...
 
Nuclear emergency response and Big Data technologies
Nuclear emergency response and Big Data technologiesNuclear emergency response and Big Data technologies
Nuclear emergency response and Big Data technologies
 
geostack
geostackgeostack
geostack
 
ML & Decision Making
ML & Decision MakingML & Decision Making
ML & Decision Making
 
big data.pptx
big data.pptxbig data.pptx
big data.pptx
 
 Gigapixel resolution imaging for near-remote sensing and phenomics
 Gigapixel resolution imaging for near-remote sensing and phenomics Gigapixel resolution imaging for near-remote sensing and phenomics
 Gigapixel resolution imaging for near-remote sensing and phenomics
 
IRJET- Geological Boundary Detection for Satellite Images using AI Technique
IRJET- Geological Boundary Detection for Satellite Images using AI TechniqueIRJET- Geological Boundary Detection for Satellite Images using AI Technique
IRJET- Geological Boundary Detection for Satellite Images using AI Technique
 
Enviromental impact assesment for highway projects
Enviromental impact assesment for highway projectsEnviromental impact assesment for highway projects
Enviromental impact assesment for highway projects
 
Model Build ArcPy Into Your FME Workflows
Model Build ArcPy Into Your FME WorkflowsModel Build ArcPy Into Your FME Workflows
Model Build ArcPy Into Your FME Workflows
 
IRJET- Land Cover Index Classification using Satellite Images with Different ...
IRJET- Land Cover Index Classification using Satellite Images with Different ...IRJET- Land Cover Index Classification using Satellite Images with Different ...
IRJET- Land Cover Index Classification using Satellite Images with Different ...
 
SC10 project slides
SC10 project slidesSC10 project slides
SC10 project slides
 
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
Predicting Flood Impacts: Analyzing Flood Dataset using Machine Learning Algo...
 
2016EvergladesFinalPresentation
2016EvergladesFinalPresentation2016EvergladesFinalPresentation
2016EvergladesFinalPresentation
 
Performance Analysis of 5 MWP Grid-Connected Solar PV Power Plant Using IE...
Performance Analysis  of  5 MWP Grid-Connected  Solar PV Power Plant Using IE...Performance Analysis  of  5 MWP Grid-Connected  Solar PV Power Plant Using IE...
Performance Analysis of 5 MWP Grid-Connected Solar PV Power Plant Using IE...
 
Final thesis presentation
Final thesis presentationFinal thesis presentation
Final thesis presentation
 
IAOS 2018 - Satellite imagery analysis for Sustainable Development Goals: req...
IAOS 2018 - Satellite imagery analysis for Sustainable Development Goals: req...IAOS 2018 - Satellite imagery analysis for Sustainable Development Goals: req...
IAOS 2018 - Satellite imagery analysis for Sustainable Development Goals: req...
 
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshop
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshopDay 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshop
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshop
 
Climate Monitoring and Prediction using Supervised Machine Learning
Climate Monitoring and Prediction using Supervised Machine LearningClimate Monitoring and Prediction using Supervised Machine Learning
Climate Monitoring and Prediction using Supervised Machine Learning
 
Using explainable machine learning to evaluate climate change projections
Using explainable machine learning to evaluate climate change projectionsUsing explainable machine learning to evaluate climate change projections
Using explainable machine learning to evaluate climate change projections
 
Assessing the performance of random forest regression for estimating canopy h...
Assessing the performance of random forest regression for estimating canopy h...Assessing the performance of random forest regression for estimating canopy h...
Assessing the performance of random forest regression for estimating canopy h...
 

Recently uploaded

DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 

Recently uploaded (20)

DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 

Automated Summarisation of Big Data, useR! 2018

  • 1. IntRoduction Automated Summarisation of Big Data Using data from the Catlin Seaview Survey - a global coral reef monitoring effort Amy StringeR 1University of Queensland UseR! July 2018 Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 1 / 35
  • 2. IntRoduction Outline 1 IntRoduction 2 Context The Catlin Data Image Collection Image Annotation 3 The Need Efficient SummaRisation 4 New Challenges Contextual Challenges Data Challenges 5 Solutions Dynamic Plotting Environments Interactive Maps using Leaflet RMySQL and Parameterised Rmarkdown Child Documents 6 Future Work Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 2 / 35
  • 3. IntRoduction [Insert witty crowd banter] Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 3 / 35
  • 4. Context The Catlin Seaview Survey Coral reef monitoring program endeavouring to develop a global baseline on reef health and then monitor the state of reefs through resurvey efforts 5 regions around the world so far: Australia, the Caribbean, Southeast Asia, the Indian Ocean, The Pacific Within these 5 major regions, we have a total of 25 survey countries (a) Bleaching at the Maldives, 2016 (b) Bleaching at Heron Island, 2016 Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 4 / 35
  • 5. Context The Catlin Data Outline 1 IntRoduction 2 Context The Catlin Data Image Collection Image Annotation 3 The Need Efficient SummaRisation 4 New Challenges Contextual Challenges Data Challenges 5 Solutions Dynamic Plotting Environments Interactive Maps using Leaflet RMySQL and Parameterised Rmarkdown Child Documents 6 Future Work Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 5 / 35
  • 6. Context The Catlin Data Efficient Monitoring Three main stages: 1 Collection of images 2 Annotation of images 3 Calculating proportions, and visualing trends between surveys Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 6 / 35
  • 7. Context Image Collection Outline 1 IntRoduction 2 Context The Catlin Data Image Collection Image Annotation 3 The Need Efficient SummaRisation 4 New Challenges Contextual Challenges Data Challenges 5 Solutions Dynamic Plotting Environments Interactive Maps using Leaflet RMySQL and Parameterised Rmarkdown Child Documents 6 Future Work Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 7 / 35
  • 8. Context Image Collection The Catlin Seaview Survey - Image Collection High definition images collected in 2km transects along a reef section - taken automatically every 3 seconds Each image is GPS located Speed of collection increased from traditional 60m2 per dive (45 min) to 2000m2 per dive Figure: A diver pushing the SVII scooter during a survey of the Great Barrier reef. For more on collection methodology, see [7] c XL Catlin Seaview Survey Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 8 / 35
  • 9. Context Image Collection Figure: An example image from a survey of the Great Barrier Reef. Images like this, along with the data, are available on the XL Catlin Global Reef Record [1]. Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 9 / 35
  • 10. Context Image Annotation Outline 1 IntRoduction 2 Context The Catlin Data Image Collection Image Annotation 3 The Need Efficient SummaRisation 4 New Challenges Contextual Challenges Data Challenges 5 Solutions Dynamic Plotting Environments Interactive Maps using Leaflet RMySQL and Parameterised Rmarkdown Child Documents 6 Future Work Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 10 / 35
  • 11. Context Image Annotation Neural Network for Image Annotations Previously a time consuming, manual task (potentially 3 decades of work for the CSS images) An automatic point-annotation method is now used based on machine learning algorithms (See [3]) Colour and texture of images are used as descriptors for label categories Coverage estimates are uploading within a week of collection Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 11 / 35
  • 12. Context Image Annotation Figure: The same image from earlier showing the points used for annotation Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 12 / 35
  • 13. The Need Efficient SummaRisation Outline 1 IntRoduction 2 Context The Catlin Data Image Collection Image Annotation 3 The Need Efficient SummaRisation 4 New Challenges Contextual Challenges Data Challenges 5 Solutions Dynamic Plotting Environments Interactive Maps using Leaflet RMySQL and Parameterised Rmarkdown Child Documents 6 Future Work Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 13 / 35
  • 14. The Need Efficient SummaRisation The Next Stage Fast data collection → fast annotation → bottle neck in processing Data stored using MySQL database, allowing for easy integration with R [6, 5] Introducing Rmarkdown [2] rmarkdown provides a solution for quick/consistent exploratory analysis of the data in a report format Visualisations! [8] Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 14 / 35
  • 15. New Challenges Contextual Challenges Outline 1 IntRoduction 2 Context The Catlin Data Image Collection Image Annotation 3 The Need Efficient SummaRisation 4 New Challenges Contextual Challenges Data Challenges 5 Solutions Dynamic Plotting Environments Interactive Maps using Leaflet RMySQL and Parameterised Rmarkdown Child Documents 6 Future Work Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 15 / 35
  • 16. New Challenges Contextual Challenges Contextual Challenges Usability for non R users Meaningful visualisations They need to be useful for more than just the researchers; local government and others in charge of marine protection need to get some value Comparisons to literature Many label groups, and spatial scales in the dataset Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 16 / 35
  • 17. New Challenges Data Challenges Outline 1 IntRoduction 2 Context The Catlin Data Image Collection Image Annotation 3 The Need Efficient SummaRisation 4 New Challenges Contextual Challenges Data Challenges 5 Solutions Dynamic Plotting Environments Interactive Maps using Leaflet RMySQL and Parameterised Rmarkdown Child Documents 6 Future Work Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 17 / 35
  • 18. New Challenges Data Challenges Data Challenges - Structure Sub-region Reef Count Transect Count Image Count Cairns-Cooktown 13 43 87631 Coral Sea 3 32 23573 Far Northern 12 33 68367 Mackay-Capricorn 4 12 14151 Townsville-Whitsunday 4 10 5722 Total 36 130 199444 Table: A summary of the various spatial scales within just one region, the Great Barrier Reef. This structure is consistent across all 5 regions. Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 18 / 35
  • 19. New Challenges Data Challenges Data Challenges - Labels Benthic labels describe the community benthic category Global labels describe morphological categories Each region has 5 functional groups Hard corals, soft corals Algae Other invertibrates, other Region Benthic Labels Global Labels GBR 27 13 Indian Ocean 49 17 Caribbean 67 16 Southeast Asia 71 17 Pacific 40 12 Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 19 / 35
  • 20. Solutions Dynamic Plotting Environments Outline 1 IntRoduction 2 Context The Catlin Data Image Collection Image Annotation 3 The Need Efficient SummaRisation 4 New Challenges Contextual Challenges Data Challenges 5 Solutions Dynamic Plotting Environments Interactive Maps using Leaflet RMySQL and Parameterised Rmarkdown Child Documents 6 Future Work Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 20 / 35
  • 21. Solutions Dynamic Plotting Environments Plotting Hiccups Plots have been created at multiple spatial scales; reef scale, subregion scale, transect scale Plot will have varying sizes based on the number of reefs/subregions/transects in the respective regional dataset Differing label sets among region makes visualisation an exciting challenge at each of these spatial scales Near impossible for clearly identifiable colours on community benthic level Visualisations at this level are more overwhelming than helpful Single survey regions need some kind of conditioning on their temporal plot construction Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 21 / 35
  • 22. Solutions Dynamic Plotting Environments Dynamic Plotting The code wrapper in the rmarkdown source script allows you to set a variable to the figure heights and widths Figure: Plot wrapper with an exmaple of the figure height addjustment according to the number of plot facets. Also shows here is a boolean variable for evaluation of the code segment. Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 22 / 35
  • 23. Solutions Dynamic Plotting Environments Reef Scale Visualisations Figure: An example visualisation of only 3 of the 36 GBR reefs. This plot is created at the functional group scale. Note that the x axis is year, and the y axis is percentage coverage over the reef in question. Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 23 / 35
  • 24. Solutions Dynamic Plotting Environments Reef Scale Visualisations Figure: An example of the reef scale visualisaton for the reefs only surveyed once. Coverage here is represented in the same way as the previous plot, giving a percentage coverage for each basic functional group. Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 24 / 35
  • 25. Solutions Dynamic Plotting Environments Change at the Transect Scale Extra challenges that arise from survey design Visualised at the global label level Investigate change only over consecutive survey years Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 25 / 35
  • 26. Solutions Interactive Maps using Leaflet Outline 1 IntRoduction 2 Context The Catlin Data Image Collection Image Annotation 3 The Need Efficient SummaRisation 4 New Challenges Contextual Challenges Data Challenges 5 Solutions Dynamic Plotting Environments Interactive Maps using Leaflet RMySQL and Parameterised Rmarkdown Child Documents 6 Future Work Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 26 / 35
  • 27. Solutions Interactive Maps using Leaflet Leaflet Using Leaflet [4] for interactive maps allows for readers to see where exactly the surveys take place. Each transect marker represents a 2km survey region. Figure: Disclaimer - this image is not so interactive Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 27 / 35
  • 28. Solutions RMySQL and Parameterised Rmarkdown Outline 1 IntRoduction 2 Context The Catlin Data Image Collection Image Annotation 3 The Need Efficient SummaRisation 4 New Challenges Contextual Challenges Data Challenges 5 Solutions Dynamic Plotting Environments Interactive Maps using Leaflet RMySQL and Parameterised Rmarkdown Child Documents 6 Future Work Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 28 / 35
  • 29. Solutions RMySQL and Parameterised Rmarkdown Parameterising Rmarkdown Documents and RMySQL RMySQL [5] allows for accessing the database through Rstudio negating the need for an external program (a) The header when making use of document parameters. In future, more parameters may be added to simplify the source script, but in the current stages things have been kept simple. (b) Working example accessing the database with the document input parameters. The use of RMySQL allows for connection to the database and extraction of data within the Rmarkdown source script. Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 29 / 35
  • 30. Solutions RMySQL and Parameterised Rmarkdown How’s the SeRenity? Figure: The only bit of code a user needs to deal with to generate up to 25 reports. Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 30 / 35
  • 31. Solutions Child Documents Outline 1 IntRoduction 2 Context The Catlin Data Image Collection Image Annotation 3 The Need Efficient SummaRisation 4 New Challenges Contextual Challenges Data Challenges 5 Solutions Dynamic Plotting Environments Interactive Maps using Leaflet RMySQL and Parameterised Rmarkdown Child Documents 6 Future Work Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 31 / 35
  • 32. Solutions Child Documents Child Documents for Textual Components Introductions and discussions will need to be different across the regions Using the parameters of the source we can import a specific introduction file for each desired region Child documents allow for easy editing of introductions/methods/discussions, without needing to open the main source document which is overwhelming and complicated Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 32 / 35
  • 33. Future Work Future Work Currently this isn’t a fully automated process Talk of linking these reports to a website Extra parameterisation of the document - perhaps a structure change based on the individual generating the report (e.g. management, research etc) Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 33 / 35
  • 34. Appendix For Further Reading References I The Ocean Agency. Global Reef Record. 2017. url: http://globalreefrecord.org/data. JJ Allaire et al. rmarkdown: Dynamic Documents for R. R package version 1.6. 2017. url: https://CRAN.R-project.org/package=rmarkdown. O. Beijbom et al. “Towards Automated Annotation of Benthic Survey Images: Variability of Human Experts and Operational Modes of Automation”. In: (2015). Joe Cheng, Bhaskar Karambelkar, and Yihui Xie. leaflet: Create Interactive Web Maps with the JavaScript ’Leaflet’ Library. R package version 1.1.0. 2017. url: https://CRAN.R-project.org/package=leaflet. Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 34 / 35
  • 35. Appendix For Further Reading References II Jeroen Ooms et al. RMySQL: Database Interface and ’MySQL’ Driver for R. R package version 0.10.13. 2017. url: https://CRAN.R-project.org/package=RMySQL. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2013. url: http://www.R-project.org/. Manuel Gonzlez - Rivero et al. “Scaling up ecological measurements of coral reefs using semi-automated field image collection and analysis”. In: Remote Sensing 8 (2016). url: http://www.mdpi.com/2072-4292/8/1/30. Hadley Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009. isbn: 978-0-387-98140-6. url: http://ggplot2.org. Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 35 / 35