This document discusses automated summarization of large datasets from the Catlin Seaview Survey, a global coral reef monitoring effort. It collects images of reefs automatically during surveys and annotates them using machine learning. The author aims to efficiently summarize the big data using dynamic RMarkdown reports connected to a MySQL database. This will allow non-experts to explore trends in the data through interactive visualizations and maps.
2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goodse-ROSA
This document discusses using earth observation data and global public goods to help smallholder farming. It describes smallholder farming as the world's largest industry but also the most poorly quantified. It outlines the STARS project which experiments with very high spatial resolution satellite images in smallholder contexts. It discusses developing global public goods like an open crop spectral signature library and image analysis algorithm repository. It also proposes integrating these resources into the CGIAR system to better support smallholder farmers worldwide.
Identifying Land Patterns from Satellite Images using Deep LearningSoumyadeep Debnath
▫️ Research Domain :
Machine Learning (ML), Deep Learning (DL) and Convolutional Neural Network (CNN).
▫️ Conference Details :
International Conference on the Networked Digital Earth (ICNDE 2018) at Indian Institute of Technology Kharagpur (IITkgp), India during March 7 - 9, 2018.
https://cse.iitkgp.ac.in/conf/NSDE/sds/ICNDE2018/
▫️ Presentation Details :
Presented the conference poster at ICNDE 2018 in front of Prof. Ravi Sundaram [Northeastern University, Boston, USA], Organizing Chair and Dr. Anil Vullikanti [Virginia Tech, USA], Invited Chair.
We use the Georeferenced results of the 2010 Census in Mexico to train machine learning algorithms to detect growth in cities and contribute new information to estimate the total population.
Sharing the experience and results of using georeferenced 2010 Census data in Mexico and EO to train algorithms in order to detect urban growth and generate useful information for estimating population for non-census years.
How to set achievable tree canopy goalsJosh Behounek
Does your community have a canopy goal that sounds good but might not be achievable? Are you thinking about setting a goal but are concerned you don't know what it should be? This presentation will demonstrate to attendees how to set an achievable urban tree canopy (UTC) goal that considers parameters like Land Use, Development, Climate, Ordinances, and Possible Planting Spaces. Example of case studies will be shown to demonstrate methodologies for setting achievable canopy goals as well as systems to check on progress of achieving your goals.
AAPG GTW 2017: Deep Water and Shelf ReservoirsDustin Dewett
The document discusses multispectral fault enhancement techniques for seismic interpretation. It provides a brief history of using spectral decomposition to better identify faults. It then outlines a typical spectral similarity workflow involving filtering, spectral decomposition, attribute analysis, lineament identification, and combining results. The workflow allows faults to be more clearly defined at specific frequencies. It also discusses using machine learning and additional attributes like peak frequency for more complete geological understanding. Future areas of research are expected to integrate more attributes and comparisons between multispectral and broadband methods.
Fault Enhancement Using Spectrally Based Seismic Attributes -- Dewett and Hen...Dustin Dewett
Fault interpretation in seismic data is a critical task that must be completed to thoroughly understand the structural history of the subsurface. The development of similarity-based attributes has allowed geoscientists to effectively filter a seismic data set to highlight discontinuities that are often associated with fault systems. Furthermore, there are numerous workflows that provide, to varying degrees, the ability to enhance this seismic attribute family. We have developed a new method, spectral similarity, to improve the similarity enhancement by integrating spectral decomposition, swarm intelligence, magnitude filtering, and orientated smoothing. In addition, the spectral similarity method has the ability to take any seismic attribute (e.g., similarity, curvature, total energy, coherent energy gradient, reflector rotation, etc.), combine it with the benefits of spectral decomposition, and create an accurate enhancement to similarity attributes. The final result is an increase in the quality of the similarity enhancement over previously used methods, and it can be computed entirely in commercial software packages. Specifically, the spectral similarity method provides a more realistic fault dip, reduction of noise, and removal of the discontinuous “stair-step” pattern common to similarity volumes.
2nd e-ROSA Stakeholder Workshop: By, EO Based Global Public goodse-ROSA
This document discusses using earth observation data and global public goods to help smallholder farming. It describes smallholder farming as the world's largest industry but also the most poorly quantified. It outlines the STARS project which experiments with very high spatial resolution satellite images in smallholder contexts. It discusses developing global public goods like an open crop spectral signature library and image analysis algorithm repository. It also proposes integrating these resources into the CGIAR system to better support smallholder farmers worldwide.
Identifying Land Patterns from Satellite Images using Deep LearningSoumyadeep Debnath
▫️ Research Domain :
Machine Learning (ML), Deep Learning (DL) and Convolutional Neural Network (CNN).
▫️ Conference Details :
International Conference on the Networked Digital Earth (ICNDE 2018) at Indian Institute of Technology Kharagpur (IITkgp), India during March 7 - 9, 2018.
https://cse.iitkgp.ac.in/conf/NSDE/sds/ICNDE2018/
▫️ Presentation Details :
Presented the conference poster at ICNDE 2018 in front of Prof. Ravi Sundaram [Northeastern University, Boston, USA], Organizing Chair and Dr. Anil Vullikanti [Virginia Tech, USA], Invited Chair.
We use the Georeferenced results of the 2010 Census in Mexico to train machine learning algorithms to detect growth in cities and contribute new information to estimate the total population.
Sharing the experience and results of using georeferenced 2010 Census data in Mexico and EO to train algorithms in order to detect urban growth and generate useful information for estimating population for non-census years.
How to set achievable tree canopy goalsJosh Behounek
Does your community have a canopy goal that sounds good but might not be achievable? Are you thinking about setting a goal but are concerned you don't know what it should be? This presentation will demonstrate to attendees how to set an achievable urban tree canopy (UTC) goal that considers parameters like Land Use, Development, Climate, Ordinances, and Possible Planting Spaces. Example of case studies will be shown to demonstrate methodologies for setting achievable canopy goals as well as systems to check on progress of achieving your goals.
AAPG GTW 2017: Deep Water and Shelf ReservoirsDustin Dewett
The document discusses multispectral fault enhancement techniques for seismic interpretation. It provides a brief history of using spectral decomposition to better identify faults. It then outlines a typical spectral similarity workflow involving filtering, spectral decomposition, attribute analysis, lineament identification, and combining results. The workflow allows faults to be more clearly defined at specific frequencies. It also discusses using machine learning and additional attributes like peak frequency for more complete geological understanding. Future areas of research are expected to integrate more attributes and comparisons between multispectral and broadband methods.
Fault Enhancement Using Spectrally Based Seismic Attributes -- Dewett and Hen...Dustin Dewett
Fault interpretation in seismic data is a critical task that must be completed to thoroughly understand the structural history of the subsurface. The development of similarity-based attributes has allowed geoscientists to effectively filter a seismic data set to highlight discontinuities that are often associated with fault systems. Furthermore, there are numerous workflows that provide, to varying degrees, the ability to enhance this seismic attribute family. We have developed a new method, spectral similarity, to improve the similarity enhancement by integrating spectral decomposition, swarm intelligence, magnitude filtering, and orientated smoothing. In addition, the spectral similarity method has the ability to take any seismic attribute (e.g., similarity, curvature, total energy, coherent energy gradient, reflector rotation, etc.), combine it with the benefits of spectral decomposition, and create an accurate enhancement to similarity attributes. The final result is an increase in the quality of the similarity enhancement over previously used methods, and it can be computed entirely in commercial software packages. Specifically, the spectral similarity method provides a more realistic fault dip, reduction of noise, and removal of the discontinuous “stair-step” pattern common to similarity volumes.
This document discusses using ontologies and semantic web technologies to integrate heterogeneous agricultural data sources for estimating rice harvests in Thailand. It describes using an ontology registry and integrated ontologies to combine satellite imagery, digital elevation models, land use maps, field survey data, meteorological data, and rice growth models. The goal is to develop a data integration environment to estimate rice harvests using small, distributed agricultural databases from various sources.
1) The document presents an optimization model for designing biogas infrastructure in Wisconsin using object-oriented programming in Julia. The model considers factors like costs, emissions, and trade-offs to determine optimal placement of dairy farm waste processing facilities.
2) The model defines variables, constraints, objectives and stakeholders to generate solutions for minimizing costs and emissions. Solutions show reasonable placement of more processing facilities when stakeholders value emissions savings highly.
3) Future work will implement a more complex, stochastic multi-stakeholder formulation using the CVaR method to find a compromise solution over different stakeholders rather than a single "utopia point" solution. This will provide insights into how dissatisfactions change with the CVaR parameter.
The document describes the GEM Foundation's efforts to create a centralized database of global seismic hazard models using common data formats and open-source software. This will allow models to be more easily compared, reproduced and inspected. It will also facilitate combining models and generating new data. Currently the database includes major models from regions around the world. Quality assurance testing has revealed some differences between models when reproduced, calling for further investigation.
The document discusses tools and datasets for seismic hazard analysis from site-specific to global scales. It describes the OpenQuake engine and Hazard Modeller's Toolkit (HMTK) which can be used for classical and event-based probabilistic seismic hazard analysis (PSHA) at various scales. The OpenQuake Ground Motion Toolkit helps with selection and weighting of ground motion prediction equations. These tools are applied in site-specific analyses, and for developing national, regional, and global seismic hazard models using various data sources on earthquakes, faults, and strain.
How to empower community by using GIS lecture 1wang yaohui
The document provides an outline for a course on applying geographic information systems (GIS) to empower communities. It discusses key GIS concepts like projections, scale, coordinate systems and data formats. It aims to familiarize students with ArcGIS software and using GIS for community applications like education, environmental management and public participation. Students will learn skills like querying spatial data and integrating external data to solve problems in empowering community projects.
TexelTek - Andrew Levine - Hadoop World 2010Cloudera, Inc.
The document discusses using an open cloud consortium to process map imagery for disaster relief. It aims to make imagery available online for relief workers, enable large-scale image processing of satellite data, and provide image deltas showing changes over time. The framework uses Apache Hadoop on a testbed platform to break images into tiles via mappers and assemble them from reducers into layers for a web map service. It demonstrates change detection over time for disasters like oil spills and floods.
- Expert elicitation was used to develop fragility functions characterizing building vulnerability to earthquakes around the world. Thirteen experts evaluated vulnerability for generic building types in eight countries, and twelve US and one Canadian experts evaluated selected building types in the US.
- Cooke's method was used to score experts based on their accuracy on seed questions and assign weights to their responses on target questions. This allowed fragility curves to be developed accounting for expert uncertainties.
- The exercises generated over 50 new fragility functions for use in earthquake modeling, providing critical data where empirical models are lacking. Further research is needed to better understand the expert scoring approach.
This document describes a geospatial modeling tool developed to retrieve climate data from large climate model databases in an efficient manner. The tool integrates R programming with ArcGIS to subset and extract grid point data for specific study areas from netCDF climate model files. It was tested on CORDEX climate model data and found to accurately obtain grid points, providing a less tedious method than manual retrieval. The tool allows climate data to be efficiently obtained and prepared as model inputs.
In this project the group members will play with daily rainfall data collected in Gulf coast (535stations in total) from 1949 to 2017. The purposes of this exercise are to:
1) to give students an idea of a typical example of a climate data set (spatio-temporal data) and someassociated scientific questions (e.g. how rainfall extremes vary in space and time and how that mightbe affected by other things like greenhouse gases or temperatures).
2) to get students familiar with data analysis using R including data manipulation, data visualization, and data summary.
3) to introduce some statistical methods (e.g. time series analysis, spatial statistics, extreme value analysis) to analyze this kind of data to "answer" (perform statistical inference) the questions of interest.
Group members: Lin Ge, Jianan Jang, Jessica Robinson, Erin Song, Seth Temple, Adam Wu
The document summarizes the products and applications of GEM's Hazard program. It outlines five global datasets created through international projects including historical earthquake archives, instrumental seismicity catalogs, active fault databases, and ground motion prediction equations. It also describes regional seismic hazard models compiled in a database and the OpenQuake open-source software for calculating seismic hazard and risk. Key applications of the products include use in building codes, insurance catastrophe modeling, and site-specific engineering analyses.
OpendTect is an attribute-study software, which is used in exploration seismology. This software is developed by dGB Earth Science group.
In this presentation file, the OpendTect 5 is used to extract the fault system of F3_Demo seismic data, North sea, Netherlands.
Application packaging and systematic processing in earth observation exploita...terradue
An overview of Terradue's solutions supporting Earth Observations (EO) Exploitation Platforms across multiple domains.
Presentation done as part of the Open Geospatial Consortium (OGC) Technical Committee ad-hoc meeting for the setup of a new domain working group on EO Exploitation Platforms.
This document discusses the potential applications of artificial intelligence in space science and planetary defense. It summarizes:
1) The Frontier Development Lab (FDL) is a collaboration between NASA and AI/ML researchers to address challenges in planetary defense, space resources, and other areas. In 2016, 12 researchers worked on 3 problem areas including radar shape modeling.
2) FDL has since expanded, with 24 researchers addressing 5 challenges in 2017, including long period comets and applied AI. Future plans include 28 researchers on 7 problem areas in 2018 such as lunar route planning and solar storm warnings.
3) AI has applications for planetary defense such as identifying meteorites, modeling asteroid shapes from radar images, and selecting
Of course, you know what data is. Probably you know what Big Data and small data is. But what's the heck is that buzz about data? Why is it so important today? These are the questions which will be the topic of the session. This session will be beyond the definitions and descriptions. We will talk about data, about different options for data usage, and how we can benefit from data.
MoonDB: Restoration & Synthesis of Planetary Geochemical DataKerstin Lehnert
This presentation explains the MoonDB project that will restore and synthesize geochemical and petrological data acquired on lunar samples over more than 4 decades. The project is a collaboration between the IEDA data facility (http://www.iedadata.org) at the Lamont-Doherty Earth Observatory of Columbia University and the Astromaterials Acquisition and Curation Office (AACO) at Johnson Space Center (JSC).
Nuclear emergency response and Big Data technologiesBigData_Europe
This document discusses using big data technologies to improve nuclear emergency response. It describes how real-time systems currently use deterministic modeling to simulate radiological situations and consequences of countermeasures. Ensemble modeling is proposed to better account for input uncertainties. Existing scenarios could be further analyzed and accessed through web tools to support decision making. Case-based reasoning is presented as an approach to integrate historical information and suggest emergency strategies. An analytical platform demonstrates retrieving similar historical cases and reusing or adapting their solutions to support response to new nuclear events.
The document discusses the value of data and the rise of big data. It notes that Matthew Fontaine Maury in the 1800s recognized the value of analyzing ship log data collectively. Today, new sources of data like sensors have exploded the volume of data. Characteristics of big data include volume, variety, and velocity. Technological challenges include scalability, heterogeneity, and low latency. The document provides examples of non-relational databases and MapReduce as approaches to handle big data.
This document discusses using machine learning and decision making for sustainability. It describes three major challenges: high dimensional spaces with many variables, uncertainty with limited information requiring stochastic models, and accounting for preferences and utilities in optimization criteria. The document outlines ongoing research using machine learning for applications like poverty mapping, natural resource management, materials discovery, and modeling migratory pastoralism. The research aims to address global challenges like poverty, food security, and environmental sustainability.
Netica is a Bayesian network modeling and inference software package developed by Norsys Software Corp. It allows users to build and evaluate causal probabilistic models known as Bayesian networks.
R: R is a programming language and software environment for statistical analysis, graphics, and statistical computing. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues.
Weka: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules
This document discusses using ontologies and semantic web technologies to integrate heterogeneous agricultural data sources for estimating rice harvests in Thailand. It describes using an ontology registry and integrated ontologies to combine satellite imagery, digital elevation models, land use maps, field survey data, meteorological data, and rice growth models. The goal is to develop a data integration environment to estimate rice harvests using small, distributed agricultural databases from various sources.
1) The document presents an optimization model for designing biogas infrastructure in Wisconsin using object-oriented programming in Julia. The model considers factors like costs, emissions, and trade-offs to determine optimal placement of dairy farm waste processing facilities.
2) The model defines variables, constraints, objectives and stakeholders to generate solutions for minimizing costs and emissions. Solutions show reasonable placement of more processing facilities when stakeholders value emissions savings highly.
3) Future work will implement a more complex, stochastic multi-stakeholder formulation using the CVaR method to find a compromise solution over different stakeholders rather than a single "utopia point" solution. This will provide insights into how dissatisfactions change with the CVaR parameter.
The document describes the GEM Foundation's efforts to create a centralized database of global seismic hazard models using common data formats and open-source software. This will allow models to be more easily compared, reproduced and inspected. It will also facilitate combining models and generating new data. Currently the database includes major models from regions around the world. Quality assurance testing has revealed some differences between models when reproduced, calling for further investigation.
The document discusses tools and datasets for seismic hazard analysis from site-specific to global scales. It describes the OpenQuake engine and Hazard Modeller's Toolkit (HMTK) which can be used for classical and event-based probabilistic seismic hazard analysis (PSHA) at various scales. The OpenQuake Ground Motion Toolkit helps with selection and weighting of ground motion prediction equations. These tools are applied in site-specific analyses, and for developing national, regional, and global seismic hazard models using various data sources on earthquakes, faults, and strain.
How to empower community by using GIS lecture 1wang yaohui
The document provides an outline for a course on applying geographic information systems (GIS) to empower communities. It discusses key GIS concepts like projections, scale, coordinate systems and data formats. It aims to familiarize students with ArcGIS software and using GIS for community applications like education, environmental management and public participation. Students will learn skills like querying spatial data and integrating external data to solve problems in empowering community projects.
TexelTek - Andrew Levine - Hadoop World 2010Cloudera, Inc.
The document discusses using an open cloud consortium to process map imagery for disaster relief. It aims to make imagery available online for relief workers, enable large-scale image processing of satellite data, and provide image deltas showing changes over time. The framework uses Apache Hadoop on a testbed platform to break images into tiles via mappers and assemble them from reducers into layers for a web map service. It demonstrates change detection over time for disasters like oil spills and floods.
- Expert elicitation was used to develop fragility functions characterizing building vulnerability to earthquakes around the world. Thirteen experts evaluated vulnerability for generic building types in eight countries, and twelve US and one Canadian experts evaluated selected building types in the US.
- Cooke's method was used to score experts based on their accuracy on seed questions and assign weights to their responses on target questions. This allowed fragility curves to be developed accounting for expert uncertainties.
- The exercises generated over 50 new fragility functions for use in earthquake modeling, providing critical data where empirical models are lacking. Further research is needed to better understand the expert scoring approach.
This document describes a geospatial modeling tool developed to retrieve climate data from large climate model databases in an efficient manner. The tool integrates R programming with ArcGIS to subset and extract grid point data for specific study areas from netCDF climate model files. It was tested on CORDEX climate model data and found to accurately obtain grid points, providing a less tedious method than manual retrieval. The tool allows climate data to be efficiently obtained and prepared as model inputs.
In this project the group members will play with daily rainfall data collected in Gulf coast (535stations in total) from 1949 to 2017. The purposes of this exercise are to:
1) to give students an idea of a typical example of a climate data set (spatio-temporal data) and someassociated scientific questions (e.g. how rainfall extremes vary in space and time and how that mightbe affected by other things like greenhouse gases or temperatures).
2) to get students familiar with data analysis using R including data manipulation, data visualization, and data summary.
3) to introduce some statistical methods (e.g. time series analysis, spatial statistics, extreme value analysis) to analyze this kind of data to "answer" (perform statistical inference) the questions of interest.
Group members: Lin Ge, Jianan Jang, Jessica Robinson, Erin Song, Seth Temple, Adam Wu
The document summarizes the products and applications of GEM's Hazard program. It outlines five global datasets created through international projects including historical earthquake archives, instrumental seismicity catalogs, active fault databases, and ground motion prediction equations. It also describes regional seismic hazard models compiled in a database and the OpenQuake open-source software for calculating seismic hazard and risk. Key applications of the products include use in building codes, insurance catastrophe modeling, and site-specific engineering analyses.
OpendTect is an attribute-study software, which is used in exploration seismology. This software is developed by dGB Earth Science group.
In this presentation file, the OpendTect 5 is used to extract the fault system of F3_Demo seismic data, North sea, Netherlands.
Application packaging and systematic processing in earth observation exploita...terradue
An overview of Terradue's solutions supporting Earth Observations (EO) Exploitation Platforms across multiple domains.
Presentation done as part of the Open Geospatial Consortium (OGC) Technical Committee ad-hoc meeting for the setup of a new domain working group on EO Exploitation Platforms.
This document discusses the potential applications of artificial intelligence in space science and planetary defense. It summarizes:
1) The Frontier Development Lab (FDL) is a collaboration between NASA and AI/ML researchers to address challenges in planetary defense, space resources, and other areas. In 2016, 12 researchers worked on 3 problem areas including radar shape modeling.
2) FDL has since expanded, with 24 researchers addressing 5 challenges in 2017, including long period comets and applied AI. Future plans include 28 researchers on 7 problem areas in 2018 such as lunar route planning and solar storm warnings.
3) AI has applications for planetary defense such as identifying meteorites, modeling asteroid shapes from radar images, and selecting
Of course, you know what data is. Probably you know what Big Data and small data is. But what's the heck is that buzz about data? Why is it so important today? These are the questions which will be the topic of the session. This session will be beyond the definitions and descriptions. We will talk about data, about different options for data usage, and how we can benefit from data.
MoonDB: Restoration & Synthesis of Planetary Geochemical DataKerstin Lehnert
This presentation explains the MoonDB project that will restore and synthesize geochemical and petrological data acquired on lunar samples over more than 4 decades. The project is a collaboration between the IEDA data facility (http://www.iedadata.org) at the Lamont-Doherty Earth Observatory of Columbia University and the Astromaterials Acquisition and Curation Office (AACO) at Johnson Space Center (JSC).
Nuclear emergency response and Big Data technologiesBigData_Europe
This document discusses using big data technologies to improve nuclear emergency response. It describes how real-time systems currently use deterministic modeling to simulate radiological situations and consequences of countermeasures. Ensemble modeling is proposed to better account for input uncertainties. Existing scenarios could be further analyzed and accessed through web tools to support decision making. Case-based reasoning is presented as an approach to integrate historical information and suggest emergency strategies. An analytical platform demonstrates retrieving similar historical cases and reusing or adapting their solutions to support response to new nuclear events.
The document discusses the value of data and the rise of big data. It notes that Matthew Fontaine Maury in the 1800s recognized the value of analyzing ship log data collectively. Today, new sources of data like sensors have exploded the volume of data. Characteristics of big data include volume, variety, and velocity. Technological challenges include scalability, heterogeneity, and low latency. The document provides examples of non-relational databases and MapReduce as approaches to handle big data.
This document discusses using machine learning and decision making for sustainability. It describes three major challenges: high dimensional spaces with many variables, uncertainty with limited information requiring stochastic models, and accounting for preferences and utilities in optimization criteria. The document outlines ongoing research using machine learning for applications like poverty mapping, natural resource management, materials discovery, and modeling migratory pastoralism. The research aims to address global challenges like poverty, food security, and environmental sustainability.
Netica is a Bayesian network modeling and inference software package developed by Norsys Software Corp. It allows users to build and evaluate causal probabilistic models known as Bayesian networks.
R: R is a programming language and software environment for statistical analysis, graphics, and statistical computing. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues.
Weka: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules
IRJET- Geological Boundary Detection for Satellite Images using AI TechniqueIRJET Journal
This document summarizes a research paper that proposes a method for detecting geological boundaries in satellite images using artificial intelligence techniques. The method involves pre-processing images, generating histograms to analyze pixel values, performing 2D convolution on image planes, applying a particle swarm optimization algorithm to identify boundaries, and testing the approach on pre-flood and post-flood satellite images of Kerala, India. The results show differences in detected geological boundaries between the two images, allowing changes from flooding to be identified. The method provides a way to automatically analyze satellite imagery and extract geological boundary information.
Enviromental impact assesment for highway projectsKushal Patel
Environmental Impact Assessment (EIA) is a tool to study various impact to be occurred due to new development actions.
Transportation Project are the projects which provides ease to the movement of vehicles.
This Paper presents a case study for analysis of EIA for a transportation project. This Paper would provide a methodology which will allow transportation planers to make a cost effective coordination of environmental information and data management.
The results assess the environmental vulnerability around the road and its impact on environment by integration the merits of GIS.
Model Build ArcPy Into Your FME WorkflowsSafe Software
This presentation will delve into utilizing ArcGIS geoprocessing within FME using PythonCaller. It will show how to harness the capabilities of both tools for efficient and flexible data manipulation and conversion, using ArcPy script to call ArcGIS from within FME. Real-world examples will be provided to illustrate the benefits of this approach in areas such as raster-vector data conversion and spatial analysis.
Tips & tricks will be demonstrated for creating ArcPy geoprocessing snippets from ArcPRO, manipulating the python for appropriate use within FME Python caller, configuring environments and extension licenses, how to pass fme objects feature attributes or user parameters to be used within geoprocessing parameters, using integration transformers to read file path results and how GP result notification strings can inform the fme user of data processing progress to the transaction log.
IRJET- Land Cover Index Classification using Satellite Images with Different ...IRJET Journal
This document presents a study on land cover index classification of satellite images of the Ayeyarwaddy Delta region of Myanmar. The study uses Google Earth satellite images from 2004-2014. The images are classified into three indices: buildings, vegetation, and roads. Three image enhancement methods are applied prior to classification - V-channel enhancement, histogram equalization, and adaptive histogram equalization. K-means clustering is then used to classify the enhanced images into the three indices in CIE L*a*b* color space. The classification results of each enhancement method are evaluated and compared using mean squared error and peak signal-to-noise ratio. According to the results, V-channel enhancement provides the best classification results compared to
The document discusses three projects related to analyzing large graph datasets:
1. The CASS-MT project designs software to analyze massive interaction networks using multithreaded architectures like the Cray XMT. Algorithms include betweenness centrality and dynamic clustering coefficients.
2. The Graph500 benchmark was developed to evaluate parallel architectures for data-intensive graph computations. Reference codes were provided for OpenMP and Cray XMT.
3. The STING project develops and optimizes a dynamic graph package for Intel platforms to analyze streaming graph-structured data from sources like Facebook in real-time.
This document describes a study that uses machine learning algorithms to analyze flood data and predict flood impacts. The study collected flood data from various states in India containing information on start/end dates, duration, causes, affected districts/states, and casualties including human injuries and deaths as well as animal fatalities. Various machine learning models like decision trees, random forests, SVMs, and neural networks were trained on the data. The models' performance was evaluated based on metrics like accuracy, precision, recall, and F1-score. The results showed that some states experienced higher numbers of human/animal casualties from floods compared to others. Graphs and charts were used to analyze relationships between variables in the data and compare flood impacts like casualties and
This document summarizes a project that used NASA satellite imagery to map and monitor mangrove extent in Everglades National Park over multiple time periods. The objectives were to create a replicable methodology using Earth observations and Google Earth Engine to map changes over time. The methodology included collecting Landsat data, random sampling, image processing, classification, and accuracy assessment. The results showed changes in mangrove extent between 1995, 2005, and 2015. Future work could include more in situ data and samples to focus on ecological forecasting.
Performance Analysis of 5 MWP Grid-Connected Solar PV Power Plant Using IE...IRJET Journal
This document analyzes the performance of a 5 MW grid-connected solar PV power plant in India using data recorded over 2016. Key parameters like energy output, performance ratio, and final yield are calculated and compared to simulated results from PV Syst software. The plant's annual performance ratio was 73.02%, lower than the simulated 78.10% due mainly to a transformer failure. Monthly energy output and final yield varied with weather conditions. While most months matched simulated results within 9%, August differed by 21%. The analysis provides insights to improve plant maintenance and optimize performance.
This thesis presents research on using deep learning methods for feature extraction from satellite imagery to identify landslide pixels. The objectives are to classify land cover using machine learning algorithms like SVM and random forests in Google Earth Engine, design and evaluate a deep neural network for landslide identification, and compare performance of deep learning models in MATLAB. Results show that a neural network achieved over 98% accuracy at identifying landslide pixels. Future work proposes developing new indices for improved identification and an automatic landslide monitoring platform.
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshopICIMOD
The document provides an overview of CORDEX (Coordinated Regional Climate Downscaling Experiment) for South Asia. It discusses how regional climate models are used to provide higher resolution climate data for impact studies. It outlines the history and coordination of CORDEX, including the establishment of the Science Advisory Team. It describes the generation of regional climate projections for South Asia using multiple regional climate models driven by several global climate models. The data is archived and disseminated via an Earth System Grid Federation node in India to support regional climate change research and applications.
Climate Monitoring and Prediction using Supervised Machine LearningIRJET Journal
This document describes a study that uses supervised machine learning models to monitor and predict climate changes based on historical climate data. The methodology involves collecting climate data from various sources, preprocessing the data, and training classification and regression models. Random forest classification achieved the best accuracy of 87% for predicting precipitation type. Polynomial regression was also effective for predicting temperature variations over time. The models can help monitor climate changes and irregularities to improve preparedness and reduce impacts on sectors like agriculture. In conclusion, accurately predicting climate trends through data-driven methods and raising awareness can help balance natural weather cycles with human activities.
Using explainable machine learning to evaluate climate change projectionsZachary Labe
5 October 2023…
Atmosphere and Ocean Climate Dynamics Seminar (Presentation): Using explainable machine learning to evaluate climate change projections, Yale University, New Haven, CT. Remote Presentation.
References...
Labe, Z.M., E.A. Barnes, and J.W. Hurrell (2023). Identifying the regional emergence of climate patterns in the ARISE-SAI-1.5 simulations. Environmental Research Letters, DOI:10.1088/1748-9326/acc81a, https://iopscience.iop.org/article/10.1088/1748-9326/acc81a
Assessing the performance of random forest regression for estimating canopy h...IJECEIAES
Accurate estimation of forest canopy height is essential for monitoring forest ecosystems and assessing their carbon storage potential. This study evaluates the effectiveness of different remote sensing techniques for estimating forest canopy height in tropical dry forests. Using field data and remote sensing data from airborne lidar and polarimetric synthetic aperture radar (SAR), a random forest (RF) model was developed to estimate canopy height based on different indices. Results show that the normalize difference build-up index (NDBI) has the highest correlation with canopy height, outperforming other indices such as relative vigor index (RVI) and polarimetric vertical and horizontal variables. The RF model with NDBI as input showed a good fit and predictive ability, with low concentration of errors around 0. These findings suggest that NDBI can be a useful tool for accurately estimating forest canopy height in tropical dry forests using remote sensing techniques, providing valuable information for forest management and conservation efforts.
Similar to Automated Summarisation of Big Data, useR! 2018 (20)
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
1. IntRoduction
Automated Summarisation of Big Data
Using data from the Catlin Seaview Survey - a global coral reef
monitoring effort
Amy StringeR
1University of Queensland
UseR! July 2018
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 1 / 35
2. IntRoduction
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 2 / 35
3. IntRoduction
[Insert witty crowd banter]
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 3 / 35
4. Context
The Catlin Seaview Survey
Coral reef monitoring
program endeavouring to
develop a global baseline on
reef health and then monitor
the state of reefs through
resurvey efforts
5 regions around the world
so far: Australia, the
Caribbean, Southeast Asia,
the Indian Ocean, The
Pacific
Within these 5 major
regions, we have a total of
25 survey countries
(a) Bleaching at the
Maldives, 2016
(b) Bleaching at Heron
Island, 2016
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 4 / 35
5. Context The Catlin Data
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 5 / 35
6. Context The Catlin Data
Efficient Monitoring
Three main stages:
1 Collection of images
2 Annotation of images
3 Calculating proportions, and visualing trends between surveys
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 6 / 35
7. Context Image Collection
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 7 / 35
8. Context Image Collection
The Catlin Seaview Survey - Image Collection
High definition images
collected in 2km transects
along a reef section - taken
automatically every 3 seconds
Each image is GPS located
Speed of collection increased
from traditional 60m2 per dive
(45 min) to 2000m2 per dive Figure: A diver pushing the SVII scooter
during a survey of the Great Barrier reef.
For more on collection methodology, see
[7] c XL Catlin Seaview Survey
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 8 / 35
9. Context Image Collection
Figure: An example image from a survey of the Great Barrier Reef. Images like
this, along with the data, are available on the XL Catlin Global Reef Record [1].
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 9 / 35
10. Context Image Annotation
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 10 / 35
11. Context Image Annotation
Neural Network for Image Annotations
Previously a time consuming, manual task (potentially 3 decades of
work for the CSS images)
An automatic point-annotation method is now used based on
machine learning algorithms (See [3])
Colour and texture of images are used as descriptors for label
categories
Coverage estimates are uploading within a week of collection
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 11 / 35
12. Context Image Annotation
Figure: The same image from earlier showing the points used for annotation
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 12 / 35
13. The Need Efficient SummaRisation
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 13 / 35
14. The Need Efficient SummaRisation
The Next Stage
Fast data collection → fast annotation → bottle neck in processing
Data stored using MySQL database, allowing for easy integration with
R [6, 5]
Introducing Rmarkdown [2]
rmarkdown provides a solution for quick/consistent exploratory
analysis of the data in a report format
Visualisations! [8]
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 14 / 35
15. New Challenges Contextual Challenges
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 15 / 35
16. New Challenges Contextual Challenges
Contextual Challenges
Usability for non R users
Meaningful visualisations
They need to be useful for more than just the researchers; local
government and others in charge of marine protection need to get
some value
Comparisons to literature
Many label groups, and spatial scales in the dataset
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 16 / 35
17. New Challenges Data Challenges
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 17 / 35
18. New Challenges Data Challenges
Data Challenges - Structure
Sub-region Reef Count Transect Count Image Count
Cairns-Cooktown 13 43 87631
Coral Sea 3 32 23573
Far Northern 12 33 68367
Mackay-Capricorn 4 12 14151
Townsville-Whitsunday 4 10 5722
Total 36 130 199444
Table: A summary of the various spatial scales within just one region, the Great
Barrier Reef. This structure is consistent across all 5 regions.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 18 / 35
19. New Challenges Data Challenges
Data Challenges - Labels
Benthic labels describe the community benthic category
Global labels describe morphological categories
Each region has 5 functional groups
Hard corals, soft corals
Algae
Other invertibrates, other
Region Benthic Labels Global Labels
GBR 27 13
Indian Ocean 49 17
Caribbean 67 16
Southeast Asia 71 17
Pacific 40 12
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 19 / 35
20. Solutions Dynamic Plotting Environments
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 20 / 35
21. Solutions Dynamic Plotting Environments
Plotting Hiccups
Plots have been created at multiple spatial scales; reef scale,
subregion scale, transect scale
Plot will have varying sizes based on the number of
reefs/subregions/transects in the respective regional dataset
Differing label sets among region makes visualisation an exciting
challenge at each of these spatial scales
Near impossible for clearly identifiable colours on community benthic
level
Visualisations at this level are more overwhelming than helpful
Single survey regions need some kind of conditioning on their
temporal plot construction
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 21 / 35
22. Solutions Dynamic Plotting Environments
Dynamic Plotting
The code wrapper in the rmarkdown source script allows you to set a
variable to the figure heights and widths
Figure: Plot wrapper with an exmaple of the figure height addjustment according
to the number of plot facets. Also shows here is a boolean variable for evaluation
of the code segment.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 22 / 35
23. Solutions Dynamic Plotting Environments
Reef Scale Visualisations
Figure: An example visualisation of only 3 of the 36 GBR reefs. This plot is
created at the functional group scale. Note that the x axis is year, and the y axis
is percentage coverage over the reef in question.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 23 / 35
24. Solutions Dynamic Plotting Environments
Reef Scale Visualisations
Figure: An example of the reef scale visualisaton for the reefs only surveyed once.
Coverage here is represented in the same way as the previous plot, giving a
percentage coverage for each basic functional group.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 24 / 35
25. Solutions Dynamic Plotting Environments
Change at the Transect Scale
Extra challenges that arise
from survey design
Visualised at the global label
level
Investigate change only over
consecutive survey years
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 25 / 35
26. Solutions Interactive Maps using Leaflet
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 26 / 35
27. Solutions Interactive Maps using Leaflet
Leaflet
Using Leaflet [4] for interactive maps allows for readers to see where
exactly the surveys take place. Each transect marker represents a 2km
survey region.
Figure: Disclaimer - this image is not so interactive
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 27 / 35
28. Solutions RMySQL and Parameterised Rmarkdown
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 28 / 35
29. Solutions RMySQL and Parameterised Rmarkdown
Parameterising Rmarkdown Documents and RMySQL
RMySQL [5] allows for accessing the database through Rstudio negating
the need for an external program
(a) The header when making use of
document parameters. In future, more
parameters may be added to simplify the
source script, but in the current stages
things have been kept simple.
(b) Working example accessing the
database with the document input
parameters. The use of RMySQL allows
for connection to the database and
extraction of data within the Rmarkdown
source script.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 29 / 35
30. Solutions RMySQL and Parameterised Rmarkdown
How’s the SeRenity?
Figure: The only bit of code a user needs to deal with to generate up to 25
reports.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 30 / 35
31. Solutions Child Documents
Outline
1 IntRoduction
2 Context
The Catlin Data
Image Collection
Image Annotation
3 The Need
Efficient SummaRisation
4 New Challenges
Contextual Challenges
Data Challenges
5 Solutions
Dynamic Plotting Environments
Interactive Maps using Leaflet
RMySQL and Parameterised Rmarkdown
Child Documents
6 Future Work
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 31 / 35
32. Solutions Child Documents
Child Documents for Textual Components
Introductions and discussions will need to be different across the
regions
Using the parameters of the source we can import a specific
introduction file for each desired region
Child documents allow for easy editing of
introductions/methods/discussions, without needing to open the main
source document which is overwhelming and complicated
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 32 / 35
33. Future Work
Future Work
Currently this isn’t a fully automated process
Talk of linking these reports to a website
Extra parameterisation of the document - perhaps a structure change
based on the individual generating the report (e.g. management,
research etc)
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 33 / 35
34. Appendix For Further Reading
References I
The Ocean Agency. Global Reef Record. 2017. url:
http://globalreefrecord.org/data.
JJ Allaire et al. rmarkdown: Dynamic Documents for R. R package
version 1.6. 2017. url:
https://CRAN.R-project.org/package=rmarkdown.
O. Beijbom et al. “Towards Automated Annotation of Benthic
Survey Images: Variability of Human Experts and Operational Modes
of Automation”. In: (2015).
Joe Cheng, Bhaskar Karambelkar, and Yihui Xie. leaflet: Create
Interactive Web Maps with the JavaScript ’Leaflet’ Library. R
package version 1.1.0. 2017. url:
https://CRAN.R-project.org/package=leaflet.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 34 / 35
35. Appendix For Further Reading
References II
Jeroen Ooms et al. RMySQL: Database Interface and ’MySQL’
Driver for R. R package version 0.10.13. 2017. url:
https://CRAN.R-project.org/package=RMySQL.
R Core Team. R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing. Vienna,
Austria, 2013. url: http://www.R-project.org/.
Manuel Gonzlez - Rivero et al. “Scaling up ecological measurements
of coral reefs using semi-automated field image collection and
analysis”. In: Remote Sensing 8 (2016). url:
http://www.mdpi.com/2072-4292/8/1/30.
Hadley Wickham. ggplot2: Elegant Graphics for Data Analysis.
Springer-Verlag New York, 2009. isbn: 978-0-387-98140-6. url:
http://ggplot2.org.
Amy StringeR (Global Change Institute) Automated Summarisation of Big Data UseR! July 2018 35 / 35