Slide show for the webinar on "Spatial Data Science with R" organized for the GeoDevelopers.org community. The video of the webinar and all the related materials including source code and sample data can be downloaded from this link: http://amsantac.co/blog/en/2016/08/07/spatial-data-science-r.html
In this webinar I talked about Data Science in the context of its application to spatial data and explained how we can use the R language for the analysis of geographic information within the different stages of a data science workflow, from the import and processing of spatial data to visualization and publication of results.
We show how deep learning can be effectively applied to remote sensing. Many problems we faced, solutions we have had discovered were highlighted too. Remotely sensed data, unlike other vision tasks are very challenging and posses extra difficulties. Objects are very small compared to the image size, and even small pixel sizes of 8*10 pixel can contain huge amount of informations.
To the best of our knowledge there is no automated or simi-automated tool that uses deep learning to detect features from satellite imagery.
Today very high resolution DEM from satellite image data with resolution of about one meter allows to depict very detailed surface changes.
High resolution DEM increase accurate satellite image geometry and adding DGPS ground control points increases x.y.z accuracy.
Wrong positioning of objects or bad parameters calculation often result in bad image geometry.
From along track stereo pairs of VHR satellite optical data it’s possible to generate an automatic DEM.
Applications :
Ortho-rectification of satellite images, 3D display.
Creation of accurate topographic reference, relief maps.
Topographic profiles and contour generation.
Surface analysis.
Calculations of slope, orientation and shading.
Calculations of volume and elevation
Extraction of terrain and morphometric parameters.
Geomorphology and structural analysis.
Geological quantifications (dips, lithological thicknesses, faults and folds of geometry, etc.).
3D Reference map of resources extraction zones (quarries, open-pits).
Calculation of hydrographic networks and watershed basin.
Determination of hypsometric curves, knickpoints, etc.
Characterization of eroded areas.
Floods simulation, risks evaluation.
Volume calculation for restraints of dams.
We show how deep learning can be effectively applied to remote sensing. Many problems we faced, solutions we have had discovered were highlighted too. Remotely sensed data, unlike other vision tasks are very challenging and posses extra difficulties. Objects are very small compared to the image size, and even small pixel sizes of 8*10 pixel can contain huge amount of informations.
To the best of our knowledge there is no automated or simi-automated tool that uses deep learning to detect features from satellite imagery.
Today very high resolution DEM from satellite image data with resolution of about one meter allows to depict very detailed surface changes.
High resolution DEM increase accurate satellite image geometry and adding DGPS ground control points increases x.y.z accuracy.
Wrong positioning of objects or bad parameters calculation often result in bad image geometry.
From along track stereo pairs of VHR satellite optical data it’s possible to generate an automatic DEM.
Applications :
Ortho-rectification of satellite images, 3D display.
Creation of accurate topographic reference, relief maps.
Topographic profiles and contour generation.
Surface analysis.
Calculations of slope, orientation and shading.
Calculations of volume and elevation
Extraction of terrain and morphometric parameters.
Geomorphology and structural analysis.
Geological quantifications (dips, lithological thicknesses, faults and folds of geometry, etc.).
3D Reference map of resources extraction zones (quarries, open-pits).
Calculation of hydrographic networks and watershed basin.
Determination of hypsometric curves, knickpoints, etc.
Characterization of eroded areas.
Floods simulation, risks evaluation.
Volume calculation for restraints of dams.
In this webinar in partnership with Focal Point, you learn about the future of Indoor Mapping & the role that next-generation GPS technology can play in a range of indoor use cases. You can watch the recorded webinar at: https://go.carto.com/webinars/focalpoint-indoor-mapping
A spatial database, or geodatabase is a database that is optimized to store and query data
that represents objects defined in a geometric space. Most spatial databases allow representing simple geometric objects such as points, lines and polygons.
Role of GIS in Health Care Management by Dr. Dipti MukherjiPriyanka_vshukla
Presentation on Role of GIS in Health Care Management by Dr. Dipti Mukherji during Seminar on Spatial Dimensions on Health Care - Use of GIS in Health Studies Organised by CEHAT and University of Mumbai on 24th Sep 2010
The following presentation was delivered by Robert Morrison, Principal Consultant at Esri Ireland, at the 2019 NICS ICT Conference in October 2019.
The presentation focuses on taking a geographic approach to machine learning to help you "see what other's can't".
Imagery and remotely sensed data is a valuable resource for many organisations who have made substantial investment obtaining the data. The field of Machine Learning is both broad and deep and is constantly evolving. Using ArcGIS and Machine Learning allows organisations to derive valuable new content.
ArcGIS is an open, interoperable platform that allows for the integration of complementary methods and techniques that empower ArcGIS users to solve complex, real-world problems in a fundamentally spatial way.
Learn how by combining powerful built-in Image analysis tools with any machine learning package users can benefit from the spatial validation, geo-enrichment and visualisation. See how this Machine Learning is being applied in real world use-cases from marine farming and crime analysis to agriculture and sustainability.
hyperspectral remote sensing and its geological applicationsabhijeet_banerjee
this is an introductory presentation on hyperspectral remote sensing, which essential deals with the distinguishing features, imaging spectrometers and its types, and some of the geological applications of hyperspectral remote sensing.
Spatial data is comprised of objects in multi-dimensional space.
Storing spatial data in a standard database would require excessive amounts of space.Queries to retrieve and analyze spatial data from a standard database would be long and cumbersome leaving a lot of room for error.
Spatial databases provide much more efficient storage, retrieval, and analysis of spatial data.
TYBSC IT PGIS Unit I Chapter I- Introduction to Geographic Information SystemsArti Parab Academics
A Gentle Introduction to GIS The nature of GIS: Some fundamental observations, Defining GIS, GISystems, GIScience and GIApplications, Spatial data and Geoinformation. The real world and representations of it: Models and modelling, Maps, Databases, Spatial databases and spatial analysis
7 Reasons Why CPG Marketers Are Turning To Location AnalyticsCARTO
In this webinar, you will discover 7 reasons why CPG Marketers & Analysts are evolving their analytics by turning to location to drive category growth.
The presentation was given by Mr. Bas Kempen, ISRIC, during the GSOC Mapping Global Training hosted by ISRIC - World Soil Information, 6 - 23 June 2017, Wageningen (The Netherlands).
Big Data and Geospatial with HPCC SystemsHPCC Systems
This presentation covers one topic that we have mastered after several years : Geospatial.
We will reveal how we deal with very specific spatial challenges in our day to day use cases :
• Answer questions combining the best of BigData and geospatial analysis.
• Ingestion and use of raster and vector data with our Massive Parallel Processing platform (Thor).
• Store and query spatial information with sub-second queries, using our data refinery (Roxie)
And much more under the umbrella of LexisNexis HPCC Systems (High Performance Computing Cluster), an open source platform for Big Data processing and analytics.
In this webinar in partnership with Focal Point, you learn about the future of Indoor Mapping & the role that next-generation GPS technology can play in a range of indoor use cases. You can watch the recorded webinar at: https://go.carto.com/webinars/focalpoint-indoor-mapping
A spatial database, or geodatabase is a database that is optimized to store and query data
that represents objects defined in a geometric space. Most spatial databases allow representing simple geometric objects such as points, lines and polygons.
Role of GIS in Health Care Management by Dr. Dipti MukherjiPriyanka_vshukla
Presentation on Role of GIS in Health Care Management by Dr. Dipti Mukherji during Seminar on Spatial Dimensions on Health Care - Use of GIS in Health Studies Organised by CEHAT and University of Mumbai on 24th Sep 2010
The following presentation was delivered by Robert Morrison, Principal Consultant at Esri Ireland, at the 2019 NICS ICT Conference in October 2019.
The presentation focuses on taking a geographic approach to machine learning to help you "see what other's can't".
Imagery and remotely sensed data is a valuable resource for many organisations who have made substantial investment obtaining the data. The field of Machine Learning is both broad and deep and is constantly evolving. Using ArcGIS and Machine Learning allows organisations to derive valuable new content.
ArcGIS is an open, interoperable platform that allows for the integration of complementary methods and techniques that empower ArcGIS users to solve complex, real-world problems in a fundamentally spatial way.
Learn how by combining powerful built-in Image analysis tools with any machine learning package users can benefit from the spatial validation, geo-enrichment and visualisation. See how this Machine Learning is being applied in real world use-cases from marine farming and crime analysis to agriculture and sustainability.
hyperspectral remote sensing and its geological applicationsabhijeet_banerjee
this is an introductory presentation on hyperspectral remote sensing, which essential deals with the distinguishing features, imaging spectrometers and its types, and some of the geological applications of hyperspectral remote sensing.
Spatial data is comprised of objects in multi-dimensional space.
Storing spatial data in a standard database would require excessive amounts of space.Queries to retrieve and analyze spatial data from a standard database would be long and cumbersome leaving a lot of room for error.
Spatial databases provide much more efficient storage, retrieval, and analysis of spatial data.
TYBSC IT PGIS Unit I Chapter I- Introduction to Geographic Information SystemsArti Parab Academics
A Gentle Introduction to GIS The nature of GIS: Some fundamental observations, Defining GIS, GISystems, GIScience and GIApplications, Spatial data and Geoinformation. The real world and representations of it: Models and modelling, Maps, Databases, Spatial databases and spatial analysis
7 Reasons Why CPG Marketers Are Turning To Location AnalyticsCARTO
In this webinar, you will discover 7 reasons why CPG Marketers & Analysts are evolving their analytics by turning to location to drive category growth.
The presentation was given by Mr. Bas Kempen, ISRIC, during the GSOC Mapping Global Training hosted by ISRIC - World Soil Information, 6 - 23 June 2017, Wageningen (The Netherlands).
Big Data and Geospatial with HPCC SystemsHPCC Systems
This presentation covers one topic that we have mastered after several years : Geospatial.
We will reveal how we deal with very specific spatial challenges in our day to day use cases :
• Answer questions combining the best of BigData and geospatial analysis.
• Ingestion and use of raster and vector data with our Massive Parallel Processing platform (Thor).
• Store and query spatial information with sub-second queries, using our data refinery (Roxie)
And much more under the umbrella of LexisNexis HPCC Systems (High Performance Computing Cluster), an open source platform for Big Data processing and analytics.
Geospatial data appears to be simple right up until the part when it becomes intractable. There are many gotcha moments with geospatial data in spark and we will break those down in our talk. Users who are new to geospatial analysis in spark will find this portion useful as projections, geometry types, indices, and geometry storage can cause issues.
Using R to Visualize Spatial Data: R as GIS - Guy LansleyGuy Lansley
This talk demonstrates some of the benefits of using R to visualize spatial data efficiently and clearly.
It was originally presented by Guy Lansley (UCL and the Consumer Data Research Centre) to the GIS for Social Data and Crisis Mapping Workshop at the University of Kent.
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...Geoffrey Fox
Advances in high-performance/parallel computing in the 1980's and 90's was spurred by the development of quality high-performance libraries, e.g., SCALAPACK, as well as by well-established benchmarks, such as Linpack.
Similar efforts to develop libraries for high-performance data analytics are underway. In this talk we motivate that such benchmarks should be motivated by frequent patterns encountered in high-performance analytics, which we call Ogres.
Based upon earlier work, we propose that doing so will enable adequate coverage of the "Apache" bigdata stack as well as most common application requirements, whilst building upon parallel computing experience.
Given the spectrum of analytic requirements and applications, there are multiple "facets" that need to be covered, and thus we propose an initial set of benchmarks - by no means currently complete - that covers these characteristics.
We hope this will encourage debate
RasterFrames: Enabling Global-Scale Geospatial Machine LearningAstraea, Inc.
RasterFrames™, a proposed LocationTech project, brings the power of Spark SQL and Spark ML to the analysis of global-scale geospatial-temporal raster data. Employing the rich geospatial primitives of LocationTech GeoTrellis and GeoMesa, RasterFrames provides scientists, data scientists and software developers with a unified data and compute model for building image processing pipelines for ETL, data-product creation, statistical analysis, supervised & unsupervised machine learning, and deep learning. Data scientists particularly benefit from the DataFrame-centric entrypoint into big data geospatial analytics.
This talk will introduce RasterFrames, explaining the need it fulfills, the capabilities it provides, and context for determining if RasterFrames is right for the problems you're trying to solve.
By Simeon Fitch
Exploring Abandoned GIS Research to Augment Applied Geography EducationMichael DeMers
Applied geography has enjoyed a resurgence since the increased availabilty of geospatial software and the advancement of an ever-increasing sophistication of these analytical tools designed to solve complex geospatial problems. These advancements have quickly been translated into coursework at colleges and universities – often adopted wholesale into complete applied geography programs throughout academia. One unintended consequence of this adoption is that much of the conceptual content responsible for the development of these tools is not covered in the applied geography coursework. In many cases the conceptual frameworks were chosen more out of expediency rather than geographical foundations, thus leaving the applied geography student with the misconception that the fundamental geographic underpinnings upon which the software is based, are thoroughly understood and extensively tested. A direct result of this is that students in applied geography programs often employ the tools with little or no understanding of their limitations for modeling real geographic processes. I propose that one aspect of an applied geography curriculum must include the study of the underlying principles upon which the software is based, and perhaps more importantly, the study of concepts that were abandoned in the early days of tool development. While this is obvious for programs that emphasize the more theoretical aspects of geography, I argue that it is equally important for those who use the tools so they are aware of the fundamental limitations of the results derived from analysis.
Giving MongoDB a Way to Play with the GIS CommunityMongoDB
The Geographic Information System (GIS), industry is booming, especially with the continued reliance on online maps and the rise of location-aware mobile devices. GIS tech can be one of the key players in the mobile internet, big data, and the internet of things, and is an essential tool for the next generation of the global IT industry.
Yet, the GIS community is not prepared. With all the data available, GIS experts lack an off-the-shelf solutions to manage the growing volume of spatial data. Relational spatial databases (RSDB) were the leader in this field for decades, but RSDBs have failed to innovate to handle massive volumes of data coming in at high velocity.
Fortunately, MongoDB a useful tool for this challenge, but needs some tooling to create a connector to the GIS tech ecosystem. In order to bridge the gap, we built a pipeline to comply with the architecture of the Geospatial Data Abstraction Library (GDAL), so that MongoDB can work with most of popular GIS tools such as OpenLayers, Mapserver, GeoServer, QGIS, ArcGIS and others with ease. In this talk, I'll go through this pipeline tool and showcase some examples of how you can use this in your next application.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
1. 1
Intro to Spatial Data Science
with R
Alí Santacruz
amsantac.co
JULY 2016
2. 2
About me
• Expert in geomatics with a background in environmental sciences
• R geek
• PhD candidate in Geography
• Interested in Spatial Data Science
• Author of several R packages (available on CRAN)
3. 3
Purpose of this talk
• Discuss what Spatial Data Science is
• Give an introductory explanation about how to conduct Spatial Data
Science with R
4. 4
What is Spatial Data Science?
Spatial Data Scientist (n.):
Statistician GIS/RS expertGIS developer Software
engineer
Spatial Data
Scientist
Spatial Data Science
Data Science
Spatial
Person that is better in spatial data analysis than a GIS developer and
better in software engineering than a GIS/RS expert
5. 5
Spatial Data Science
All they are combined for data
analysis in order to …
Support a better decision
making
"The key word in data science is not data; it is science"
Jeff Leek. Data Science Specialization. Coursera.
7. 7
Hacking skills
• Programming languages: Python and R (and others)
http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
8. 8
Why should we use R?
• Free and open-source
• A large and comprehensive set of packages (> 8600)
• Data access
• Data cleaning
• Analysis
• Visualization and report generation
• Excellent development environments – RStudio IDE
• An active and friendly developers community
• A huge users community: > 2 million
9. 9
Why R for spatial analysis
• 160+ packages in CRAN Task View: Analysis of Spatial Data
• Classes for spatial (and spatio-temporal) data
• Spatial data import/export
• Exploratory spatial data analysis
• Support for vector and raster operations
• Spatial statistics
• Data visualization through static and dynamic (web) graphics
• Integration with GIS software
• Easy integration with techniques from non-spatial packages
10. 10
R classes for spatial data
• Before 2003:
• Several packages with different assumptions on how spatial data was structured
• From 2003:
• ‘sp’ package: extends R classes and methods for spatial data (vector and raster)
• From 2010:
• ‘raster’ package: deals with raster files stored in disk that are too large to be
loaded on memory (RAM)
11. 11
R classes for spatial data
SpatialPointsDataFrame SpatialLinesDataFrame SpatialPolygonsDataFrame
SpatialPixelsDataFrame
SpatialGridDataFrame
sp package
RasterLayer
RasterStack
RasterBrick
raster package
(recommended)
13. 13
MODELAR
los datos
MODEL
the data
EXPLORAR
los datos
EXPLORE
the data
PREPARAR
los datos
PREPARE
the data
• Is this A or B or C? :: classification
• Is this weird? :: anomaly detection
• How much/how many? :: regression
• How is it organized? :: clustering
• How will it change? :: prediction
"The key word in data science is not data; it is science"
Jeff Leek. Data Science Specialization. Coursera.
OBTENER
los datos
GET
the data
Domain expertise
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMUNICAR
los resultados
COMMUNICATE
the results
14. 14
• Import vector layers: rgdal, raster packages
• Import raster layers: raster package
• Get geocoded data from APIs: twitteR package, see example
• Download satellite images/geographic data: raster, modis, MODISTools packages
For this slide and following ones see code
and examples in in this webpage
MODELAR
los datos
MODEL
the data
EXPLORAR
los datos
EXPLORE
the data
PREPARAR
los datos
PREPARE
the data
GET
the data
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMUNICAR
los resultados
COMMUNICATE
the results
15. 15
• Data cleaning, subset, etc.
• Manipulate data with “verbs” from dplyr and other Hadley-verse packages
• Spatial subset (sp, raster packages)
• Vector operations:
• Operations on the attribute table (sp package)
• Overlay: union, intersection, clip, extract values from raster data using
points/polygons (raster, rgeos packages)
• Dissolve (sp, rgeos packages), buffer (rgeos package)
• Rasterize vector data (raster package)
• Raster operations:
• Map algebra, spatial filters, resampling, … (raster package)
• Vectorize raster data (rgdal, raster packages)
For slides 14 - 18 see code and examples
in this webpage
MODELAR
los datos
MODEL
the data
EXPLORAR
los datos
EXPLORE
the data
PREPARE
the data
OBTENER
los datos
GET
the data
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMUNICAR
los resultados
COMMUNICATE
the results
16. 16
• Descriptive statistics: central tendency and spread measures
• Exploratory graphics (2D, 3D): scatter plot, box plot, histogram, …
• Spatial autocorrelation:
• Global spatial autocorrelation statistics: Moran’s I, Geary’s C, Getis and Ord’s
G(d) (spdep package)
• Local spatial autocorrelation statistics: Moran’s Ii, Getis and Ord’s Gi y Gi*(d)
(spdep package)
MODELAR
los datos
MODEL
the data
EXPLORE
the data
PREPARAR
los datos
PREPARE
the data
OBTENER
los datos
GET
the data
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMUNICAR
los resultados
COMMUNICATE
the results
For slides 14 - 18 see code and examples
in this webpage
17. 17
• Regression:
• Spatial autoregressive models (spdep package)
• Geographically weighted regression (spgwr package)
• Classification (Machine Learning):
• Supervised: RandomForests, SVM, boosting, … (caret package)
• Non-supervised: k-means clustering (stats package)
• Spatial statistics:
• Geostatistics (gstat, geoR, geospt packages and others)
• Spatial point patterns (spatstat package)
MODEL
the data
EXPLORAR
los datos
EXPLORE
the data
PREPARAR
los datos
PREPARE
the data
OBTENER
los datos
GET
the data
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMUNICAR
los resultados
COMMUNICATE
the results
For slides 14 - 18 see code and examples
in this webpage
18. 18
• Static or interactive maps: tmap, leaflet, mapview packages
• Interactive graphics, web apps and dashboards:
• plotly (example), rcharts, googleVis (example) packages
• shiny, see example
• flexdashboard, see example
MODELAR
los datos
MODEL
the data
EXPLORAR
los datos
EXPLORE
the data
PREPARAR
los datos
PREPARE
the data
OBTENER
los datos
GET
the data
PLANTEAR la
pregunta correcta
PLANTEAR la
pregunta correcta
ASK the right
question
COMMUNICATE
the results
For slides 14 - 18 see code and examples
in this webpage
19. 19
Don’t forget: Reproducibility!
• R code and output for examples shown in this webinar (slides 17-21) can
be reproduced with this .Rmd document using RMarkdown
• See this example about reproducible spatial analysis using interactive
notebooks
• Learn more about reproducible geoscientific research
20. 20
Integrating R with GIS software
• QGIS: see example in this post
• ArcGIS: arcgisbinding package, see example in this post
• GRASS GIS: version 6, spgrass6 package; version 7, rgrass7 package
• gvSIG: more info in this post
• SAGA: RSAGA package
• GME (Geospatial Modelling Environment): more info in this webpage
21. 21
References / Online resources
• Bivand, R., Pebesma, E., Gómez-Rubio, V. 2013. Applied Spatial Data
Analysis with R. New York: Springer. 2nd ed.
• R-SIG-Geo mailing list
• CRAN Task View: Analysis of Spatial Data
• Facebook groups: GIS with R, R project en Español
• Google+ groups: Statistics and R, R Programming for Data Analysis
• My blog: amsantac.co/blog.html
22. If you have any question feel free to contact me:
amsantac.co/contact.html
Thanks!