In this research, we propose a MapReduce al- gorithm for creating contiguity-based spatial weights. This algorithm provides the ability to create spatial weights from very large spatial datasets efficiently by using computing re- sources that are organized in the Hadoop framework. It works in the paradigm of MapReduce: mappers are dis- tributed in computing clusters to find contiguous neighbors in parallel, then reducers collect the results and generate the weights matrix. To test the performance of this al- gorithm, we design experiment to create contiguity-based weights matrix from artificial spatial data with up to 190 million polygons using Amazon’s Hadoop framework called Elastic MapReduce. The experiment demonstrates the scal- ability of this parallel algorithm which utilizes large com- puting clusters to solve the problem of creating contiguity weights on Big data.
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsRob Emanuele
LocationPowers OGC BigGeoData 2016
This presentation will discuss tools in the open source landscape that are used to handle big geospatial data. In particular, we will focus on how Apache frameworks such as Spark and Accumulo are "geospatially enabled" by four projects: GeoTrellis, GeoWave, GeoMesa, and GeoJinni. These four projects all participate in LocationTech, a working group under the Eclipse Foundation. In particular, we will discuss how each of these LocationTech technologies implement spatial indexing (e.g. by using space filling curves) in order to provide quick access to data, and other common themes among the four projects. Attendees should walk away from this presentation understanding important parts of the Apache big data ecosystem, a set of LocationTech projects that belong to the cutting edge of enabling those Apache project's handling of geospatial data, as well as some solutions to common problems when dealing with large geospatial data.
GeoMesa presentation from LocationTech Tour - DC - November, 14th 2013. Presented by Anthony Fox (@algoriffic) of CCRi.
GeoMesa is an open source project providing spatio-temporal indexing, querying, and visualizing capabilities to Accumulo. Learn more at http://geomesa.github.io/
In this research, we propose a MapReduce al- gorithm for creating contiguity-based spatial weights. This algorithm provides the ability to create spatial weights from very large spatial datasets efficiently by using computing re- sources that are organized in the Hadoop framework. It works in the paradigm of MapReduce: mappers are dis- tributed in computing clusters to find contiguous neighbors in parallel, then reducers collect the results and generate the weights matrix. To test the performance of this al- gorithm, we design experiment to create contiguity-based weights matrix from artificial spatial data with up to 190 million polygons using Amazon’s Hadoop framework called Elastic MapReduce. The experiment demonstrates the scal- ability of this parallel algorithm which utilizes large com- puting clusters to solve the problem of creating contiguity weights on Big data.
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsRob Emanuele
LocationPowers OGC BigGeoData 2016
This presentation will discuss tools in the open source landscape that are used to handle big geospatial data. In particular, we will focus on how Apache frameworks such as Spark and Accumulo are "geospatially enabled" by four projects: GeoTrellis, GeoWave, GeoMesa, and GeoJinni. These four projects all participate in LocationTech, a working group under the Eclipse Foundation. In particular, we will discuss how each of these LocationTech technologies implement spatial indexing (e.g. by using space filling curves) in order to provide quick access to data, and other common themes among the four projects. Attendees should walk away from this presentation understanding important parts of the Apache big data ecosystem, a set of LocationTech projects that belong to the cutting edge of enabling those Apache project's handling of geospatial data, as well as some solutions to common problems when dealing with large geospatial data.
GeoMesa presentation from LocationTech Tour - DC - November, 14th 2013. Presented by Anthony Fox (@algoriffic) of CCRi.
GeoMesa is an open source project providing spatio-temporal indexing, querying, and visualizing capabilities to Accumulo. Learn more at http://geomesa.github.io/
Processing Geospatial at Scale at LocationTechRob Emanuele
These slides were for a talk at EclipseCon Europe 2015, about LocationTech projects that provide capabilities of processing geospatial data at scale. Video of the talk should be released at some point through the Eclipse Foundation.
RasterFrames: Enabling Global-Scale Geospatial Machine LearningAstraea, Inc.
RasterFrames™, a proposed LocationTech project, brings the power of Spark SQL and Spark ML to the analysis of global-scale geospatial-temporal raster data. Employing the rich geospatial primitives of LocationTech GeoTrellis and GeoMesa, RasterFrames provides scientists, data scientists and software developers with a unified data and compute model for building image processing pipelines for ETL, data-product creation, statistical analysis, supervised & unsupervised machine learning, and deep learning. Data scientists particularly benefit from the DataFrame-centric entrypoint into big data geospatial analytics.
This talk will introduce RasterFrames, explaining the need it fulfills, the capabilities it provides, and context for determining if RasterFrames is right for the problems you're trying to solve.
By Simeon Fitch
Using Deep Learning to Derive 3D Cities from Satellite ImageryAstraea, Inc.
Detection and reconstruction of 3D buildings in urban areas has been a hot topic of research due to its many applications, including 3D population density studies, emergency planning, and building value estimation. Standard approaches to extract building footprint and measure building height rely on either aerial or space borne point cloud data, which in many areas is unavailable. In contrast, high resolution satellite imagery has become more readily available in recent years, and could provide enough information to estimate a building’s height. Recent successes of deep learning on semantic segmentation have shown that convolutional neural networks can be effective tools at extracting 2D building footprints. Using a digital surface model derived using FOSS and LiDAR data as ground truth, this study goes a step further by employing state of the art deep learning architectures such as U-net to infer both building footprints and estimated building heights in one pass from a single satellite image. This application of open deep learning frameworks can bring the benefits of 3D cities to a larger portion of the world.
ePOM - Intro to Ocean Data Science - Raster and Vector Data FormatsGiuseppe Masetti
E-learning Python for Ocean Mapping (ePOM) project.
Complementary slides to the Raster and Vector Data Formats module (part of the Introduction to Ocean Data Science training).
More details at https://www.hydroffice.org/epom
Big Data and Geospatial with HPCC SystemsHPCC Systems
This presentation covers one topic that we have mastered after several years : Geospatial.
We will reveal how we deal with very specific spatial challenges in our day to day use cases :
• Answer questions combining the best of BigData and geospatial analysis.
• Ingestion and use of raster and vector data with our Massive Parallel Processing platform (Thor).
• Store and query spatial information with sub-second queries, using our data refinery (Roxie)
And much more under the umbrella of LexisNexis HPCC Systems (High Performance Computing Cluster), an open source platform for Big Data processing and analytics.
Building maps for apps in the cloud - a Softlayer Use CaseTiman Rebel
Together with Softlayer Snowciety gave a presentation at GOTO Amsterdam 2013 about building custom maps using OpenStreetMap and SRTM data, POstgis/PostgreSQL as a datsbase, Mapnik as a renderer, Tilestance and Apache as the http servers and Leaflet as the javascript client.
Presentation at workshop: Reducing the costs of GHG estimates in agriculture to inform low emissions development
November 10-12, 2014
Sponsored by the CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS) and the Food and Agriculture Organization of the United Nations (FAO)
In this paper we propose Regularised Cross-Modal Hashing
(RCMH) a new cross-modal hashing model that projects
annotation and visual feature descriptors into a common
Hamming space. RCMH optimises the hashcode similarity
of related data-points in the annotation modality using an
iterative three-step hashing algorithm: in the first step each
training image is assigned a K-bit hashcode based on hyperplanes learnt at the previous iteration; in the second step the binary bits are smoothed by a formulation of graph regularisation so that similar data-points have similar bits; in the third step a set of binary classifiers are trained to predict the regularised bits with maximum margin. Visual descriptors are projected into the annotation Hamming space by a set of binary classifiers learnt using the bits of the corresponding annotations as labels. RCMH is shown to consistently improve retrieval effectiveness over state-of-the-art baselines.
Amin tayyebi: Big Data and Land Use Change Scienceknowdiff
Ph.D.
University of California-Riverside, Center for Conservation Biology
1)Time: Tuesday, August 25, 2015, 15:30- 16:30
(1)Location: Amirkabir University of Technology, Department of Civil and Environmental Engineering
(2)Time: Wednesday, August 26, 2015, 14:00- 16:00
(2)Location: Department of Surveying Engineering, University of Tehran, N. Kargar St.
Processing Geospatial at Scale at LocationTechRob Emanuele
These slides were for a talk at EclipseCon Europe 2015, about LocationTech projects that provide capabilities of processing geospatial data at scale. Video of the talk should be released at some point through the Eclipse Foundation.
RasterFrames: Enabling Global-Scale Geospatial Machine LearningAstraea, Inc.
RasterFrames™, a proposed LocationTech project, brings the power of Spark SQL and Spark ML to the analysis of global-scale geospatial-temporal raster data. Employing the rich geospatial primitives of LocationTech GeoTrellis and GeoMesa, RasterFrames provides scientists, data scientists and software developers with a unified data and compute model for building image processing pipelines for ETL, data-product creation, statistical analysis, supervised & unsupervised machine learning, and deep learning. Data scientists particularly benefit from the DataFrame-centric entrypoint into big data geospatial analytics.
This talk will introduce RasterFrames, explaining the need it fulfills, the capabilities it provides, and context for determining if RasterFrames is right for the problems you're trying to solve.
By Simeon Fitch
Using Deep Learning to Derive 3D Cities from Satellite ImageryAstraea, Inc.
Detection and reconstruction of 3D buildings in urban areas has been a hot topic of research due to its many applications, including 3D population density studies, emergency planning, and building value estimation. Standard approaches to extract building footprint and measure building height rely on either aerial or space borne point cloud data, which in many areas is unavailable. In contrast, high resolution satellite imagery has become more readily available in recent years, and could provide enough information to estimate a building’s height. Recent successes of deep learning on semantic segmentation have shown that convolutional neural networks can be effective tools at extracting 2D building footprints. Using a digital surface model derived using FOSS and LiDAR data as ground truth, this study goes a step further by employing state of the art deep learning architectures such as U-net to infer both building footprints and estimated building heights in one pass from a single satellite image. This application of open deep learning frameworks can bring the benefits of 3D cities to a larger portion of the world.
ePOM - Intro to Ocean Data Science - Raster and Vector Data FormatsGiuseppe Masetti
E-learning Python for Ocean Mapping (ePOM) project.
Complementary slides to the Raster and Vector Data Formats module (part of the Introduction to Ocean Data Science training).
More details at https://www.hydroffice.org/epom
Big Data and Geospatial with HPCC SystemsHPCC Systems
This presentation covers one topic that we have mastered after several years : Geospatial.
We will reveal how we deal with very specific spatial challenges in our day to day use cases :
• Answer questions combining the best of BigData and geospatial analysis.
• Ingestion and use of raster and vector data with our Massive Parallel Processing platform (Thor).
• Store and query spatial information with sub-second queries, using our data refinery (Roxie)
And much more under the umbrella of LexisNexis HPCC Systems (High Performance Computing Cluster), an open source platform for Big Data processing and analytics.
Building maps for apps in the cloud - a Softlayer Use CaseTiman Rebel
Together with Softlayer Snowciety gave a presentation at GOTO Amsterdam 2013 about building custom maps using OpenStreetMap and SRTM data, POstgis/PostgreSQL as a datsbase, Mapnik as a renderer, Tilestance and Apache as the http servers and Leaflet as the javascript client.
Presentation at workshop: Reducing the costs of GHG estimates in agriculture to inform low emissions development
November 10-12, 2014
Sponsored by the CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS) and the Food and Agriculture Organization of the United Nations (FAO)
In this paper we propose Regularised Cross-Modal Hashing
(RCMH) a new cross-modal hashing model that projects
annotation and visual feature descriptors into a common
Hamming space. RCMH optimises the hashcode similarity
of related data-points in the annotation modality using an
iterative three-step hashing algorithm: in the first step each
training image is assigned a K-bit hashcode based on hyperplanes learnt at the previous iteration; in the second step the binary bits are smoothed by a formulation of graph regularisation so that similar data-points have similar bits; in the third step a set of binary classifiers are trained to predict the regularised bits with maximum margin. Visual descriptors are projected into the annotation Hamming space by a set of binary classifiers learnt using the bits of the corresponding annotations as labels. RCMH is shown to consistently improve retrieval effectiveness over state-of-the-art baselines.
Amin tayyebi: Big Data and Land Use Change Scienceknowdiff
Ph.D.
University of California-Riverside, Center for Conservation Biology
1)Time: Tuesday, August 25, 2015, 15:30- 16:30
(1)Location: Amirkabir University of Technology, Department of Civil and Environmental Engineering
(2)Time: Wednesday, August 26, 2015, 14:00- 16:00
(2)Location: Department of Surveying Engineering, University of Tehran, N. Kargar St.
I am lokesh kanna from Anna University Regional Campus Tirunelveli make use of the resource you got
Machine learning is the ability of machine to understand languages to machine it is a low level language that is used by humans to give command
A variety of algorithms may be applied depending on the nature of the earth science exploration. Some algorithms may perform significantly better than others for particular objectives. For example, convolutional neural networks (CNN) are good at interpreting images, artificial neural networks (ANN) perform well in soil classification[4] but more computationally expensive to train than support-vector machine (SVM) learning. The application of machine learning has been popular in recent decades, as the development of other technologies such as unmanned aerial vehicles (UAVs),[5] ultra-high resolution remote sensing technology and high-performance computing units[6] lead to the availability of large high-quality datasets and more advanced algorithms.
The basic intention of this presentation is to help the beginners in GIS to understand what GIS is? It is a simple presentation about GIS, i mean an introductory one. Hope anyone finds it useful.
This content shows geospatial data sources for Japan and global data, coordinate reference system, and create a map of population density (Vector analysis: dissolve vector, join table, calculate area and population density.
Introduction to GIS - Basic spatial concepts - Coordinate Systems - GIS and Information Systems – Definitions – History of GIS - Components of a GIS – Hardware, Software, Data, People, Methods – Proprietary and open source Software - Types of data – Spatial, Attribute data- types of attributes – scales/ levels of measurements.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
4. Geospatial Data
o Geographical: related to the Earth’s surface
o Spatial: about space (locations, distances etc)
o Data: yep, this is also data
o Usually handled by Geographical Information Systems (GIS) tools
o …many of which are written in Python…
28. The GDAL toolset
Example terminal-line tools:
• OGR2OGR: convert between vector formats
• GDALwarp: cookie-cut raster files
• gdal_polygonize: convert raster to vector
Python libraries:
• gdal, ogr2ogr etc
• fiona (“pythonic GDAL”)
29. Reading shapefiles: the Fiona library
from fiona import collection
with collection('example_data/TZwards/TZwards.shp', 'r') as input:
for f in input:
print(f)
30. Convert vector file formats: OGR2OGR
From the terminal window:
ogr2ogr f GeoJSON where "ADM0_A3 = 'YEM'" outfile.json
ne_10m_admin_1_states_provinces.shp
38. Vector data calculations
• Point location (e.g. lat/long from address)
• Area and area overlap sizes (e.g. overlap between village and protected area)
• Belonging (e.g. finding which district a lat/long is in)
• Straight-line distance between points (e.g. great circles)
• Practical distance and time between points (e.g. using roads)
39. Geopy: get lat/ longs from addresses
from geopy.geocoders import Nominatim
from geopy.geocoders import GoogleV3
geolocator = Nominatim()
googlocator = GoogleV3()
address = '1600 Pennsylvania Ave NW, Washington, DC'
result = geolocator.geocode(address, timeout=10)
if result is None:
result = googlocator.geocode(address, timeout=10)
if result is None:
latlon = (0.0, 0.0)
else:
latlon = (float(result.latitude), float(result.longitude))
Note that spatial data doesn’t have to be geographical: it could be at much smaller (e.g. building) or larger (e.g. universe) level.
There’s an estimate floating around GIS circles that at least 80% of data has a geospatial component.
Printed maps are images depicting an area, that are often carefully created by hand (yes, there are people who spend hours working how to get those labels in just the right places).
Road maps are a great example of the art of making map data easily readable by humans. Note, for instance, how the label “Tioga State Forest” has been carefully placed around the roads in the bottom left corner of this map.
The University of Texas has a great raster map collection online, e.g. the Libya image is at http://www.lib.utexas.edu/maps/libya.html
Road map is from http://www.austinques.com/znz-9683-poxq.htm
For many places, the only detailed maps available are still printed maps. Many of the older maps were created by surveyors triangulating (as in forming triangles out of survey points and using the angles between them to estimate new distances) their way across countries, marking heights and features of interest. More recently, maps have been created using satellite-based positions, either from georeferenced satellite images (e.g. the satellite images have been stretched to match the lat-longs on an area, then roads etc have been traced onto the map) or by geographers using hand-held GPS units to mark the locations of places and features of interest.
Left: An example of an aerial map. This image is from https://www.flickr.com/photos/jeffreywarren/4774701213/ - Jeff is part of PublicLab, a group that amongst other great citizen science projects, creates aerial maps using balloons, kites, ordinary cameras and rubber bands. You’re probably already aware of the satellite images in e.g. Google Maps, Bing maps etc. If you look closely, the PublicLab balloon maps are included in GoogleMaps etc too.
Right: NASA satellite data for Tanzania, showing trees. NASA openly releases a *lot* of satellite data for the earth, at many different wavelengths, only one of which is visible light. The tree data is from a satellite that measures differences in height at the earth’s surface using radar; this was originally intended to measure polar ice depths, but turned out to be useful for measuring tree heights too. Other satellite data includes infrared, other radars (including wavelengths tuned to water frequencies), visible light, at intervals ranging from hourly. More at https://earthdata.nasa.gov
Vector maps represent the world as a series of labelled points (e.g. the labelled subway stations in this map), lines (e.g. the roads in this map), and polygons (e.g. the building outlines in this map). Vector maps are easier to update and use than printed maps; most of the major vector maps (e.g. OpenStreetMap, GoogleMaps) also have APIs that you can use to find map features, feature locations and details.
Some maps are community-based. For example, anyone can add or edit features on OpenStreetMap.
Here, I’m using the ID editor to add buildings to an OpenStreetMap, as part of Humanitarian OpenStreetMap task http://tasks.hotosm.org/project/1686#task/48. HOT OSM puts out requests for editing (and verifying) help like this on the HOT tasking manager, http://tasks.hotosm.org.
Some geospatial data images don’t map their features to the underlying lat/longs. The classic geospatial schematic is the London Underground map, shown here, which was created (using an electronics diagram as a template) to be easy to use by people looking for where they should change between underground lines to get between two points. Plotting the London Underground map so it’s realistic about lat/longs produces something a lot messier and harder for most people to read.
You can add your own datapoints to both maps and schematics. Here’s the New York subway map, with the nearest good coffee shop (subjective!) to each subway station.
Image: http://www.businessinsider.com/nyc-best-coffee-shops-by-subway-stop-2014-2
Maps, like datasets, are intensely political, and when you look at a map, you need to ask similar questions to when you look at a dataset: who created this, what did they create it for, what were their biases, what was important to them. One of the things you need to be aware of is map projections: “projection” because you’re trying to represent all or part of an oblate spheroid (squished ball) as a flat image.
Here are two maps of the world, from http://geoawesomeness.com/map-distortions/
The one on the left uses the Mercator projection: designed for sailors, so when they wanted to sail between two points on the world, they could read the angle (direction) they needed to sail in from the map. This is great for sailors, but terrible for geopolitics: it shows the USA as much larger than Africa, South America and Australia, all of which, in reality, are huge.
The map on the right uses the Gall-Peters projection: this would be terrible for sailors, but does a much better job at showing the relative area size of each continent. Note incidentally that both maps are north-up and centred on the UK, which splits the Pacific Ocean in two.
Projections also matter on a smaller scale. When you’re dealing with map data, you also need to know the coordinate system it’s in. Maps, and the coordinate systems they’re in, are created by fitting a flat plane to the world in one of 3 ways (it helps at this point to think about how you’d wrap a ball using a sheet of paper): a tangent plane that touches the world at a specific location on earth, a cylinder wrapped around the world, or a cone wrapped around part of the world, with its point at a specific location on earth. Each of these will produce different lat/long numbers for the same point on the earth’s surface – which means that, in most GPS coordinate systems, the Greenwich Meridian (the 0 line from north pole to south pole) is about 100m to the West. To make this even more complicated, parts of the Earth are also moving (blame the tectonic plates…).
QGIS can convert between coordinate systems, if you’re combining GIS data from different ones.
Image from http://www.movable-type.co.uk/scripts/latlong-convert-coords.html
More: https://en.wikipedia.org/wiki/Geographic_coordinate_system
And we’re done with map views… except we’re not, because Tableau only contains country outlines and USA outline data. If we want to cover other places in the world, we’re going to need another tool.
QGIS is a map data viewer, but it’s Python-based, and also has GIS data tools hidden inside it.
File used here is TZwards.shp
This uses data file Tzwards.shp
Right-click on the name of the layer (e.g. TzWards), then select “open attribute table”
Here’s your dataset. You can also change what’s displayed using the “Properties” option, and save this layer to a different data format (e.g. CSV).
File is TreeCover_250m_MODIS_Tanzania_2010.tif
Shapefiles are very common vector map formats. KMZ files are zipped versions of KML files. OpenStreetMap uses GPX as its internal data representation. Geojson and Topojson are commonly used in D3 GIS code.
For most tools, you’ll only need the .shp and .shx file. For some tools, you need the .shp, .shx and .dbf.
More about all those files: https://en.wikipedia.org/wiki/Shapefile
List of GDAL tools at http://www.gdal.org/gdal_utilities.html. Get these from http://www.gdal.org, unless you’re a mac user, in which case try http://www.kyngchaos.com/software/archive#gdal
Fiona needs both the .shp and .shx file. You’ll recognise the output as a dictionary.
Intro to Fiona and Shapely: http://www.macwright.org/2012/10/31/gis-with-python-shapely-fiona.html
OGR2OGR is part of the GDAL toolset. This code filters and converts a shapefile to geojson. Ne_10m_admin_1_states_provinces.shp is a shapefile from the NaturalEarth.com shapefile collection: this file has all the provinces (states) in all the countries in the world in it.
You can also do this in Python, e.g.
import ogr2ogr
ogr2ogr.main([‘’,'f','GeoJSON','test2.json',‘TZwards.shp'])”
The Ogr2ogr formats list is: http://www.gdal.org/ogr_formats.html
Most of the time, you’ll see tif and jpg images. If you’re handling satellite data, NITF and HTF5 are common. More at https://www.bluemarblegeo.com/products/global-mapper-formats-raster.php
Most raster data files have “bands” in them: often 1 band (e.g. those tree heights), but sometimes 3 (red/green/blue) or more (e.g. monthly rainfall numbers).
If you zoom into a Geotiff file, you’ll see something like this. Lots of pixels, with a greyscale (usually between 0 and 127) value for each pixel. This is just like an image file – and you can using image processing code on geotiff files too.
You need:
A shapefile with the outline in it: both the .shp and .shx files
a Geotiff (or other) file that needs cookiecutting, IN THE SAME COORDINATE SYSTEM as the shapefile.
The result is in file yourresult.tif.
The GDAL library has cookiecutters as part of its Gdalwarp library.
Shapefile restriction applies to other tech too, e.g. Arcview won’t display a shape if you don’t give it .shp, .shx and .dbf files.
where yourshapefile.shp is the shapefile you’re using as an outline, and yourgeotiff.tif is the file that you want to cookiecut.
You can also cookie-cut with QGIS: see http://www.qgistutorials.com/en/docs/raster_mosaicing_and_clipping.html
You can also use postcodes, e.g.
Country = ‘US’
Postcode = ‘20500’
result = googlocator.geocode(‘’, components={'postal_code': postcode, 'country': country}, timeout=10)
Great circle distances: map straight lines aren’t shortest paths
Nautical miles aren’t land miles – and aren’t constant lengths
This is an extract from a larger file: it won’t run on its own (you need the station lat/longs, in stationlat and stationlon).
You can also do choropleths in Python: http://matplotlib.org/basemap/api/basemap_api.html#mpl_toolkits.basemap.Basemap.contour