Gazetteer creation guide
A gazetteer should be created to allow users to refer simply to locations by name in datasets. The spatial data for locations can then be automatically matched in from the shared gazetteer, rather than users needing to provide the coordinates for every single location.
The locations in a gazetteer can be any simple GIS feature:
- a simple point (for a sampling point),
- a line (for a transect), or
- a polygon (to provide plot or region boundaries).
The gazetteer is maintained by the data manager and then shared with data providers
through the safedata system. Data providers cannot edit the gazetteer file, to prevent
accidental renaming, deletion or modification of coordinate data or attributes. If data
providers sample from genuinely new points, they can either request that the data manager
adds them to the gazetteer or alternatively they can just add them to their datasets as
unique locations.
The gazetteer file must be a GeoJSON file. This file has to be
defined using the WGS84 coordinate
system, with a local
transformation defined so that distances between points can be calculated. Each GIS
feature needs to have a name attribute specified, and this is used to match location
names in datasets to the gazetteer features: location names must be unique.
Other attributes can also be populated if required, but these are not currently searchable using the metadata server. The metadata server can however provide an up to date copy of the gazetteer that scripts or applications written by the data manager can make use of.
The gazetteer can either be generated using GIS software, or can be generated programmatically. Guides for both approaches are provided below.
Creation using GIS software
This can be done using whatever GIS software you like, provided that the software can
export the final feature data as a GeoJSON file. The developers of the safedata system
use QGIS, but if you have a preferred GIS software already you
should stick with that. Sampling locations (i.e. points, lines and polygons) can all be
digitized manually, providing that the name attribute is then completed.
However, manual digitisation can be imprecise and we would recommend loading in existing
shapes files for the sampling locations, and then using the GIS software to verify that
the locations are correct and to add the relevant attributes. If shape files do not
already exist they can be generated from the .gpx files used in GPS units.
Programmatic generation
If shape files either exist already or GPS points are available (in some format) describing the locations, then the gazetteer can be generated using code ('programatically'). If you have the relevant files (or details of the points) and have at least some familiarity with coding we would strongly advise taking this approach. This is to ensure that the procedure used to create your gazetteer is reproducible, which makes it easier to track down any errors that get introduced, as well as reducing the chance of entering erroneous information to begin with.
We provide examples for R and python below as they are both highly popular
scientific programming languages. However, if you have a different preferred language
it's completely fine to use that instead.
Generation using R
In R, we recommend that you use the sf
package to define your locations, to combine
files and to export the gazetteer. If you want to load GeoJSON files into R we
recommend using either the geojsonio
package or the jsonlite package.
The example below demonstrates how to manually create points, lines and polygons
manually using coordinates, how to load in shape files, and finally how to export
everything as combined GeoJSON file.
library(sf)
# Create points, lines, and polygon features
sampling_point <- st_point(c(117.900, 4.250))
transect <- st_linestring(
cbind(c(117.900, 117.902, 117.904, 117.906), c(4.250, 4.251, 4.252, 4.253))
)
sampling_area <- st_polygon(list(cbind(
c(117.900, 117.900, 117.855, 117.855, 117.900),
c(4.250, 4.255, 4.255, 4.250, 4.250)
)))
# Combine into a single geometry collection with name attributes defined
# (coordinate reference system must be 4326 for GeoJSON)
geometry_column <- st_sfc(sampling_point, transect, sampling_area, crs = 4326)
manual_locations <- st_sf(
name = c("point_A1", "transect_B57", "plot_C324"),
geometry = geometry_column
)
# Load features from an existing shapefile
shape_file <- st_read("example_shape_file.shp")
# # Combine features from different sources
all_locations <- rbind(manual_locations, shape_file)
# Export as GeoJSON file
file <- "gazetteer.geojson"
st_write(all_locations, file, driver = "GeoJSON")
Generation using python
In python, we recommend that you use the shapely
package to define your locations, the geopandas
package to load in existing shape files and the
geojson package to combine and export the
gazetteer.
The example below demonstrates how to manually create points, lines and polygons
manually using coordinates, how to load in shape files, and finally how to export
everything as combined GeoJSON file.
"""Example script showing the basics of creating a gazetteer."""
import geojson
import geopandas as gpd
from shapely.geometry import LineString, Point, Polygon, mapping
# The easiest way to create features from coordinates is to create instances of
# Point, LineString and Polygon features using the Shapely package.
# A point sampling location with latitude = 4.250 and longitude = 117.900
sampling_point = Point(4.250, 117.900)
# A line (transect) created by specifying each sampling point along the transect
transect = LineString(
[(4.250, 117.900), (4.251, 117.902), (4.252, 117.904), (4.253, 117.906)]
)
# A polygon or plot feature is created by specifying a point for each vertex of the
# polygon - the ring of the polygon is automatically closed if the first and last
# coordinates are not identical.
sampling_area = Polygon(
[(4.250, 117.900), (4.255, 117.900), (4.255, 117.855), (4.250, 117.855)]
)
# Now convert the shapely points into GeoJSON features
feature_sampling_point = geojson.Feature(
geometry=mapping(sampling_point), properties={"name": "point_A1"}
)
feature_transect = geojson.Feature(
geometry=mapping(transect), properties={"name": "transect_B57"}
)
feature_sampling_area = geojson.Feature(
geometry=mapping(sampling_area), properties={"name": "plot_C324"}
)
# Then combine them all the manually defined points into a single feature collection
manual_points = geojson.FeatureCollection(
[feature_sampling_point, feature_transect, feature_sampling_area]
)
# We can also add features loaded from an existing shapefile
shape_file = gpd.read_file("example_shape_file.shp")
# Convert the loaded in points to a geojson feature collection
shape_file_points = geojson.loads(shape_file.to_json())
# Finally combine the points from the different sources
all_points = geojson.FeatureCollection(
manual_points["features"] + shape_file_points["features"]
)
# Then export everything as a geojson
with open("gazetteer.geojson", "w") as gazetteer_file:
geojson.dump(all_points, gazetteer_file)