Skip to content

Gazetteer creation guide

A gazetteer should be created to allow users to refer simply to locations by name in datasets. The spatial data for locations can then be automatically matched in from the shared gazetteer, rather than users needing to provide the coordinates for every single location.

The locations in a gazetteer can be any simple GIS feature:

  • a simple point (for a sampling point),
  • a line (for a transect), or
  • a polygon (to provide plot or region boundaries).

The gazetteer is maintained by the data manager and then shared with data providers through the safedata system. Data providers cannot edit the gazetteer file, to prevent accidental renaming, deletion or modification of coordinate data or attributes. If data providers sample from genuinely new points, they can either request that the data manager adds them to the gazetteer or alternatively they can just add them to their datasets as unique locations.

The gazetteer file must be a GeoJSON file. This file has to be defined using the WGS84 coordinate system, with a local transformation defined so that distances between points can be calculated. Each GIS feature needs to have a name attribute specified, and this is used to match location names in datasets to the gazetteer features: location names must be unique.

Other attributes can also be populated if required, but these are not currently searchable using the metadata server. The metadata server can however provide an up to date copy of the gazetteer that scripts or applications written by the data manager can make use of.

The gazetteer can either be generated using GIS software, or can be generated programmatically. Guides for both approaches are provided below.

Creation using GIS software

This can be done using whatever GIS software you like, provided that the software can export the final feature data as a GeoJSON file. The developers of the safedata system use QGIS, but if you have a preferred GIS software already you should stick with that. Sampling locations (i.e. points, lines and polygons) can all be digitized manually, providing that the name attribute is then completed.

However, manual digitisation can be imprecise and we would recommend loading in existing shapes files for the sampling locations, and then using the GIS software to verify that the locations are correct and to add the relevant attributes. If shape files do not already exist they can be generated from the .gpx files used in GPS units.

Programmatic generation

If shape files either exist already or GPS points are available (in some format) describing the locations, then the gazetteer can be generated using code ('programatically'). If you have the relevant files (or details of the points) and have at least some familiarity with coding we would strongly advise taking this approach. This is to ensure that the procedure used to create your gazetteer is reproducible, which makes it easier to track down any errors that get introduced, as well as reducing the chance of entering erroneous information to begin with.

We provide examples for R and python below as they are both highly popular scientific programming languages. However, if you have a different preferred language it's completely fine to use that instead.

Generation using R

In R, we recommend that you use the sf package to define your locations, to combine files and to export the gazetteer. If you want to load GeoJSON files into R we recommend using either the geojsonio package or the jsonlite package.

The example below demonstrates how to manually create points, lines and polygons manually using coordinates, how to load in shape files, and finally how to export everything as combined GeoJSON file.

library(sf)

# Create points, lines, and polygon features
sampling_point <- st_point(c(117.900, 4.250))
transect <- st_linestring(
  cbind(c(117.900, 117.902, 117.904, 117.906), c(4.250, 4.251, 4.252, 4.253))
)
sampling_area <- st_polygon(list(cbind(
  c(117.900, 117.900, 117.855, 117.855, 117.900),
  c(4.250, 4.255, 4.255, 4.250, 4.250)
)))

# Combine into a single geometry collection with name attributes defined
# (coordinate reference system must be 4326 for GeoJSON)
geometry_column <- st_sfc(sampling_point, transect, sampling_area, crs = 4326)
manual_locations <- st_sf(
  name = c("point_A1", "transect_B57", "plot_C324"),
  geometry = geometry_column
)

# Load features from an existing shapefile
shape_file <- st_read("example_shape_file.shp")

# # Combine features from different sources
all_locations <- rbind(manual_locations, shape_file)

# Export as GeoJSON file
file <- "gazetteer.geojson"
st_write(all_locations, file, driver = "GeoJSON")

Generation using python

In python, we recommend that you use the shapely package to define your locations, the geopandas package to load in existing shape files and the geojson package to combine and export the gazetteer.

The example below demonstrates how to manually create points, lines and polygons manually using coordinates, how to load in shape files, and finally how to export everything as combined GeoJSON file.

"""Example script showing the basics of creating a gazetteer."""

import geojson
import geopandas as gpd
from shapely.geometry import LineString, Point, Polygon, mapping

# The easiest way to create features from coordinates is to create instances of
# Point, LineString and Polygon features using the Shapely package.

# A point sampling location with latitude = 4.250 and longitude = 117.900
sampling_point = Point(4.250, 117.900)

# A line (transect) created by specifying each sampling point along the transect
transect = LineString(
    [(4.250, 117.900), (4.251, 117.902), (4.252, 117.904), (4.253, 117.906)]
)

# A polygon or plot feature is created by specifying a point for each vertex of the
# polygon - the ring of the polygon is automatically closed if the first and last
# coordinates are not identical.
sampling_area = Polygon(
    [(4.250, 117.900), (4.255, 117.900), (4.255, 117.855), (4.250, 117.855)]
)

# Now convert the shapely points into GeoJSON features
feature_sampling_point = geojson.Feature(
    geometry=mapping(sampling_point), properties={"name": "point_A1"}
)
feature_transect = geojson.Feature(
    geometry=mapping(transect), properties={"name": "transect_B57"}
)
feature_sampling_area = geojson.Feature(
    geometry=mapping(sampling_area), properties={"name": "plot_C324"}
)

# Then combine them all the manually defined points into a single feature collection
manual_points = geojson.FeatureCollection(
    [feature_sampling_point, feature_transect, feature_sampling_area]
)

# We can also add features loaded from  an existing shapefile
shape_file = gpd.read_file("example_shape_file.shp")

# Convert the loaded in points to a geojson feature collection
shape_file_points = geojson.loads(shape_file.to_json())

# Finally combine the points from the different sources
all_points = geojson.FeatureCollection(
    manual_points["features"] + shape_file_points["features"]
)
# Then export everything as a geojson
with open("gazetteer.geojson", "w") as gazetteer_file:
    geojson.dump(all_points, gazetteer_file)