Skip to content

The GBIFTaxa worksheet

The GBIFTaxa worksheet is used to record taxonomic details of organisms referred to in the Data worksheets. It is generally intended for use with field observations of organisms, such as direct observation, camera trapping, audio recordings and similar. The taxonomic information is used in three ways:

  • The listed taxa are validated against a version of the GBIF backbone taxonomy. This is done to try and identify the common typographic errors and taxonomic issues that make it difficult to reuse field data.
  • The list of taxa are cross-checked against taxon fields in the data worksheets to confirm that there is a complete list of the taxa recorded in the datasets.
  • The taxonomy is used to generate an hierarchical taxon index that can be used to search datasets for taxa of interest.

If you have taxa that have been identified via sequencing - rather than field observation - then one or more sequenced taxa worksheets should be included instead. If you have both field observations and sequenced taxa, you will need to provide both.

GBIF Taxon validation

In order to help keep the taxonomy as clean as possible and to allow us to index the taxonomic coverage of datasets, we will check all taxon names in GBIFTaxa worksheet against the GBIF backbone taxonomy database. If you want to check your taxon names and ranks, then the search engine is here:

https://www.gbif.org/species/search

No online taxonomy is ever going to be 100% up to date (or 100% agree with your taxonomic usage!) but the GBIF backbone has very good taxonomic coverage and is well curated.

Taxon table layout

The table format looks like this:

Name Taxon name Taxon type Taxon ID Ignore ID Parent name Parent type Parent ID Comments
crbo Crematogaster borneensis Species
dolic_sp Dolichoderus Genus
gannets Morus Genus 2480962
lost_orang Pongo tapanuliensis Species Pongo Genus New species
morpho1 NA Morphospecies Formicidae Family
bombines Bombini Tribe Apidae Family
new_gannet Morus rubra Species 5361886 Morus Genus 2480962 New gannet (not mulberry) species
Tiger leech Haemadipsa picta Species
Brown leech Haemadipsa zeylanica Species
Clouded leopard Neofelis nebulosa Species
Flat headed cat Prionailurus planiceps Species
Brown rat Rattus norvegicus Species
Moon rat Echinosorex gymnura Species

The table must contain column headers in the first row of the worksheet. The first three columns (Name, Taxon name and Taxon type) are mandatory and contain the following:

  • Name: This column must contain a local name for all of the taxa that you are going to use in the rest of the dataset. You cannot have duplicated names! Note that these can be abbreviations or codes: if you want to use Crbe in your data worksheets, rather than typing out Crematogaster borneensis every time, then that is fine.

Please also note that different worksheet names need to be used if a taxon is detected in multiple ways and hence represented in multiple taxa worksheets.

Note

These are the names that you are going to use in your data worksheet. The other columns are to help us validate the taxonomy of your names.

  • Taxon name: This column must contain the scientific name of the taxon, which will be used for taxon validation via GBIF. Note that this should not include any taxon authority, so Panthera tigris not Panthera tigris (Linnaeus, 1758).

  • Taxon type: This column must provide the taxonomic type of the named taxon, which is usually the taxonomic rank. For example, the taxon Pongo pygmaeus would be of type Species and the taxon Formicidae would be of type Family. If the taxon type is not one of the core GBIF backbone ranks ( Kingdom, Phylum, Order, Class, Family, Genus, Species and Subspecies), or is one of the special values Morphospecies or Functional group, then you will need to provide a parent taxon (see below).

  • Comments: This is entirely optional - if you have a fairly standard set of taxa with no serious issues then you can leave it out entirely or it can be empty. If you do have particular notes that you want to make - explaining disagreements with GBIF taxonomy, new species notes and the like - then these can be very useful for future researchers trying to place taxa.

The other fields are optional and are used to handle the following taxonomic issues. You only need to put anything in these optional fields for the rows they apply to: leave them blank for all other rows as in the example table above.

Ambiguous taxon names

Some taxon names map to more than one taxon. For example, the genus Morus can refer to either mulberries or gannets. The checking code will raise an error if it encounters an ambiguous name. In these rare cases, you will need to look up the taxon in GBIF and provide the GBIF ID number in the Taxon ID field.

The example in the table allows us to confirm that you mean this Morus: https://www.gbif.org/species/2480962.

Incorrect GBIF taxonomy

In general, we follow the GBIF 'accepted usage' of a taxon name and this includes following when GBIF say a taxon is a synonym. We include your taxon name in our taxonomic indices, but under the taxonomic hierarchy suggested by GBIF.

If you strongly disagree with that accepted usage, then you can use the Ignore ID field to explicitly ignore the GBIF match. You will need to insert the GBIF ID number of the accepted usage (reported in the warning message) in this field. You will then need to provide a parent taxon (see below)

In the example in the table, a (fictional) recently discovered species of gannet (Morus) has been included. Unfortunately, this species shares a scientific name (Morus rubra) with a species of mulberry (Morus). This results in information about the mulberry species being incorrectly included in the taxonomic hierarchy. To stop this from happening, information on the correct parent genus must be included, and the incorrect species match must be explicitly ignored using the Ignore ID field.

The example in the table is made up, but similar situations have arisen in previous datasets published using the safedata system. That said conflicts of this type are rare, so you are unlikely to have to ever use Ignore ID.

Parent taxon details

If a taxon name cannot be matched against GBIF, then you will need to provide details of a parent taxon. This is a taxon that can be validated in GBIF and provides a hook to allow us to place the child taxon in the backbone taxonomy. Note that you should still provide the taxon information as usual.

In these cases, the next three columns (Parent name, Parent type and Parent ID) are used to provide a parent taxon. They are used in exactly the same way as the Taxon columns: the only restriction is that the Parent type must be one of core ranks used in the GBIF backbone: Kingdom, Phylum, Order, Class, Family, Genus, Species and Subspecies. Again Parent ID is only needed to resolve ambiguous taxon names.

The table shows examples for the following cases:

New and unrecognized taxa

If a taxon is new or not recognized by GBIF (and you're sure it isn't just a typo!) then provide a parent name and type to allow us to hook the taxon into the index. For example, before Pongo tapanuliensis was recognised as a species, you would need to provide Pongo as a parent name of type 'genus',

You would then get a message in the file validation report saying:

- Row 4 (lost_orang): not found in GBIF but has valid parent information

Morphospecies and Functional groups

For morphospecies and functional groups, the Taxon type should be specified as 'Morphospecies' or 'Functional group'. The Name field should be filled in with whatever name is used to label it in the dataset. In this case, the Taxon name field is not checked or used anywhere in the validation process, so can be filled in with whatever you wish. However, this the field still has to be filled out. If you do not wish to provide a name here, we recommend just filling it in with 'NA'.

Now you need to provide a parent taxon and type. The level of taxonomic certainty for morphospecies and functional groups is quite variable, but we'd like the finest taxonomic level you can provide. As an example, in the table above, 'Morphospecies #1' is simply identified as being an ant. The validation report will show:

- Row 5 (morpho1): Morphospecies with valid parent information

Less common taxonomic levels

If you want to use taxa defined at any intermediate taxonomic levels that are not included in the GBIF backbone, then you will again need to provide a parent taxon. In the example in the table, taxa identified to tribe level (Bombini) are hooked in as being in the family Apidae. The subfamily Apinae would be more precise, but subfamily isn't one of the backbone taxonomic levels.

Ignored taxa

If you have set an Ignore ID value for a taxon, then that is explicitly rejecting the parent taxon that we would naturally use from GBIF and so you will need to provide a replacement. The example in the table is explicitly saying that Morus rubra is not a species of mulberry (genus Morus) but is a species of gannet that belongs in the Animalia genus Morus.

Deleted taxa

The GBIF backbone also includes a large number of deleted taxon ids: these are a mix of duplicated names, typos in the database and other errors. The GBIF ID of these taxa are preserved, along with some information, but we do not allow deleted taxa to be used in the GBIFTaxa worksheet.

My data doesn't contain taxa

Fine. You can omit both the GBIFTaxa worksheet and the sequenced taxa worksheets!