Overview
The safedata_validator
package is a dataset validation and publishing tool for use
with large collections of related but heterogenous datasets. It was originally built to
handle the field data collections from the SAFE Project,
which has generated a large number of related
datasets.
It forms one element of the wider safedata
system, which provide a general framework
for use in providing data capture, validation, publication and discovery within long
term ecological projects.
User starting points
- Data providers
- These pages provide an overview of the data preparation and submission process.
- Data managers
- These pages provide an introduction to the command line tools used to validate and publish datasets.
- Developers
- These pages provide the API for the package classes and methods.
Package components
The safedata_validator
package uses the following elements:
- A defined format for annotating data tables with metadata, validating sampling locations and taxonomic references and providing a standard set of summary metadata.
- The
safedata_validator
package itself, written in Python, which is used to check that a dataset meets the standard format. The package provides:- A programmatic API:
safedata_validator
can be imported and then used to implement validation within another program. - An implementation of the format checking process for data stored in Excel files, although implementations for other formats are possible.
- Command line tools for both the validation (
safedata_validate
) and publication (safedata_zenodo
) of datasets
- A programmatic API:
- A data store to archive validated datasets. The
safedata_validator
packages currently uses Zenodo to store datasets.
The wider safedata
system then also uses:
- The
safedata_server
web application, which implements a metadata server for published datasets, used to maintain a searchable interface to the full metadata extracted from datasets. - The
safedata
package, which provides data discovery and download for published datasets into the R statistical computing framework.
Code availability
The safedata_validator
package is open source Python and is maintained on
GitHub. It can
be installed using PyPI.
See the installation notes for setup instructions.