Skip to content

Zenodo and metadata tools overview

The safedata_validator package also contains tools to publish validated datasets to the Zenodo data repository and to update the metadata server.

Data publication process

The process of publishing a dataset involves both the safedata_validate tool and the various subcommands of the safedata_zenodo tool. These commands make use of two different JSON metadata files.

Dataset metadata

The safedata_validate command generates a JSON file containing a standard JSON description of the metadata in the dataset and of the data tables it contains. Some of this metadata is used to populate the Zenodo description of the published dataset files.

The dataset metadata is also used to populate the database of a metadata server. This is a separate website that provides the API for searching available data and forms the main data discovery backend for the safedata R package.

Zenodo deposit metadata

The Zenodo API returns JSON metadata that provides key details on a Zenodo deposit that is being prepared or published. It contains key API links that are used to provide file details.

The safedata_zenodo tool

Info

The subcommands of the safedata_zenodo tools require that the zenodo section of the resources configuration be completed. This is not required for simply validating datasets.

The safedata_zenodo command line tool provides the following subcommands which are used to publish data, post metadata and help maintain and document published datasets.

The top level command line help for the tool, showing the available subcommands is shown below:

cl_prompt $ safedata_zenodo -h
usage: safedata_zenodo [-h] [-r RESOURCES] [-s] [-q]  ...

Publish validated datasets to Zenodo using a command line interface.

    This is a the command line interface for publishing safedata validated
    datasets to Zenodo, downloading information and maintaining a local
    copy of the datasets in the file structure required by the R safedata
    package.

    The safedata_zenodo command is used by providing subcommands for the
    different actions required to publish a validated dataset. The list of
    subcommands (with aliases) is shown below and individual help is
    available for each of the subcommands:

        safedata_zenodo subcommand -h

    The subcommands for this tool use two different JSON format metadata
    files:

    * A dataset metadata file (`dataset_json`). This is the output from
        using the `safedata_validate` tool. Some of the information in this
        file is used to create the Zenodo dataset description, and all of
        the data is used to describe a dataset on the separate metadata
        server.

    * A Zenodo metadata file (`zenodo_json`), that describes the metadata
        associated with a Zenodo deposit or published record.

    Note that most of these actions are also available via the Zenodo website.

positional arguments:

    create_deposit
                Create a new Zenodo draft deposit
    discard_deposit
                Discard an unpublished deposit
    get_deposit
                Download and display deposit metadata
    publish_deposit
                Publish a draft deposit
    upload_file
                Upload a file to an unpublished deposit
    delete_file
                Delete a file from an unpublished deposit
    upload_metadata
                Populate the Zenodo metadata
    amend_metadata
                Update published Zenodo metadata
    sync_local_dir
                Create or update a local safedata directory
    maintain_ris
                Maintain a RIS bibliography file for datasets
    generate_html
                Generate an HTML dataset description
    generate_xml
                Create INSPIRE compliant metadata XML

options:
  -h, --help    show this help message and exit
  -r RESOURCES, --resources RESOURCES
                Path to a safedata_validator resource configuration file
  -s, --show-resources
                Validate and display the selected resources and exit
  -q, --quiet   Suppress normal information messages.

Simple publication process

As an initial example, the process for publishing a simple dataset (without any external data files) would be:

# Validate the file, creating the Test_format_good.json metadata file
safedata_validate Test_format_good.xlsx

# Create a new deposit, creating a JSON file of metadata for the Zenodo
# deposit as - for example - zenodo_1059375.json
safedata_zenodo create_deposit

# Upload the file
safedata_zenodo upload_file zenodo_1059375.json Test_format_good.xlsx

# Populate the Zenodo deposit metadata from the dataset metadata
safedata_zenodo upload_metadata zenodo_1059375.json Test_format_good.json

# Publish the deposit
safedata_zenodo publish_deposit zenodo_1059375.json

# Post the metadata to the metadata server
safedata_zenodo post_metadata zenodo_1059375.json Test_format_good.json

The safedata_zenodo subcommands

The command line help for each of the various subcommands is shown below:

The create_deposit subcommand

cl_prompt $ safedata_zenodo create_deposit -h
usage: safedata_zenodo create_deposit [-h] [-c CONCEPT_ID]

Create a new deposit draft. The concept_id option uses a provided Zenodo
concept ID to creates a draft as a new version of an existing data set.

When successful, the function downloads and saves a JSON file containing the
resulting Zenodo deposit metadata. This file is used as an input to other
subcommands that work with an existing deposit.

options:
  -h, --help    show this help message and exit
  -c CONCEPT_ID, --concept_id CONCEPT_ID
                A Zenodo concept ID

The discard_deposit subcommand

cl_prompt $ safedata_zenodo discard_deposit -h
usage: safedata_zenodo discard_deposit [-h] zenodo_json

Discard an unpublished deposit. The deposit and all uploaded files will be
removed from Zenodo.

positional arguments:
  zenodo_json  Path to a Zenodo JSON file for a deposit

options:
  -h, --help   show this help message and exit

The get_deposit subcommand

cl_prompt $ safedata_zenodo get_deposit -h
usage: safedata_zenodo get_deposit [-h] zenodo_id

Download the Zenodo metadata for a deposit and print out summary information.

positional arguments:
  zenodo_id   An ID for an existing Zenodo deposit

options:
  -h, --help  show this help message and exit

The publish_deposit subcommand

cl_prompt $ safedata_zenodo publish_deposit -h
usage: safedata_zenodo publish_deposit [-h] zenodo_json

Publishes a Zenodo deposit. This is the final step in publishing a dataset and
is not reversible. Once a dataset is published, the DOI associated with the
record is published to Datacite.

It may be worth reviewing the deposit webpage (https://zenodo.org/deposit/###)
before finally publishing.

positional arguments:
  zenodo_json  Path to a Zenodo JSON file for a deposit

options:
  -h, --help   show this help message and exit

The upload_file subcommand

cl_prompt $ safedata_zenodo upload_file -h
usage: safedata_zenodo upload_file [-h] [--zenodo_filename ZENODO_FILENAME]
                                   zenodo_json filepath

Uploads the contents of a specified file to an _unpublished_ Zenodo deposit,
optionally using an alternative filename. If you upload a new file to the same
filename, it will replace the existing uploaded file.

positional arguments:
  zenodo_json   Path to a Zenodo JSON file for a deposit
  filepath      The path to the file to be uploaded

options:
  -h, --help    show this help message and exit
  --zenodo_filename ZENODO_FILENAME
                An optional alternative file name to be used on Zenodo

The delete_file subcommand

cl_prompt $ safedata_zenodo delete_file -h
usage: safedata_zenodo delete_file [-h] zenodo_json filename

Delete an uploaded file from an unpublished deposit. The deposit metadata will
be re-downloaded to ensure an up to date list of files in the deposit.

positional arguments:
  zenodo_json  Path to a Zenodo JSON file for a deposit
  filename     The name of the file to delete

options:
  -h, --help   show this help message and exit

The upload_metadata subcommand

cl_prompt $ safedata_zenodo upload_metadata -h
usage: safedata_zenodo upload_metadata [-h] zenodo_json dataset_json

Uses the dataset metadata created using `safedata_validate` to populate the
required Zenodo metadata for an unpublished deposit.

positional arguments:
  zenodo_json   Path to a Zenodo JSON file for a deposit
  dataset_json  Path to a JSON metadata file for a dataset

options:
  -h, --help    show this help message and exit

The amend_metadata subcommand

cl_prompt $ safedata_zenodo amend_metadata -h
usage: safedata_zenodo amend_metadata [-h] zenodo_json_update

Updates the Zenodo metadata for an published deposit. To use this, make sure
you have the most recent Zenodo metadata for the deposit and then edit the
JSON file to the new values. You can only edit the contents of the
`metadata` section.

Caution: this command should only be used to make urgent changes - such as
access restrictions. It is also easy to submit invalid metadata!

positional arguments:
  zenodo_json_update
                Path to an updated Zenodo metadata file

options:
  -h, --help    show this help message and exit

The sync_local_dir subcommand

cl_prompt $ safedata_zenodo sync_local_dir -h
usage: safedata_zenodo sync_local_dir [-h] [--not-just-xlsx]
                                      [--replace-modified]
                                      datadir

Synchronize a local data directory

This subcommand allows a safedata developer or community maintainer to
create or update such a directory with _all_ of the resources in the Zenodo
community, regardless of their public access status. This forms a backup
(although Zenodo is heavily backed up) but also provides local copies of the
files for testing and development of the code packages.

The file structure of the directory follows that used by the safedata R
package, used to store metadata and files downloaded from a safedata
community on Zenodo and from a safedata metadata server. The
`safedata_validator` configuration file will need to include the metadata
API.

By default, only the XLSX files containing metadata and data tables are
downloaded, ignoring any additional files, which are often large.

positional arguments:
  datadir       The path to a local directory containing an existing safedata
                directory or an empty folder in which to create one

options:
  -h, --help    show this help message and exit
  --not-just-xlsx
                Should large non-xlsx files also be downloaded.
  --replace-modified
                Should locally modified files be overwritten with the archive
                version

The maintain_ris subcommand

cl_prompt $ safedata_zenodo maintain_ris -h
usage: safedata_zenodo maintain_ris [-h] ris_file

This command maintains a RIS format bibliography file of the datasets
uploaded to a Zenodo community. It can update an existing RIS format file
to add new records or it can create the file from scratch.

The program uses both the Zenodo API (to find the records in the community)
and the Datacite API to access machine readable bibliographic records.

positional arguments:
  ris_file    The file path to populate with RIS records. If this file already
              exists, it is assumed to be RIS file to update with any new
              records not already included in the file.

options:
  -h, --help  show this help message and exit

The generate_html subcommand

cl_prompt $ safedata_zenodo generate_html -h
usage: safedata_zenodo generate_html [-h] zenodo_json dataset_json html_out

Generates an html file containing a standard description of a dataset from the
JSON metadata. Usually this will be generated and uploaded as part of the
dataset publication process, but this subcommand can be used for local
checking of the resulting HTML.

positional arguments:
  zenodo_json   Path to a Zenodo JSON file for a deposit
  dataset_json  Path to a JSON metadata file for a dataset
  html_out      Output path for the HTML file

options:
  -h, --help    show this help message and exit

The generate_xml subcommand

See also here.

cl_prompt $ safedata_zenodo generate_xml -h
usage: safedata_zenodo generate_xml [-h] [-l LINEAGE_STATEMENT]
                                    zenodo_json dataset_json xml_out

Creates an INSPIRE compliant XML metadata file for a published dataset,
optionally including a user provided lineage statement (such as project
details).

positional arguments:
  zenodo_json   Path to a Zenodo JSON file for a deposit
  dataset_json  Path to a JSON metadata file for a dataset
  xml_out       Output path for the XML file

options:
  -h, --help    show this help message and exit
  -l LINEAGE_STATEMENT, --lineage-statement LINEAGE_STATEMENT
                Path to a text file containing a lineage statement