Skip to content

Publishing data on Zenodo

The process of publishing a dataset to the Zenodo data repository requires a validated dataset and the subcommands of the safedata_zenodo tool. These commands use two different JSON metadata files.

Dataset metadata

The safedata_validate command generates a JSON file containing a standard JSON description of the metadata in the dataset and of the data tables it contains. Some of this metadata is used to populate the Zenodo description of the published dataset files.

The dataset metadata is also used to populate the database of a metadata server. This is a separate website that provides the API for searching available data and forms the main data discovery backend for the safedata R package.

Zenodo deposit metadata

The Zenodo API uses JSON metadata to return key details on a Zenodo deposit that is being prepared or published. It contains key API links that are used to provide file details. This file will be generated when a new deposit is generated and is then used to carry out the other publication steps.

The safedata_zenodo tool

Info

The subcommands of the safedata_zenodo tools require that the zenodo section of the resources configuration be completed. This is not required for simply validating datasets.

The safedata_zenodo command line tool provides the following subcommands which are used to publish data, post metadata and help maintain and document published datasets.

The top level command line help for the tool, showing the available subcommands is shown below:

cl_prompt $ safedata_zenodo -h
usage: safedata_zenodo [-h] [-r RESOURCES] [-s] [-q]  ...

Publish validated datasets to Zenodo using a command line interface.

    This is a the command line interface for publishing safedata validated
    datasets to Zenodo, downloading information and maintaining a local
    copy of the datasets in the file structure required by the R safedata
    package.

    The safedata_zenodo command is used by providing subcommands for the
    different actions required to publish a validated dataset. The list of
    subcommands is shown below and individual help is available for each
    of the subcommands:

        safedata_zenodo subcommand -h

    The subcommands for this tool use two different JSON format metadata
    files:

    * A dataset metadata file (`dataset_json`). This is the output from
        using the `safedata_validate` tool. Some of the information in this
        file is used to create the Zenodo dataset description, and all of
        the data is used to describe a dataset on the separate metadata
        server.

    * A Zenodo metadata file (`zenodo_json`), that describes the metadata
        associated with a Zenodo deposit or published record.

    The subcommands that send and receive data from Zenodo also accept the
    `--live` and `--sandbox` options which can be used to override the
    `use_sandbox` setting in the configuration file. If the configuration is set
    to `true` then `--live` will use the live site and if the configuration is set
    to `false` then the `--sandbox` can be used to use the sandbox instead.

    Note that most of these actions are also available via the Zenodo website.

positional arguments:

    create_deposit
                Create a new Zenodo draft deposit
    discard_deposit
                Discard an unpublished deposit
    get_deposit
                Download and display deposit metadata
    publish_deposit
                Publish a draft deposit
    upload_files
                Upload a file to an unpublished deposit
    delete_files
                Delete files from an unpublished deposit
    upload_metadata
                Populate the Zenodo metadata
    sync_local_dir
                Create or update a local safedata directory
    maintain_ris
                Maintain a RIS bibliography file for datasets
    generate_html
                Generate an HTML dataset description
    generate_xml
                Create INSPIRE compliant metadata XML
    publish_dataset
                Publish a validated dataset

options:
  -h, --help    show this help message and exit
  -r RESOURCES, --resources RESOURCES
                Path to a safedata_validator resource configuration file
  -s, --show-resources
                Validate and display the selected resources and exit
  -q, --quiet   Suppress normal information messages.

The safedata_zenodo subcommands

The command line help for each of the various subcommands is shown below:

The publish_dataset subcommand

cl_prompt $ safedata_zenodo publish_dataset -h
usage: safedata_zenodo publish_dataset [-h] [--live | --sandbox]
                                       [-n NEW_VERSION] [--no-xml]
                                       [-e [EXTERNAL_FILES ...]]
                                       dataset_json dataset

This subcommand runs through the complete publication process for a validated
dataset and any external files. It does not provide all of the options of the
subcommands for individual steps but covers the main common usage of the
safedata_zenodo command. If the publication process fails, the resulting partial
deposit is discarded. All of the files in the deposit must be replaced.

positional arguments:
  dataset_json  Path to a JSON metadata file for a dataset
  dataset       Path to the Excel file to be published

options:
  -h, --help    show this help message and exit
  --live        Use the Zenodo live site, overriding the configuration file
  --sandbox     Use the Zenodo sandbox site, overriding the configuration file
  -n NEW_VERSION, --new-version NEW_VERSION
                Create a new version of the dataset with the provided Zenodo
                ID.
  --no-xml      Do not include metadata XML in the published record
  -e [EXTERNAL_FILES ...], --external-files [EXTERNAL_FILES ...]
                A set of external files documented in the dataset to be
                included.

The create_deposit subcommand

cl_prompt $ safedata_zenodo create_deposit -h
usage: safedata_zenodo create_deposit [-h] [--live | --sandbox]
                                      [-n NEW_VERSION] [-i]

Create a new deposit draft. 

The new version option takes the record ID of the most recent version of an
existing dataset and creates a new deposit as a new version of that dataset. 
Versions of datasets are grouped under a single concept ID, which always 
redirects to the most recent version. Use the most recent version ID and 
_not_ the concept ID here.

When successful, the function downloads and saves a JSON file containing the
resulting Zenodo deposit metadata. This file is used as an input to other
subcommands that work with an existing deposit.

The --id-to-stdout option can be provided to explicitly return the new
deposit ID to stdout, where it can be captured for use in shell scripts.
All other logging is written to stderr.

options:
  -h, --help    show this help message and exit
  --live        Use the Zenodo live site, overriding the configuration file
  --sandbox     Use the Zenodo sandbox site, overriding the configuration file
  -n NEW_VERSION, --new-version NEW_VERSION
                Create a new version of the dataset with the provided Zenodo
                ID.
  -i, --id-to-stdout
                Write the deposit record ID to stdout.

The discard_deposit subcommand

cl_prompt $ safedata_zenodo discard_deposit -h
usage: safedata_zenodo discard_deposit [-h] [--live | --sandbox] zenodo_json

Discard an unpublished deposit. The deposit and all uploaded files will be
removed from Zenodo.

positional arguments:
  zenodo_json  Path to a Zenodo JSON file for a deposit

options:
  -h, --help   show this help message and exit
  --live       Use the Zenodo live site, overriding the configuration file
  --sandbox    Use the Zenodo sandbox site, overriding the configuration file

The get_deposit subcommand

cl_prompt $ safedata_zenodo get_deposit -h
usage: safedata_zenodo get_deposit [-h] [--live | --sandbox] zenodo_id

Download the Zenodo metadata for a deposit and print out summary information.

positional arguments:
  zenodo_id   An ID for an existing Zenodo deposit

options:
  -h, --help  show this help message and exit
  --live      Use the Zenodo live site, overriding the configuration file
  --sandbox   Use the Zenodo sandbox site, overriding the configuration file

The publish_deposit subcommand

cl_prompt $ safedata_zenodo publish_deposit -h
usage: safedata_zenodo publish_deposit [-h] [--live | --sandbox] zenodo_json

Publishes a Zenodo deposit. This is the final step in publishing a dataset and
is not reversible. Once a dataset is published, the DOI associated with the
record is published to Datacite.

It may be worth reviewing the deposit webpage (https://zenodo.org/deposit/###)
before finally publishing.

positional arguments:
  zenodo_json  Path to a Zenodo JSON file for a deposit

options:
  -h, --help   show this help message and exit
  --live       Use the Zenodo live site, overriding the configuration file
  --sandbox    Use the Zenodo sandbox site, overriding the configuration file

The upload_files subcommand

cl_prompt $ safedata_zenodo upload_files -h
usage: safedata_zenodo upload_files [-h] [--live | --sandbox]
                                    zenodo_json [filepaths ...]

Uploads a set of files to an _unpublished_ Zenodo deposit. If you upload a new file
to the same filename, it will replace the existing uploaded file.

positional arguments:
  zenodo_json  Path to a Zenodo JSON file for a deposit
  filepaths    The paths to the file to be uploaded

options:
  -h, --help   show this help message and exit
  --live       Use the Zenodo live site, overriding the configuration file
  --sandbox    Use the Zenodo sandbox site, overriding the configuration file

The delete_files subcommand

cl_prompt $ safedata_zenodo delete_files -h
usage: safedata_zenodo delete_files [-h] [--live | --sandbox]
                                    zenodo_json [filenames ...]

Delete an list of uploaded file from an unpublished deposit. The deposit metadata
will be re-downloaded to ensure an up to date list of files in the deposit.

positional arguments:
  zenodo_json  Path to a Zenodo JSON file for a deposit
  filenames    The names of files to delete

options:
  -h, --help   show this help message and exit
  --live       Use the Zenodo live site, overriding the configuration file
  --sandbox    Use the Zenodo sandbox site, overriding the configuration file

The upload_metadata subcommand

cl_prompt $ safedata_zenodo upload_metadata -h
usage: safedata_zenodo upload_metadata [-h] [--live | --sandbox]
                                       zenodo_json dataset_json

Uses the dataset metadata created using `safedata_validate` to populate the
required Zenodo metadata for an unpublished deposit.

positional arguments:
  zenodo_json   Path to a Zenodo JSON file for a deposit
  dataset_json  Path to a JSON metadata file for a dataset

options:
  -h, --help    show this help message and exit
  --live        Use the Zenodo live site, overriding the configuration file
  --sandbox     Use the Zenodo sandbox site, overriding the configuration file

The sync_local_dir subcommand

cl_prompt $ safedata_zenodo sync_local_dir -h
usage: safedata_zenodo sync_local_dir [-h] [--not-just-xlsx]
                                      [--replace-modified] [--dry-run]
                                      datadir

Synchronize a local data directory

This subcommand allows a safedata developer or community maintainer to
create or update such a directory with _all_ of the resources in the Zenodo
community, regardless of their public access status. This forms a backup
(although Zenodo is heavily backed up) but also provides local copies of the
files for testing and development of the code packages.

The file structure of the directory follows that used by the safedata R
package, used to store metadata and files downloaded from a safedata
community on Zenodo and from a safedata metadata server. The
`safedata_validator` configuration file will need to include the metadata
API.

By default, only the XLSX files containing metadata and data tables are
downloaded, ignoring any additional files, which are often large.

positional arguments:
  datadir       The path to a local directory containing an existing safedata
                directory or an empty folder in which to create one

options:
  -h, --help    show this help message and exit
  --not-just-xlsx
                Should large non-xlsx files also be downloaded.
  --replace-modified
                Should locally modified files be overwritten with the archive
                version
  --dry-run     Run the synchronisation process without altering the local
                directory.

The maintain_ris subcommand

cl_prompt $ safedata_zenodo maintain_ris -h
usage: safedata_zenodo maintain_ris [-h] ris_file

This command maintains a RIS format bibliography file of the datasets
uploaded to a Zenodo community. It can update an existing RIS format file
to add new records or it can create the file from scratch.

The program uses both the Zenodo API (to find the records in the community)
and the Datacite API to access machine readable bibliographic records.

positional arguments:
  ris_file    The file path to populate with RIS records. If this file already
              exists, it is assumed to be RIS file to update with any new
              records not already included in the file.

options:
  -h, --help  show this help message and exit

The generate_html subcommand

cl_prompt $ safedata_zenodo generate_html -h
usage: safedata_zenodo generate_html [-h] dataset_json html_out

Generates an html file containing a standard description of a dataset from the
JSON metadata. Usually this will be generated and uploaded as part of the
dataset publication process, but this subcommand can be used for local
checking of the resulting HTML and developing custom templates.

positional arguments:
  dataset_json  Path to a JSON metadata file for a dataset
  html_out      Output path for the HTML file

options:
  -h, --help    show this help message and exit

The generate_xml subcommand

See also.

cl_prompt $ safedata_zenodo generate_xml -h
usage: safedata_zenodo generate_xml [-h] [-l LINEAGE_STATEMENT]
                                    zenodo_json dataset_json xml_out

Creates an INSPIRE compliant XML metadata file for a published dataset,
optionally including a user provided lineage statement (such as project
details).

positional arguments:
  zenodo_json   Path to a Zenodo JSON file for a deposit
  dataset_json  Path to a JSON metadata file for a dataset
  xml_out       Output path for the XML file

options:
  -h, --help    show this help message and exit
  -l LINEAGE_STATEMENT, --lineage-statement LINEAGE_STATEMENT
                Path to a text file containing a lineage statement