Publishing data on Zenodo
The process of publishing a dataset to the Zenodo data repository
requires a validated dataset and the subcommands of the safedata_zenodo tool. These
commands use two different JSON metadata files.
Dataset metadata
The safedata_validate command generates a JSON file
containing a standard JSON description of the metadata in the dataset and of the data
tables it contains. Some of this metadata is used to populate the Zenodo description of
the published dataset files.
The dataset metadata is also used to populate the database of a metadata server.
This is a separate website that provides the API for searching available data and forms
the main data discovery backend for the safedata R package.
Zenodo deposit metadata
The Zenodo API uses JSON metadata to return key details on a Zenodo deposit that is being prepared or published. It contains key API links that are used to provide file details. This file will be generated when a new deposit is generated and is then used to carry out the other publication steps.
The safedata_zenodo tool
Info
The subcommands of the safedata_zenodo tools require that the zenodo
section of the resources configuration
be completed. This is not required for simply validating datasets.
The safedata_zenodo command line tool provides the following subcommands which
are used to publish data, post metadata and help maintain and document published
datasets.
The top level command line help for the tool, showing the available subcommands is shown below:
cl_prompt $ safedata_zenodo -h
usage: safedata_zenodo [-h] [-r RESOURCES] [-s] [-q] ...
Publish validated datasets to Zenodo using a command line interface.
This is a the command line interface for publishing safedata validated
datasets to Zenodo, downloading information and maintaining a local
copy of the datasets in the file structure required by the R safedata
package.
The safedata_zenodo command is used by providing subcommands for the
different actions required to publish a validated dataset. The list of
subcommands is shown below and individual help is available for each
of the subcommands:
safedata_zenodo subcommand -h
The subcommands for this tool use two different JSON format metadata
files:
* A dataset metadata file (`dataset_json`). This is the output from
using the `safedata_validate` tool. Some of the information in this
file is used to create the Zenodo dataset description, and all of
the data is used to describe a dataset on the separate metadata
server.
* A Zenodo metadata file (`zenodo_json`), that describes the metadata
associated with a Zenodo deposit or published record.
The subcommands that send and receive data from Zenodo also accept the
`--live` and `--sandbox` options which can be used to override the
`use_sandbox` setting in the configuration file. If the configuration is set
to `true` then `--live` will use the live site and if the configuration is set
to `false` then the `--sandbox` can be used to use the sandbox instead.
Note that most of these actions are also available via the Zenodo website.
positional arguments:
create_deposit
Create a new Zenodo draft deposit
discard_deposit
Discard an unpublished deposit
get_deposit
Download and display deposit metadata
publish_deposit
Publish a draft deposit
upload_files
Upload a file to an unpublished deposit
delete_files
Delete files from an unpublished deposit
upload_metadata
Populate the Zenodo metadata
sync_local_dir
Create or update a local safedata directory
maintain_ris
Maintain a RIS bibliography file for datasets
generate_html
Generate an HTML dataset description
generate_xml
Create INSPIRE compliant metadata XML
publish_dataset
Publish a validated dataset
options:
-h, --help show this help message and exit
-r RESOURCES, --resources RESOURCES
Path to a safedata_validator resource configuration file
-s, --show-resources
Validate and display the selected resources and exit
-q, --quiet Suppress normal information messages.
The safedata_zenodo subcommands
The command line help for each of the various subcommands is shown below:
The publish_dataset subcommand
cl_prompt $ safedata_zenodo publish_dataset -h
usage: safedata_zenodo publish_dataset [-h] [--live | --sandbox]
[-n NEW_VERSION] [--no-xml]
[-e [EXTERNAL_FILES ...]]
dataset_json dataset
This subcommand runs through the complete publication process for a validated
dataset and any external files. It does not provide all of the options of the
subcommands for individual steps but covers the main common usage of the
safedata_zenodo command. If the publication process fails, the resulting partial
deposit is discarded. All of the files in the deposit must be replaced.
positional arguments:
dataset_json Path to a JSON metadata file for a dataset
dataset Path to the Excel file to be published
options:
-h, --help show this help message and exit
--live Use the Zenodo live site, overriding the configuration file
--sandbox Use the Zenodo sandbox site, overriding the configuration file
-n NEW_VERSION, --new-version NEW_VERSION
Create a new version of the dataset with the provided Zenodo
ID.
--no-xml Do not include metadata XML in the published record
-e [EXTERNAL_FILES ...], --external-files [EXTERNAL_FILES ...]
A set of external files documented in the dataset to be
included.
The create_deposit subcommand
cl_prompt $ safedata_zenodo create_deposit -h
usage: safedata_zenodo create_deposit [-h] [--live | --sandbox]
[-n NEW_VERSION] [-i]
Create a new deposit draft.
The new version option takes the record ID of the most recent version of an
existing dataset and creates a new deposit as a new version of that dataset.
Versions of datasets are grouped under a single concept ID, which always
redirects to the most recent version. Use the most recent version ID and
_not_ the concept ID here.
When successful, the function downloads and saves a JSON file containing the
resulting Zenodo deposit metadata. This file is used as an input to other
subcommands that work with an existing deposit.
The --id-to-stdout option can be provided to explicitly return the new
deposit ID to stdout, where it can be captured for use in shell scripts.
All other logging is written to stderr.
options:
-h, --help show this help message and exit
--live Use the Zenodo live site, overriding the configuration file
--sandbox Use the Zenodo sandbox site, overriding the configuration file
-n NEW_VERSION, --new-version NEW_VERSION
Create a new version of the dataset with the provided Zenodo
ID.
-i, --id-to-stdout
Write the deposit record ID to stdout.
The discard_deposit subcommand
cl_prompt $ safedata_zenodo discard_deposit -h
usage: safedata_zenodo discard_deposit [-h] [--live | --sandbox] zenodo_json
Discard an unpublished deposit. The deposit and all uploaded files will be
removed from Zenodo.
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
options:
-h, --help show this help message and exit
--live Use the Zenodo live site, overriding the configuration file
--sandbox Use the Zenodo sandbox site, overriding the configuration file
The get_deposit subcommand
cl_prompt $ safedata_zenodo get_deposit -h
usage: safedata_zenodo get_deposit [-h] [--live | --sandbox] zenodo_id
Download the Zenodo metadata for a deposit and print out summary information.
positional arguments:
zenodo_id An ID for an existing Zenodo deposit
options:
-h, --help show this help message and exit
--live Use the Zenodo live site, overriding the configuration file
--sandbox Use the Zenodo sandbox site, overriding the configuration file
The publish_deposit subcommand
cl_prompt $ safedata_zenodo publish_deposit -h
usage: safedata_zenodo publish_deposit [-h] [--live | --sandbox] zenodo_json
Publishes a Zenodo deposit. This is the final step in publishing a dataset and
is not reversible. Once a dataset is published, the DOI associated with the
record is published to Datacite.
It may be worth reviewing the deposit webpage (https://zenodo.org/deposit/###)
before finally publishing.
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
options:
-h, --help show this help message and exit
--live Use the Zenodo live site, overriding the configuration file
--sandbox Use the Zenodo sandbox site, overriding the configuration file
The upload_files subcommand
cl_prompt $ safedata_zenodo upload_files -h
usage: safedata_zenodo upload_files [-h] [--live | --sandbox]
zenodo_json [filepaths ...]
Uploads a set of files to an _unpublished_ Zenodo deposit. If you upload a new file
to the same filename, it will replace the existing uploaded file.
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
filepaths The paths to the file to be uploaded
options:
-h, --help show this help message and exit
--live Use the Zenodo live site, overriding the configuration file
--sandbox Use the Zenodo sandbox site, overriding the configuration file
The delete_files subcommand
cl_prompt $ safedata_zenodo delete_files -h
usage: safedata_zenodo delete_files [-h] [--live | --sandbox]
zenodo_json [filenames ...]
Delete an list of uploaded file from an unpublished deposit. The deposit metadata
will be re-downloaded to ensure an up to date list of files in the deposit.
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
filenames The names of files to delete
options:
-h, --help show this help message and exit
--live Use the Zenodo live site, overriding the configuration file
--sandbox Use the Zenodo sandbox site, overriding the configuration file
The upload_metadata subcommand
cl_prompt $ safedata_zenodo upload_metadata -h
usage: safedata_zenodo upload_metadata [-h] [--live | --sandbox]
zenodo_json dataset_json
Uses the dataset metadata created using `safedata_validate` to populate the
required Zenodo metadata for an unpublished deposit.
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
dataset_json Path to a JSON metadata file for a dataset
options:
-h, --help show this help message and exit
--live Use the Zenodo live site, overriding the configuration file
--sandbox Use the Zenodo sandbox site, overriding the configuration file
The sync_local_dir subcommand
cl_prompt $ safedata_zenodo sync_local_dir -h
usage: safedata_zenodo sync_local_dir [-h] [--not-just-xlsx]
[--replace-modified] [--dry-run]
datadir
Synchronize a local data directory
This subcommand allows a safedata developer or community maintainer to
create or update such a directory with _all_ of the resources in the Zenodo
community, regardless of their public access status. This forms a backup
(although Zenodo is heavily backed up) but also provides local copies of the
files for testing and development of the code packages.
The file structure of the directory follows that used by the safedata R
package, used to store metadata and files downloaded from a safedata
community on Zenodo and from a safedata metadata server. The
`safedata_validator` configuration file will need to include the metadata
API.
By default, only the XLSX files containing metadata and data tables are
downloaded, ignoring any additional files, which are often large.
positional arguments:
datadir The path to a local directory containing an existing safedata
directory or an empty folder in which to create one
options:
-h, --help show this help message and exit
--not-just-xlsx
Should large non-xlsx files also be downloaded.
--replace-modified
Should locally modified files be overwritten with the archive
version
--dry-run Run the synchronisation process without altering the local
directory.
The maintain_ris subcommand
cl_prompt $ safedata_zenodo maintain_ris -h
usage: safedata_zenodo maintain_ris [-h] ris_file
This command maintains a RIS format bibliography file of the datasets
uploaded to a Zenodo community. It can update an existing RIS format file
to add new records or it can create the file from scratch.
The program uses both the Zenodo API (to find the records in the community)
and the Datacite API to access machine readable bibliographic records.
positional arguments:
ris_file The file path to populate with RIS records. If this file already
exists, it is assumed to be RIS file to update with any new
records not already included in the file.
options:
-h, --help show this help message and exit
The generate_html subcommand
cl_prompt $ safedata_zenodo generate_html -h
usage: safedata_zenodo generate_html [-h] dataset_json html_out
Generates an html file containing a standard description of a dataset from the
JSON metadata. Usually this will be generated and uploaded as part of the
dataset publication process, but this subcommand can be used for local
checking of the resulting HTML and developing custom templates.
positional arguments:
dataset_json Path to a JSON metadata file for a dataset
html_out Output path for the HTML file
options:
-h, --help show this help message and exit
The generate_xml subcommand
cl_prompt $ safedata_zenodo generate_xml -h
usage: safedata_zenodo generate_xml [-h] [-l LINEAGE_STATEMENT]
zenodo_json dataset_json xml_out
Creates an INSPIRE compliant XML metadata file for a published dataset,
optionally including a user provided lineage statement (such as project
details).
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
dataset_json Path to a JSON metadata file for a dataset
xml_out Output path for the XML file
options:
-h, --help show this help message and exit
-l LINEAGE_STATEMENT, --lineage-statement LINEAGE_STATEMENT
Path to a text file containing a lineage statement