Zenodo and metadata tools overview
The safedata_validator
package also contains tools to publish validated
datasets to the Zenodo data repository and to update
the metadata server.
Data publication process
The process of publishing a dataset involves both the safedata_validate
tool
and the various subcommands of the safedata_zenodo
tool. These commands make
use of two different JSON metadata files.
Dataset metadata
The safedata_validate
command generates a JSON file containing a standard JSON
description of the metadata in the dataset and of the data tables it contains.
Some of this metadata is used to populate the Zenodo description of the published
dataset files.
The dataset metadata is also used to populate the database of a metadata
server. This is a separate website that provides the API for searching
available data and forms the main data discovery backend for the safedata
R
package.
Zenodo deposit metadata
The Zenodo API returns JSON metadata that provides key details on a Zenodo deposit that is being prepared or published. It contains key API links that are used to provide file details.
The safedata_zenodo
tool
Info
The subcommands of the safedata_zenodo
tools require that the zenodo
section of the resources configuration be
completed. This is not required for simply validating datasets.
The safedata_zenodo
command line tool provides the following subcommands which
are used to publish data, post metadata and help maintain and document published
datasets.
The top level command line help for the tool, showing the available subcommands is shown below:
cl_prompt $ safedata_zenodo -h
usage: safedata_zenodo [-h] [-r RESOURCES] [-s] [-q] ...
Publish validated datasets to Zenodo using a command line interface.
This is a the command line interface for publishing safedata validated
datasets to Zenodo, downloading information and maintaining a local
copy of the datasets in the file structure required by the R safedata
package.
The safedata_zenodo command is used by providing subcommands for the
different actions required to publish a validated dataset. The list of
subcommands (with aliases) is shown below and individual help is
available for each of the subcommands:
safedata_zenodo subcommand -h
The subcommands for this tool use two different JSON format metadata
files:
* A dataset metadata file (`dataset_json`). This is the output from
using the `safedata_validate` tool. Some of the information in this
file is used to create the Zenodo dataset description, and all of
the data is used to describe a dataset on the separate metadata
server.
* A Zenodo metadata file (`zenodo_json`), that describes the metadata
associated with a Zenodo deposit or published record.
Note that most of these actions are also available via the Zenodo website.
positional arguments:
create_deposit
Create a new Zenodo draft deposit
discard_deposit
Discard an unpublished deposit
get_deposit
Download and display deposit metadata
publish_deposit
Publish a draft deposit
upload_file
Upload a file to an unpublished deposit
delete_file
Delete a file from an unpublished deposit
upload_metadata
Populate the Zenodo metadata
amend_metadata
Update published Zenodo metadata
sync_local_dir
Create or update a local safedata directory
maintain_ris
Maintain a RIS bibliography file for datasets
generate_html
Generate an HTML dataset description
generate_xml
Create INSPIRE compliant metadata XML
options:
-h, --help show this help message and exit
-r RESOURCES, --resources RESOURCES
Path to a safedata_validator resource configuration file
-s, --show-resources
Validate and display the selected resources and exit
-q, --quiet Suppress normal information messages.
Simple publication process
As an initial example, the process for publishing a simple dataset (without any external data files) would be:
# Validate the file, creating the Test_format_good.json metadata file
safedata_validate Test_format_good.xlsx
# Create a new deposit, creating a JSON file of metadata for the Zenodo
# deposit as - for example - zenodo_1059375.json
safedata_zenodo create_deposit
# Upload the file
safedata_zenodo upload_file zenodo_1059375.json Test_format_good.xlsx
# Populate the Zenodo deposit metadata from the dataset metadata
safedata_zenodo upload_metadata zenodo_1059375.json Test_format_good.json
# Publish the deposit
safedata_zenodo publish_deposit zenodo_1059375.json
# Post the metadata to the metadata server
safedata_zenodo post_metadata zenodo_1059375.json Test_format_good.json
The safedata_zenodo
subcommands
The command line help for each of the various subcommands is shown below:
The create_deposit
subcommand
cl_prompt $ safedata_zenodo create_deposit -h
usage: safedata_zenodo create_deposit [-h] [-c CONCEPT_ID]
Create a new deposit draft. The concept_id option uses a provided Zenodo
concept ID to creates a draft as a new version of an existing data set.
When successful, the function downloads and saves a JSON file containing the
resulting Zenodo deposit metadata. This file is used as an input to other
subcommands that work with an existing deposit.
options:
-h, --help show this help message and exit
-c CONCEPT_ID, --concept_id CONCEPT_ID
A Zenodo concept ID
The discard_deposit
subcommand
cl_prompt $ safedata_zenodo discard_deposit -h
usage: safedata_zenodo discard_deposit [-h] zenodo_json
Discard an unpublished deposit. The deposit and all uploaded files will be
removed from Zenodo.
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
options:
-h, --help show this help message and exit
The get_deposit
subcommand
cl_prompt $ safedata_zenodo get_deposit -h
usage: safedata_zenodo get_deposit [-h] zenodo_id
Download the Zenodo metadata for a deposit and print out summary information.
positional arguments:
zenodo_id An ID for an existing Zenodo deposit
options:
-h, --help show this help message and exit
The publish_deposit
subcommand
cl_prompt $ safedata_zenodo publish_deposit -h
usage: safedata_zenodo publish_deposit [-h] zenodo_json
Publishes a Zenodo deposit. This is the final step in publishing a dataset and
is not reversible. Once a dataset is published, the DOI associated with the
record is published to Datacite.
It may be worth reviewing the deposit webpage (https://zenodo.org/deposit/###)
before finally publishing.
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
options:
-h, --help show this help message and exit
The upload_file
subcommand
cl_prompt $ safedata_zenodo upload_file -h
usage: safedata_zenodo upload_file [-h] [--zenodo_filename ZENODO_FILENAME]
zenodo_json filepath
Uploads the contents of a specified file to an _unpublished_ Zenodo deposit,
optionally using an alternative filename. If you upload a new file to the same
filename, it will replace the existing uploaded file.
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
filepath The path to the file to be uploaded
options:
-h, --help show this help message and exit
--zenodo_filename ZENODO_FILENAME
An optional alternative file name to be used on Zenodo
The delete_file
subcommand
cl_prompt $ safedata_zenodo delete_file -h
usage: safedata_zenodo delete_file [-h] zenodo_json filename
Delete an uploaded file from an unpublished deposit. The deposit metadata will
be re-downloaded to ensure an up to date list of files in the deposit.
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
filename The name of the file to delete
options:
-h, --help show this help message and exit
The upload_metadata
subcommand
cl_prompt $ safedata_zenodo upload_metadata -h
usage: safedata_zenodo upload_metadata [-h] zenodo_json dataset_json
Uses the dataset metadata created using `safedata_validate` to populate the
required Zenodo metadata for an unpublished deposit.
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
dataset_json Path to a JSON metadata file for a dataset
options:
-h, --help show this help message and exit
The amend_metadata
subcommand
cl_prompt $ safedata_zenodo amend_metadata -h
usage: safedata_zenodo amend_metadata [-h] zenodo_json_update
Updates the Zenodo metadata for an published deposit. To use this, make sure
you have the most recent Zenodo metadata for the deposit and then edit the
JSON file to the new values. You can only edit the contents of the
`metadata` section.
Caution: this command should only be used to make urgent changes - such as
access restrictions. It is also easy to submit invalid metadata!
positional arguments:
zenodo_json_update
Path to an updated Zenodo metadata file
options:
-h, --help show this help message and exit
The sync_local_dir
subcommand
cl_prompt $ safedata_zenodo sync_local_dir -h
usage: safedata_zenodo sync_local_dir [-h] [--not-just-xlsx]
[--replace-modified]
datadir
Synchronize a local data directory
This subcommand allows a safedata developer or community maintainer to
create or update such a directory with _all_ of the resources in the Zenodo
community, regardless of their public access status. This forms a backup
(although Zenodo is heavily backed up) but also provides local copies of the
files for testing and development of the code packages.
The file structure of the directory follows that used by the safedata R
package, used to store metadata and files downloaded from a safedata
community on Zenodo and from a safedata metadata server. The
`safedata_validator` configuration file will need to include the metadata
API.
By default, only the XLSX files containing metadata and data tables are
downloaded, ignoring any additional files, which are often large.
positional arguments:
datadir The path to a local directory containing an existing safedata
directory or an empty folder in which to create one
options:
-h, --help show this help message and exit
--not-just-xlsx
Should large non-xlsx files also be downloaded.
--replace-modified
Should locally modified files be overwritten with the archive
version
The maintain_ris
subcommand
cl_prompt $ safedata_zenodo maintain_ris -h
usage: safedata_zenodo maintain_ris [-h] ris_file
This command maintains a RIS format bibliography file of the datasets
uploaded to a Zenodo community. It can update an existing RIS format file
to add new records or it can create the file from scratch.
The program uses both the Zenodo API (to find the records in the community)
and the Datacite API to access machine readable bibliographic records.
positional arguments:
ris_file The file path to populate with RIS records. If this file already
exists, it is assumed to be RIS file to update with any new
records not already included in the file.
options:
-h, --help show this help message and exit
The generate_html
subcommand
cl_prompt $ safedata_zenodo generate_html -h
usage: safedata_zenodo generate_html [-h] zenodo_json dataset_json html_out
Generates an html file containing a standard description of a dataset from the
JSON metadata. Usually this will be generated and uploaded as part of the
dataset publication process, but this subcommand can be used for local
checking of the resulting HTML.
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
dataset_json Path to a JSON metadata file for a dataset
html_out Output path for the HTML file
options:
-h, --help show this help message and exit
The generate_xml
subcommand
See also here.
cl_prompt $ safedata_zenodo generate_xml -h
usage: safedata_zenodo generate_xml [-h] [-l LINEAGE_STATEMENT]
zenodo_json dataset_json xml_out
Creates an INSPIRE compliant XML metadata file for a published dataset,
optionally including a user provided lineage statement (such as project
details).
positional arguments:
zenodo_json Path to a Zenodo JSON file for a deposit
dataset_json Path to a JSON metadata file for a dataset
xml_out Output path for the XML file
options:
-h, --help show this help message and exit
-l LINEAGE_STATEMENT, --lineage-statement LINEAGE_STATEMENT
Path to a text file containing a lineage statement