Using the safedata
system
The sections below provide examples of using the safedata_validator
package to
administer datasets. Typical use will be from the command line using a Unix-like shell
or Windows subsystem for Linux,
but the examples also show how to use the programmatic API for safedata_validator
from
within Python. The examples assume that a user has provided a SAFE formatted dataset and
an additional ZIP file containing additional files:
SAFE_dataset.xlsx
Supplementary_files.zip
Both of the examples below include a stage for creating and including a GEMINI compliant XML metadata file in a published dataset. We recommend this as good practice, but it is optional.
Validating and publishing as a new dataset
These examples show the typical workflow for publishing these data and accompanying
metadata as a completely new dataset using safedata_validator
.
#!/bin/sh
# The code below assumes that the safedata_validator tools find a configuration
# file in one of the standard locations. If not, the path can be specified
# explicitly using, the -r flag:
# -r /path/to/safedata_validator_local_test.cfg
# Validate the dataset. If successful this step will export a dataset metadata
# file called SAFE_dataset.json
safedata_validate SAFE_dataset.xlsx
# Publish the dataset to Zenodo
# 1) Create a new deposit, which will generate a deposit metadata file called
# something like zenodo_1143714.json
safedata_zenodo create_deposit
# 2) Generate a GEMINI XML metadata file for the deposit
safedata_zenodo generate_xml zenodo_1143714.json SAFE_dataset.json 1143714_GEMINI.xml
# 3) Upload the dataset file, external files named in the dataset summary and the XML
# metadata. This uses the zenodo metadata file to confirm the upload destination.
safedata_zenodo upload_file zenodo_1143714.json SAFE_dataset.xlsx
safedata_zenodo upload_file zenodo_1143714.json Supplementary_files.zip
safedata_zenodo upload_file zenodo_1143714.json 1143714_GEMINI.xml
# 4) Update the Zenodo deposit webpage - this populates the deposit description
# on Zenodo from the dataset metadata
safedata_zenodo upload_metadata zenodo_1143714.json SAFE_dataset.json
# 5) Finally, publish the deposit to create the final record and DOI
safedata_zenodo publish_deposit zenodo_1143714.json
# 6) Publish the dataset and zenodo metadata to the metadata server
safedata_metadata post_metadata zenodo_1143714.json SAFE_dataset.json
"""Example script to publish a dataset using safedata_validator from within Python."""
import simplejson
from safedata_validator.field import Dataset
from safedata_validator.logger import LOGGER
from safedata_validator.resources import Resources
from safedata_validator.server import post_metadata
from safedata_validator.zenodo import (
create_deposit,
generate_inspire_xml,
publish_deposit,
upload_file,
upload_metadata,
)
# Local paths to the configuration file and the dataset to be validated
config_path = "config.cfg"
dataset = "SAFE_dataset.xlsx"
extra_file = "Supplementary_files.zip"
xml_file = "SAFE_dataset_GEMINI.xml"
# Create a Resources object from the config file and then create a dataset instance
# using those validation resources
resources = Resources(config_path)
ds = Dataset(resources)
# Load the dataset from the Excel workbook, which validates the content
ds.load_from_workbook(dataset)
# Extract the validated dataset metadata
data_metadata = simplejson.loads(ds.to_json())
# Create the new deposit to publish the dataset
zenodo_metadata, error = create_deposit(resources=resources)
# Monitor the success of individual steps
all_good = error is None
# Generate XML
xml_content = generate_inspire_xml(
dataset_metadata=data_metadata, zenodo_metadata=zenodo_metadata, resources=resources
)
with open(xml_file, "w") as xml_out:
xml_out.write(xml_content)
# Post the files
for file in [dataset, extra_file, xml_file]:
if all_good:
file_upload_response, error = upload_file(
metadata=zenodo_metadata, filepath=file, resources=resources
)
all_good = error is None
# Post the metadata
if all_good:
md_upload_response, error = upload_metadata(
metadata=data_metadata, zenodo=zenodo_metadata, resources=resources
)
all_good = error is None
# Publish the deposit
if all_good:
publish_response, error = publish_deposit(
zenodo=zenodo_metadata, resources=resources
)
all_good = error is None
# Show the new publication
publish_response["links"]["html"]
# Post the dataset metadata to the safedata server
if all_good:
response, error = post_metadata(
zenodo=zenodo_metadata, metadata=data_metadata, resources=resources
)
Validating and publishing as an update
A new version of an existing dataset can be created by requesting a new Zenodo record
using the Concept ID of an existing dataset. This is a Zenodo record ID that
identifies a collection of versions of a dataset. In the previous example, the specific
record ID for the dataset was 1143714
- the concept record ID might be 1143713
.
The workflows below look almost identical, except for the initial step of creating the deposit using an existing concept ID.
#!/bin/sh
# The code below assumes that the safedata_validator tools find a configuration
# file in one of the standard locations. If not, the path can be specified
# explicitly using, the -r flag:
# -r /path/to/safedata_validator_local_test.cfg
# Validate the dataset. If successful this step will export a dataset metadata
# file called SAFE_dataset.json
safedata_validate SAFE_dataset.xlsx
# Publish the dataset to Zenodo
# 1) Create a new deposit as a new version of an existing record. Again this will
# generate a deposit metadata file called something like zenodo_1156212.json
safedata_zenodo create_deposit -c 1143713
# 2) Generate a GEMINI XML metadata file for the deposit
safedata_zenodo generate_xml zenodo_1156212.json SAFE_dataset.json 1156212_GEMINI.xml
# 3) Upload the dataset file, external files named in the dataset summary and the XML
# metadata. This uses the zenodo metadata file to confirm the upload destination.
safedata_zenodo upload_file zenodo_1156212.json SAFE_dataset.xlsx
safedata_zenodo upload_file zenodo_1156212.json Supplementary_files.zip
safedata_zenodo upload_file zenodo_1156212.json 1156212_GEMINI.xml
# 3) Update the Zenodo deposit webpage - this populates the deposit description
# on Zenodo from the dataset metadata
safedata_zenodo upload_metadata zenodo_1156212.json SAFE_dataset.json
# 4) Finally, publish the deposit to create the final record and DOI
safedata_zenodo publish_deposit zenodo_1156212.json
# Publish the dataset and zenodo metadata to the metadata server
safedata_metadata post_metadata zenodo_1156212.json SAFE_dataset.json
"""Example script to publish a dataset using safedata_validator from within Python."""
import simplejson
from safedata_validator.field import Dataset
from safedata_validator.logger import LOGGER
from safedata_validator.resources import Resources
from safedata_validator.server import post_metadata
from safedata_validator.zenodo import (
create_deposit,
generate_inspire_xml,
publish_deposit,
upload_file,
upload_metadata,
)
# Local paths to the configuration file and the dataset to be validated
config_path = "path/to/config.cfg"
dataset = "SAFE_dataset.xlsx"
extra_file = "Supplementary_files.zip"
xml_file = "SAFE_dataset_GEMINI.xml"
# Create a Resources object from the config file and then create a dataset
# instance using those validation resources
resources = Resources(config_path)
ds = Dataset(resources)
# Load the dataset from the Excel workbook, which validates the content
ds.load_from_workbook(dataset)
# Extract the validated dataset metadata
data_metadata = simplejson.loads(ds.to_json())
# Create a new version of an existing dataset, using the concept ID.
zenodo_metadata, error = create_deposit(
concept_id=1143713,
resources=resources,
)
# Monitor the success of individual steps
all_good = error is None
# Generate XML
xml_content = generate_inspire_xml(
dataset_metadata=data_metadata, zenodo_metadata=zenodo_metadata, resources=resources
)
with open(xml_file, "w") as xml_out:
xml_out.write(xml_content)
# Post the files
for file in [dataset, extra_file]:
if all_good:
file_upload_response, error = upload_file(
metadata=zenodo_metadata, filepath=file, resources=resources
)
all_good = error is None
# Post the metadata
if all_good:
md_upload_response, error = upload_metadata(
metadata=data_metadata, zenodo=zenodo_metadata, resources=resources
)
all_good = error is None
# Publish the deposit
if all_good:
publish_response, error = publish_deposit(
zenodo=zenodo_metadata, resources=resources
)
all_good = error is None
# Post the dataset metadata to the safedata server
if all_good:
response, error = post_metadata(
zenodo=zenodo_metadata, metadata=data_metadata, resources=resources
)