Using the `safedata` system

The sections below provide examples of using the safedata_validator package to administer datasets. Typical use will be from the command line using a Unix-like shell or Windows subsystem for Linux, but the examples also show how to use the programmatic API for safedata_validator from within Python. The examples assume that a user has provided a SAFE formatted dataset and an additional ZIP file containing additional files:

SAFE_dataset.xlsx
Supplementary_files.zip

Both of the examples below include a stage for creating and including a GEMINI compliant XML metadata file in a published dataset. We recommend this as good practice, but it is optional.

Validating and publishing as a new dataset

These examples show the typical workflow for publishing these data and accompanying metadata as a completely new dataset using safedata_validator.

bashpython

#!/bin/sh

# The code below assumes that the safedata_validator tools find a configuration
# file in one of the standard locations. If not, the path can be specified 
# explicitly using, the -r flag:
#   -r /path/to/safedata_validator_local_test.cfg 

# Validate the dataset. If successful this step will export a dataset metadata
# file called SAFE_dataset.json
safedata_validate SAFE_dataset.xlsx

# Publish the dataset to Zenodo
# 1) Create a new deposit, which will generate a deposit metadata file called
#    something like zenodo_1143714.json
safedata_zenodo create_deposit

# 2) Generate a GEMINI XML metadata file for the deposit
safedata_zenodo generate_xml zenodo_1143714.json SAFE_dataset.json 1143714_GEMINI.xml

# 3) Upload the dataset file, external files named in the dataset summary and the XML
#    metadata. This uses the zenodo metadata file to confirm the upload destination.
safedata_zenodo upload_file zenodo_1143714.json SAFE_dataset.xlsx
safedata_zenodo upload_file zenodo_1143714.json Supplementary_files.zip
safedata_zenodo upload_file zenodo_1143714.json 1143714_GEMINI.xml

# 4) Update the Zenodo deposit webpage - this populates the deposit description
#    on Zenodo from the dataset metadata
safedata_zenodo upload_metadata zenodo_1143714.json SAFE_dataset.json

# 5) Finally, publish the deposit to create the final record and DOI
safedata_zenodo publish_deposit zenodo_1143714.json

# 6) Publish the dataset and zenodo metadata to the metadata server
safedata_metadata post_metadata zenodo_1143714.json SAFE_dataset.json

"""Example script to publish a dataset using safedata_validator from within Python."""

import simplejson

from safedata_validator.field import Dataset
from safedata_validator.logger import LOGGER
from safedata_validator.resources import Resources
from safedata_validator.server import post_metadata
from safedata_validator.zenodo import (
    create_deposit,
    generate_inspire_xml,
    publish_deposit,
    upload_file,
    upload_metadata,
)

# Local paths to the configuration file and the dataset to be validated
config_path = "config.cfg"
dataset = "SAFE_dataset.xlsx"
extra_file = "Supplementary_files.zip"
xml_file = "SAFE_dataset_GEMINI.xml"

# Create a Resources object from the config file and then create a dataset instance
# using those validation resources
resources = Resources(config_path)
ds = Dataset(resources)

# Load the dataset from the Excel workbook, which validates the content
ds.load_from_workbook(dataset)

# Extract the validated dataset metadata
data_metadata = simplejson.loads(ds.to_json())

# Create the new deposit to publish the dataset
zenodo_metadata, error = create_deposit(resources=resources)

# Monitor the success of individual steps
all_good = error is None

# Generate XML
xml_content = generate_inspire_xml(
    dataset_metadata=data_metadata, zenodo_metadata=zenodo_metadata, resources=resources
)
with open(xml_file, "w") as xml_out:
    xml_out.write(xml_content)

# Post the files
for file in [dataset, extra_file, xml_file]:
    if all_good:
        file_upload_response, error = upload_file(
            metadata=zenodo_metadata, filepath=file, resources=resources
        )
        all_good = error is None

# Post the metadata
if all_good:
    md_upload_response, error = upload_metadata(
        metadata=data_metadata, zenodo=zenodo_metadata, resources=resources
    )
    all_good = error is None

# Publish the deposit
if all_good:
    publish_response, error = publish_deposit(
        zenodo=zenodo_metadata, resources=resources
    )
    all_good = error is None

# Show the new publication
publish_response["links"]["html"]

# Post the dataset metadata to the safedata server
if all_good:
    response, error = post_metadata(
        zenodo=zenodo_metadata, metadata=data_metadata, resources=resources
    )

Validating and publishing as an update

A new version of an existing dataset can be created by requesting a new Zenodo record using the Concept ID of an existing dataset. This is a Zenodo record ID that identifies a collection of versions of a dataset. In the previous example, the specific record ID for the dataset was 1143714 - the concept record ID might be 1143713.

The workflows below look almost identical, except for the initial step of creating the deposit using an existing concept ID.

bashpython

#!/bin/sh

# The code below assumes that the safedata_validator tools find a configuration
# file in one of the standard locations. If not, the path can be specified 
# explicitly using, the -r flag:
#   -r /path/to/safedata_validator_local_test.cfg 

# Validate the dataset. If successful this step will export a dataset metadata
# file called SAFE_dataset.json
safedata_validate SAFE_dataset.xlsx

# Publish the dataset to Zenodo
# 1) Create a new deposit as a new version of an existing record. Again this will
#    generate a deposit metadata file called something like zenodo_1156212.json
safedata_zenodo create_deposit -c 1143713

# 2) Generate a GEMINI XML metadata file for the deposit
safedata_zenodo generate_xml zenodo_1156212.json SAFE_dataset.json 1156212_GEMINI.xml

# 3) Upload the dataset file, external files named in the dataset summary and the XML
#    metadata. This uses the zenodo metadata file to confirm the upload destination.
safedata_zenodo upload_file zenodo_1156212.json SAFE_dataset.xlsx
safedata_zenodo upload_file zenodo_1156212.json Supplementary_files.zip
safedata_zenodo upload_file zenodo_1156212.json 1156212_GEMINI.xml

# 3) Update the Zenodo deposit webpage - this populates the deposit description
#    on Zenodo from the dataset metadata
safedata_zenodo upload_metadata zenodo_1156212.json SAFE_dataset.json

# 4) Finally, publish the deposit to create the final record and DOI
safedata_zenodo publish_deposit zenodo_1156212.json

# Publish the dataset and zenodo metadata to the metadata server
safedata_metadata post_metadata zenodo_1156212.json SAFE_dataset.json

"""Example script to publish a dataset using safedata_validator from within Python."""

import simplejson

from safedata_validator.field import Dataset
from safedata_validator.logger import LOGGER
from safedata_validator.resources import Resources
from safedata_validator.server import post_metadata
from safedata_validator.zenodo import (
    create_deposit,
    generate_inspire_xml,
    publish_deposit,
    upload_file,
    upload_metadata,
)

# Local paths to the configuration file and the dataset to be validated
config_path = "path/to/config.cfg"
dataset = "SAFE_dataset.xlsx"
extra_file = "Supplementary_files.zip"
xml_file = "SAFE_dataset_GEMINI.xml"

# Create a Resources object from the config file and then create a dataset
# instance using those validation resources
resources = Resources(config_path)
ds = Dataset(resources)

# Load the dataset from the Excel workbook, which validates the content
ds.load_from_workbook(dataset)

# Extract the validated dataset metadata
data_metadata = simplejson.loads(ds.to_json())

# Create a new version of an existing dataset, using the concept ID.
zenodo_metadata, error = create_deposit(
    concept_id=1143713,
    resources=resources,
)

# Monitor the success of individual steps
all_good = error is None

# Generate XML
xml_content = generate_inspire_xml(
    dataset_metadata=data_metadata, zenodo_metadata=zenodo_metadata, resources=resources
)
with open(xml_file, "w") as xml_out:
    xml_out.write(xml_content)


# Post the files
for file in [dataset, extra_file]:
    if all_good:
        file_upload_response, error = upload_file(
            metadata=zenodo_metadata, filepath=file, resources=resources
        )
        all_good = error is None

# Post the metadata
if all_good:
    md_upload_response, error = upload_metadata(
        metadata=data_metadata, zenodo=zenodo_metadata, resources=resources
    )
    all_good = error is None

# Publish the deposit
if all_good:
    publish_response, error = publish_deposit(
        zenodo=zenodo_metadata, resources=resources
    )
    all_good = error is None

# Post the dataset metadata to the safedata server
if all_good:
    response, error = post_metadata(
        zenodo=zenodo_metadata, metadata=data_metadata, resources=resources
    )

Using the safedata system

Validating and publishing as a new dataset

Validating and publishing as an update

Using the `safedata` system