Writing a new profile

This page is about writing a SHACL validation profile for a new or existing RO-Crate profile. It does not offer guidance on creating the RO-Crate profile itself - for that, see the RO-Crate page on Profiles.

Learning SHACL

The validator profiles are written in SHACL (Shapes Constraint Language), a language for validating RDF graphs against a set of conditions. To use SHACL effectively, you also need some familiarity with RDF (Resource Description Framework), the technology which underpins JSON-LD and therefore RO-Crate.

For an RDF introduction, try the RDF 1.1 Primer or Introduction to the Principles of Linked Open Data.

This chapter on SHACL from the book Validating RDF Data has examples of most of SHACL’s features and is a good place to start learning. Other chapters in that book may provide an understanding of why SHACL is our language of choice for this purpose.

For complex validation, you may also need some knowledge of SPARQL, an RDF query language. You can learn about SPARQL in the tutorial Using SPARQL to access Linked Open Data.

All these tools are best learned through practice and examples, so when building a profile, it’s encouraged to use the other profiles as a point of reference.

Setting up profile files and tests

These instructions assume you are familiar with code development using Python and Git.

  1. Install the repository from source.

  2. From the root folder of the repo, create a folder for the profile under rocrate_validator/profiles.

  3. To set up the profile metadata, copy across profile.ttl from another profile folder to the folder you created (example) & update that metadata to reflect your profile. In particular:

    1. change the token for the profile to a new and unique name, e.g. prof:hasToken "workflow-ro-crate-linkml". This is the name which can be used to select the profile using --profile-identifier argument (and should also be the name of the folder).

    2. Ensure the URI of the profile is unique (the first line after the @prefix statements), to prevent conflation between this profile and any other profile in the package.

    3. If this profile inherits from another profile in the validator (including the base specification), set prof:isProfileOf / prof:isTransitiveProfileOf to that profile’s URI (which can be found in that profile’s own profile.ttl).

  4. Create a profile-name.ttl file in the folder you created - this is where you will write the SHACL for the validation. If you have a lot of checks to write, you can create multiple files - the validator will collect them all automatically at runtime.

    • Note: some profiles split the checks into folders called must/, should/ and may/ according to the requirement severity. This is not mandatory - you can also label individual checks/shapes with sh:severity in the SHACL code instead.

  5. From the root folder of the repo, create a test folder for the profile under tests/integration/profiles. The name should match the folder you made earlier.

  6. Copy the style of other profiles’ tests to build up a test suite for the profile. Add any required RO-Crate test data under tests/data/crates/ and create corresponding classes in tests/ro_crates.py which can be used to fetch the data during the tests.

  7. When your profile & tests are written, open a pull request to contribute it back to the repository!

Running validator & tests during profile development

To run the test suite, run pytest. New tests should be picked up automatically for the new profile.

When running the validator manually, use --profile-identifier to select the desired profile.

The crates in tests/data/crates` can be used as examples for running the validator. For example:

rocrate-validator validate --profile-identifier your-profile-name tests/data/crates/invalid/1_wroc_crate/no_mainentity/