Common Provenance Model RO-Crate profile
- Title: Common Provenance Model RO-Crate profile
- Authors: Rudolf Wittner, Stian Soiland-Reyes, Simone Leo
- Date: 2023-01-11
- Version: 0.2
- Persistent identifier: https://w3id.org/cpm/ro-crate/0.2
Research objects, such as data, experimental results, computational models, or biological samples, are exchanged between organizations, so each of the organizations can provide provenance information only about a part of the research object’s life cycle. As a result, a complete provenance description of the object is then spread across different heterogeneous organizations.
The Common Provenance Model (CPM) provides a baseline for such distributed provenance chains. It defines how to interconnect distributed provenance parts encapsulated in PROV bundles, how to express standardized derivation paths between inputs and outputs of a process in a single bundle (so called provenance backbone), and how to attach domain specific information to the chain in a harmonized way.
This document specifies how to identify and handle CPM compliant provenance files and CPM compliant meta-provenance files in an RO-Crate.
General Requirements
- Each CPM compliant provenance bundle MUST be serialized into a standalone file.
- The RO-Crate MAY contain multiple CPM compliant bundles/files.
- The RO-Crate MUST include references to all CPM compliant provenance bundles/files present in the crate (arbitrary provenance files or log files do not need to be mentioned).
- Rationale: Each CPM provenance bundle is part of a distributed provenance chain. As a consequence, any such bundle can be referenced from other parts of the chain, which can be stored externally (outside the crate). For that reason, the RO-Crate must provide means to identify and locate any of the CPM compliant provenance bundles present in the crate.
- The RO-Crate MAY include a meta provenance file. Multiple meta-provenance bundles MAY be present in the meta provenance file.
- Rationale: This is to keep meta provenance handling simple. If multiple meta provenance files would be allowed, then we would have to set requirements on how meta provenance can be split across the files, which might introduce unnecessary complexity.
- The RO Crate MUST include a reference to the meta provenance file, if present.
Type/Property | Required? | Description |
---|---|---|
CPMProvenanceFile extends MediaObject (@id is resolvable), dataEntity |
||
@type | MUST |
Type that identifies the CPM provenance file.
Array MUST include "File". Array MUST include "CPMProvenanceFile". |
@id | MUST |
Identifier of the CPM provenance file.
SHOULD be a relative URI to a data entity in the crate (e.g.
|
identifier | SHOULD |
Identifier of a provenance bundle present in the CPM provenance file.
MUST be an absolute URI. MUST match the expanded bundle identifier. MAY be equal to @id if absolute. Note: PROV formats that support identified bundles SHOULD ensure their internally defined identifier also matches this identifier. |
dateModified | SHOULD | The time this CPM provenance file was last modified/written (not necessarily when the bundle included was finalized or the file was added to the RO-Crate). |
encodingFormat | MUST |
Encoding of the CPM provenance file.
Array MUST contain a string indicating the IANA media type of the file, e.g.
Array MUST also contain a reference to a CreativeWork that indicates the PROV
format used in the serialization, which
|
about | SHOULD |
Array contains entity identifiers, which are documented by the CPM provenance
file.
SHOULD contain at least one identifier. |
CPMMetaProvenanceFile extends CPMProvenanceFile |
||
@type | MUST |
Type that identifies the CPM meta provenance file.
Array MUST include |
@id | MUST |
Identifier of the CPM meta provenance file.
SHOULD be an absolute URI, but MAY be a relative URI to a data entity in the
crate (e.g. |
dateModified | SHOULD | The time this CPM meta provenance file was last modified/written (not necessarily when the bundle included was finalized or the file was added to the RO-Crate). |
encodingFormat | MUST |
Encoding of the CPM meta provenance file.
Array MUST contain a string indicating the IANA media type of the file, e.g.
Array MUST also contain a reference to a CreativeWork that indicates the PROV
format used in the serialization, which
|
hasPart | MUST |
Identifiers of meta provenance bundles present in the CPM meta provenance file.
Array MUST contain absolute URIs. URIs MUST match the expanded bundle identifiers as used internally in the CPM provenance files. |
Example
Example of the CPM RO-Crate profile usage is available in Zenodo repository: https://doi.org/10.5281/zenodo.7676923
Notes
-
Some PROV formats like PROV-N and PROV-O in JSON-LD support multiple bundles in the same document. This feature can be used if there is no need for different access control on different bundles.
-
PROV formats that support identified bundles SHOULD ensure their internally defined identifier also match this identifier.