An increasing number of researchers support reproducibility by including pointers to and descriptions of datasets, software and methods in their publications. However, scientific articles may be ambiguous, incomplete and difficult to process by automated systems. In this paper we introduce RO-Crate, an open, community-driven, and lightweight approach to packaging research artefacts along with their metadata in a machine readable manner. RO-Crate is based on annotations in JSON-LD, aiming to establish best practices to formally describe metadata in an accessible and practical way for their use in a wide variety of situations.

An RO-Crate is a structured archive of all the items that contributed to a research outcome, including their identifiers, provenance, relations and annotations. As a general purpose packaging approach for data and their metadata, RO-Crate is used across multiple areas, including bioinformatics, digital humanities and regulatory sciences. By applying “just enough” Linked Data standards, RO-Crate simplifies the process of making research outputs FAIR while also enhancing research reproducibility.

Stian Soiland-Reyes, Peter Sefton, Mercè Crosas, Leyla Jael Castro, Frederik Coppens, José M. Fernández, Daniel Garijo, Björn Grüning, Marco La Rosa, Simone Leo, Eoghan Ó Carragáin, Marc Portier, Ana Trisovic, RO-Crate Community, Paul Groth, Carole Goble (2022):
Packaging research artefacts with RO-Crate.
Data Science 5(2)

RO Metadata File Structured metadata about the RO and content. Archive file format. Linked Data approach. RO-Content with links to web resources and directory of data.

Conceptual overview of RO-Crate. A Persistent Identifier (PID) points to a Research Object (RO), which may be archived using different packaging approaches like BagIt, OCFL, git or ZIP. The RO is described within a RO-Crate Metadata File, providing identifiers for authors using ORCID, organisations using Research Organization Registry (ROR) and licences such as Creative Commons using SPDX identifiers. The RO-Crate content is further described with additional metadata following a Linked Data approach. Data can be embedded files and directories, as well as links to external Web resources, PIDs and nested RO-Crates.

About the author

Stian works in School of Computer Science, at the University of Manchester in Carole Goble‘s eScience Lab as a technical software architect and researcher. In addition to BioExcel, Stian’s involvements include Open PHACTS (pharmacological data warehouse), Common Workflow Language (CWL), Apache Taverna (scientific workflow system), Linked Data and identifiers, research objects (open science) and digital preservation, myExperiment (sharing scientific workflows), provenance (where did things come from and who did it) and annotations (who said what).