CADET-RDM is a Research Data Management toolbox developed at Forschungszentrum Jülich. It supports computational research projects by tracking code, data, environments, and generated results in a reproducible and shareable way.
The toolbox is domain-agnostic and can be applied to any computational project with a structured workflow.
CADET-RDM helps manage and version
- input data
- source code
- configurations and metadata
- software and environment versions
- generated output data
The primary goal is to ensure reproducibility, traceability, and reuse of computational results by explicitly linking them to the project state that produced them.
A CADET-RDM project consists of two independent but coupled Git repositories:
-
Project repository Contains source code, configuration files, documentation, and metadata required to execute the computations.
-
Output repository Contains the results generated by running the project code, including data products, models, figures, and run-specific metadata.
Both repositories have separate Git histories and remotes. CADET-RDM provides workflows that operate on both repositories to maintain a consistent link between code and results.
Each execution of project code creates a new output branch that contains only the files generated by that run.
In addition, a central run history records
- the project repository commit used for the run
- software and environment information
- metadata required to reproduce the result
This commit structure allows results to be reproduced and inspected without manual bookkeeping.
CADET-RDM can be used through
- a command line interface (CLI), e.g. for scripted or automated bash workflows
- a Python interface, e.g. for direct context tracking of code within existing Python workflows
Additionally, CADET-RDM can be used within Jupyter Lab with some limitations.
Detailed descriptions of commands and APIs are provided in the dedicated interface documentation.
- Initialize or clone a CADET-RDM project
- Develop and commit project code
- Execute computations with CADET-RDM result tracking
- Generate versioned output branches automatically
- Push project and output repositories to their remotes
- Reuse or reference results via their output branches
Results are referenced by unique output branch names that encode the timestamp, active project branch, and project commit hash. CADET-RDM provides a local cache mechanism that allows results from previous runs or from other CADET-RDM projects to be reused as input data while preserving provenance information.
The full documentation is available at https://cadet-rdm.readthedocs.io
It includes installation instructions, usage guides for the different interfaces, and detailed descriptions of repository and result management workflows.