Have you ever thought about creating a kind of registry of the runs of a particular pipeline? Maybe this is more or less the usage that experiment tracking tools like MLFlow have, but I think that having a simple registry (even with txt files) can be good enough.
I was thinking in using the Hook after-pipeline-run to autogenerate a log file inside of the project folder with the following info:
- Name of pipeline.
- Nodes run.
- Inputs used (this is more important if you have versioned datasets).
- Outputs generated.
- Parameters used in the run.
- Git commit reference of the code.
Having this kind of structured info, it won’t be very difficult to have a jupyter notebook that allows you to inspect the runs, search for runs with particular parameters, and things like that… Would you find it interesting, or do you think that is better to use a took like MLFlow for example?