Registry of runs?

Have you ever thought about creating a kind of registry of the runs of a particular pipeline? Maybe this is more or less the usage that experiment tracking tools like MLFlow have, but I think that having a simple registry (even with txt files) can be good enough.

I was thinking in using the Hook after-pipeline-run to autogenerate a log file inside of the project folder with the following info:

  • Timestamp.
  • Name of pipeline.
  • Nodes run.
  • Inputs used (this is more important if you have versioned datasets).
  • Outputs generated.
  • Parameters used in the run.
  • Git commit reference of the code.

Having this kind of structured info, it won’t be very difficult to have a jupyter notebook that allows you to inspect the runs, search for runs with particular parameters, and things like that… Would you find it interesting, or do you think that is better to use a took like MLFlow for example?

If I am not mistaken a subset of these thing is covered by the Kedro Journal and Kedro Logging.

For log analytics I am currently experimenting a bit with Amazon Kinesis and streaming the logs to AWS.

1 Like

Oh, yes, this is 99% what I was looking for. It might be sufficient. Thank you very much. Kedro is awesome :smiley:

1 Like

@Jaime_Arboleda_Casti it really is, I have learned a lot by just reading through the code and the awesome docs. :heart_eyes:

2 Likes