I am pretty new to the Kedro world. But I am starting off with a project where I have to run a Pipeline individually for > 100 samples, merge the results, and then go to another pipeline with the merged data.
However, I found several ways to do so (Partitioned datasets, catalog.yml creation through
before_pipeline_run-hook, and more). Some methods have the downside that they create stuff dynamically during runtine, which is not so nice to maintain and also
kedro viz or
ipython sessions will not work as intended.
I ended up with a nice solution that uses
namespaces, which means that I create n pipelines in the
pipeline_registry.py each with the sample name as
namespace. See HERE. This works as expected with one Problem: The “namespaced” files have to be registered in the data catalog. Right now, they are just handled as
Is there a way to also set this dynamically, so that the dataset name and the filepath is modified?
From the documentation I would have guessed that this works as it states:
The namespace ensures that outputs are not overwritten, so intermediate and final outputs are prefixed, i.e. **beta.intermediary_output**, **beta.output**.
I found a solution with
jinja2 in the catalog but in this case I have to define the samples at two places.
So the core question is: Can the namespace be applied to the catalog dynamically.