Run Kedro pipeline n-times

Hi Kedro community :slight_smile:

I humbly ask for assistance with my Kedro problem.

I have written a Kedro pipeline which is used to mine and transform data from 2D airfoil simulations. The problem is that if I run the pipeline, data for only one operating condition is mined. I would therefore like to run the pipeline n-times to mine data for n-number of operating conditions (the operating conditions change dynamically each time the Kedro pipeline is run).

I have tried to run the pipeline in a loop and to use batch to run the pipeline n-times but it feels a bit hacky and I cannot help but feel that there is a more elegant solution out there.

Thanks in advance for any help.

Hi @vincent ,

Hope this is not too late. Are you using Kedro >=0.17.0? I am assuming yes. From my knowledge, Kedro doesnt have something that meets your specific use case but you can easily leverage KedroSession (for Kedro>=0.17.0) or KedroContext for your purposes.

What you can do is set up a .py file in your Kedro project folder that looks something like this:
For Kedro >=0.17.0

from kedro.framework.startup import _get_project_metadata, _add_src_to_path
from pathlib import Path
from kedro.framework.session import KedroSession

project_path = Path.cwd()
metadata = _get_project_metadata(project_path)
_add_src_to_path(metadata.source_dir, project_path)

operating_condition= 1234
for i in range(0, n_rounds):
     session = KedroSession.create(metadata.package_name, project_path, extra_params={"operating_condition": operating_condition})
     output = session.run()
    # If data is MemoryDataSet, you can load it like this:
    operating_condition= output["operating_condition"]
   # If data is saved to disk, you can load it like this:
    catalog = session.load_context()
   operating_condition_on_disk = catalog.load("operating_condition_on_disk ")

For Kedro <=0.16.6

from kedro.framework.context import load_context

operating_condition= 1234
for i in range(0, n_rounds):
     context = load_context("./",  extra_params={"operating_condition": operating_condition})
     output = context.run()
    # If data is MemoryDataSet, you can load it like this:
    operating_condition= output["operating_condition"]
   # If data is saved to disk, you can load it like this:
   catalog = context.catalog
   operating_condition_on_disk = catalog.load("operating_condition_on_disk ")

And then you can run your code by simply using โ€œpython my_code.pyโ€

1 Like

Hi @ljam,

Thanks so much for the help. Definitely not too late and was exactly what I needed.