How to deploy Kedro pipeline with other python applications or How can we pass python object instead of string into Kedro pipeline?

I have a use case that my pipeline did a few steps.

  1. Get data (For development use)
  2. Train a Model
  3. Save the output

Let say if I package up this pipeline and want to pass it to another team. I would like to pass in an actual python object to the pipeline instead of relying on catalog only (Since it would mean they must first write the data to some kind of file)


data = release_team_get_data()
context.run(pipeline="my_publish_pipeline", input=data)
# I want to pass in data that is already loaded in memory

AFIK, kedro inputs and outputs can only be catalog entries. You might be able to pass an object in by storing creating a dataset for it, then running save on that dataset. Another less appealing option might be storing data in globals and pulling from globals in your function, this would be a last ditch effort.

data = release_team_get_data()
context.catalog.datasets.your_pickle_data_set.save(data)
context.run(pipeline="my_publish_pipeline", input=data)
# Now your pipeline can load 'your_pickle_data_set'

Thanks @waylonwalker. This is a possible solution. However, I find that it should be made much simpler, a pipeline should be able to run with other python code instead of running the pipeline itself only.

I would not expect other team need to understand how kedro works and manipulating the catalog themselves, thoughts?

Did you have experience that u need to pass the pipeline to other teams so they can work on it?

I don’t have experience with that. I originally packaged up all of out projects as a library. I could import as many projects as I wanted, combine them run them separately run them together, and so on. I feel that kedro has taken a different direction that makes it hard to treat a kedro project as a library, its more of a stand alone application that is difficult to run in the same process as another. Things such as how the global hooks being defined as singletons come to mind. I have yet to explore it, but the session, settings, and now importing my pipeline from kedro all feel less and less like I can treat kedro projects as a stand alone library. Maybe I’m has a better answer than I. It might be worth re igniting the conversation on github or discord.

Thanks for sharing your experience, it’s a good time to try out GitHub Discussion~