Simplify api heirarchy

I created an issue advocating for a simpler top level api. Please give it a :-1::+1: and add some discussion with your thoughts here or in the issue. I feel this would be a big change to simplify things for users like us.

3 Likes

I agree that starting in Kedro is something that takes some time, and maybe this is an indicator that a simplified API migth be of interest.

I don’t know if this belongs in this issue or not, but for me the more obscure things in Kedro are:

  • The context and catalog. Specially, the catalog, that cannot be edited for the whole session (because load_context generates a new catalog every time is called). If you want to dynamically modify the catalog using some Hook, this can be ugly. I propose that the Session could keep a fixed instance of catalog insted of creating new ones.
  • The fact that, in a pipeline, all inputs and outputs are strings (keys in the catalog) is somewhat confusing at first. What if you want to pass directly an argument to a given node? Like, for example, imagine a node that has an int among its arguments. Can you pass directly a particular integer that is neither a parameter nor a previous output when you define the pipeline?
  • I don’t understand very well namespaces. Have you used them?

I should edit what I said. I think I did not understand the register_catalog Hook, that does precisely what I intended… At least, from the code it seems clear that if you want to alter the catalog once and for all, this is the Hook to use.

I was mistaken due to the name of the Hook. Register_catalog sounds like something you do only when the session starts, not everytime that an instance of the catalog is required. This is somewhat weird for me…

I think the better hook for this is after_catalog_created. See the documentation for more details. It gives you the catalog after Kedro creates it and lets you modify it/add things to it/etc and let Kedro pass it on during the rest of the execution timeline.

https://kedro.readthedocs.io/en/stable/kedro.framework.hooks.specs.DataCatalogSpecs.html#kedro.framework.hooks.specs.DataCatalogSpecs.after_catalog_created

2 Likes

re: literal values in pipeline inputs/outputs, you could use partial to do this, e.g as described in this SO answer. Alternatively, you could use params and define the literal values in your parameters (if that makes sense).

The Kedro team has looked into different alternatives, see this issue for more detail!

1 Like

Thanks, Zain. I’ll try again. I played with this Hook and my feeling was that, after modifying the catalog, the pipeline will run with a non modified version of the catalog. But maybe I was mistaken…

pipeline will run with a non modified version of the catalog

That’s not my understanding or experience!