How exactly does kedro manages datasets and parameters?

I’m sorry if this is too basic a questiom. I have read all docs and I have not yet a clear picture of what is going on. I come from Luigi, and in Luigi:

  • Each task (the analogous object as kedro node) depends on its parameters, so there is a task_id that is generated through a hash of all parameters of the task.
  • The persist objects can use that task_id to set its path, in such a way that for each combination of parameters, the csvs, pickled… whatever persisted objects will have a different path.
  • This allows that any time the task run, it checks if for this particular combination of parameters, the output exists. If this is the case, then the task does not run at all. It reads the dataset amd it is passed along in the pipeline.

I have not been able to understand if this basic funcionality is achievable in kedro or not. Thanks in advance!

I thinik this might be the answer to this question, at least partially:

https://kedro.readthedocs.io/en/stable/06_nodes_and_pipelines/03_modular_pipelines.html?highlight=namespace#how-to-use-a-modular-pipeline-with-different-parameters

But this means, if I understand it well, that by default parameters don’t affect outputs of nodes. So if you don’t manually replace the namespace, different runs of the same pipeline with different parameters will overwrite the outputs. Am I right?

If I summarise this right, you want to only run the parts of your pipeline that changed?

Hi again! I think it’s better if we continue this discussion in the other topic I asked:

Because in fact both topics are related, I just explained it better in the second one. So I’ll answer there.