How can I iterate over subsets of data from a database and pass them into nodes?

In order to get the best help, it is suggested to answer the following questions:

What is the goal you are trying to achieve?
I have a dataset ~30m rows in a MySQL table and I would like to call subsets of the data based on a key then iterate the data through a node and consolidate the results from each subset at the end.

What have you tried, in order to accomplish the goal?
I have made an attempt that seems to work using a generator object for the database results then passing the subsets into the node and appending the results to and existing table.(feels very filthy).

What version of Kedro are you using? (Use kedro -V)

Do you have any custom plugins?

Perhaps I am overthinking this and there is a simpler way to do this…


If I understand you correctly, you still want to process all ~30M rows but in parallel based on some partitioning keys?

1 Like

Yes.That is correct.

What processing engine do you currently use? e.g. Do you have spark or is it plain python / pandas?

its in pandas