How can I iterate over subsets of data from a database and pass them into nodes?

I have a dataset ~30m rows in a MySQL table and I would like to call subsets of the data based on a key then iterate the data through a node and consolidate the results from each subset at the end.

I have made an attempt that seems to work using a generator object for the database results then passing the subsets into the node and appending the results to and existing table.(feels very filthy).

Perhaps I am overthinking this and there is a simpler way to do this…


If I understand you correctly, you still want to process all ~30M rows but in parallel based on some partitioning keys?

Yes.That is correct.

What processing engine do you currently use? e.g. Do you have spark or is it plain python / pandas?

its in pandas