'versioning' a PartitionedDataset


Every week I produce a new export of some forecasting data. Given that the export is rather large, I do not want to save out one large .csv file but rather many files split up by customer_id.

So in the end I could have something like this:

├── 2021-03-10
│   ├── id_1.csv
│   └── id_2.csv
├── 2021-03-11
│   ├── id_1.csv
│   ├── id_2.csv
│   └── id_3.csv
├── 2021-03-12
│   ├── id_5.csv
│   ├── id_6.csv
│   └── id_11.csv


Now, If I have understood the code and the documentation correctly this is not something that the PartitionedDataSet supports. Is this correct?

Has anyone else encountered this scenario and if so what was your solution? At the moment I have a rather hacky solution that patches paths at runtime. I guess the better option is to implement a custom PartitionedDataSet where at each run all the partitions get stored under a timestamped subfolder.

The minimum viable implementation is possibly fairly trivial, but I wanted to hear other opinions to see if there are better approaches for this.

Thanks! :slight_smile: