Every week I produce a new export of some forecasting data. Given that the export is rather large, I do not want to save out one large
.csv file but rather many files split up by
So in the end I could have something like this:
├── 2021-03-10 │ ├── id_1.csv │ └── id_2.csv ├── 2021-03-11 │ ├── id_1.csv │ ├── id_2.csv │ └── id_3.csv ├── 2021-03-12 │ ├── id_5.csv │ ├── id_6.csv │ └── id_11.csv ...
Now, If I have understood the code and the documentation correctly this is not something that the
PartitionedDataSet supports. Is this correct?
Has anyone else encountered this scenario and if so what was your solution? At the moment I have a rather hacky solution that patches paths at runtime. I guess the better option is to implement a custom
PartitionedDataSet where at each run all the partitions get stored under a timestamped subfolder.
The minimum viable implementation is possibly fairly trivial, but I wanted to hear other opinions to see if there are better approaches for this.