I have been periodically running kedro pipelines that I started with the pyspark-iris starter. Everything has been going fine until recently when all of a sudden I am getting the spark error:
java.lang.IllegalArgumentException: Too large frame: 5785721462337832960 originating from py4j. This happens even when I just use the default pipeline with iris.csv.
When I create a new kedro project (0.17.3) and try to run the example pipeline with iris.csv everything works fine, so I don’t think this has to do with my environment. Based on my research some people get this error when their pyspark version doesn’t match the version of their spark cluster, but I am running this all locally for now.
I thought I might genuinely have a spark partition that is too large, but how could I even tell which catalog item is causing this? I tried temporarily swapping out my catalog for en empty one that only includes iris.csv and I still get the error.
kedro, version 0.17.3
Custom plugins? No
Thank you for your help on this!!