pandas.CSVDataSet

Hello, everyone.

I’m having an issue while trying to convert the type of an entire DataFrame column during dataset loading of pandas.CSVDataSet type. I’m trying to use the load_args converter option but in none of the setups I could get it.

For instance, by configuring the catalog as

alias:
type: pandas.CSVDataSet
filepath: data/01_raw/alias.csv
load_args:
encoding: ‘utf-8’
converters:
volume: str

I would like to convert the volume column (which is a int64) to a string. Nevertheless I’m getting the following error

DataSetError: Failed while loading data from data set CSVDataSet(filepath=/home/jupyter/data/01_raw/alias.csv, load_args={‘converters’: {‘volume’: str}, ‘encoding’: utf-8}, protocol=file, save_args={‘index’: False}).
‘str’ object is not callable

Do you know how can I solve this problem?

The converters parameter expects a “dict of functions” that is why you get that error. I think what you really want to use is the dtype parameter. In other words:

load_args:
  dtype: {'volume': str}

Yeah! That solved my problem, thank you. Is it not expected that the yml translation would be capable to generate a function instead of a string, right?

As far as I can tell you are correct. In this case it is not an issue I guess, but I am not sure how it would work when you really want to pass in a callable… :thinking:

Welsome to the community @vinialbert! Make sure to share your story in the welcome thread if thats your kind of thing.

yaml is a superset of json, so as @sebastianbertoli has pointed out you can nest objects with standard json syntax, but more often its done with indents. Here is an example out of the kedro docs.

motorbikes:
  type: pandas.CSVDataSet
  filepath: s3://your_bucket/data/02_intermediate/company/motorbikes.csv
  credentials: dev_s3
  load_args:
    sep: ','
    skiprows: 5
    skipfooter: 1
    na_values: ['#NA', NA]
1 Like

You’d probably need to switch to defining your catalog using python instead!

Alternatively, maybe an after_catalog_created hook that converts a string into a function using importlib?

1 Like

Hi @sebastianbertoli ,

PyYAML supports to include a Python callable in YAML using a Python tag like !!python/name:my_awesome_module.my_awesome_func

You can find the details at this document:
https://pipelinex.readthedocs.io/en/latest/section04.html#python-tags-in-yaml

1 Like

Thanks @zain @waylonwalker @Minyus - this is very helpful advice! :slight_smile: