Handling below the decimal-point number

Hi all,

I have run the same pipeline twice with the same data sources in the same VM env.
Compared these two outputs, it seems to have some differences with calculation related below decimal point.
image

I would like to know the reason why it happens and how to fix the result.
I executed a pipeline under the following settings.

  • VM: RHEL
  • kedro version: 0.16.6
  • python version: 3.7.10
  • plugin: Plugins: Probably there are some plugins

I would appreciate if you could share your opinion or solution.
Thank you!
@waylonwalker @Minyus

I would suggest you consult with experts of the data processing tool you use (e.g. Spark, pandas, SQL etc.)

If you use pandas, then df.round(2) method could be a workaround.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.round.html

2 Likes

I agree with Minyus, would also add looking into the catalog type you are using (MemoryDataSet, csv, parquet, sql).

1 Like

Thank you for your help @Minyus and @waylonwalker !
We mainly use pySpark in codes and all of the data is Spark tables.
The purpose to raise this issue was to conduct tests for datasets, so I will use round function somehow when I check the values as Minyus mentioned :sparkles: