Passing None as one of the inputs to a Kedro pipeline Node

What is the goal you are trying to achieve?
Hi!
I’m trying to pass None as one of the input arguments to a pipeline Node (In the following code df_y may be None in some cases):
def evaluate_model(df_x: pd.DataFrame, df_y=Optional[pd.Series])->float: return
Node(evaluate_model, [df_x, df_y], ["output"], name="evaluate_model", tags="ev")
when df_y is None I get:
split_name = element.split(TRANSCODING_SEPARATOR) AttributeError: 'NoneType' object has no attribute 'split'
How should one deal with inputs to a pipeline Node that may be set to None?
Thanks for your help!

What version of Kedro are you using? (Use kedro -V)
0.17.0

Do you have any custom plugins?
No

What is the full stack trace of the error (if applicable)
Traceback (most recent call last): File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/kedro/__main__.py", line 38, in <module> main() File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/kedro/framework/cli/cli.py", line 696, in main cli_collection(**cli_context) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/IL/Desktop/github/phd/PROJECTS/INTINP/INTINP/src/INTINP/cli.py", line 228, in run session.run( File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/kedro/framework/session/session.py", line 377, in run pipeline = context._get_pipeline(name=pipeline_name) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/kedro/framework/context/context.py", line 250, in _get_pipeline pipelines = self._get_pipelines() File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/kedro/framework/context/context.py", line 269, in _get_pipelines hook_manager.hook.register_pipelines() # pylint: disable=no-member File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/pluggy/hooks.py", line 286, in __call__ return self._hookexec(self, self.get_hookimpls(), kwargs) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/pluggy/manager.py", line 93, in _hookexec return self._inner_hookexec(hook, methods, kwargs) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/pluggy/manager.py", line 84, in <lambda> self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall( File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/pluggy/callers.py", line 208, in _multicall return outcome.get_result() File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/pluggy/callers.py", line 80, in get_result raise ex[1].with_traceback(ex[2]) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/pluggy/callers.py", line 187, in _multicall res = hook_impl.function(*args) File "/Users/IL/Desktop/github/phd/PROJECTS/INTINP/INTINP/src/INTINP/hooks.py", line 58, in register_pipelines eval_pipeline = eval.create_pipelines(input_df_x="merged_views_df", File "/Users/IL/Desktop/github/phd/PROJECTS/INTINP/INTINP/src/INTINP/pipelines/evaluate_model/pipeline.py", line 94, in create_pipelines evaluate_model_pipeline = Pipeline([ File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/kedro/pipeline/pipeline.py", line 174, in __init__ _validate_transcoded_inputs_outputs(nodes) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/kedro/pipeline/pipeline.py", line 819, in _validate_transcoded_inputs_outputs name = _strip_transcoding(dataset_name) File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/kedro/pipeline/pipeline.py", line 84, in _strip_transcoding return _transcode_split(element)[0] File "Users/IL/miniconda3/envs/INTINP/lib/python3.8/site-packages/kedro/pipeline/pipeline.py", line 60, in _transcode_split split_name = element.split(TRANSCODING_SEPARATOR) AttributeError: 'NoneType' object has no attribute 'split'

Could you try changing ["output"] to "output" instead?

Thanks for your suggestion! In this case the code is just an example. I included the square brackets because my node has multiple outputs. The problem I was facing concerned providing a None input within the inputs to a node. For example, I would like to provide inputs like [ “a”, “b”, None, “params:c” ] as inputs to a node. The problem, from what I understood, is that all the inputs, if they are not parameters, are checked for transcoding before looking them up from the catalog. So, Kedro assumes the input is a string and tries to split it on the TRANSCODING_SEPARATOR which by default is “@”. As a consequence, If I provide a None as one of the inputs I get “AttributeError: ‘NoneType’ object has no attribute ‘split’”. So I was looking for a way to pass None as one of the inputs without incurring in this problem. At the moment, I solved it checking if the input is none before plugging it in the node and if this is the case I assigned to it a “params:none” which I defined to be None in the config.

I see. As far as I remember, kedro treat is as a string literal, if you pass None as a input, it will fail.

I have similiar pain point when the return variable is a dummy variable.

With normal python, I would just assign the variable as _, for kedro I have to assign a variable for it with a name. If I have multiple variables like these, I have to name them dummy1, dummy2 etc,

1 Like

I think that a simple API to add a memorydataset to the catalog will be beneficial.

Well, it seems that this example in the docs does something like that:

from kedro.io import MemoryDataSet

memory = MemoryDataSet(data=None)
io.add("cars_cache", memory)
io.save("cars_cache", "Memory can store anything.")
io.load("car_cache")

I think you could do that in the Hook register_catalog; in this way, you will always have the name “None” assigned to None. I have not tried it, but it might work.

1 Like

Hi! thanks for your answers @Jaime_Arboleda_Casti , @nok. I found them really useful.
So, at the moment it seems like there are 3 strategies:

  • store a parameter none that can be referenced as “params:none”
  • store None in the catalog as dummy variable
  • store a variable in a memory dataset added in the ProjectHooks

I think @Jaime_Arboleda_Casti’s option fits really well my needs. Probably, generally used input values like None could be directly put in any Kedro ProjectHooks such that they can be always referenced. I think I would instead opt for @nok solution in case of project-specific dummy variables. Again thanks for your help!

1 Like

I hope it works! I did not test it… I just read yesterday the code of Kedro context._get_catalog() and the corresponding Hook and I thought that it should be ok. If it works please let me know! Thanks!

You are welcome! Hope it helps