Kedro-Diff (work in progress)

I’ve been doing a lot of code review lately, and sometimes the PR has a lot going on. It’s really hard to see from the diffs what has changed in the pipeline without combing though each line very carefully. I started a cli plugin to be able to see pipeline diffs between two different commits/branches with a familiar interface.

End goal is going to be something along the lines of this.

kedro diff --stat develop..master
M  __default__      | 6 ++++-
M  data_science     | 3 +++
M  data_engineering | 3 ++-
?? new_pipeline

4 pipelines changed, 5 insertions(+), 4 deletions(-)

Now compares inputs, outputs, and tags when checking for modified nodes.



Fixed how modified nodes are printed, I was only counting the add in the last example. When you do a git diff --stat on a file that has a modified line it shows up a a 2 +- rather than a 1 +



Was able to play with kedro_diff on a larger pipeline, its definitely a bit slow without caching or parallelization as listing nodes for each commit creates two temp copies of the project and loads a kedro session, and each following pipeline currently loads a separate kedro session.

I am excited to start seeing pieces start coming together to enable more complex diffs than just --stat

Getting started on the full detailed diff view, which will be the default.

Continuing a bit on kedro diff tonight. Made a big change to how individual nodes are diffed and it was much easier to do at this level.


Making more progress yesterday, got node_diff working with full test coverage. I tried it on a few pipelines and a project with about 10 pipelines it took 1m to 1m20s. If I manually enforce some caching I can get it down to about 1.6s.

Currently it’s super inefficient as it runs a new subshell to get the pipeline object for every pipeline for both commits being compared, getting this down to one subshell for each commit being compared and turning on come caching will reap some big rewards in performance.

this looks interesting…Probably going to need something like this soon.

1 Like

Let me know if you give it a try. It’s something I have been using quite a bit in PR’s. I could see it being essential if you were on larger teams to sus out changes.