Kedro-Diff (work in progress)

I’ve been doing a lot of code review lately, and sometimes the PR has a lot going on. It’s really hard to see from the diffs what has changed in the pipeline without combing though each line very carefully. I started a cli plugin to be able to see pipeline diffs between two different commits/branches with a familiar interface.

End goal is going to be something along the lines of this.

kedro diff --stat develop..master
M  __default__      | 6 ++++-
M  data_science     | 3 +++
M  data_engineering | 3 ++-
?? new_pipeline

4 pipelines changed, 5 insertions(+), 4 deletions(-)
3 Likes

Now compares inputs, outputs, and tags when checking for modified nodes.

image

2 Likes

Fixed how modified nodes are printed, I was only counting the add in the last example. When you do a git diff --stat on a file that has a modified line it shows up a a 2 +- rather than a 1 +

image

2 Likes

Was able to play with kedro_diff on a larger pipeline, its definitely a bit slow without caching or parallelization as listing nodes for each commit creates two temp copies of the project and loads a kedro session, and each following pipeline currently loads a separate kedro session.

I am excited to start seeing pieces start coming together to enable more complex diffs than just --stat

Getting started on the full detailed diff view, which will be the default.