VCS For Jupyter Notebooks!

Hi everyone! I think that will be a good idea to introduce VCS for JupyterNotebooks. Normally I use externals solution for tracking my jupyter notebooks and avoid the metadata in git.

What do you think?

By the way. I create a Help Kedro in our comunity (is in spanish) for encourage to use Kedro for our proyects.

2 Likes

Yes! VCS is really useful, but jupyter notebooks are notorious for being very bad for VCS. However, there are tools out there that can lint the notebooks in order to make them much more easy to add to VCS by removing the outputs, the run numbers, and etc etc.

Another solution that may be more consistent and automatable is to leverage pre-commit with nbstripout or nbconvert. This should make a much cleaner git diff.

Personally, I would lean on this one that uses nbconvert. I feel that it could be infuriating to some to find their notebook output gone when doing a git commit, but having it output to html/python would be far less intrusive.

Further searching brought me to this one that appears more popular.

https://jupytext.readthedocs.io/en/latest/using-pre-commit.html


All you need to do is add a .pre-commit-config.yaml file at the root of your project, then instruct developers to pip install pre-commit && pre-commit install and it will take care of the rest.