Handling Zip Files in Kedro Using the de1 python package!

Got a special one, today! Announcing a brand new package that you can use with your data pipelines, the de1 package! GitHub - dataengineerone/de1-python: Curated collection of DE1's favorite kedro pieces.

In this video, we use the ZipFileDataSet from the package to play with compressed csv files.

def create_pipeline(**kwargs):
pipes = Pipeline([])
for source_dataset in ['first', 'second', 'third', 'fourth']:
    pipes += Pipeline([
        node(
            lambda x: x.decode('utf8'),
            inputs=source_dataset,
            outputs=f"decoded_{source_dataset}",
            name=f"decode_{source_dataset}",
            tags=[source_dataset],
        ),
        node(
            lambda x: pd.read_csv(StringIO(x)),
            inputs=f"decoded_{source_dataset}",
            outputs=f"pandas_{source_dataset}",
            name=f"pandify_{source_dataset}",
            tags=[source_dataset],
        ),
        node(
            lambda x: print(f"Row Count: {len(x)}"),
            inputs=f"pandas_{source_dataset}",
            outputs=None,
            name=f"count_{source_dataset}",
            tags=[source_dataset],
        ),
    ])
return pipes
2 Likes