Using the @transformation() decorator

There are two main decorators in the Daipe framework - @transformation() and @notebook_function().

  1. @transformation() understands Spark dataframes better and provides you with extra Spark-related functionality like display and duplicate columns checking.
  2. @notebook_function() should be used for functions and procedures which don't manipulate with a DataFrame e. g. downloading data.

First, import everything necessary for a Daipe pipeline workflow:

from datalakebundle.imports import *

Any decorator can also take functions as parameters:

@transformation(
    read_csv(
        "/data.csv",
        options=dict(header=True, inferSchema=True)
    ),
)
def read_csv(df: DataFrame):
    return df

See the list of all functions which can be used.

display=True option can be used for displaying the DataFrame.

@transformation(
    read_table(
        "bronze.tbl_customers",
        options=dict(header=True, inferSchema=True)
    ),
    display=True
)
def read_tbl_customers(df: DataFrame):
    return df

For more information see the technical reference.

Environments

Each table is prefixed with an environment tag (dev, test, prod) to separate production data from the developement code and vice versa. The Daipe framework automatically inserts the prefix based on your selected environment therefore the code stays the same across all environments.