Writing function output

Output decorators are used in conjunction with the @dp.transformation input decorator. They take either a string identifier of the table...

import daipe as dp

@dp.transformation(dp.read_csv, display=True)
def save(df: DataFrame, logger: Logger):
    logger.info(f"Saving {df.count()} records")
    return df.withColumn("Birthdate", f.to_date(f.col("Birthdate"), "yyyy-MM-dd"))

...or you also skip Hive and write directly into delta:

import daipe as dp

@dp.transformation(dp.read_table("bronze.tbl_customers"), dp.read_table("bronze.tbl_contracts"))
def join_tables(df1: DataFrame, df2: DataFrame):
    return df1.join(df2, "Customer_ID")

For more details see the Output decorators reference.

Automatic schema

When using the string table identifier, the @dp.table_overwrite decorator saves the data using the DataFrame schema. This is useful for prototyping. It is highly recommended to use explicit schema for production pipelines.