Using explicit table schema

Table schema can be easily created using the TableSchema class:

import daipe as dp

def get_schema():
    return dp.TableSchema(
            t.StructField("ReportAsOfEOD", t.DateType(), True),
            t.StructField("LoanID", t.StringType(), True),
            t.StructField("Date", t.DateType(), True),
            t.StructField("PrincipalRepayment", t.DoubleType(), True),
            t.StructField("InterestRepayment", t.DoubleType(), True),
            t.StructField("LateFeesRepayment", t.DoubleType(), True),
        primary_key=["LoanID", "Date"],
        # partition_by = "Date"

For more details see the TableSchema reference.

Selecting all fields from the schema before writing them into table:

import daipe as dp

@dp.table_overwrite("bronze.tbl_loans", get_schema())
def save(df: DataFrame):
    return (

Schema autosuggestion

When using @table_* decorators without an explicit schema,...

import daipe as dp
    dp.read_csv("/RepaymentsData.csv", options=dict(header=True)),
def load_csv_and_save(df: DataFrame):
    return df

...Daipe raises a warning and generates a schema based on the DataFrame for you.


Schema checking

When using @table_* decorators with an explicit schema, Daipe checks if the schemas match and raises an Exception if they do not.

It also shows a difference between the schemas so you can easily fix the problems.