Daipe 2.0: Release Notes


  • It is no longer necessary to define tables in a local environment, YAML config is optional. Local environment is only necessary for the initial setup of the project
  • It is now possible to use Daipe without Databricks on whatever Spark environment or even without Spark just using Pandas
  • Functions such as read_csv() and read_table() can be used as arguments for decorators. This completely replaces the functionality of @data_frame_loader, see docs. Example:
    # Old Daipe
    def my_transformation(spark: SparkSession):
        return spark.read.table("my_database.my_table")
    # New Daipe
    @transformation(read_table("my_database.my_table"), display=True)
    def my_transformation(df: DataFrame):
        return df
  • Support for DBR 8.x
  • Decorator @table_overwrite which overwrites all data in a table with the data from a DataFrame, see docs
  • Decorator @table_append which appends the data from a DataFrame to a table, see docs
  • Decorator @table_upsert which updates existing data based on primary_key and inserts new data, see docs

  • Schema now allows you to define a primary_key (used for @table_upsert ), partition_by and tbl_properties , see docs

  • Schema will be generated for you if you do not provide it to the @table_* decorators see example:

  • Schema checking output is greatly improved. Schema diff example:

Backwards incompatible changes

  • Schema is no longer loaded automatically from the schema.py file in the notebook folder. Now the schema can be defined inside the notebook as well as imported from a separate file, see docs and example:

  • Command console datalake:table:create-missing has been removed, because it is no longer possible to rely on the tables being defined in YAML config
  • Command console datalake:table:delete renamed to console datalake:table:delete-including-data


  • Decorator @data_frame_loader has been deprecated
  • Decorator @data_frame_saver has been deprecated