Daipe 2.0: Release Notes

Enhancements

  • It is no longer necessary to define tables in a local environment, YAML config is optional. Local environment is only necessary for the initial setup of the project
  • It is now possible to use Daipe without Databricks on whatever Spark environment or even without Spark just using Pandas
  • Functions such as dp.read_csv() and dp.read_table() can be used as arguments for decorators. This completely replaces the functionality of @dp.data_frame_loader, see docs. Example:
    # Imports
    import daipe as dp
    

# Old Daipe
@dp.data_frame_loader(display=True)
def my_transformation(spark: SparkSession):
    return spark.read.table("my_database.my_table")
# New Daipe
@dp.transformation(dp.read_table("my_database.my_table"), display=True)
def my_transformation(df: DataFrame):
    return df
- Support for DBR 8.x - Decorator @dp.table_overwrite which overwrites all data in a table with the data from a DataFrame, see docs - Decorator @dp.table_append which appends the data from a DataFrame to a table, see docs - Decorator @dp.table_upsert which updates existing data based on primary_key and inserts new data, see docs

  • Schema now allows you to define a primary_key (used for @dp.table_upsert ), partition_by and tbl_properties , see docs
  • Schema will be generated for you if you do not provide it to the @table_* decorators see example:

  • Schema checking output is greatly improved. Schema diff example:

Backwards incompatible changes

  • Schema is no longer loaded automatically from the schema.py file in the notebook folder. Now the schema can be defined inside the notebook as well as imported from a separate file, see docs and example:

  • Command console datalake:table:create-missing has been removed, because it is no longer possible to rely on the tables being defined in YAML config
  • Command console datalake:table:delete renamed to console datalake:table:delete-including-data

Deprecations

  • Decorator @dp.data_frame_loader has been deprecated
  • Decorator @dp.data_frame_saver has been deprecated