Input decorators¶
These decorators are used to wrap the entire content of a cell.
@dp.transformation¶
@dp.transformation(*objects, display = False, check_duplicate_columns = True
)
Used for decorating a function which manipulates with a DataFrame. Runs the decorated function upon declaration.
*objects
: an arbitrary number of objects passed to the decorated functiondisplay
: bool, default False - ifTrue
the output DataFrame is displayedcheck_duplicate_columns
: bool, default True - ifTrue
raises an Exception if there are duplicate columns in the DataFrame
Example:
import daipe as dp
@dp.transformation(dp.read_table("silver.tbl_loans"), dp.read_table("silver.tbl_repayments"), display=True)
@dp.table_overwrite("silver.tbl_joined_loans_and_repayments", get_joined_schema())
def join_loans_and_repayments(df1: DataFrame, df2: DataFrame):
return df1.join(df2, "LoanID")
@dp.notebook_function¶
@dp.notebook_function(*objects
)
Used for decorating any other function which is not decorated with the
@dp.transformation
decorator. Runs the decorated function upon declaration.
Parameters:
*objects
: an arbitrary number of objects passed to the decorated functiondisplay
: bool, default False - ifTrue
the output DataFrame is displayedcheck_duplicate_columns
: bool, default True - ifTrue
raises an Exception if there are duplicate columns in the DataFrame
Example:
@dp.notebook_function()
def download_data():
opener = urllib.request.URLopener()
opener.addheader(
"User-Agent",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
)
opener.retrieve("https://www.bondora.com/marketing/media/LoanData.zip", "/loanData.zip")
opener.retrieve("https://www.bondora.com/marketing/media/RepaymentsData.zip", "/repaymentsData.zip")
Objects available in @dp.transformation and @dp.notebook_function¶
-
spark: SparkSession
-
dbutils: DBUtils
-
logger: Logger
Using Spark
and Logger
from logging import Logger
from pyspark.sql.session import SparkSession
@dp.notebook_function()
def customers_table(spark: SparkSession, logger: Logger):
logger.info('Reading my_crm.customers')
return spark.read.table('my_crm.customers')
Using DBUtils
from pyspark.dbutils import DBUtils
@dp.notebook_function()
def create_input_widgets(dbutils: DBUtils):
dbutils.widgets.dropdown("base_year", "2015", list(map(str, range(2009, 2022))), "Base year")