Decorator functions¶
read_csv¶
dp.read_csv(path: str, schema: StructType = None, options: dict = None
)
Reads a CSV file into a spark DataFrame
Parameters:
path
: str - path to the CSV fileschema
: StructType, default None - schema of the CSV fileoptions
: dict, default None - options passed tospark.read.options(**options)
Example:
@dp.transformation(dp.read_csv("/LoanData.csv", options=dict(header=True, inferSchema=True)), display=True)
@dp.table_overwrite("bronze.tbl_loans")
def save(df: DataFrame):
return df.orderBy("LoanDate")
read_delta¶
dp.read_delta(path: str, schema: StructType = None, options: dict = None
)
Reads a Delta from a path
Parameters:
path
: str - path to the Deltaschema
: StructType, default None - Union[str, list], default None - schema of the Deltaoptions
: dict, default None - options passed tospark.read.options(**options)
read_json¶
dp.read_json(path: str, schema: StructType = None, options: dict = None
)
Reads a json file from a path
Parameters:
path
: str - path to the json fileschema
: StructType, default None - Union[str, list], default None - schema of the json fileoptions
: dict, default None - options passed tospark.read.options(**options)
read_parquet¶
dp.read_parquet(path: str, schema: StructType = None, options: dict = None
)
Reads a parquet from a path
Parameters:
path
: str - path to the parquetschema
: StructType, default None - Union[str, list], default None - schema of the parquetoptions
: dict, default None - options passed tospark.read.options(**options)
read_table¶
dp.read_table(identifier: str
)
Reads a table into a spark DataFrame
Parameters:
identifier
: str - full table name, formatdb.table_name
Example:
@dp.transformation(dp.read_table("silver.tbl_loans"))
def read_table_bronze_loans_tbl_loans(df: DataFrame, dbutils: DBUtils):
base_year = dbutils.widgets.get("base_year")
return df.filter(f.col("DefaultDate") >= base_year)
table_params¶
dp.table_params(identifier: str, param_path_parts: list = None
)
Reads parameters from datalakebundle.tables.[
identifier
]
Parameters:
identifier
: str - full table name, formatdb.table_name
param_path_parts
: list, default None - Union[str, list], default None - list of parameter levels leading to result