Notebook examplanation

Notebook which creates a temporary table called target_store which contains machine learning targets and a no target.

<aside> 🎯 Targets are dates when important events which we want to predict happened. For example the day client took a mortgage or the day client closed their account

</aside>

Example targets table

client_id timestamp target
1 2020-01-15 mortgage_taken
2 2021-03-10 mortgage_taken
2 2022-07-19 mortgage_taken
3 2021-08-09 mortgage_taken
1 2022-02-13 closed_account
4 2022-09-13 closed_account

<aside> 📅 No target is a fake target which takes all existing IDs and adds a constant date to them. It is used for everyday calculation.

</aside>

Example no target for date 2022-01-01

Note this company has exactly 6 clients.

client_id timestamp target
1 2022-01-01 no target
2 2022-01-01 no target
3 2022-01-01 no target
4 2022-01-01 no target
5 2022-01-01 no target
6 2022-01-01 no target

Notebook code

Import functions

from pyspark.sql import functions as f

The target_store table is controlled by widgets for timestamp , target and timeshift which allows for moving the calculation timestamp back for a given number of days

dbutils.widgets.text("timestamp", "")
dbutils.widgets.text("timeshift", "0")
dbutils.widgets.text("target", "no target")

The actual code for creating the target_store

It consists of reading a targets table and uniting a base of all clients with a constant date taken from the timestamp widget

(
    # insert your table containing all ids
    spark.table("odap_offline_sdm_l2.customer")
    .select(
        "customer_id",
        f.lit("no target").alias("target"),
        (
            f.lit(dbutils.widgets.get("timestamp")).cast("timestamp")
            - f.expr(f"interval {dbutils.widgets.get('timeshift')} days")
        ).alias("timestamp"),
    )
		# insert your table containing targets
    .unionByName(spark.table("odap_targets.targets"))
    .filter(f.col("target") == dbutils.widgets.get("target"))
).createOrReplaceTempView("target_store")

print("Target store successfully initialized")

if dbutils.widgets.get("target") == "no target":
    print(f"Timestamp '{dbutils.widgets.get('timestamp')}' used with timeshift of '{dbutils.widgets.get('timeshift')}' days")
else:
    print("Timestamp and timeshift widgets are being ignored")