Notebook which creates a temporary table called target_store
which contains machine learning targets and a no target.
<aside> 🎯 Targets are dates when important events which we want to predict happened. For example the day client took a mortgage or the day client closed their account
</aside>
Example targets table
client_id | timestamp | target |
---|---|---|
1 | 2020-01-15 | mortgage_taken |
2 | 2021-03-10 | mortgage_taken |
2 | 2022-07-19 | mortgage_taken |
3 | 2021-08-09 | mortgage_taken |
1 | 2022-02-13 | closed_account |
4 | 2022-09-13 | closed_account |
<aside> 📅 No target is a fake target which takes all existing IDs and adds a constant date to them. It is used for everyday calculation.
</aside>
Example no target for date 2022-01-01
Note this company has exactly 6 clients.
client_id | timestamp | target |
---|---|---|
1 | 2022-01-01 | no target |
2 | 2022-01-01 | no target |
3 | 2022-01-01 | no target |
4 | 2022-01-01 | no target |
5 | 2022-01-01 | no target |
6 | 2022-01-01 | no target |
Import functions
from pyspark.sql import functions as f
The target_store
table is controlled by widgets for timestamp
, target
and timeshift
which allows for moving the calculation timestamp back for a given number of days
dbutils.widgets.text("timestamp", "")
dbutils.widgets.text("timeshift", "0")
dbutils.widgets.text("target", "no target")
The actual code for creating the target_store
It consists of reading a targets
table and uniting a base of all clients with a constant date taken from the timestamp widget
(
# insert your table containing all ids
spark.table("odap_offline_sdm_l2.customer")
.select(
"customer_id",
f.lit("no target").alias("target"),
(
f.lit(dbutils.widgets.get("timestamp")).cast("timestamp")
- f.expr(f"interval {dbutils.widgets.get('timeshift')} days")
).alias("timestamp"),
)
# insert your table containing targets
.unionByName(spark.table("odap_targets.targets"))
.filter(f.col("target") == dbutils.widgets.get("target"))
).createOrReplaceTempView("target_store")
print("Target store successfully initialized")
if dbutils.widgets.get("target") == "no target":
print(f"Timestamp '{dbutils.widgets.get('timestamp')}' used with timeshift of '{dbutils.widgets.get('timeshift')}' days")
else:
print("Timestamp and timeshift widgets are being ignored")