Run init_target_store notebook
%run ../init_target_store
Imports
from pyspark.sql import functions as f
Create widgets, for target based calculation you need a second widget called target
dbutils.widgets.text("timestamp", "2022-01-01")
dbutils.widgets.text("target", "no target")
Load source data
df = spark.read.table("odap_digi_sdm_l2.web_visits")
Load target store
target_store = spark.read.table("target_store")
Join loaded data with target_store
on entity id
df = df.join(target_store, on="customer_id")
! Important !
Filter the input data for the time window which you are using in your features.
Always use lower and upper bounds. Your code might be run to calculate features in the past so you need to cut off anything after timestamp
.
df = df.filter(
f.col("web_visit_timestamp").between(
f.col("timestamp") - f.expr("interval 30 days"), f.col("timestamp")
)
)
Compute same features for both target
and no target
modes