Simple target based mode

Run init_target_store notebook

Init target store notebook

%run ../init_target_store

Imports

from pyspark.sql import functions as f

Create widgets, for target based calculation you need a second widget called target

dbutils.widgets.text("timestamp", "2022-01-01")
dbutils.widgets.text("target", "no target")

Load source data

df = spark.read.table("odap_digi_sdm_l2.web_visits")

Load target store

target_store = spark.read.table("target_store")

Join loaded data with target_store on entity id

df = df.join(target_store, on="customer_id")

! Important !

Filter the input data for the time window which you are using in your features.

Always use lower and upper bounds. Your code might be run to calculate features in the past so you need to cut off anything after timestamp.

df = df.filter(
    f.col("web_visit_timestamp").between(
        f.col("timestamp") - f.expr("interval 30 days"), f.col("timestamp")
    )
)

Compute same features for both target and no target modes