SegmentDataFrame has to be assigned to df_final
variable.
<aside>
💡 The expected schema is only the entity ID column. The columns to be exported are defined in config.yaml
.
</aside>
from pyspark.sql import functions as f
df = spark.read.table("hive_metastore.odap_features.customer")
df_final = (
df.filter(
(f.col("transactions_sum_amount_in_last_30d") >= 50000)
& (f.col("investice_web_visits_count_in_last_90d") > 0 )
).select("customer_id")
)