Get started with PySpark

Code

SegmentDataFrame has to be assigned to df_final variable.

<aside> 💡 The expected schema is only the entity ID column. The columns to be exported are defined in config.yaml.

</aside>

Example

from pyspark.sql import functions as f

df = spark.read.table("hive_metastore.odap_features.customer")

df_final = (
    df.filter(
        (f.col("transactions_sum_amount_in_last_30d") >= 50000) 
        & (f.col("investice_web_visits_count_in_last_90d") > 0 )
     ).select("customer_id")
)