To unit test functions from features, one has to extract them from feature notebook.

Proposed way how do it is by extracting it to folder functions where we create file with the same name as the feature.

Untitled

In this case we extracted aggregating function product_agg_features.

from typing import List
from pyspark.sql import functions as f
from odap.feature_factory import time_windows as tw

products = ["investice", "pujcky", "hypoteky"]

def product_agg_features(time_window: str) -> List[tw.WindowedColumn]:
    return [
        tw.sum_windowed(
            f"{product}_web_visits_count_in_last_{time_window}",
            f.lower("url").contains(product).cast("integer"),
        )
        for product in products
    ]

Now we can import this function to feature notebook and test notebook as well.

In this case the test notebook is located in tests folder.

Untitled

In it you can define test class that has to inherit from PySparkTestCase. This class allows us to run your test outside of databricks during CI/CD.

from odap.common.test.PySparkTestCase import PySparkTestCase

class ProductWebVisitsCountTest(PySparkTestCase):
...

and then run the test by calling

from functions.core import run_test

run_test(ProductWebVisitsCountTest)

<aside> 🚨 If you want your test to be part of a CI/CD process it has to contain a cell in which the test is being run → f.e. by calling the run_test method.

</aside>