You can use soda data checks, to test quality of your data in Features DataFrame.

Checks are written in Soda CL (Checks Language).

Soda CL Documentation

Documentation can be found here: https://docs.soda.io/soda-cl/soda-cl-overview.html

In the documentation checks start with definition that looks like checks for dim_product:

dim_product is the name of the DataFrame we want to check. Here you don’t have to define it, it’s done automatically by odap framework.

In case you run orchestration or dry run on more than one feature, features are first joined to one DataFrame and then the checks are ran on it.


YAML →Python

In provided documentation checks are in YAML format. To apply them on your features, they need to be rewritten to Python(json).

<aside> 💡 To some sort of success an online YAML → python converter can be used.

</aside>

Example

Checks in YAML

checks for my_dataframe:
	- invalid_count(column_name) = 0:
		valid values: ['hodnota1', 'hodnota2', 'hodnota3']
	- missing_percent(web_visit_count) < 50%

Checks in Python

dq_checks = [
	{
		'invalid_count(column_name) = 0': {
	        "valid values": ["hodnota1", "hodnota2", "hodnota3"]
		}
	},
	'missing_percent(web_visit_count) < 50%'
]

How to define checks

You can define checks inside Feature Notebook by adding dq_checks list. In that list write needed checks in python syntax more in Soda CL Documentation.