Feature table strategy

How many feature tables to have?

The basic recommendation is to group features with the same period of computation to the same table.

For example if I have three feature notebooks:

customer_gender
- calculated once every 3 months
monthly_repayments
- calculated every month
daily_transactions
- calculated daily

It is necessary to put all of these into three separate tables.

What if I have 50 notebooks calculated daily?

In this case it is hard to give a general rule for grouping features into tables.

It depends on how many feature columns are calculated in each notebooks, if different input data arrives at different times and so on.

Our recommendation for a project with N notebooks is to have approx. N/10 tables where each table corresponds to a particular dataset e.g. web data, card transactions, account transactions etc. When one of these tables becomes large in column size we recommend splitting it into two keeping roughly 150-300 columns in each table.