Scalable data science
- All features in a few (or just one) wide tables for features and at the same time calculated in multiple notebooks
- Data scientists doing data science and not engineering (orchestration, optimized writes)
- Data scientists using SQL or PySpark
- The same feature code is used for training and inference
- Features easily reusable across many use cases by multiple teams
- Always up to date business and technical feature documentation
- Production ready code independent of particular developers
Solution?
- Feature store + Feature engineering framework
What is a feature store in general and what are the benefits?
- This article sums it up pretty well
What are the benefits of Databricks Feature store?
Differences/synergy of Databricks Feature store and ODAP
FAQ
Main benefits of using ODAP Feature factory
Feature factory overview