Machine Learning General Workflow
General Workflow in Steps (WIP)
Below are the general steps I learnt from the Machine Learning course, to be updated as the course progress
- Define problem statement
- Gather data (eg, data lake, data pipeline)
- Exploring data and gain insights
- null inspection
- pair plot (correlation between features)
- scatter plot (density view with alpha=0.1)
- shape and size
- data type
- Data cleansing
- fillna / dropna
- one-hot loading
- imputer
- Build baseline model, perhaps using linear regression
- Iterative Process
- P-value inspection
- Feature Selection
- Feature Scaling & Normalization
- Feature Engineering, eg: for polynomial regression
- Prepare Data set: training, testing, holdout
- Train the model with cross validation
- Model evaluation
- R2 score as a function of polynomial degree