General Workflow in Steps (WIP)

Below are the general steps I learnt from the Machine Learning course, to be updated as the course progress

  • Define problem statement
  • Gather data (eg, data lake, data pipeline)
  • Exploring data and gain insights
    • null inspection
    • pair plot (correlation between features)
    • scatter plot (density view with alpha=0.1)
    • shape and size
    • data type
  • Data cleansing
    • fillna / dropna
    • one-hot loading
    • imputer
  • Build baseline model, perhaps using linear regression
  • Iterative Process
    • P-value inspection
    • Feature Selection
    • Feature Scaling & Normalization
    • Feature Engineering, eg: for polynomial regression
    • Prepare Data set: training, testing, holdout
    • Train the model with cross validation
    • Model evaluation
      • R2 score as a function of polynomial degree

workflow