ML Modeling 101: Principles & Practice
A series on the foundational ideas behind machine learning — how to think about problems, design experiments, and operate models in production. Each post focuses on a principle rather than a tool. Examples are drawn from credit risk and real-world classification and regression problems.
Series: ML Modeling 101
| # | Post | What it covers |
|---|---|---|
| 0 | The Philosophy of Modeling: A Controlled Approximation | Models don't need to be right — they need to be useful. Parsimony and three questions to answer before you start |
| 1 | Problem Formulation: Getting the Question Right Before Choosing a Model | Translating a business problem into an ML problem. Target variable, unit of observation, horizon |
| 2 | Data Understanding: How Your Data Was Born Matters | Data Generating Process, population vs sample, EDA as hypothesis testing |
| 3 | Feature Engineering: Where Domain Knowledge Becomes Signal | Signal vs noise, transformation trade-offs, and leakage prevention — the most dangerous mistake in ML |
| 4 | Experimental Design: The Split Determines the Conclusion | Train/val/test roles, no peeking, cross-validation, temporal split |
| 5 | Model Selection: No Free Lunch | No Free Lunch Theorem, inductive bias, bias-variance tradeoff |
| 6 | Training & Optimization: Minimizing a Proxy of the Real Objective | Loss functions, gradient descent, regularization as a prior belief |
| 7 | Model Evaluation: Measure What You Actually Need to Optimize | Classification metrics, calibration, slice-based evaluation, metric hacking |
| 8 | Hyperparameter Tuning: Searching with a Strategy | Grid search, random search, Bayesian optimization, overfitting hyperparameters |
| 9 | Interpretability & Explainability: Models Must Be Trusted | SHAP, LIME, PDP/ICE, WoE scorecard, global vs local explanation |
| 10 | Productionization & Monitoring: Models Decay Over Time | Training-serving skew, data drift vs concept drift, PSI, retraining strategy |
| 11 | The Modeling Mindset: Synthesis | Iterative process, every decision is a hypothesis, domain knowledge beats algorithms |
The caketool API Reference is available at API Reference.