Regularization and Early Stopping

Takes into account the model complexity when calculating the error. It’s a major field of ML research, but we are going to focus on L1 and L2 regularization. L2 vs L1 Regularization L1, and L2 regularization are so called parameter norm penalties. They both aim to penalize the loss function by introducing coefficients with different … Read more

ML Activation and Loss Functions

Common Activation Functions Placeholder Failure modes for Gradient Descent Problem Gradients can vanish Gradients can explode ReLu layers can die Insight Each Additional layer can reduce the signal vs. noise Learning rates are important here Monitor fraction of zero weights in TensorBoard Solution Using ReLu instead of sigmoid/tanh can help Batch normalization (useful knob) can … Read more