ML Activation and Loss Functions

Common Activation Functions Placeholder Failure modes for Gradient Descent Problem Gradients can vanish Gradients can explode ReLu layers can die Insight Each Additional layer can reduce the signal vs. noise Learning rates are important here Monitor fraction of zero weights in TensorBoard Solution Using ReLu instead of sigmoid/tanh can help Batch normalization (useful knob) can … Read more