Question 1

What are vanishing gradients?

Accepted Answer

During backpropagation in deep networks, gradient signals can become extremely small as they propagate to earlier layers, causing very small weight updates and slow learning.

Question 2

What are exploding gradients?

Accepted Answer

Gradients become excessively large during backpropagation, leading to unstable updates, possible numerical overflow, and divergent training.

Question 3

Why do these problems occur in deep networks?

Accepted Answer

Repeated multiplications of gradients across many layers (via the chain rule), especially with saturating activation functions or poor initialization, can shrink or amplify gradients.

Question 4

How do activation functions affect gradients?

Accepted Answer

Saturating activations like sigmoid/tanh can squash gradients toward zero, while non-saturating activations like ReLU help preserve gradient magnitude (though ReLU can cause dead neurons if not managed).

Question 5

How can I mitigate vanishing and exploding gradients?

Accepted Answer

Use better initialization (Xavier/Glorot, He), non-saturating activations (ReLU/Leaky ReLU), gradient clipping, batch normalization, and architectural techniques like residual connections to maintain gradient flow.

Understanding Vanishing and Exploding Gradients

💡 Key Takeaways

❓ Frequently Asked Questions

You may also like

Understanding Neural Network for Drug Discovery

Understanding Neural Network Ensembles

Understanding Neural Network Robustness

You may also like

Understanding Neural Network for Drug Discovery

Understanding Neural Network Ensembles

Understanding Neural Network Robustness