Question 1

What is gradient descent and what is it used for?

Accepted Answer

Gradient descent is an optimization algorithm that minimizes a loss function by iteratively updating model parameters in the direction of steepest descent, using the gradient of the loss with respect to the parameters.

Question 2

What is learning rate and how does it affect training?

Accepted Answer

The learning rate controls the step size of each update. A too-large rate can cause divergence; a too-small rate leads to slow convergence. It can be fixed or scheduled during training.

Question 3

What is regularization and why is it used in ML?

Accepted Answer

Regularization adds a penalty to the loss to discourage overly complex models, helping prevent overfitting and improve generalization.

Question 4

What are L1 and L2 regularization, and how do they differ?

Accepted Answer

L1 regularization (Lasso) uses the sum of absolute weights and can drive some weights to zero, promoting sparsity. L2 regularization (Ridge) uses the sum of squared weights and shrinks weights without necessarily zeroing them.

Question 5

What are common gradient descent variants, and when should you use them?

Accepted Answer

Batch gradient descent uses the full dataset per update and is stable but slow; stochastic gradient descent updates per example and is fast but noisy; mini-batch gradient descent uses small batches and balances stability and efficiency, making it a common default choice.

Machine Learning Mathematics: Gradient Descent and Regularization

Machine Learning Mathematics: Gradient Descent and Regularization

💡 Key Takeaways

❓ Frequently Asked Questions