Question 1

What is gradient descent and its goal?

Accepted Answer

Gradient descent is an iterative optimization method that aims to minimize a function by moving parameters in the direction of steepest descent, i.e., opposite the gradient ∇f.

Question 2

How is the update step computed in gradient descent?

Accepted Answer

The update is θ_{k+1} = θ_k − α_k ∇f(θ_k), where α_k is the learning rate (step size) that controls how far you move each iteration.

Question 3

What is the learning rate and why does it matter?

Accepted Answer

The learning rate α scales the update. If too large, the method may diverge; if too small, convergence is slow. It can be fixed or adjusted over time.

Question 4

What are stochastic and mini-batch gradient descent?

Accepted Answer

They approximate the full gradient using a subset of data: SGD uses a single sample, while mini-batch uses a small batch. This speeds up iterations on large datasets and can improve robustness.

Question 5

How do you know when gradient methods have converged?

Accepted Answer

Common criteria include a small gradient norm ||∇f(θ)||, negligible parameter updates, or minimal changes in the objective value, often with a maximum iteration limit.

Numerical Optimization: Gradient Methods

Numerical Optimization: Gradient Methods

💡 Key Takeaways

❓ Frequently Asked Questions