Machine learning mathematics involves foundational concepts like gradient descent and regularization. Gradient descent is an optimization algorithm used to minimize a loss function by iteratively adjusting model parameters in the direction of steepest descent. Regularization introduces additional terms to the loss function, discouraging overly complex models and preventing overfitting. Together, these techniques enable machine learning models to learn efficiently from data while maintaining generalizability and robustness by balancing accuracy and complexity.
Machine learning mathematics involves foundational concepts like gradient descent and regularization. Gradient descent is an optimization algorithm used to minimize a loss function by iteratively adjusting model parameters in the direction of steepest descent. Regularization introduces additional terms to the loss function, discouraging overly complex models and preventing overfitting. Together, these techniques enable machine learning models to learn efficiently from data while maintaining generalizability and robustness by balancing accuracy and complexity.
What is gradient descent and what is it used for?
Gradient descent is an optimization algorithm that minimizes a loss function by iteratively updating model parameters in the direction of steepest descent, using the gradient of the loss with respect to the parameters.
What is learning rate and how does it affect training?
The learning rate controls the step size of each update. A too-large rate can cause divergence; a too-small rate leads to slow convergence. It can be fixed or scheduled during training.
What is regularization and why is it used in ML?
Regularization adds a penalty to the loss to discourage overly complex models, helping prevent overfitting and improve generalization.
What are L1 and L2 regularization, and how do they differ?
L1 regularization (Lasso) uses the sum of absolute weights and can drive some weights to zero, promoting sparsity. L2 regularization (Ridge) uses the sum of squared weights and shrinks weights without necessarily zeroing them.
What are common gradient descent variants, and when should you use them?
Batch gradient descent uses the full dataset per update and is stable but slow; stochastic gradient descent updates per example and is fast but noisy; mini-batch gradient descent uses small batches and balances stability and efficiency, making it a common default choice.