Linear Regression is a fundamental statistical and machine learning model used to predict a continuous outcome based on one or more input features. It establishes a linear relationship between the dependent and independent variables by fitting a straight line to the data. The model estimates coefficients that minimize the difference between predicted and actual values, making it widely used for tasks like trend analysis, forecasting, and understanding variable relationships.
Linear Regression is a fundamental statistical and machine learning model used to predict a continuous outcome based on one or more input features. It establishes a linear relationship between the dependent and independent variables by fitting a straight line to the data. The model estimates coefficients that minimize the difference between predicted and actual values, making it widely used for tasks like trend analysis, forecasting, and understanding variable relationships.
What is linear regression?
A method to model the relationship between a dependent variable and one or more independent variables by fitting a straight line that minimizes the sum of squared residuals (least squares).
What does the slope coefficient represent in simple linear regression?
The amount the dependent variable is expected to change for a one-unit increase in the predictor (the rate of change).
What is R-squared and what does it tell you about the model fit?
R-squared is the proportion of variance in the dependent variable explained by the model (0 to 1). Higher values indicate a better fit, but it doesn’t prove causation or guarantee predictive accuracy.
What are the key assumptions of linear regression?
Linear relationship, independence of observations, homoscedasticity (constant residual variance), normally distributed residuals for inference, and no perfect multicollinearity among predictors.
When should you use linear regression, and when might you avoid it?
Use linear regression when the relationship is approximately linear and assumptions are reasonably met. Avoid it if the relationship is non-linear, residuals show patterns, variances are non-constant, or the outcome is categorical.