Logistic Regression is a statistical method used for binary classification problems, where the output can take one of two possible values (e.g., yes/no, 0/1, true/false). It models the probability that a given input belongs to a particular category by fitting a logistic function (also known as the sigmoid function) to the input features. Logistic Regression estimates the probability of an event occurring based on the given input variables.
- Probabilistic Interpretation: Outputs a probability value between 0 and 1.
- Linear Decision Boundary: The decision boundary is linear in the feature space.
- Non-linear Mapping: The logistic function maps linear combinations of inputs to a probability.
- Interpretable Coefficients: Coefficients indicate the strength and direction of the relationship between features and the target.
- Linearity of Independent Variables and Log Odds: The relationship between the independent variables and the log odds of the dependent variable is linear.
- Independence of Errors: The observations should be independent of each other.
- No Multicollinearity: Independent variables should not be highly correlated with each other.
- Large Sample Size: Logistic regression requires a large sample size to provide reliable estimates.
The logistic regression model predicts the probability
Where:
-
$$\ \sigma $$ is the logistic (sigmoid) function. -
$$\ \mathbf{w} $$ is the vector of weights. -
$$\ \mathbf{x} $$ is the input feature vector. -
$$\ b $$ is the bias term.
The logit function (log-odds) is given by:
Initialize the weights
For each input
Compute the loss function using the binary cross-entropy loss:
Compute the gradients of the loss function with respect to the weights
Update the weights
where
Repeat steps 2 to 5 until the loss function converges or a predefined number of iterations is reached.
- Binary Outcomes: When the dependent variable is binary (i.e., it has two possible outcomes like Yes/No, True/False, 0/1).
- Linearly Separable Data: When the data can be linearly separated, meaning a straight line (or hyperplane in higher dimensions) can be used to separate the two classes.
- Probability Prediction: When you need not only the classification outcome but also the probability of a particular class.
- Simple and Fast: When you need a quick and easy-to-implement model that works well for a baseline or when computational resources are limited.
- Interpretability: When model interpretability is important, as logistic regression coefficients can provide insights into the relationship between the predictor variables and the probability of the outcome.
- Simplicity: Logistic Regression is straightforward to implement and understand, making it a good baseline model.
- Efficiency: It requires less computational power and can handle large datasets efficiently.
- Probability Interpretation: It provides probability scores for observations, which can be useful in various decision-making processes.
- Feature Importance: The coefficients of the model can be used to understand the influence of different features on the outcome.
- Regularization: Techniques like L1 (Lasso) and L2 (Ridge) regularization can be easily incorporated to prevent overfitting.
- Linear Decision Boundary: Logistic Regression assumes a linear relationship between the independent variables and the log odds of the outcome, which might not capture complex patterns.
- Not Suitable for Non-linear Problems: For non-linear classification problems, logistic regression may not perform well without transformations or feature engineering.
- Sensitive to Outliers: The performance can be affected by the presence of outliers.
- Overfitting with High-Dimensional Data: When the number of features is very large, logistic regression might overfit the training data, especially if regularization is not applied.
- Binary Limitation: Standard logistic regression is limited to binary classification. For multi-class classification, extensions like multinomial logistic regression or other algorithms are needed.