- Naive Bayes classifier for categorical data from scratch in Python
- Naive Bayes classifier for continuous data from scratch in Python
- Data Visualization: Showing Iris dataset with Blender API
- Norms in vector space: A review of norms, and reminding p-norms are included. Finally, we compare some special p-norms.
- Inner products in vector space: Reminding dot product and Frobenius inner product, and then canonical norms based on them. There are examples with module numpy.
- Gram-Schmidt process: An algorithm to convert a linearly independent set of vectors into an orthogonal set of vectors.
- Boxplot: The elements of a boxplot are reviewed here, including: medians, quartiles, fences, and outliers.
- Probability, standard terms: such as sample space, trial, outcome, and event.
- Logisitic function: It is an S-shaped curve, which is widely used in machine learning and neural networks.
- Sigmoid functions (curves): Some examples are included. They are widely used in neural networks and deep learning.
- Conditional probability: We review the conditional probability and based on it, we get the multiplication rule.
- Inclusion-exclusion principle: We review this principle both in set theory and in probability. Python code is also provided.
- Probability, independent events: The property of independent events are mentioned here. Also, multiplication rule is included with some examples.
- Probability, Bayes' rule: The Bayes' rule is expressed here along the total probability theorem. Bayes' rule is defined by conditional probabilities. Some Python code are included too.
- Linear Regression with Least Squares: When we assume the data points are related through a linear function, we can predict the dependent variable from independent variabe(s). This is a lienar regression. One way to find the parameters of a linear regression is to use a Least Squares estimator. The related Python code clarifies this topic.
- Ridge Regression with Least Squares: Ridge regression is an extension of linear regression in which a penalty term is included in the loss function. This penalty term is called regularization term. Ridge regression is especially useful when data points are noisy and/or having outliers. It also shows robustness against overfitting.
- Gradient Descent for Linear and Ridge Regression: This time we are going to use the Gradient Descent method for finding the minimum of loss function of ridge regression and linear regression. For a deeper look at the Gradient Descent (GD), see our repository for Optimization.
- Gradient and tangent planes: For a surface in the form of f(x,y,z)=constant, its gradient vector is orthogonal to the surface at that point. With this property, we can get the equation for tangent planes to a surface or a level curve. It is reminded that a tangent plane is a linear approximation to the given function.
- Lasso regression and Elastic Net: After becoming familar with Ridge regression, we should become acquainted with Lasso and Elastic Net. In Lasso, we use the L1 norm for the regularization term. However, in Elastic Net, we employ both L1 and L2 norms for regularization. This post mentions Lasso and Elastic Net with an example on a real dataset for classification using subgradient method.
- Coordinate Descent for Lasso: In the previous post, we talked about Lasso regression, which was done by subgradient method. In this post, however, we use the coordinate descent for Lasso regression. For a deeper discussion on Coordinate Descent, see our post in the repository of Optimization.
- Probability, Discrete Random Variables: Discrete random variables take values from a countable set. When we have discrete random variables, their distribution is defined by a step function. Here, we use Bernoulli distribution as an example of those representing discrete random variables. Also, we compute the entropy of a Bernoulli distribution for different values of its parameter.
- Probability, Continuous Random Variables: Continuous random variables often take values from an interval of real numbers. In fact, when we measure things such as speed, voltage, profit, and etc; we are dealing with continuous random variables. As an example, we review the continuous uniform distribution function and its probability (density) function.
- Maximum Likelihood Estimation: When we have a number of samples taken from a probability function, which we do not know its parameters, one way to estimate the parameters is to use the maximum likelihood estimation (MLE). Here, we review the MLE and apply it to two examples.