Machine learning algorithms are divided into categories according to their purpose.

Main categories are

**Supervised learning**(predictive model, “labeled” data)- classification (Logistic Regression, Decision Tree, KNN, Random Forest, SVM, Naive Bayes, etc)
- Regression (Linear Regression, Gradient Descent)

**Unsupervised learning**(descriptive model, “unlabeled” data)- clustering (K-Means)
- pattern discovery

**Supervised learning**

With supervised machine learning, the training data is delivered to the machine with information supplemented with labels. Thus, not only is the machine given the input, but also the solution. In this context, the model is able to utilize the parameters and values of the training data and associates them with the correct solution. As the system receives new data, the model takes the parameter values of the data and utilizes the known data parameters to associate this new data with a particular value.

The primary applications of supervised machine learning include the tasks of classification and regression.

**Linear Regression**

It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s). Here, we establish relationship between independent and dependent variables by fitting a best line. This best fit line is known as regression line and represented by a linear equation Y = mX + c

Where, Y is dependent variable, X is independent variable, m denotes slop, c is intersecting point of Y.

m = (mean(X).mean(Y) – mean(X.Y)) / (Square(mean(X)) – mean((Square(X)))c = mean(Y) – m. mean(X)

**Logistic Regression**

Don’t get confused by its name! It is a classification not a regression algorithm. It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false ) based on given set of independent variable(s). In simple words, it predicts the probability of occurrence of an event by fitting data to a logic function. Hence, it is also known as logic regression. Since, it predicts the probability, its output values lies between 0 and 1 (as expected).

*y*=(*w*[0]×*x*[0])+(*w*[1]×*x*[1])+⋅⋅⋅+(*w*[*i*]×*x*[*i*])+*b*

By this function, ‘x[i]’ represents the features of the data, which can be inferred as the data in conjunction with its associated attributes. Additionally, ‘w[i]’ and ‘b’ are said to be parameters of the data, which are only inferred in training. By simplifying this function, we observe a linear relationship of the form:y=wx+b*y*=*w**x*+*b*

**Multivariate Regression Algorithms**

Multivariate regression algorithms are particularly useful for predicting data values when multiple parameters of significant weight control the behavior of the data entries. These multivariate regression algorithms assign a weight to the different parameters based on the correlation of these parameter values to the true value of the data. Then while incoming data enters the system, the model interprets the parameter values to discern the value of the data entry.

The multivariate regression algorithms mathematically model as:y=b_0+b_1x_1+b_2x_2*y*=*b*0+*b*1*x*1+*b*2*x*2

Thus, the multivariate logistic regression model implies that if the weights of the parameters ‘x1’ and ‘x2’ are known and the parameter values ‘b0’, ‘b1’, and ‘b2’ are known, then the value of ‘y’ may be predicted from the model.

**Decision Tree**

It is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables.

With decision trees, the root node represents the unadulterated total sample population. The Decision Tree algorithm, whichever version is used, splits the root node into a series of sub-nodes. At this point, the sub-nodes may be left alone, or may be further sub-divided to yield decision nodes. At the point where nodes to not split further, these regions are known as leaves.

The decision tree algorithms for machine learning are relatively easy to program. This is due to the fact that they rely on the simply coded Boolean logic states ‘if’ or ‘else’. This fact makes it clear why this can be so ubiquitously applied for categorical and numerical variables.

**K Nearest Neighbours**

As a supervised classification algorithm, the K-Nearest Neighbor algorithm is one of the most ubiquitous. Like other classification methodologies, it applies its algorithms to segregate data into different groups based upon their parameter values. The K-Nearest Neighbor algorithm is unique from the other classification algorithms in that it relies on computing the distance from the input value to the nearest value of the training data.

This might seem rather mundane, but it may be executed in a variety of ways. Firstly, our standard understanding of distance comes from the Euclidean conceptualization of distance:D=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}*D*=(*x*2−*x*1)2+(*y*2−*y*1)2

**Naive-Bayes**

Firstly, linear classifiers are fundamentally based upon the linear relationships that exist between incoming data in the various parameters thereof. One of such linear classifier algorithm is the Naive-Bayes classifier. This particular linear classifier endeavors to segregate data objects into groups based on an assumption that there is a degree of independence between the various predictors of their group characteristics. Such an assumption relies on Bayes theorem which mathematically models as:P(A|B)=\frac{P(B|A)P(A)}{P(B)}*P*(*A*∣*B*)=*P*(*B*)*P*(*B*∣*A*)*P*(*A*)

What Bayes theorem determines is the probability that some even A will occur if the event B is true. The Naive-Bayes linear classification model is particularly useful for working with large amounts of data. Furthermore, they are easy for encoding classification models as the category by which a data entry is said to belong to depends on the probability as computed by its parameter values.

**Random Forests**

The random forest is a subset of a decision tree algorithm, but also operates as an ensemble algorithm. In this capacity, the random forest algorithm utilizes several different decision tree algorithms randomly created from the training data, and therefrom chooses the optimal methodology. In this manner, by examining all of the different possibilities of decision tree algorithms that could be employed, the optimal one. The output of the random forest algorithm determines the test object.

Based on the description, it might seem apparent that this algorithm may deliver a very precise methodology for finding a solution. While this may be helpful, it can also be detrimental to flexibility for a machine learning model. This is due to the possibility of overfitting that may come from taking into account so precisely the weights of different parameters for the data.

**Support Vector Machines (SVMs)**

While the support vector machine learning methodology may be applied with linear regression, it is also particularly useful with respect to classification. With SVMs, the training data plots the data in space based on several select parameters. The training data self-segregate into different groups based upon the value of these parameters. Once the groups are fully formed, the algorithm creates a line or series of lines that segregate the groups. When new data enters the machine learning system, the side of the line they end up on determines the group they belong to.

Disclaimer: Content and images collected from various blog and website