Machine Learning – Unsupervised Learning

Unsupervised machine learning is also a machine learning system which responds to our question of whether intervention is involved in training the model. As you may recall, supervised machine learning answers this question by utilizing labels attached to data objects that relate the data value to its proper solution. With unsupervised machine learning, alternatively, the training data uses unlabeled objects. In this case, the system attempts to learn without having explicit instructions, but rather, seeks to identify its own instructions.

Techniques of Unsupervised Machine Learning

A variety of unsupervised machine learning methodologies exists which permit the elucidation of a solution to a particular data object. One of these is clustering, which involves a machine learning system that divides data into various groups based on parameters associated with the data. Clustering algorithms can go so far as dividing these groups into their own subgroups for enhanced precision, especially when working with large data sets.

Unsupervised machine learning also supports execution of visualization tasks which help create visible models based on unlabeled data. The output can be either a two-dimensional or three-dimensional model of the data in space.

Finally, with respect to unsupervised learning, associative learning is one particular task that may prove to be especially useful. This methodology takes multi-dimensional data as input and identifies relationships between various parameters of the data.

We take an opportunity in this article to explicate the algorithms associated with each of these unsupervised machine learning methodologies.

K-Means Clustering

The purpose of K-Means Clustering in unsupervised learning is to discover patterns in data entries and use pattern similarities to segregate these items into groups. 

When executing a K-Means clustering protocol, the user defines a value ‘k’ which represents the number of clusters that define a data set. Each cluster itself has a centroid, wherein each centroid represents the center value of the cluster. Incoming data is assigned to the cluster which it happens to be nearest to.

This image has an empty alt attribute; its file name is Screen-Shot-2020-06-22-at-12.04.57-PM.png

The K-Means Clustering algorithms operates by randomly identifying clusters in a data set and optimizing the position of the centroid within that cluster based on the data points that occupy it. Once the centroids have been defined, the program may be used to determine the values of new data inputs.

How the K-means algorithm works

To process the learning data, the K-means algorithm in data mining starts with a first group of randomly selected centroids, which are used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to optimize the positions of the centroids

It halts creating and optimizing clusters when either:

  • The centroids have stabilized — there is no change in their values because the clustering has been successful.
  • The defined number of iterations has been achieved.

Disclaimer: Content and images collected from various blog and website

Toufiq Mahmud