Unsupervised Machine Learning
The Logic behind it.
In an ideal world, all existing data would have labels. But that isn't always the case. Think of a scenario where patients visit a hospital with chest pains and have their chest scans taken in order to get a diagnosis. Some are healthy and others are not. However, you have no idea what a healthy person’s scan looks like and what a sick person’s scan looks like. Basically, there are no labels identifying what each scan implies. What you notice however, is that there are similarities between some of the scans. They can be perfectly grouped into two groups. Therefore, you conclude that one set must therefore represent healthy chest scans and the other set sick ones.
In layman’s terms, that’s how unsupervised machine learning algorithms work. From the example above, there is no training data set. The algorithm is tasked with finding the underlying patterns and insights from the data available.
Classifications of Unsupervised Machine Learning Algorithms
Broadly, Unsupervised machine learning algorithms are based on 2 underlying principles.
- CLUSTERING
- ASSOCIATION
A) Clustering
Under this principle, the unlabeled data is grouped based on similarities or differences. For instance, In the scans example I mentioned earlier, the scans are grouped(clustered) into two groups (clusters) because each set had unique characteristics. Clustering algorithms will take the unlabeled data and group it based on similarities. One algorithm that uses this principle is the K-Means clustering algorithm.
B) Association
When it comes to association, the unsupervised model will focus on finding the relationships between the variables in the data set. For instance, one application that would perfectly fall under this category is Market Behavior Prediction. This application heavily relies on the principle of association. For instance, the model might show that people who purchase a mac-book, also purchase noise-cancelling headphones. This means, the purchase of one item might be associated with a high probability of purchasing another. And if you can guess, this is how recommendation systems works.
Applications
It goes without saying that unsupervised machine learning algorithms have very broad applications. Some of them might include:
- Medical Imaging
- Recommendation engines(eg online shopping)
- Anomaly detection
- Market Analysis
- Computer Vision. ETC
The downsides of Unsupervised Machine Learning
- Long training time- Because the data is unlabeled, the model has to learn on its own. This means the training phase of the algorithm takes a long time because the data takes time to analyse all possibilities and options.
- Requires human intervention — Once the groups are formed, the model will require human intervention and knowledge in order to understand and potentially label the patterns
- High inaccuracy levels — Because there is no training data, the output isn't always accurate.
The Advantages of Unsupervised Learning Algorithms
- These algorithms are used to solve very complex problems. In some cases, this might require domain knowledge like in the case of medical imaging. However, it can uncover hidden patterns that the human eye would otherwise never uncover and thereby leading to discovering and even better treatment methods.
- Data is easily available- simply put, it is easier to find data without labels that nicely structured labeled data.
Note: In the next blog, ill be delving into Clustering. Specifically, K-Means clustering, its applications and an example Jupyter notebook of how to analyse data using this method.