Sample Whitepaper Page 5

Oct 01,2024

This is the Text from Text

alt text

Abstract

When we think of machine learning/deep learning models, two techniques come to mind immediately — supervised learning and unsupervised learning. In very simple terms, main difference between two approaches is - availability of labelled data, supervised learning has it, and other, does not.

Both approaches have their advantages and shortcomings and have their fair share of relevance based on business use case(s) in question. Over time, scientists have introduced several techniques that offer the flavors of both worlds.

Two most popular techniques are semi-supervised learning and self-supervised learning. These methods are developed, again to create a “data efficient” system.

We can say that these are “somewhat” an extension of “unsupervised learning” as pointed out by Yann LeCun – “I Now call it "self-supervised learning", because "unsupervised" is both a loaded and confusing term. ” Source – Link Semi-supervised learning is a machine learning method in which we have input data, and a fraction of input data is labeled i.e. only few input samples of the dataset are provided with target values.

It is a mix of supervised and unsupervised learning. This can be useful in training of models with less labelled training data. The training process can use a small chunk of labeled data and pseudo-label rest of the dataset by learning from the feature representation of labeled data.

Self-supervised learning is a machine learning process where a model trains itself to learn one part of input from another part of input. It is also known as predictive or pretext learning.

In case of pseudo-labeling, we have some labelled data to learn from but in case of self-supervised learning we don’t have any labeled data and thus we train the model using method like contrastive learning.

In this process, an unsupervised problem is transformed to a supervised problem by auto generating labels. To make use of huge quantity of unlabeled data, it is crucial to set right learning objectives to get supervision from the data itself.

The process of self-supervised learning method is to identify any hidden part of the input from any unhidden part of the input. This work tackles the problems surrounding data availability for CV use cases.

How really these “learnings” pan out? Let’s consider a simple example. Consider having a significant number of unlabeled data waiting to be labelled for modelling, such labelling tasks equally require lot of manual labor which further increases the overall resources.

There are 2 ways to handle such situations

To label a small amount of data and use it to train the model and pseudo-label remaining data is known as – Semi-supervised approach

To label a small amount of data and use it to train the model and pseudo-label remaining data is known as – Semi-supervised approach

test