There are two broad kind of ML problems: supervised and unsupervised learning.
- Supervised Learning – each record has both feature and label (X and Y)
- The goal is to predict Y based on X
- Unsupervised Learning – there is no label (X)
- The goal is to “understand” or make sense out of the data
- Often it means clustering the data points into groups.
The most widely used case is supervised learning because it’s direct and we often have something that we want to predict in a business use case. However, it can quite expensive to collect label often. So there are many labeling technique to lower the cost of obtaining labels:
- Self-supervised learning
- The label is in the raw data.
- This is strange if you think about it. If the label is already in the data, why are we trying to predict something that we already know? Actually there can be a good reasons. One example is to learning an embedding about the data.
- Ex: BERT – we try to predict masked words and next-sentence relationship to build a word embedding
- Ex: Audio/Speech recognition (paper: Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?)
- Semi-supervised learning
- Transductive Label propagation
- Nearest neighbor graph based on embedding
- Idea: Similar input should produce similar target
- Paper: Iscen, A., Tolias, G., Avrithis, Y., & Chum, O. (2019). Label propagation for deep semi-supervised learning
- Assumption (need at least one of below)
- Continuity – the decision boundary should be relatively simple
- Cluster – data points in feature space form clusters, and points in the same cluster are likely to have the same label
- Manifold – data points lie on a manifold that is much lower dimension. In other words, the actual data distribution has much lower dimension (fewer degree of freedom) than the number of dimension of the feature space (ex: natural images)
- Transductive Label propagation
- Active learning
- The learning algorithm asks for the data points to be labeled in an interactive fashion
- It’s an orthogonal concept to semi-supervised learning because it’s can be applied to both supervised and semi-supervised.
- Query strategy
- Margin sampling
- cluster-based
- query by committee
- region based sampling
- Weak supervision
- Learning from noisy label
- Weak labels
- Global statistics from groups: means of subset of samples
- Weak classifier: heuristic by subject-matter expert
- Has a generative and discriminative model
- Application: Snorkel (Stanford) and Cleanlab (MIT)