Achieve More with Less Data
Active Learning is a smart machine learning technique that minimizes labeling effort by intelligently selecting the most informative data points for training. This guide provides an interactive exploration of its core concepts.
The Efficiency Advantage
The primary motivation for Active Learning is to drastically reduce the cost and time associated with data labeling, which is often the biggest bottleneck in machine learning projects. This section visually compares the resources needed in traditional supervised learning versus an active learning approach to achieve similar model performance.
Traditional Supervised Learning
Requires a massive, fully-labeled dataset from the start.
Active Learning
Starts small and strategically grows the labeled set.
The Active Learning Cycle
Active Learning is not a one-off process but an iterative loop. The model, the data, and the human expert (oracle) work in tandem to progressively improve performance. Click on each step in the diagram below to understand its role in this intelligent cycle.
Train Model
Query Strategy
Oracle Labeling
Augment & Retrain
Exploring Query Strategies
The power of Active Learning lies in its ability to intelligently select which data to label. This is handled by a "query strategy". In this section, you can explore some of the most common strategies and interact with simplified visualizations to understand how they decide which data points are the most informative.
Common Scenarios
Active Learning can be applied in different settings, depending on how data is accessed and processed. Here are the three main scenarios.
Pool-Based Sampling
This is the most common scenario. The algorithm has access to a large pool of unlabeled data and queries the most informative instances from this pool to be labeled by the oracle.
Stream-Based Selective Sampling
Data points arrive one by one in a stream. For each instance, the algorithm must quickly decide whether to query its label or discard it, without the ability to revisit it later.
Membership Query Synthesis
In this scenario, the learning algorithm can generate its own new data points from scratch and ask the oracle to label them. This is powerful but less common in practice.