Achieve More with Less Data

Active Learning is a smart machine learning technique that minimizes labeling effort by intelligently selecting the most informative data points for training. This guide provides an interactive exploration of its core concepts.

The Efficiency Advantage

The primary motivation for Active Learning is to drastically reduce the cost and time associated with data labeling, which is often the biggest bottleneck in machine learning projects. This section visually compares the resources needed in traditional supervised learning versus an active learning approach to achieve similar model performance.

Traditional Supervised Learning

Requires a massive, fully-labeled dataset from the start.

Active Learning

Starts small and strategically grows the labeled set.

The Active Learning Cycle

Active Learning is not a one-off process but an iterative loop. The model, the data, and the human expert (oracle) work in tandem to progressively improve performance. Click on each step in the diagram below to understand its role in this intelligent cycle.

1️⃣

Train Model

2️⃣

Query Strategy

3️⃣

Oracle Labeling

4️⃣

Augment & Retrain

Exploring Query Strategies

The power of Active Learning lies in its ability to intelligently select which data to label. This is handled by a "query strategy". In this section, you can explore some of the most common strategies and interact with simplified visualizations to understand how they decide which data points are the most informative.

Common Scenarios

Active Learning can be applied in different settings, depending on how data is accessed and processed. Here are the three main scenarios.

Pool-Based Sampling

This is the most common scenario. The algorithm has access to a large pool of unlabeled data and queries the most informative instances from this pool to be labeled by the oracle.

Stream-Based Selective Sampling

Data points arrive one by one in a stream. For each instance, the algorithm must quickly decide whether to query its label or discard it, without the ability to revisit it later.

Membership Query Synthesis

In this scenario, the learning algorithm can generate its own new data points from scratch and ask the oracle to label them. This is powerful but less common in practice.