What is Optimal Transport?
This section introduces the core idea behind Optimal Transport using a simple, intuitive analogy. Imagine you have a pile of dirt and need to move it to fill a hole. Optimal Transport is the mathematical framework for finding the most efficient plan to move the dirt, minimizing the total work (cost) required. This powerful concept can be applied to compare and transform data distributions.
The Earth Mover's Analogy
Use the slider to transport the "mass" from the source to the target. Observe how the total cost accumulates based on the amount of mass moved and the distance it travels.
The Core Problem: Comparing Distributions
In data science, we often need to compare different datasets or distributions. Optimal Transport provides a robust way to measure the "distance" between them. This section visualizes the kind of problem OT aims to solve: finding an optimal matching between a source distribution and a target distribution. Select different scenarios to see how the distributions can vary.
Source Distribution (μ)
Target Distribution (ν)
The Solution: A Transport Plan
Optimal Transport finds a "transport plan" that specifies how much mass to move from each part of the source to each part of the target to minimize total cost. The visualization below shows this plan as connections between the two distributions. The total cost is known as the Wasserstein or "Earth Mover's" Distance, a powerful metric for comparing distributions.
Visualizing the Optimal Transport Plan
Applications in Data Science
Optimal Transport is not just a theoretical concept; it's a practical tool used to solve real-world problems in machine learning and data analysis. This section explores some key applications. Click through the tabs to see how OT is used for generating new data, adapting models to new domains, and more.
Improving Generative Models
Wasserstein GANs (WGANs) use the Wasserstein distance from OT as a loss function. This helps stabilize the training of generative models, allowing them to create more realistic and higher-quality data, such as images. The animation shows a generated distribution (orange) iteratively learning to match the real data distribution (teal) by minimizing the OT distance.