# Schedule for: 21w5120 - Entropic Regularization of Optimal Transport and Applications (Online)

Beginning on Sunday, June 20 and ending Friday June 25, 2021

All times in Banff, Alberta time, MDT (UTC-6).

Monday, June 21 | |
---|---|

08:45 - 09:00 |
Introduction and Welcome by BIRS Staff ↓ A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions. (Online) |

09:00 - 09:40 |
Luca Tamanini: Small-time asymptotics of the metric Schrödinger problem ↓ The Schrödinger problem as "noised" optimal transport is by now a well-established interpretation. From this perspective several natural questions stem, as for instance the convergence rate as the noise parameter vanishes of many quantities: optimal value, Schrödinger bridges and potentials... As for the optimal value, after the works of Erbar-Maas-Renger and Pal a first-order Taylor expansion is available. First aim of this talk is to improve this result in a twofold sense: from the first to the second order and from the Euclidean to the Riemannian setting (and actually far beyond). From the proof it will be clear that the statement is in fact a particular instance of a more general result. For this reason, in the second part of the talk we introduce a large class of dynamical variational problems, extending far beyond the classical Schrödinger problem, and for them we prove $\Gamma$-convergence towards the geodesic problem and a suitable generalization of the second-order Taylor expansion. (based on joint works with G. Conforti, L. Monsaingeon and D. Vorotnikov) (Online) |

09:50 - 10:30 |
Luca Nenna: (Entropic) Optimal Transport in the Grand Canonical ensemble ↓ In this talk I will firstly review standard Multi-Marginal Optimal Transport ( a number N of marginals is fixed) focusing, in particular, on the applications in Quantum Mechanics (in this case the marginals are all the same and represent the electron density). I will then extend the Optimal Transportation problem to the grand canonical setting: only the expected number of marginals/electrons is now given (i.e. we can now define a OT problem with a fractional number of marginals). I will compare these two problems and show how they behave differently despite considering the same cost functions. Existence of minimisers, duality, entropic formulation and numerics will be discussed. (Online) |

10:40 - 11:20 |
Young-Heon Kim: Optimal transport in Brownian motion stopping ↓ We consider an optimal transport problem arising from stopping the Brownian motion from a given distribution to get a fixed or free target distribution; the fixed target case is often called the optimal Skorokhod embedding problem in the literature, a popular topic in math finance pioneered by many people. Our focus is on the case of general dimensions, which has not been well understood. We explain that under certain natural assumptions on the transportation cost, the optimal stopping time is given by the hitting time to a barrier, which is determined by the solution to the dual optimization problem. In the free target case, the problem is related to the Stefan problem, that is, a free boundary problem for the heat equation. We obtain analytical information on the optimal solutions, including certain BV estimates. The fixed target case is mainly from the joint work with Nassif Ghoussoub and Aaron Palmer at UBC, while the free target case is the recent joint work (in-progress) with Inwon Kim at UCLA. (Online) |

11:30 - 12:10 |
Robert McCann: Inscribed radius bounds for lower Ricci bounded metric measure spaces with mean convex boundary ↓ Inscribed radius bounds for lower Ricci bounded metricConsider an essentially nonbranching metric measure space with the measure contraction property of Ohta and Sturm. We prove a sharp upper bound on the inscribed radius of any subset whose boundary has a suitably signed lower bound on its generalized mean curvature. This provides a nonsmooth analog of results dating back to Kasue (1983) and subsequent authors. We prove a stability statement concerning such bounds and --- in the Riemannian curvature-dimension (RCD) setting --- characterize the cases of equality. This represents joint work with Annegret Burtscher, Christian Ketterer and Eric Woolgar. (Online) |

14:15 - 14:20 |
Group Photo ↓ Please turn on your cameras for the "group photo" -- a screenshot in Zoom's Gallery view. (Online) |

14:30 - 15:10 |
Yongxin Chen: Graphical Optimal Transport and its Applications ↓ Multi-marginal optimal transport (MOT) is a generalization of optimal transport theory to settings with possibly more than two marginals. The computation of the solutions to MOT problems has been a longstanding challenge. In this talk, we introduce graphical optimal transport, a special class of MOT problems. We consider MOT problems from a probabilistic graphical model perspective and point out an elegant connection between the two when the underlying cost for optimal transport allows a graph structure. In particular, an entropy regularized MOT is equivalent to a Bayesian marginal inference problem for probabilistic graphical models with the additional requirement that some of the marginal distributions are specified. This relation on the one hand extends the optimal transport as well as the probabilistic graphical model theories, and on the other hand leads to fast algorithms for MOT by leveraging the well-developed algorithms in Bayesian inference. We will cover recent developments of graphical optimal transport in theory and algorithms. We will also go over several applications in aggregate filtering and mean field games. (Online) |

15:20 - 16:20 | Visual Talks: Adolfo Vargas-Jiménez, David Simmons, Axel Turnquist, Johannes Wiesel (Online) |

Tuesday, June 22 | |
---|---|

09:00 - 09:40 |
Gabriel Peyré: Scaling Optimal Transport for High dimensional Learning ↓ Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will explain how to leverage entropic regularization methods to define computationally efficient loss functions, approximating OT with a better sample complexity. More information and references can be found on the website of our book "Computational Optimal Transport" https://optimaltransport.github.io/ (Online) |

09:50 - 10:30 |
Anna Korba: Wasserstein Proximal Gradient ↓ Wasserstein gradient flows are continuous time dynamics that define curves of steepest descent to minimize an objective function over the space of probability measures (i.e., the Wasserstein space). This objective is typically a divergence w.r.t. a fixed target distribution. In recent years, these continuous time dynamics have been used to study the convergence of machine learning algorithms aiming at approximating a probability distribution. However, the discrete-time behavior of these algorithms might differ from the continuous time dynamics. Besides, although discretized gradient flows have been proposed in the literature, little is known about their minimization power. In this work, we propose a Forward Backward (FB) discretization scheme that can tackle the case where the objective function is the sum of a smooth and a nonsmooth geodesically convex terms. Using techniques from convex optimization and optimal transport, we analyze the FB scheme as a minimization algorithm on the Wasserstein space. More precisely, we show under mild assumptions that the FB scheme has convergence guarantees similar to the proximal gradient algorithm in Euclidean spaces. (Online) |

10:40 - 11:20 |
Jonathan Niles-Weed: Asymptotics for semi-discrete entropic optimal transport ↓ We compute exact second-order asymptotics for the cost of an optimal solution to the entropic optimal transport problem in the continuous-to-discrete, or semi-discrete, setting. In contrast to the discrete-discrete or continuous-continuous case, we show that the first-order term in this expansion vanishes but the second-order term does not, so that in the semi-discrete setting the difference in cost between the unregularized and regularized solution is quadratic in the inverse regularization parameter, with a leading constant that depends explicitly on the value of the density at the points of discontinuity of the optimal unregularized map between the measures. We develop these results by proving new pointwise convergence rates of the solutions to the dual problem, which may be of independent interest. Joint work with J. Alsthculer and A. Stromme. (Online) |

11:30 - 12:10 |
Zaid Harchaoui: Schrödinger Bridge with Entropic Regularization: two-sample test, chaos decomposition, and large-sample limits ↓ We consider an entropy-regularized statistic that allows one to compare two data samples drawn from possibly different distributions. The statistic admits an expression as a weighted average of Monge couplings with respect to a Gibbs measure. This coupling can be related to the static Schrödinger bridge given a finite number of particles. We establish the asymptotic consistency as the sample sizes go to infinity of the statistic and show that the population limit is the solution of Föllmer's entropy-regularized optimal transport. The proof technique relies on a chaos decomposition for paired samples. This is joint work with Lang Liu and Soumik Pal. (Online) |

14:30 - 15:10 |
Promit Ghosal: Geometry and large deviation of entropic optimal transport ↓ Optimal transport (OT) theory has flourished due to its connections with geometry, analysis, probability theory, and other fields in mathematics. A renewed interest in OT stems from applied fields such as machine learning, image processing and statistics through the introduction of entropic regularization. In this talk, we will discuss the convergence of entropically regularized optimal transport. Our first result is about a large deviation principle of the associated optimizers in entropic OT and the second result is about the stability of the optimizers under weak convergence. To prove these results, we will introduce a new notion called 'cyclical invariance' of measures. This is a joint work with Marcel Nutz and Espen Bernton. (Online) |

15:20 - 16:20 | Visual Talks: Tobias Schroeder, Lang Liu, Matthieu Heitz, Fang Han (Online) |

Wednesday, June 23 | |
---|---|

09:00 - 09:40 |
Beatrice Acciaio: PQ-GAN: a market generation model consistent with observed spot prices and derivative prices ↓ In this talk I will present a model for market generation that is consistent with both the observed spot prices and the market prices of derivatives. The structure used to learn the evolution of the asset prices (under the real-world measure P) is that of a conditional GAN for time series generation, that uses causal optimal transport in the training objective. On the other hand, the derivative prices are used to learn the change of measure from P to the pricing measure Q. This talk is based on a joint work with F. Krach. (Online) |

09:50 - 10:30 |
Alfred Galichon: Dynamic Matching Problems (joint w Pauline Corblet and Jeremy Fox) ↓ For the purposes of economics applications, we formulate a class of dynamic matching problems. We investigate in particular the stationary case, and computation and estimation issues are investigated. (Online) |

10:40 - 11:20 |
Ting-Kam Leonard Wong: Logarithmic divergences and statistical applications ↓ We consider the Dirichlet optimal transport which is a multiplicative analogue of the Wasserstein transport and is deeply connected to the Dirichlet distribution. The log-likelihood of this distribution defines a logarithmic divergence, in the same way that the square loss comes from the normal distribution. Using this divergence, which can be extended to a family of generalized exponential families, we consider statistical methodologies including clustering and nonlinear principal component analysis. Our approach extends a well-known duality between exponential family and Bregman divergence. Joint work with Zhixu Tao, Jiaowen Yang and Jun Zhang. (Online) |

11:30 - 12:00 |
Giovanni Conforti: Hamilton Jacobi equations for controlled gradient flows: the comparison principle ↓ This talk is devoted to the study of a class of Hamilton-Jacobi equations on the space of probability measures that arises naturally in connection with the study of a general form of the Schrödinger problem for interacting particle systems. After presenting the equations and their geometrical interpretation, I will move on to illustrate the main ideas behind a general strategy for to prove uniqueness of viscosity solutions, i.e. the comparison principle. Joint work with D. Tonon (U. Padova) and R. Kraaij (TU Delft). (Online) |

12:00 - 12:15 | MITACS Presentation (Online) |

14:30 - 16:20 | Gathertown (social gathering) (Online) |

Thursday, June 24 | |
---|---|

09:00 - 09:40 |
Martin Huesmann: Fluctuations in the optimal matching problems ↓ The optimal matching problem is one of the classical random optimization problems. While the asymptotic behavior of the expected cost is well understood only little is known for the asymptotic behavior of the optimal couplings - the solutions to the optimal matching problem. In this talk we show that at all mesoscopic scales the displacement under the optimal coupling converges in suitable Sobolev spaces to a Gaussian field which can be identified as the curl-free part of a vector Gaussian free field. (based on joint work with Michael Goldman) (Online) |

09:50 - 10:30 |
Mathias Beiglböck: The Wasserstein space of stochastic processes ↓ Wasserstein distance induces a natural Riemannian structure for the probabilities on the Euclidean space. This insight of classical transport theory is fundamental for tremendous applications in various fields of pure and applied mathematics. We believe that an appropriate probabilistic variant, the adapted Wasserstein distance AW, can play a similar role for the class FP of filtered processes, i.e. stochastic processes together with a filtration. In contrast to other topologies for stochastic processes, probabilistic operations such as the Doob-decomposition, optimal stopping and stochastic control are continuous w.r.t. AW. We also show that (FP,AW) is a geodesic space, isometric to a classical Wasserstein space, and that martingales form a closed geodesically convex subspace. (Online) |

10:40 - 11:20 |
Anna Kausamo: Multi-marginal entropy-regularized optimal transportation for singular cost functions ↓ I will introduce multi-marginal optimal transportation (MOT) for singular cost functions and mention some of its applications. Then I move on to the entropy-regularised framework, focusing on the Gamma-convergence proof of the regularized minimizers for the singular MOT problem towards a non-regularised solution when the regularisation parameter goes to zero. When one goes from two to many marginals and from attractive to singular cost function, different levels of difficulty are introduced. One of the aims of my talk is to show how these difficulties can be tackled. (Online) |

14:30 - 15:10 |
Geoffrey Schiebinger: Towards a mathematical theory of development ↓ New measurement technologies like single-cell RNA sequencing are bringing 'big data' to biology. My group develops mathematical tools for analyzing time-courses of high-dimensional gene expression data, leveraging tools from probability and optimal transport. We aim to develop a mathematical theory to answer questions like How does a stem cell transform into a muscle cell, a skin cell, or a neuron? How can we reprogram a skin cell into a neuron? We model a developing population of cells with a curve in the space of probability distributions on a high-dimensional gene expression space. We design algorithms to recover these curves from samples at various time-points and we collaborate closely with experimentalists to test these ideas on real data. (Online) |

15:20 - 16:20 |
Open problem discussion ↓ discussion to be held in the main zoom meeting room (Online) |

Friday, June 25 | |
---|---|

09:00 - 09:40 |
Max von Renesse: On Overrelaxation in the Sinkhorn Algorithm ↓ We discuss a simple but potent modification of the Sinkhorn algorithm based on overrelaxation. We provide an a priori estimate for the crucial overrelaxation parameter which guarantees both global and improved local convergence. (Online) |

09:50 - 10:30 |
Flavien Léger: Taylor expansions for the regularized optimal transport problem ↓ We prove Taylor expansions of the regularized optimal transport problem with general cost as the temperature goes to zero.
Our first contribution is a multivariate Laplace expansion formula. We show that the first-order terms involve the scalar curvature in the corresponding Hessian geometry.
We then obtain:
- first-order expansion of the potentials;
- second-order expansion of the optimal transport value.
Joint work with Pierre Roussillon, François-Xavier Vialard and Gabriel Peyré. (Online) |

10:40 - 11:20 |
Yunan Yang: Optimal transport-based objective function for physical inverse problems ↓ We have proposed the quadratic Wasserstein distance from optimal transport theory for inverse problems, including nonlinear medium reconstruction for waveform inversions and chaotic dynamical systems parameter identification. Traditional methods for both applications suffered from longstanding difficulties such as nonconvexity and noise sensitivity. As we advance, we discover that the advantages of using optimal transposed-based metrics apply in a broader class of data-fitting problems where the continuous dependence between the parameter and the data involves the change of data phase or support of the data. The implicit regularization effects of the Wasserstein distance similar to a weak norm also help improve stability of parameter identification. (Online) |

11:30 - 12:10 |
Katy Craig: A blob method for diffusion and applications to sampling and two layer neural networks. ↓ Given a desired target distribution and an initial guess of that distribution, composed of finitely many samples, what is the best way to evolve the locations of the samples so that they more accurately represent the desired distribution? A classical solution to this problem is to allow the samples to evolve according to Langevin dynamics, the stochastic particle method corresponding to the Fokker-Planck equation. In today’s talk, I will contrast this classical approach with a deterministic particle method corresponding to the porous medium equation. This method corresponds exactly to the mean-field dynamics of training a two layer neural network for a radial basis function activation function. We prove that, as the number of samples increases and the variance of the radial basis function goes to zero, the particle method converges to a bounded entropy solution of the porous medium equation. As a consequence, we obtain both a novel method for sampling probability distributions as well as insight into the training dynamics of two layer neural networks in the mean field regime. This is joint work with Karthik Elamvazhuthi (UCLA), Matt Haberland (Cal Poly), and Olga Turanova (Michigan State). (Online) |