Topological Data Analysis: Developing Abstract Foundations (17w5108)

Arriving in Banff, Alberta Sunday, July 30 and departing Friday August 4, 2017


(Technische Universität München)

(Columbia University)


One of the main challenges that topological data analysis faces is finding a common, working language for research. Important breakthroughs have been established to find a common ground to bring together topologists, probabilists, statisticians, and computer and other data scientists, including a mathematical, probabilistic definition of the space of persistence diagrams by Mileyko, Mukherjee & Harer (2012). Turner, Mileyko, Mukherjee & Harer (2012) then proceed to define probabilistic quantities, such as the expectation, variance, and conditional expectation, thereby laying the foundation to answer questions that are commonly asked in classical statistics. From this, other important statistical quantities have been studied in the context of persistent homology and topological data analysis, including the construction of sufficient statistics by Turner, Mukherjee & Boyer (2014) and a topological summary for data by Bubenik (2014), inference related to dimension reduction and embedding higher-dimensional data by Bobrowski, Mukherjee & Taylor (2014), and limiting distributions of topological summaries by Bobrowski & Mukherjee (2013).

While much progress has been made in this direction, there still remains a great deal to be accomplished, both in theory as well as in applications. The main aim of this workshop is to build on and develop abstract foundations of the subject. In the same manner that probability theory is heavily grounded on measure theory and functional analysis, we would like to establish the same rigorous theoretical foundations for topological data analysis, drawing on not only homology and cohomology theory, but also by bringing in theory from other branches of pure mathematics.

For example, topological persistence itself has been studied rather effectively in terms of commutative algebra and representation theory. The essential difficulty of multi-dimensional persistence, along with possible solutions, becomes clear in this framework; see Carlsson & Zomorodian (2007). More recently Bubenik & Scott (2014) have recast the theory from a categorical point of view. In their more general framework, even very concrete geometric objects such as merge trees and Reeb graphs can be interpreted as persistence objects and as cosheaves.

Another such example is the parallel between sheaf theory and Gibbs sampling. Gibbs sampling is an important classical statistical technique which allows sampling from general data generation processes motivated by the theory of Markov random fields, under which global joint spatial distributions are determined by local, conditional distributions. Statistical sampling theory is particularly important in the context of topological data analysis since the complicated nature of the space of persistence diagrams presents significant challenges in constructing probability distributions for topological quantities. Sheaves are an algebraic construct that follow the same idea of inferencing global structure given local information; in particular, they are tools for keeping track of locally-defined information associated with open sets of a topological space. One theoretical foundation that has yet to be established is a sheaf construction of Gibbs sampling and Markov random fields; in this manner, the goal is towards the construction of a theoretical probability distribution for persistence diagrams from which sampling is possible, so that likelihood inference, a cornerstone of statistical inference, may be performed to learn the underlying manifold on which the stochastic process is defined.

There are various other parallels between topological persistence and statistical and probabilistic topology that have yet to be linked theoretically in the same manner. The principal purpose of this workshop is to bring together researchers from both side to work to develop new foundations that make these parallels explicit and effective.