Tuesday, July 11 |
07:00 - 09:00 |
Breakfast (Vistas Dining Room) |
09:00 - 10:30 |
Applications: Medical and Biological applications (Chair: Richard Lockhart) (TCPL 201) |
09:00 - 09:30 |
Janis Abkowitz: Describing blood cell differentiation with stochastic methods: biological insights ↓ Hematopoiesis is the process by which stem cells that reside in the bone marrow differentiate into mature and functional blood cells. Whether an individual hematopoietic stem cell differentiates, replicates (self-renews), or dies is determined by its unique genetic, epigenetic and environmental inputs. These cues cannot be directly observed. Thus, stem cell fates cannot be suitably predicted deterministically, but are very amenable to stochastic modeling and simulation. The many insights derived from 25 years of collaborative studies with Peter Guttorp will be reviewed. As examples, we showed that hematopoiesis is maintained by the successive contributions of different hematopoietic stem cell clones and that the rate at which hematopoietic stem cells replicate differs greatly among mammalian species (from once per 2.5 to once per 40 weeks) although the numbers of stem cells per animal and numbers of stem divisions per lifetime are relatively constant. These insights are crucial for understanding how normal blood cells develop, why diseases like aplastic anemia and leukemia occur, and why close collaborations between biology and statistics are needed to optimize discovery. (TCPL 201) |
09:30 - 10:00 |
Jason Xu: Stochastic Compartmental Modeling and Inference with Biological Applications ↓ Stochastic compartmental models have played a crucial role in statistical studies of biological processes. Inference can be challenging when the data are only partially informative or largely missing, which is almost always the case in experimental and observational studies. We present recent methodology for estimation in challenging missing data settings based on branching process techniques. These methods enable likelihood-based inference in previously intractable settings, but face computational limitations for extremely large systems, which present an open challenge for future work. Applications we consider throughout the talk include molecular epidemiology, hematopoietic lineage tracking, and SIR models of infectious disease from prevalence data. (TCPL 201) |
10:00 - 10:30 |
Vladimir Minin: Statistical analysis of compartmental models: epidemiology, molecular biology, and everything in between (Talk to motivate Breakout B) ↓ Compartmental models describe dynamics of populations, where individuals can be assigned types, and individuals are allowed to switch types as time goes by. I will review statistical challenges that arise when analyzing such models and will highlight how different research communities proposed to tackle these challenges. My main focus will be describing statistics of compartmental models under realistically complicated observation schemes with noisy observations and large fractions of missing data. (TCPL 201) |
10:30 - 11:00 |
Coffee Break (TCPL Foyer) |
11:00 - 12:00 |
Breakout A. Statistical parameter estimation and inference for dynamical models (Moderator: Jennifer Hoeting) (TCPL 202) |
11:01 - 12:00 |
Breakout B. Statistical and computational challenges posed by partially observed compartmental models (Moderator: Vladimir Minin) (TCPL 107) |
11:02 - 12:00 |
Breakout C. Spatio-temporal modeling 1 (Moderators: Michael Stein and Paul Sampson) (TCPL 201) |
12:00 - 13:30 |
Lunch (Vistas Dining Room) |
13:30 - 14:00 |
Vladimir Minin: Report back from Breakout sessions A, B and C (Hoeting; Minin; Stein and Sampson) (TCPL 201) |
14:00 - 15:00 |
Two talks to motivate breakout sessions (Chair: Debashis Mondal) (TCPL 201) |
14:00 - 14:30 |
Alexandra Mello Schmidt: Non-Gaussian processes (motivate breakout D) ↓ In the analysis of most spatio-temporal processes in environmental studies, observations present skewed distributions, with a heavy right or left tail. Usually, a single transformation of the data is used to approximate normality, and stationary Gaussian processes are assumed to model the transformed data. Spatial interpolation and/or temporal prediction are routinely performed by transforming the predictions back to the original scale. The choice of a distribution for the data is key for spatial interpolation and temporal prediction. Initially we will discuss advantages and disadvantages of using a single transformation to model such processes. Then we will focus on some recent advances in the modeling of non-Gaussian spatial and spatio-temporal processes. (TCPL 201) |
14:30 - 15:00 |
Finn Lindgren: A case study in hierarchical space-time modelling (motivate breakout F) ↓ The EUSTACE project will give publicly available daily estimates of surface air temperature since 1850 across the globe for the first time by combining surface and satellite data using novel statistical techniques." To fulfil this ambitious mission, a spatio-temporal multiscale statistical Gaussian random field model is constructed, with a hierarchy of spatio-temporal dependence structures, ranging from weather on a daily timescale to climate on a multidecadal timescale. Connections between SPDEs and Markov random fields are used to obtain sparse matrices for the practical computation of point estimates, uncertainty estimates, and posterior samples. The extreme size of the problem necessitates the use of iterative solvers, which requires using the multiscale structure of the model to design an effective preconditioner. We raise questions about how to leverage domain specific knowledge and merge traditional statistical techniques with modern numerical methods. (TCPL 201) |
15:00 - 15:30 |
Coffee Break (TCPL Foyer) |
15:30 - 16:30 |
Breakout D. Non-Gaussian processes (Moderator: Alexandra Schmidt) (TCPL 202) |
15:31 - 16:30 |
Breakout E. Spatial point processes (Moderator: Aila Särkkä) (TCPL 107) |
15:32 - 16:30 |
Breakout F. Spatio-temporal modeling 2 (Moderators: Finn Lindgren and Wendy Meiring) (TCPL 201) |
16:30 - 17:00 |
Wendy Meiring: Report back from Breakout sessions J, K and L (Schmidt; Särkkä; Meiring and Lindgren) (TCPL 201) |
17:30 - 19:30 |
Dinner (Vistas Dining Room) |
19:30 - 21:00 |
Poster session (TCPL Lobby) |
19:30 - 19:31 |
Jim Faulkner: Locally Adaptive Spatial Smoothing with Shrinkage Prior Markov Random Fields (Poster) ↓ Non-Gaussian random fields can offer increased flexibility over Gaussian random fields for modeling complex spatial surfaces. We extend the concept of Gaussian Markov random fields (GMRF) to allow the kth-order increments to follow non-Gaussian distributions. In particular, we show that placing shrinkage priors such as the horseshoe prior on the kth-order increments can result in a combination of global smoothing and local control. This fully Bayesian formulation allows adaptation to local changes in smoothness of a surface, including abrupt changes or jumps, without compromising smoothness across the rest of the surface. We call the resulting processes shrinkage prior Markov random fields (SPMRFs). We compare the performance of SPRMFs to GMRFs using simulated data, and show that SPRMF models result in reduced bias and increased precision. We also apply the method to two real spatial data sets. This is joint work with Vladimir N. Minin, University of Washington. (TCPL Lobby) |
19:31 - 19:32 |
Shreyan Ganguly: Estimation of Locally Stationary Processes and its Application to Climate Modeling (Poster) ↓ In the analysis of climate it is common to build non-stationary spatio-temporal processes, often based on assuming a random walk behavior over time for the error process. Random walk models may be a poor description for the temporal dynamics, leading to inaccurate uncertainty quantification. Likewise, assuming stationarity in time may also not be a reasonable assumption, especially under climate change. In our ongoing research, we present a class of time-varying autoregressive processes that are stationary in space, but locally stationary in time. We demonstrate how to parameterize the time-varying model parameters in terms of a transformation of basis functions. We present some properties of parameter estimates when the process is observed at a finite collection of spatial locations, and apply our methodology to a spatio-temporal analysis of temperature. (TCPL Lobby) |
19:32 - 19:33 |
Josh Hewitt: Remote effects spatial process models for modeling teleconnections (Poster) ↓ Local factors like orographic effects and temperature, and processes that create remote dependence like the El Nino-Southern Oscillation (ENSO) teleconnection both impact local and regional weather patterns. Many statistical methods, however, can only account for either local or remote covariates when analyzing this data. While existing methods can model important phenomena like rainfall and temperature, improvements are possible by introducing new methods that simultaneously account for the effects of local and remote covariates. We propose a geostatistical model that uses covariates observed on a spatially remote domain to improve locally-driven models of a spatial process. Our model draws on ideas from spatially varying coefficient models, spatial basis functions, and predictive processes to allow several interpretations of effects and to overcome modeling challenges in for teleconnections. We adopt a hierarchical Bayesian framework to conduct inference and make predictions to demonstrate how precipitation in Colorado is more accurately modeled by accounting for teleconnection effects with Pacific Ocean sea surface temperatures. We also discuss physical motivations and interpretations for our model. (TCPL Lobby) |
19:33 - 19:34 |
Mikael Kuusela: Locally stationary spatio-temporal interpolation of Argo profiling float data (Poster) ↓ Argo floats measure sea water temperature and salinity in the upper 2 km of the global ocean. The statistical analysis of the resulting spatio-temporal dataset is challenging due to its non-stationary structure and large size. We propose mapping these data using locally stationary Gaussian process regression where covariance parameter estimation and spatio-temporal prediction are carried out in a moving-window fashion. This yields computationally tractable non-stationary anomaly fields without the need to explicitly model the non-stationary covariance structure. The approach also provides local estimates of the ocean covariance parameters which may be of scientific interest in their own right. We demonstrate using a cross-validation study that the approach yields improved point predictions and uncertainty quantification of Argo anomaly fields and study the structure of the estimated spatial and temporal dependence scales. Joint work with Michael Stein. (TCPL Lobby) |
19:34 - 19:35 |
Johnny Paige: A fault in time and space: Spatial models for past and future Cascadia earthquakes (Poster) ↓ In spite of the fact that studies have found substantial risk of a magnitude 9.0 earthquake on the Cascadia Subduction Zone (CSZ) in the next 50 years, few fully likelihood-based spatial models have been developed to explore them. Although they represent important steps in the study of Cascadia, many studies use only a handful of predetermined earthquakes to represent the full range of those possible. While Levy Processes have been used to model slips due to their convenient stability properties, they are heavy-tailed to the point of having all moments infinite, which is unrealistic. This work combines paleoseismic subsidence data collected along the US and Canadian west coast with GPS-based fault locking rate estimates over the CSZ megathrust to fit a fully stochastic spatial-statistical model for earthquake slips. To match observations, slips are tapered off in deeper parts of the fault, but the rate of tapering varies as a function of latitude and is estimated empirically. Multiple distributions for Cascadia slips are tested, including normal, truncated (positive) normal, and lognormal distributions, and error inflations for subsidence estimates are computed empirically. We also obtain predictive distributions of past earthquakes and infer how historical earthquake slips may have been distributed across the fault. We find that subsidence data uncertainties are on average higher than reported, and that the normal and truncated normal models fit the data better than the lognormal model. This study demonstrates that historical subsidence and GPS data can be used to help understand variation in earthquake slip across the CSZ, and the inferred historical earthquake slips can help us better understand the range of possible CSZ earthquakes. (TCPL Lobby) |
19:35 - 19:36 |
Trevor Ruiz: Sparse Estimation in High-dimensional Time Series (Poster) ↓ Union of Intersections for Vector Autoregression (UoI-VAR) is a method under development for estimating sparse graphs with both temporal and instantaneous edges. The method is presented with performance results on synthetic datasets, and its application to graphical representation of temporal functional connectivity in neural electrocorticography recordings is considered. The graphical estimation problem is formulated as a regression to which common regularized estimation methods can be applied. While sparse estimation in regression with dependent data is a comparatively underdeveloped area relative to the iid setting, the subject has been studied more recently, including as a special case regularized transition matrix estimation in vector autoregressions. The LASSO is the primary regularization method currently used in this context; the UoI-VAR method aims to deliver improved performance for estimation problems involving high-dimensional time series. (TCPL Lobby) |
19:36 - 19:37 |
Max Schneider: Whose Fault Is It Anyway? (Poster) ↓ A Spatially-Varying Parameters Point Process Model of Earthquake Occurrence in the Pacific Northwest
Abstract: A major component of the earthquake risk in the Pacific Northwest comes from onshore faults under population centers. Mitigating and reducing this risk requires earthquake occurrence modeling that not only has optimal fit to earthquake catalogs but also quantifiable measures of uncertainty that can be presented to diverse audiences. Furthermore, characterizing the spatial nonstationary in earthquake occurrence data is crucial to properly model such a tectonically heterogenous region. To reach this aim, we implement a series of earthquake occurrence models based on the hierarchical space-time Epidemic Type Aftershock Sequence (HIST-ETAS) method. ETAS is a space-time point process model where the conditional intensity is governed by a set of parameters such as background seismicity and aftershock productivity that are directly related to earthquake risk. The HIST-ETAS version allows for the estimated parameters to vary over the spatial domain. We combine three catalogs for the region, incorporating error estimates for reported magnitudes and locations. We present preliminary results for several HIST-ETAS models for a subregion centered around the Puget Sound in western Washington. Methods for model comparison and uncertainty quantification (both from the catalog data and model estimations) will be solicited and discussed. (TCPL Lobby) |
19:37 - 19:38 |
Tyler Tucker: Snow-Man: A method of gathering snow and ice time series (Poster) ↓ This poster describes the Snow-Man toolkit for snow and ice coverage calculation and display (SACD) based on the Interactive Multisensor Snow and Ice Mapping System (IMS). Snow-Man generates time series of snow and ice coverage for any area over the Northern Hemisphere from 4 February 1997 to today. IMS is a well-used system for monitoring the snow and ice cover and supported by National Snow and Ice Data Center. The Tibetan Plateau region is selected as an example to describe the toolkit's method, results, and use. Snow-Man novel feature calculates areas using the shoelace formula for a sphere projected on a 2d surface. Snow-Man's main feature obtains Snow coverage for a given day and region by summing up the pixel areas reported as snow or ice. The IMS products include 24, 4, and 1 km nominal resolution data over the northern hemisphere. Although the current version of Snow-Man utilizes only the 24 and 4km data sets, recent developments have the data parsed into a MongoDB database for easy querying and scalable storage, and allowing the addition of the 1km data set. The Tibetan Plateau (TP) region bounded by 25-45 latitude by 65-105 longitude is used as an example to demonstrate our work on SACD. Grid areas calculated by Snow-Man were compared with spherical triangle areas and are shown to have a difference 0.046% for the 24 km grid and 0.033% for the 4 km grid. The differences in the snow-cover area reported by the 24 km and 4 km grids vary between -2.34 and 6.24%. Climate averaged anomalies were calculated and was shown to have a mean of -309.9 km^2. With a standard deviation of 342599.71 km^2, a skew of 1.0006 and a kurtosis of 4.351. Further developments for this product include a website coupled with the database, where the data can be easily obtained easily and without programming experience. (TCPL Lobby) |