Monday, September 16 |
07:00 - 08:45 |
Breakfast ↓ Breakfast is served daily between 7 and 9am in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
08:45 - 09:00 |
Introduction and Welcome by BIRS Staff ↓ A brief introduction to BIRS with important logistical information, technology instruction, and opportunity for participants to ask questions. (TCPL 201) |
09:00 - 09:40 |
Yijuan Hu: Analyzing matched sets of microbiome data using the LDM and PERMANOVA (Presenter: Glen Satten) ↓ Matched data arise frequently in microbiome studies. For example, we may collect samples pre and post treatment from a set of subjects, or matched case-control subjects who were matched on important confounding factors. However, there is a lack of methods to provide both a global test of microbiome effect and tests of individual operational taxonomic units (OTUs) in a unified manner, while accommodating complex data such as those with unbalanced sample sizes per set, confounders varying within a set, and continuous traits of interest. PERMANOVA is a commonly used distance-based method for testing the global hypotheses of any microbiome effect. We have also developed the linear decomposition model (LDM) that includes the global test and tests of individual OTU effects while controlling the false discovery rate (FDR). Here we present a strategy that can be used in the LDM and PERMANOVA for analyzing matched-set data. We propose to include set indicators as covariates so as to constrain comparisons between samples within a set. We also propose to permute covariates within each set which can account for exchangeable sample correlations. Additionally, the flexible nature of the LDM and PERMANOVA allows discrete or continuous variables (e.g., clinical outcomes) to be tested, within-set confounders to be adjusted, and unbalanced data to be fully exploited. Our simulations indicate that the proposed strategy outperformed alternative strategies in a wide range of scenarios. Using simulation, we also explored optimal designs for matched-set studies. The flexibility of the LDM and PERMANOVA for a variety of matched-set microbiome data is illustrated by the analysis of data from two microbiome studies. (TCPL 201) |
09:40 - 10:20 |
Zhengzheng Tang: Robust and powerful differential composition tests on clustered microbiome data ↓ Clustered microbiome data have become prevalent in recent years from designs such as longitudinal studies, family studies, and matched case-control studies. The within-cluster dependence compounds the challenge of the microbiome data analysis. Methods that properly accommodate intra-cluster correlation and features of the microbiome data are needed. We develop robust and powerful differential composition tests for clustered microbiome data. The methods do not rely on any distributional assumptions on the microbial compositions, which provides flexibility to model various correlation structures among taxa and among samples within a cluster. By leveraging the adjusted sandwich covariance estimate, the methods properly accommodate sample dependence within a cluster. Different types of confounding variables can be easily adjusted for in the methods. We perform extensive simulation studies under commonly-adopted clustered data designs to evaluate the methods. The usefulness of the proposed methods is further demonstrated with a real dataset from a longitudinal microbiome study on pregnant women. (TCPL 201) |
10:20 - 10:50 |
Coffee Break (TCPL Foyer) |
10:50 - 11:30 |
Jun Chen: A permutation framework for robust and powerful differential abundance analysis of microbiome sequencing data (Cancelled) ↓ One central theme of microbiome studies is to identify bacterial taxa/functions associated with some clinical or biological outcome (a.k.a, microbiome biomarker discovery). The discovered microbiome biomarkers can be used for disease diagnosis, prognosis, and treatment selection. Many methods have been proposed for this task, ranging from simple Wilcoxon rank sum test to sophisticated zero-inflated parametric models. Due to the excessive zeros, outliers, compositionality and phylogenetic structure in microbiome data, the existing methods are still far from optimal: parametric methods tend to be less robust while non-parametric methods are less powerful. To address the limitations of current approaches, we propose an efficient permutation framework (ZicoSeq) for robust and powerful microbiome biomarker discovery. The method is based on the traditional F-statistic for linear models with a novel posterior sampling step to address zero inflation and sampling variability. A multiple-stage normalization strategy is implemented to control the compositional effects. The framework takes into account the full characteristics of microbiome sequencing data including variable library sizes, the correlations among taxa, the inherent compositionality, and the phylogenetic relatedness of the taxa. An omnibus test is developed to capture various biological effects. By simulations and real data applications, we demonstrate good power and false positive control for the proposed method. (TCPL 201) |
10:50 - 11:30 |
Michael Wu: Testing Associations Between Microbiome and Other Omics Data Types ↓ Joint analysis of microbiome and other genomic data types offers to simultaneously improve power to identify novel associations and elucidate the mechanisms underlying established relationships with outcomes. However, microbiome data are subject to high dimensionality, compositionality, sparsity, phylogenetic constraints, and complexity of relationships among taxa. Combined with the myriad challenges specific to other omics data types, how to conduct integrative analysis continues to pose a grand challenge. To move towards joint analysis, we propose development of methods for identifying individual and groups of genomic features related to microbiome community structure. Specifically, using kernels to capture microbiome community structure, we develop approaches for rapidly screening genomic features that collectively, marginally or conditionally affecting beta diversity. (TCPL 201) |
11:30 - 13:00 |
Lunch ↓ Lunch is served daily between 11:30am and 1:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |
13:00 - 14:00 |
Guided Tour of The Banff Centre ↓ Meet in the Corbett Hall Lounge for a guided tour of The Banff Centre campus. (Corbett Hall Lounge (CH 2110)) |
14:00 - 14:20 |
Group Photo ↓ Meet in foyer of TCPL to participate in the BIRS group photo. The photograph will be taken outdoors, so dress appropriately for the weather. Please don't be late, or you might not be in the official group photo! (TCPL 201) |
14:20 - 15:00 |
Toby Kenney: Using Stochastic Differential Equations to Model Microbial Dynamics ↓ Most of the research to-date on analysis of microbiome data has focussed on the microbial states associated with various conditions. However, understanding the temporal dynamics of the microbiome is also extremely important. Stochastic differential equations (SDEs) are widely used in ecology to describe the dynamics of ecological systems. In this talk, we present two preliminary approaches to modelling microbial dynamics using SDEs.
The first approach looks at modelling the temporal dynamics of an individual OTU using an Ornstein-Uhlembek (OU) process, which is based on Brownian motion, with mean reversion, meaning that the abundance of the OTU, while fluctuating randomly, is drawn towards a stable level. By comparing the fit of the OU process with Brownian motion, we are able to provide evidence confirming the tendency towards mean-reversion. By studying the Fisher information matrix for the parameters of the OU process, we are able to determine the accuracy of our modelling for various sampling schemes, and study the best sampling frequency for various systems.
The second approach looks at modelling inter-species interaction between OTUs, using a stochastic version of the generalised Lotka-Volterra equation (GLV). The deterministic GLV equation is widely used in ecology as a simple model for various forms of inter-species interaction. In this work, we study the equation with the addition of a Brownian motion stochastic term to the equation. We prove existence of a solution to this equation, and show that the stochastic process has a stationary distribution with the ergodic property. We show that the use of approximate maximum likelihood to estimate parameters of the equation is consistent, and empirically performs better than using a deterministic differential equation with measurement error. We apply this approach to real data to identify interactions between the most abundant families. (TCPL 202) |
15:00 - 15:30 |
Coffee Break (TCPL Foyer) |
15:30 - 16:10 |
Robert Beiko: Has anyone seen my plasmid? Probing the dark corners of metagenome-assembled genomes ↓ Metagenomic analyses typically produce millions of short reads, sampled from the entire diversity of genomes present in a particular sample. While direct analysis of these reads can yield useful information about the diversity of microorganisms and functions present, a great deal of information can be learned by merging short reads into longer assemblies. Algorithms to reconstruct metagenome-assembled genomes (MAGs) draw from different types of evidence, including the relative abundance of particular reads in a sample, and the similarity of “words” of length k (known as k-mers). Reconstruction of MAGs has shed new light on heretofore unknown deep lineages of bacteria, and revealed the degree of diversity of closely related organisms in different habitats. MAGs can also be very useful for the reconstruction of entire metabolic pathways and networks. However, the effectiveness of MAG assembly is not uniform, and stretches of DNA that deviate from the expected frequency or k-mer distribution can be difficult or impossible to correctly assign. This problem is especially acute in unusual constituents of the genome such as plasmids and genomic islands (GIs); since these elements often harbour useful information about antimicrobial resistance and other important pathways, their absence from a MAG can lead to underestimation of their abundance. We assessed the extent of the problem using a simulated 250 base-pair paired-end metagenome of 30 genomes displaying a broad range of GI abundance and numbers of plasmids. Across a range of methods, a median of 66.2% of all chromosomal sequence was binned into MAGs; however, only 23.1% of plasmids and 31.7% of GIs were similarly present in any bin. When assessing the percentage of GIs and plasmids that were correctly assigned to the same bin as the rest of their source genome this performance is even worse (median 32.5% of GIs and 6.9% of plasmids). These results on a relatively simple simulated community point to (possibly fundamental) limitations of existing methods in assigning exotic elements to their correct source genome. Although further improvements will undoubtedly be realized through better algorithms and statistics, high accuracy may depend on the integration of additional DNA sequencing data, and better use of known reference genomes. (TCPL 202) |
17:30 - 19:30 |
Dinner ↓ A buffet dinner is served daily between 5:30pm and 7:30pm in the Vistas Dining Room, the top floor of the Sally Borden Building. (Vistas Dining Room) |