Variable Selection under Logistic Regression for Compositional Functional Data


The gut microbiome has been shown to be closely related to human health. During the study researchers often take various samples for sequencing and identifying the microbiome, resulting naturally a set of trajectories describing this dynamic eco-system. In this paper we propose a logistic model for high dimensional functional compositional data in order to analyze the relationship between gut microbiome and colonizing status of multi-drug resistant bacteria (MDRB) after liver transplant operation. The proposed model is based on the linear log-contrast model for the compositional data but with some advances: the model incorporates both scalar and functional covariates for better model flexibility. A set of basis functions are chosen to perform a low-rank approximation for both the functional covariates and their corresponding functional coefficients. In such a way we achieve dimension reduction for the infinitely dimensional functions, and the functional variable selection problem can then take a form as the group-wise variable selection. The resulting model takes the form of a logistic regression subject to grouping but on an affine subspace. We develop an algorithm based on MM principle to solve this specific problem. The convergence property of the proposed algorithm is established. Also the statistical properties of the estimators are given for several penalty functions. Finally the proposed method is used to study the relationship between MDRB status and gut microbiome of patients before and after liver transplant operations. The analysis is conduct based on different biological level and variable selection approaches, which has shown consistent results in variable selection across these different levels, implying that the proposed method is promising for such studies.

Jun 11, 2022 08:20 — Jun 12, 2022 18:00
Chao Cheng
Chao Cheng

My research interests include applied statistics and machine learning.