LIVI is a probabilistic model for single-cell RNA-seq data collected from a large population of individuals. At its core, LIVI builds on variational autoencoders (VAEs), employing structured linear decoders to decompose observed variation in single-cell expression to cell-state variation, donor-driven variation and their interaction. The resulting model has properties that resemble classical factor analysis, where the decoder is a factor loadings matrix instead of a neural network with non-linear activations.
Once trained, LIVI enables efficient donor-level association testing, while retaining single-cell resolution and interpretation. Because donor latent factors are inferred without information on specific donor-level characteristics, such as SNP genotypes, they can be used as quantitative phenotypes to test for genetic effects without the risk of circularity. Following association testing at the donor level, the discovered effects can be projected back onto single cells via LIVI's latent donor-cell-state interaction model (
Install dependencies
# clone project
git clone https://github.com/PMBio/LIVI
cd LIVI
# [OPTIONAL] create conda environment
conda create -n LIVIenv python=3.11
conda activate LIVIenv
# install pytorch according to instructions
# https://pytorch.org/get-started/
# install requirements
pip install -r requirements.txtTrain model with chosen experiment configuration from configs/experiment/
python src/train.py experiment=experiment_name.yamlTrain model on CPU/GPU
# train on CPU
python src/train.py trainer=cpu
# train on GPU
python src/train.py trainer=gpuYou can override any parameter from command line like this
python src/train.py trainer.max_epochs=100 datamodule.batch_size=528The following performs inference on the gene expression data stored in --adata, using the "best" model checkpoint stored under --model_run_dir. Subsequently, it runs association testing between inferred donor factors and the SNP genotypes in --genotype_matrix (prefix of .bed, .bim, .fam PLINK files), while accounting for covariates (e.g. expression PCs) specified under --covariates and population structure specified under --kinship using a LMM. Output files are saved under -od.
For a full list of options please run python src/analysis/livi_analysis.py --h.
python src/analysis/livi_analysis.py \
--model_run_dir /path/to/model/checkpoints/ \
--adata /path/to/adata.h5ad \
--celltype_column CELLTYPE_COLUMN \
--individual_column INDIVIDUAL_COLUMN \
--covariates /path/to/association/testing/covariate_file.tsv \
--fdr_threshold FDR \
--genotype_matrix /path/to/PLINK/genotype/matrix --plink \
--method LMM \
--kinship /path/to/Kinship_matrix.tsv \
-od /path/to/output/directoryExamples of downstream intrepretation and plotting of association testing results can be found in https://github.com/danaivagiaki/LIVI_analyses