This page was generated from docs/tutorial/R/imputation_scRNAseq.ipynb. Interactive online version:

Imputation of scRNA-seq data#

Data#

In this R Notebook, we illustrate how to use scVAEIT on R through scRNA-seq data. These are count data that can be modeled by (zero-inflated) Negative Binomial distribution generally. Hypothetically, the zeros may or may not be missing; unless when integrating multiple scRNA-seq datasets, we would know which entries are missing for sure. Below we use generate missing values to illusrate the usage of scVAEIT.

The data we use is a subset of the DOGMAseq dataset, containing B cells.

[1]:

load("DOGMAseq-RNA.RData")

[2]:

ls()

'data'

[3]:

data[1:5,1:5]

A data.frame: 5 × 5
	X1	X3	X4	X5	X6
	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
22	0	0	0	0	0
32	0	0	0	0	0
62	0	0	0	0	0
75	0	0	0	0	0
77	0	0	0	0	0

[4]:

dim(data)

613
2166

[5]:

n <- nrow(data); p <- ncol(data)
data <- as.matrix(data, n, p)
data <- data[, colSums(data)>0]; # remove all-0 columns

comp.rna <- data
ij.na <- matrix(rbinom(n*p,prob=0.3,size=1),nrow=n) == 1
raw.rna <- comp.rna
raw.rna[ij.na] <- NA

# prepare input data for VAEIT
data <- raw.rna; data[is.na(data)] <- 0.;
masks <- - as.matrix(is.na(as.matrix(raw.rna)))

Loading Python Package from R#

[6]:

# import Python package (call after setting up python environment with reticulate)
# Use 'reticulate' to access Python environment
library(reticulate)
use_condaenv(condaenv = "tf")
Sys.setenv(PYTHONUNBUFFERED = TRUE)
# If the package is installed via PyPI in the Python environment,
# simply import it:
# scVAEIT <- import("scVAEIT")
# Otherwise, need to downloaded Github repo, set a proper path to import the module.
setwd('../../../../')
scVAEIT <- import("scVAEIT")
cat("scVAEIT version:", scVAEIT$`__version__`, "\n")

scVAEIT version: 1.1.0

Initialize scVAEIT model#

[7]:

config = list(
# A network stucture of
# x     :              dim_input -> 64 -> 16 -> z 4 -> 16 -> 64 -> dim_input
#                                 |             |
# masks : dim_input -> dim_embed ->             ->
    'dimensions'=c(16), # hidden layers
    'dim_latent'=4L,    # latent space

    # Block structure for the input and output layer
    'dim_block_enc'=c(64),
    'dim_block_dec'=c(64),
    'dist_block'=list('NB'), # use Gaussian likelihood by default

    'dim_block_embed'=128L,

    # some hyperparameters
    'beta_unobs'=0.9,

    # prob of random maskings
    "p_feat"=0.5
)
scVAEIT$reset_random_seeds(as.integer(0))

cat('Initializing model...\n')
model <- scVAEIT$VAEIT(config, data, masks)

Initializing model...

[8]:

model$config

namespace(beta_kl=2.0, beta_unobs=0.9, beta_reverse=0.0, beta_modal=array([1.], dtype=float32), p_modal=None, p_feat=0.5, uni_block_names=array(['M-0'], dtype='<U23'), block_names=array(['M-0'], dtype='<U23'), dist_block=array(['NB'], dtype='<U2'), dim_block=array([2166], dtype=int32), dim_block_enc=array([64], dtype=int32), dim_block_dec=array([64], dtype=int32), skip_conn=False, mean_vals=<tf.Tensor: shape=(2166,), dtype=float32, numpy=array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)>, min_vals=<tf.Tensor: shape=(2166,), dtype=float32, numpy=array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)>, max_vals=<tf.Tensor: shape=(2166,), dtype=float32, numpy=
array([6.354844, 6.354844, 6.354844, ..., 6.354844, 6.354844, 6.354844],
      dtype=float32)>, max_disp=6.0, max_zi_prob=None, gamma=0.0, dimensions=array([16], dtype=int32), dim_latent=4, dim_block_embed=array([128], dtype=int32), dim_input_arr=array([2166], dtype=int32))

Model training and inference#

[9]:

hist <- model$train(num_epoch=1000, verbose=FALSE)

# Save model if needed
# model$save_model('example/result/R/checkpoint/')

[10]:

plot(1:length(hist[['train']][['total']]), hist[['train']][['total']], type="l")

../../_images/tutorial_R_imputation_scRNAseq_14_0.png

After the model is trained, we can get impouted and denoised abundances:

[11]:

imp.rna <- model$get_denoised_data(return_mean=TRUE)

One can also use pulgin estimate (only the missing values are imputed):

[12]:

blend.rna <- data
ina <- masks!=0
blend.rna[ina] <- imp.rna[ina]
rownames(blend.rna) <- rownames(data)
colnames(blend.rna) <- colnames(data)

Visualize#

[13]:

plot(comp.rna[is.na(raw.rna)], imp.rna[is.na(raw.rna)],
     pch = 16, cex = .5, col = rgb(red = 0, green = 0, blue = 1, alpha = 0.1))
abline(0,1)

../../_images/tutorial_R_imputation_scRNAseq_20_0.png

[14]:

cat(mean(((comp.rna - imp.rna)**2)[is.na(raw.rna)]))

0.80719

[15]:

plot(comp.rna, imp.rna,
     pch = 16, cex = .5, col = rgb(red = 0, green = 0, blue = 1, alpha = 0.1)
)
abline(0,1)

../../_images/tutorial_R_imputation_scRNAseq_22_0.png

R function interface#

When the above procedure works properly, it is convenient to have a simple interface to iteratively the same procedure and to tune over different hyperparameters. For this purpose, we can create a wrapper function as R file (e.g. "R_wrapper.R" in the current directory) and load the wrapper function in any R program by the following commands:

source("R_wrapper.R")

When using the above command, make sure to use the correct path, or set up the workspace path properly.

Imputation of scRNA-seq data

Contents

Imputation of scRNA-seq data#

Data#

Loading Python Package from R#

Initialize scVAEIT model#

Model training and inference#

Visualize#

R function interface#