VAEIT#

class scVAEIT.VAEIT.VAEIT(config: SimpleNamespace, data, masks, id_dataset=None, batches_cate=None, batches_cont=None, conditions=None)#

Variational Inference for integration and transfer learning.

Methods

get_denoised_data([masks, zero_out, ...])

Get the denoised data (decoder output) from the current latent space z

get_latent_z([masks, zero_out, num_repeat, ...])

Get the posterior mean of latent space z.

load_model(path_to_model)

Load the model weights from the specified path.

save_model(path_to_model)

Save the model weights to the specified path.

train([valid, stratify, test_size, ...])

Train the VAEIT model with the specified parameters.

update_z([masks, zero_out, batch_size_inference])

Update the latent representation z based on the current input data and masks.

visualize_latent([method, color])

Visualize the current latent space z using the scanpy visualization tools

reset

set_dataset

get_denoised_data(masks=None, zero_out=True, return_mean=True, num_repeat=5, L=50, batch_size_inference=256, training=True)#

Get the denoised data (decoder output) from the current latent space z

Parameters:
masksnp.array, optional

Masks indicating missingness, where 1 represents missing and 0 represents observed. If None, the full masks will be used. Default is None.

zero_outbool, optional

Whether to zero out the missing values in the output. Default is True.

return_meanbool, optional

Whether to return the mean of the denoised data. Default is True.

num_repeatint, optional

The number of times to repeat the dataset to remove effects of batch shuffling for inference. Default is 5; if sample size is smaller than batch_size_inference, it is set to 1.

Lint, optional

The number of Monte Carlo samples for denoising. Default is 50.

batch_size_inferenceint, optional

The batch size for inference. Default is 512.

trainingbool, optional

Whether to use the model in training mode when batch normalization is performed based on batch mean and variance; otherwise moving average mean and variance are used. When using small datasets or small a number of epochs for training, it is recommended to set this to True. Default is True.

Returns:
denoised_datanp.array

The denoised data with shape ([N, d]) where (N) is the number of cells and (d) is the number of features.

get_latent_z(masks=None, zero_out=True, num_repeat=5, batch_size_inference=512, training=True)#

Get the posterior mean of latent space z.

Parameters:
masksnp.array, optional

Masks indicating missingness, where 1 represents missing and 0 represents observed. If None, the full masks will be used. Default is None.

zero_outbool, optional

Whether to zero out the missing values in the output. Default is True.

num_repeatint, optional

The number of times to repeat the dataset to remove effects of batch shuffling for inference. Default is 5; if sample size is smaller than batch_size_inference, it is set to 1.

batch_size_inferenceint, optional

The batch size for inference. Default is 512.

trainingbool, optional

Whether to use the model in training mode when batch normalization is performed based on batch mean and variance; otherwise moving average mean and variance are used. When using small datasets or small a number of epochs for training, it is recommended to set this to True. Default is True.

Returns:
znp.array

([N,d]) The latent means.

load_model(path_to_model)#

Load the model weights from the specified path.

Parameters:
path_to_modelstr

Path to the directory or specific checkpoint file containing the model weights. If a directory is provided, the latest checkpoint will be loaded.

save_model(path_to_model)#

Save the model weights to the specified path.

Parameters:
path_to_modelstr

Path to the directory where the model weights will be saved.

train(valid=False, stratify=False, test_size=0.1, random_state: int = 0, learning_rate: float = 0.0003, num_repeat: int | None = 1, batch_size: int | None = None, batch_size_inference: int | None = None, L: int = 1, num_epoch: int = 200, num_step_per_epoch: int | None = None, save_every_epoch: int | None = 25, init_epoch: int | None = 1, early_stopping_patience: int = 10, early_stopping_tolerance: float = 0.0001, early_stopping_relative: bool = True, verbose: bool = False, checkpoint_dir: str | None = None, delete_existing: str | None = True, eval_func=None)#

Train the VAEIT model with the specified parameters.

Parameters:
validbool, optional

Whether to use a validation set during training. Default is False.

stratifybool, optional

Whether to stratify the split when creating the validation set. Default is False.

test_sizefloat or int, optional

The proportion or size of the validation set. Default is 0.1.

random_stateint, optional

The random state for data splitting. Default is 0.

learning_ratefloat, optional

The initial learning rate for the Adam optimizer. Default is 1e-3.

batch_sizeint, optional

The batch size for training. Default is 256 when using full mask matrices, or 64 otherwise.

batch_size_inferenceint, optional

The batch size for inference. Default is 256 when using full mask matrices, or 64 otherwise.

Lint, optional

The number of Monte Carlo samples. Default is 1.

num_epochint, optional

The maximum number of epochs. Default is 200.

num_step_per_epochint, optional

The number of steps per epoch. If None, it will be inferred from the number of cells and batch size. Default is None.

save_every_epochint, optional

Frequency (in epochs) to save model checkpoints. Default is num_epoch.

init_epochint, optional

The initial epoch number. Default is 1.

early_stopping_patienceint, optional

The number of epochs to wait for improvement before early stopping. Default is 10.

early_stopping_tolerancefloat, optional

The minimum change in loss to be considered as an improvement. Default is 1e-4.

early_stopping_relativebool, optional

Whether to monitor the relative change in loss for early stopping. Default is True.

verbosebool, optional

Whether to print the training process. Default is False.

checkpoint_dirstr, optional

Directory to save model checkpoints. Default is None.

delete_existingbool, optional

Whether to delete existing checkpoints in the directory; only used if init_epoch=1. Default is True.

eval_funcfunction, optional

A function to evaluate the model, which takes the VAE as an input. Default is None.

Returns:
histdict

A dictionary containing the history of training and validation losses.

update_z(masks=None, zero_out=True, batch_size_inference=512, **kwargs)#

Update the latent representation z based on the current input data and masks.

Parameters:
masksnp.array, optional

Masks indicating missingness, where 1 represents missing and 0 represents observed. If None, the full masks will be used. Default is None.

zero_outbool, optional

Whether to zero out the missing values in the output. Default is True.

batch_size_inferenceint, optional

The batch size for inference. Default is 512.

**kwargs

Additional keyword arguments to be passed to the scanpy neighbors function.

visualize_latent(method: str = 'UMAP', color=None, **kwargs)#

Visualize the current latent space z using the scanpy visualization tools

Parameters:
methodstr, optional

Visualization method to use. The default is “draw_graph” (the FA plot). Possible choices include “PCA”, “UMAP”, “diffmap”, “TSNE” and “draw_graph”

colorTYPE, optional

Keys for annotations of observations/cells or variables/genes, e.g., ‘ann1’ or [‘ann1’, ‘ann2’]. The default is None. Same as scanpy.

**kwargs

Extra key-value arguments that can be passed to scanpy plotting functions (scanpy.pl.XX).

Returns:
axesmatplotlib.axes.Axes

Axes object containing the visualization.