VAEIT#

class scVAEIT.VAEIT.VAEIT(config: SimpleNamespace, data, masks, id_dataset=None, batches_cate=None, batches_cont=None, conditions=None)#

Variational Inference for integration and transfer learning.

Methods

`get_denoised_data`([masks, zero_out, ...])	Get the denoised data (decoder output) from the current latent space z
`get_latent_z`([masks, zero_out, num_repeat, ...])	Get the posterior mean of latent space z.
`load_model`(path_to_model)	Load the model weights from the specified path.
`save_model`(path_to_model)	Save the model weights to the specified path.
`train`([valid, stratify, test_size, ...])	Train the VAEIT model with the specified parameters.
`update_z`([masks, zero_out, batch_size_inference])	Update the latent representation z based on the current input data and masks.
`visualize_latent`([method, color])	Visualize the current latent space z using the scanpy visualization tools

reset
set_dataset

get_denoised_data(masks=None, zero_out=True, return_mean=True, num_repeat=5, L=50, batch_size_inference=256, training=True)#

Get the denoised data (decoder output) from the current latent space z

Parameters:

masksnp.array, optional: Masks indicating missingness, where 1 represents missing and 0 represents observed. If None, the full masks will be used. Default is None.
zero_outbool, optional: Whether to zero out the missing values in the output. Default is True.
return_meanbool, optional: Whether to return the mean of the denoised data. Default is True.
num_repeatint, optional: The number of times to repeat the dataset to remove effects of batch shuffling for inference. Default is 5; if sample size is smaller than batch_size_inference, it is set to 1.
Lint, optional: The number of Monte Carlo samples for denoising. Default is 50.
batch_size_inferenceint, optional: The batch size for inference. Default is 512.
trainingbool, optional: Whether to use the model in training mode when batch normalization is performed based on batch mean and variance; otherwise moving average mean and variance are used. When using small datasets or small a number of epochs for training, it is recommended to set this to True. Default is True.

Returns:

denoised_datanp.array: The denoised data with shape ([N, d]) where (N) is the number of cells and (d) is the number of features.

get_latent_z(masks=None, zero_out=True, num_repeat=5, batch_size_inference=512, training=True)#

Get the posterior mean of latent space z.

Parameters:

masksnp.array, optional: Masks indicating missingness, where 1 represents missing and 0 represents observed. If None, the full masks will be used. Default is None.
zero_outbool, optional: Whether to zero out the missing values in the output. Default is True.
num_repeatint, optional: The number of times to repeat the dataset to remove effects of batch shuffling for inference. Default is 5; if sample size is smaller than batch_size_inference, it is set to 1.
batch_size_inferenceint, optional: The batch size for inference. Default is 512.
trainingbool, optional: Whether to use the model in training mode when batch normalization is performed based on batch mean and variance; otherwise moving average mean and variance are used. When using small datasets or small a number of epochs for training, it is recommended to set this to True. Default is True.

Returns:

znp.array: ([N,d]) The latent means.

load_model(path_to_model)#

Load the model weights from the specified path.

Parameters:

path_to_modelstr: Path to the directory or specific checkpoint file containing the model weights. If a directory is provided, the latest checkpoint will be loaded.

save_model(path_to_model)#

Save the model weights to the specified path.

Parameters:

path_to_modelstr: Path to the directory where the model weights will be saved.

train(valid=False, stratify=False, test_size=0.1, random_state: int = 0, learning_rate: float = 0.0003, num_repeat: int | None = 1, batch_size: int | None = None, batch_size_inference: int | None = None, L: int = 1, num_epoch: int = 200, num_step_per_epoch: int | None = None, save_every_epoch: int | None = 25, init_epoch: int | None = 1, early_stopping_patience: int = 10, early_stopping_tolerance: float = 0.0001, early_stopping_relative: bool = True, verbose: bool = False, checkpoint_dir: str | None = None, delete_existing: str | None = True, eval_func=None)#

Train the VAEIT model with the specified parameters.

Parameters:

validbool, optional: Whether to use a validation set during training. Default is False.
stratifybool, optional: Whether to stratify the split when creating the validation set. Default is False.
test_sizefloat or int, optional: The proportion or size of the validation set. Default is 0.1.
random_stateint, optional: The random state for data splitting. Default is 0.
learning_ratefloat, optional: The initial learning rate for the Adam optimizer. Default is 1e-3.
batch_sizeint, optional: The batch size for training. Default is 256 when using full mask matrices, or 64 otherwise.
batch_size_inferenceint, optional: The batch size for inference. Default is 256 when using full mask matrices, or 64 otherwise.
Lint, optional: The number of Monte Carlo samples. Default is 1.
num_epochint, optional: The maximum number of epochs. Default is 200.
num_step_per_epochint, optional: The number of steps per epoch. If None, it will be inferred from the number of cells and batch size. Default is None.
save_every_epochint, optional: Frequency (in epochs) to save model checkpoints. Default is num_epoch.
init_epochint, optional: The initial epoch number. Default is 1.
early_stopping_patienceint, optional: The number of epochs to wait for improvement before early stopping. Default is 10.
early_stopping_tolerancefloat, optional: The minimum change in loss to be considered as an improvement. Default is 1e-4.
early_stopping_relativebool, optional: Whether to monitor the relative change in loss for early stopping. Default is True.
verbosebool, optional: Whether to print the training process. Default is False.
checkpoint_dirstr, optional: Directory to save model checkpoints. Default is None.
delete_existingbool, optional: Whether to delete existing checkpoints in the directory; only used if init_epoch=1. Default is True.
eval_funcfunction, optional: A function to evaluate the model, which takes the VAE as an input. Default is None.

Returns:

histdict: A dictionary containing the history of training and validation losses.

update_z(masks=None, zero_out=True, batch_size_inference=512, **kwargs)#

Update the latent representation z based on the current input data and masks.

Parameters:

masksnp.array, optional: Masks indicating missingness, where 1 represents missing and 0 represents observed. If None, the full masks will be used. Default is None.
zero_outbool, optional: Whether to zero out the missing values in the output. Default is True.
batch_size_inferenceint, optional: The batch size for inference. Default is 512.
**kwargs: Additional keyword arguments to be passed to the scanpy neighbors function.

visualize_latent(method: str = 'UMAP', color=None, **kwargs)#

Visualize the current latent space z using the scanpy visualization tools

Parameters:

methodstr, optional: Visualization method to use. The default is “draw_graph” (the FA plot). Possible choices include “PCA”, “UMAP”, “diffmap”, “TSNE” and “draw_graph”
colorTYPE, optional: Keys for annotations of observations/cells or variables/genes, e.g., ‘ann1’ or [‘ann1’, ‘ann2’]. The default is None. Same as scanpy.
**kwargs: Extra key-value arguments that can be passed to scanpy plotting functions (scanpy.pl.XX).

Returns:

axesmatplotlib.axes.Axes: Axes object containing the visualization.

VAEIT

Contents

VAEIT#