VAEIT#
- class scVAEIT.VAEIT.VAEIT(config: SimpleNamespace, data, masks, id_dataset=None, batches_cate=None, batches_cont=None, conditions=None)#
Variational Inference for integration and transfer learning.
Methods
get_denoised_data([masks, zero_out, ...])Get the denoised data (decoder output) from the current latent space z
get_latent_z([masks, zero_out, num_repeat, ...])Get the posterior mean of latent space z.
load_model(path_to_model)Load the model weights from the specified path.
save_model(path_to_model)Save the model weights to the specified path.
train([valid, stratify, test_size, ...])Train the VAEIT model with the specified parameters.
update_z([masks, zero_out, batch_size_inference])Update the latent representation z based on the current input data and masks.
visualize_latent([method, color])Visualize the current latent space z using the scanpy visualization tools
reset
set_dataset
- get_denoised_data(masks=None, zero_out=True, return_mean=True, num_repeat=5, L=50, batch_size_inference=256, training=True)#
Get the denoised data (decoder output) from the current latent space z
- Parameters:
- masksnp.array, optional
Masks indicating missingness, where 1 represents missing and 0 represents observed. If None, the full masks will be used. Default is None.
- zero_outbool, optional
Whether to zero out the missing values in the output. Default is True.
- return_meanbool, optional
Whether to return the mean of the denoised data. Default is True.
- num_repeatint, optional
The number of times to repeat the dataset to remove effects of batch shuffling for inference. Default is 5; if sample size is smaller than batch_size_inference, it is set to 1.
- Lint, optional
The number of Monte Carlo samples for denoising. Default is 50.
- batch_size_inferenceint, optional
The batch size for inference. Default is 512.
- trainingbool, optional
Whether to use the model in training mode when batch normalization is performed based on batch mean and variance; otherwise moving average mean and variance are used. When using small datasets or small a number of epochs for training, it is recommended to set this to True. Default is True.
- Returns:
- denoised_datanp.array
The denoised data with shape ([N, d]) where (N) is the number of cells and (d) is the number of features.
- get_latent_z(masks=None, zero_out=True, num_repeat=5, batch_size_inference=512, training=True)#
Get the posterior mean of latent space z.
- Parameters:
- masksnp.array, optional
Masks indicating missingness, where 1 represents missing and 0 represents observed. If None, the full masks will be used. Default is None.
- zero_outbool, optional
Whether to zero out the missing values in the output. Default is True.
- num_repeatint, optional
The number of times to repeat the dataset to remove effects of batch shuffling for inference. Default is 5; if sample size is smaller than batch_size_inference, it is set to 1.
- batch_size_inferenceint, optional
The batch size for inference. Default is 512.
- trainingbool, optional
Whether to use the model in training mode when batch normalization is performed based on batch mean and variance; otherwise moving average mean and variance are used. When using small datasets or small a number of epochs for training, it is recommended to set this to True. Default is True.
- Returns:
- znp.array
([N,d]) The latent means.
- load_model(path_to_model)#
Load the model weights from the specified path.
- Parameters:
- path_to_modelstr
Path to the directory or specific checkpoint file containing the model weights. If a directory is provided, the latest checkpoint will be loaded.
- save_model(path_to_model)#
Save the model weights to the specified path.
- Parameters:
- path_to_modelstr
Path to the directory where the model weights will be saved.
- train(valid=False, stratify=False, test_size=0.1, random_state: int = 0, learning_rate: float = 0.0003, num_repeat: int | None = 1, batch_size: int | None = None, batch_size_inference: int | None = None, L: int = 1, num_epoch: int = 200, num_step_per_epoch: int | None = None, save_every_epoch: int | None = 25, init_epoch: int | None = 1, early_stopping_patience: int = 10, early_stopping_tolerance: float = 0.0001, early_stopping_relative: bool = True, verbose: bool = False, checkpoint_dir: str | None = None, delete_existing: str | None = True, eval_func=None)#
Train the VAEIT model with the specified parameters.
- Parameters:
- validbool, optional
Whether to use a validation set during training. Default is False.
- stratifybool, optional
Whether to stratify the split when creating the validation set. Default is False.
- test_sizefloat or int, optional
The proportion or size of the validation set. Default is 0.1.
- random_stateint, optional
The random state for data splitting. Default is 0.
- learning_ratefloat, optional
The initial learning rate for the Adam optimizer. Default is 1e-3.
- batch_sizeint, optional
The batch size for training. Default is 256 when using full mask matrices, or 64 otherwise.
- batch_size_inferenceint, optional
The batch size for inference. Default is 256 when using full mask matrices, or 64 otherwise.
- Lint, optional
The number of Monte Carlo samples. Default is 1.
- num_epochint, optional
The maximum number of epochs. Default is 200.
- num_step_per_epochint, optional
The number of steps per epoch. If None, it will be inferred from the number of cells and batch size. Default is None.
- save_every_epochint, optional
Frequency (in epochs) to save model checkpoints. Default is num_epoch.
- init_epochint, optional
The initial epoch number. Default is 1.
- early_stopping_patienceint, optional
The number of epochs to wait for improvement before early stopping. Default is 10.
- early_stopping_tolerancefloat, optional
The minimum change in loss to be considered as an improvement. Default is 1e-4.
- early_stopping_relativebool, optional
Whether to monitor the relative change in loss for early stopping. Default is True.
- verbosebool, optional
Whether to print the training process. Default is False.
- checkpoint_dirstr, optional
Directory to save model checkpoints. Default is None.
- delete_existingbool, optional
Whether to delete existing checkpoints in the directory; only used if init_epoch=1. Default is True.
- eval_funcfunction, optional
A function to evaluate the model, which takes the VAE as an input. Default is None.
- Returns:
- histdict
A dictionary containing the history of training and validation losses.
- update_z(masks=None, zero_out=True, batch_size_inference=512, **kwargs)#
Update the latent representation z based on the current input data and masks.
- Parameters:
- masksnp.array, optional
Masks indicating missingness, where 1 represents missing and 0 represents observed. If None, the full masks will be used. Default is None.
- zero_outbool, optional
Whether to zero out the missing values in the output. Default is True.
- batch_size_inferenceint, optional
The batch size for inference. Default is 512.
- **kwargs
Additional keyword arguments to be passed to the scanpy neighbors function.
- visualize_latent(method: str = 'UMAP', color=None, **kwargs)#
Visualize the current latent space z using the scanpy visualization tools
- Parameters:
- methodstr, optional
Visualization method to use. The default is “draw_graph” (the FA plot). Possible choices include “PCA”, “UMAP”, “diffmap”, “TSNE” and “draw_graph”
- colorTYPE, optional
Keys for annotations of observations/cells or variables/genes, e.g., ‘ann1’ or [‘ann1’, ‘ann2’]. The default is None. Same as scanpy.
- **kwargs
Extra key-value arguments that can be passed to scanpy plotting functions (scanpy.pl.XX).
- Returns:
- axesmatplotlib.axes.Axes
Axes object containing the visualization.