Documentation for the cli modules

scprint.__main__

Entry point for scprint.

Classes:

Name Description
MySaveConfig

MySaveConfig is a subclass of SaveConfigCallback to parametrize the wandb logger further in cli mode

MySaveConfig

Bases: SaveConfigCallback

MySaveConfig is a subclass of SaveConfigCallback to parametrize the wandb logger further in cli mode

scprint.cli

Classes:

Name Description
MyCLI

MyCLI is a subclass of LightningCLI to add some missing params and create bindings between params of the model and the data.

MyCLI

Bases: LightningCLI

MyCLI is a subclass of LightningCLI to add some missing params and create bindings between params of the model and the data.

Used to allow calling denoise / embed / gninfer from the command line. Also to add more parameters and link parameters between the scdataloader and the scPRINT model.

Methods:

Name Description
instantiate_trainer

Override to customize trainer instantiation

instantiate_trainer

Override to customize trainer instantiation

Source code in scprint/cli.py
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
def instantiate_trainer(self, **kwargs) -> Trainer:
    """Override to customize trainer instantiation"""
    # Modify strategy if it's DDP
    trainer = super().instantiate_trainer(**kwargs)
    if "fit" in self.config and self.config["fit"]["trainer"]["strategy"] in [
        "ddp",
        "ddp_find_unused_parameters_true",
    ]:
        # Create DDPStrategy with custom timeout
        from datetime import timedelta

        # Update the config
        print("updating the config")
        trainer.strategy._timeout = timedelta(seconds=7000)  # 2hours in second
        trainer.strategy.setup_distributed()
    # Call parent method to create trainer
    return trainer

scprint.trainer.trainer

Classes:

Name Description
TrainingMode

TrainingMode

Bases: Callback

TrainingMode a callback to set the training specific info to the model.

This is because lightning is unfortunately setup this way. the model should be separated from training but at the same time it has training specific methods... so we have to do this.

Parameters:
  • do_denoise (bool, default: True ) –

    Whether to apply denoising during training. Defaults to True.

  • noise (List[float], default: [0.6] ) –

    List of noise levels to apply if denoising is enabled. Defaults to [0.6], meaning only one forward path with 60% of the counts being dropped will happen.

  • do_cce (bool, default: False ) –

    Whether to apply the Contrastive Cell Embedding from scGPT during training. Defaults to False.

  • cce_temp (float, default: 0.2 ) –

    Similarity threshold for CCE. Defaults to 0.5.

  • cce_scale (float, default: 0.1 ) –

    Scaling factor for CCE loss. Defaults to 0.002.

  • do_ecs (bool, default: False ) –

    Whether to apply the Elastic Cell Similarity loss from scGPT during training. Defaults to False.

  • ecs_threshold (float, default: 0.4 ) –

    Threshold for ECS. Defaults to 0.3.

  • ecs_scale (float, default: 0.1 ) –

    Scaling factor for ECS loss. Defaults to 0.05.

  • do_mvc (bool, default: False ) –

    Whether to do the cell embedding generation with the scGPT's MVC loss. Defaults to False.

  • mvc_scale (float, default: 1.0 ) –

    Scaling factor for MVC loss. Defaults to 1.0.

  • do_adv_cls (bool, default: False ) –

    Whether to apply adversarial classification during training. Defaults to False.

  • do_generate (bool, default: True ) –

    Whether to do the bottleneck learning task. Defaults to True.

  • class_scale (float, default: 1 ) –

    Scaling factor for classification loss. Defaults to 1.5.

  • mask_ratio (List[float], default: [] ) –

    List of mask ratios to apply during training. Defaults to [], meaning no masking is applied during pretraining.

  • warmup_duration (int, default: 500 ) –

    Number of warmup steps for learning rate scheduling. Defaults to 500.

  • fused_adam (bool, default: False ) –

    Whether to use fused Adam optimizer. Defaults to True.

  • adv_class_scale (float, default: 0.1 ) –

    Scaling factor for adversarial classification loss. Defaults to 0.1.

  • lr_reduce_patience (int, default: 2 ) –

    Number of epochs with no improvement after which learning rate will be reduced. Defaults to 1.

  • lr_reduce_factor (float, default: 0.6 ) –

    Factor by which the learning rate will be reduced. Defaults to 0.6.

  • lr_reduce_monitor (str, default: 'val_loss' ) –

    Quantity to be monitored for learning rate reduction. Defaults to "val_loss".

  • do_cls (bool, default: True ) –

    Whether to perform classification during training. Defaults to True.

  • do_adv_batch (bool, default: False ) –

    Whether to apply adversarial batch training. Defaults to False.

  • run_full_forward (bool, default: False ) –

    Whether to run a second forward pass without masking or denoising for the bottleneck learning / MVC case. Defaults to False.

  • lr (float, default: 0.0001 ) –

    Initial learning rate. Defaults to 0.001.

  • optim (str, default: 'adamW' ) –

    Optimizer to use during training. Defaults to "adamW".

  • weight_decay (float, default: 0.01 ) –

    Weight decay to apply during optimization. Defaults to 0.01.

  • name (str, default: '' ) –

    Name of the training mode. Defaults to an empty string. should be an ID for the model

  • test_every (int, default: 20 ) –

    Number of epochs between testing. Defaults to 1.

  • class_embd_diss_scale (float, default: 0.1 ) –

    Scaling factor for the class embedding dissimilarity loss. Defaults to 0.1.

  • zinb_and_mse (bool, default: False ) –

    Whether to use ZINB and MSE loss. Defaults to False.

  • var_context_length (bool, default: False ) –

    Whether to use variable context length. Defaults to False.

  • dropout (float, default: 0.1 ) –

    Dropout rate for the model. Defaults to 0.1.

  • set_step (int, default: None ) –

    Set the global step for the model. Defaults to None.

Source code in scprint/trainer/trainer.py
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def __init__(
    self,
    do_denoise: bool = True,
    noise: List[float] = [0.6],
    do_cce: bool = False,
    cce_temp: float = 0.2,  # .6
    cce_scale: float = 0.1,  # .01
    do_ecs: bool = False,
    ecs_threshold: float = 0.4,
    class_embd_diss_scale: float = 0.1,
    ecs_scale: float = 0.1,  # .1
    do_mvc: bool = False,
    mvc_scale: float = 1.0,
    do_adv_cls: bool = False,
    do_next_tp: bool = False,
    do_generate: bool = True,
    class_scale: float = 1,
    mask_ratio: List[float | str] = [],  # 0.3
    test_every: int = 20,
    warmup_duration: int = 500,
    fused_adam: bool = False,
    adv_class_scale: float = 0.1,
    lr_reduce_patience: int = 2,
    lr_reduce_factor: float = 0.6,
    lr_reduce_monitor: str = "val_loss",
    do_cls: bool = True,
    do_adv_batch: bool = False,
    run_full_forward: bool = False,
    lr: float = 0.0001,
    dropout: float = 0.1,
    optim: str = "adamW",
    weight_decay: float = 0.01,
    zinb_and_mse: bool = False,
    var_context_length: bool = False,
    name="",
    set_step: Optional[int] = None,
):
    """
    TrainingMode a callback to set the training specific info to the model.

    This is because lightning is unfortunately setup this way. the model should be separated from training
    but at the same time it has training specific methods... so we have to do this.

    Args:
        do_denoise (bool): Whether to apply denoising during training. Defaults to True.
        noise (List[float]): List of noise levels to apply if denoising is enabled. Defaults to [0.6], meaning only one forward path with 60% of the counts being dropped will happen.
        do_cce (bool): Whether to apply the Contrastive Cell Embedding from scGPT during training. Defaults to False.
        cce_temp (float): Similarity threshold for CCE. Defaults to 0.5.
        cce_scale (float): Scaling factor for CCE loss. Defaults to 0.002.
        do_ecs (bool): Whether to apply the Elastic Cell Similarity loss from scGPT during training. Defaults to False.
        ecs_threshold (float): Threshold for ECS. Defaults to 0.3.
        ecs_scale (float): Scaling factor for ECS loss. Defaults to 0.05.
        do_mvc (bool): Whether to do the cell embedding generation with the scGPT's MVC loss. Defaults to False.
        mvc_scale (float): Scaling factor for MVC loss. Defaults to 1.0.
        do_adv_cls (bool): Whether to apply adversarial classification during training. Defaults to False.
        do_generate (bool): Whether to do the bottleneck learning task. Defaults to True.
        class_scale (float): Scaling factor for classification loss. Defaults to 1.5.
        mask_ratio (List[float]): List of mask ratios to apply during training. Defaults to [], meaning no masking is applied during pretraining.
        warmup_duration (int): Number of warmup steps for learning rate scheduling. Defaults to 500.
        fused_adam (bool): Whether to use fused Adam optimizer. Defaults to True.
        adv_class_scale (float): Scaling factor for adversarial classification loss. Defaults to 0.1.
        lr_reduce_patience (int): Number of epochs with no improvement after which learning rate will be reduced. Defaults to 1.
        lr_reduce_factor (float): Factor by which the learning rate will be reduced. Defaults to 0.6.
        lr_reduce_monitor (str): Quantity to be monitored for learning rate reduction. Defaults to "val_loss".
        do_cls (bool): Whether to perform classification during training. Defaults to True.
        do_adv_batch (bool): Whether to apply adversarial batch training. Defaults to False.
        run_full_forward (bool): Whether to run a second forward pass without masking or denoising for the bottleneck learning / MVC case. Defaults to False.
        lr (float): Initial learning rate. Defaults to 0.001.
        optim (str): Optimizer to use during training. Defaults to "adamW".
        weight_decay (float): Weight decay to apply during optimization. Defaults to 0.01.
        name (str): Name of the training mode. Defaults to an empty string. should be an ID for the model
        test_every (int): Number of epochs between testing. Defaults to 1.
        class_embd_diss_scale (float): Scaling factor for the class embedding dissimilarity loss. Defaults to 0.1.
        zinb_and_mse (bool): Whether to use ZINB and MSE loss. Defaults to False.
        var_context_length (bool): Whether to use variable context length. Defaults to False.
        dropout (float): Dropout rate for the model. Defaults to 0.1.
        set_step (int, optional): Set the global step for the model. Defaults to None.
    """
    super().__init__()
    self.do_denoise = do_denoise
    self.noise = noise
    self.do_cce = do_cce
    self.cce_temp = cce_temp
    self.cce_scale = cce_scale
    self.do_ecs = do_ecs
    self.ecs_threshold = ecs_threshold
    self.ecs_scale = ecs_scale
    self.do_mvc = do_mvc
    self.do_adv_cls = do_adv_cls
    self.do_next_tp = do_next_tp
    self.do_generate = do_generate
    self.class_scale = class_scale
    self.mask_ratio = mask_ratio
    self.warmup_duration = warmup_duration
    self.fused_adam = fused_adam
    self.mvc_scale = mvc_scale
    self.do_cls = do_cls
    self.adv_class_scale = adv_class_scale
    self.lr_reduce_patience = lr_reduce_patience
    self.lr_reduce_factor = lr_reduce_factor
    self.lr_reduce_monitor = lr_reduce_monitor
    self.lr = lr
    self.optim = optim
    self.weight_decay = weight_decay
    self.do_cls = do_cls
    self.do_adv_batch = do_adv_batch
    self.run_full_forward = run_full_forward
    self.name = name
    self.test_every = test_every
    self.class_embd_diss_scale = class_embd_diss_scale
    self.zinb_and_mse = zinb_and_mse
    self.var_context_length = var_context_length
    self.dropout = dropout
    self.set_step = set_step