Documentation for the cli modules

scprint.__main__

Entry point for scprint.

MySaveConfig

Bases: SaveConfigCallback

MySaveConfig is a subclass of SaveConfigCallback to parametrize the wandb logger further in cli mode

scprint.cli

MyCLI

Bases: LightningCLI

MyCLI is a subclass of LightningCLI to add some missing params and create bindings between params of the model and the data.

Used to allow calling denoise / embed / gninfer from the command line. Also to add more parameters and link parameters between the scdataloader and the scPRINT model.

scprint.trainer.trainer

TrainingMode

Bases: Callback

TrainingMode a callback to set the training specific info to the model.

This is because lightning is unfortunately setup this way. the model should be separated from training but at the same time it has training specific methods... so we have to do this.

Parameters:
  • do_denoise (bool, default: True ) –

    Whether to apply denoising during training. Defaults to True.

  • noise (List[float], default: [0.6] ) –

    List of noise levels to apply if denoising is enabled. Defaults to [0.6], meaning only one forward path with 60% of the counts being dropped will happen.

  • do_cce (bool, default: False ) –

    Whether to apply the Contrastive Cell Embedding from scGPT during training. Defaults to False.

  • cce_sim (float, default: 0.5 ) –

    Similarity threshold for CCE. Defaults to 0.5.

  • cce_scale (float, default: 0.002 ) –

    Scaling factor for CCE loss. Defaults to 0.002.

  • do_ecs (bool, default: False ) –

    Whether to apply the Elastic Cell Similarity loss from scGPT during training. Defaults to False.

  • ecs_threshold (float, default: 0.3 ) –

    Threshold for ECS. Defaults to 0.3.

  • ecs_scale (float, default: 0.05 ) –

    Scaling factor for ECS loss. Defaults to 0.05.

  • do_mvc (bool, default: False ) –

    Whether to do the cell embedding generation with the scGPT's MVC loss. Defaults to False.

  • mvc_scale (float, default: 1.0 ) –

    Scaling factor for MVC loss. Defaults to 1.0.

  • do_adv_cls (bool, default: False ) –

    Whether to apply adversarial classification during training. Defaults to False.

  • do_generate (bool, default: True ) –

    Whether to do the bottleneck learning task. Defaults to True.

  • class_scale (float, default: 1.5 ) –

    Scaling factor for classification loss. Defaults to 1.5.

  • mask_ratio (List[float], default: [] ) –

    List of mask ratios to apply during training. Defaults to [], meaning no masking is applied during pretraining.

  • warmup_duration (int, default: 500 ) –

    Number of warmup steps for learning rate scheduling. Defaults to 500.

  • fused_adam (bool, default: False ) –

    Whether to use fused Adam optimizer. Defaults to True.

  • adv_class_scale (float, default: 0.1 ) –

    Scaling factor for adversarial classification loss. Defaults to 0.1.

  • lr_reduce_patience (int, default: 1 ) –

    Number of epochs with no improvement after which learning rate will be reduced. Defaults to 1.

  • lr_reduce_factor (float, default: 0.6 ) –

    Factor by which the learning rate will be reduced. Defaults to 0.6.

  • lr_reduce_monitor (str, default: 'val_loss' ) –

    Quantity to be monitored for learning rate reduction. Defaults to "val_loss".

  • do_cls (bool, default: True ) –

    Whether to perform classification during training. Defaults to True.

  • do_adv_batch (bool, default: False ) –

    Whether to apply adversarial batch training. Defaults to False.

  • run_full_forward (bool, default: False ) –

    Whether to run a second forward pass without masking or denoising for the bottleneck learning / MVC case. Defaults to False.

  • lr (float, default: 0.001 ) –

    Initial learning rate. Defaults to 0.001.

  • optim (str, default: 'adamW' ) –

    Optimizer to use during training. Defaults to "adamW".

  • weight_decay (float, default: 0.01 ) –

    Weight decay to apply during optimization. Defaults to 0.01.

  • name (str, default: '' ) –

    Name of the training mode. Defaults to an empty string. should be an ID for the model

  • test_every (int, default: 1 ) –

    Number of epochs between testing. Defaults to 1.

Source code in scprint/trainer/trainer.py
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
def __init__(
    self,
    do_denoise: bool = True,
    noise: List[float] = [0.6],
    do_cce: bool = False,
    cce_sim: float = 0.5,  # .6
    cce_scale: float = 0.002,  # .01
    do_ecs: bool = False,
    ecs_threshold: float = 0.3,
    ecs_scale: float = 0.05,  # .1
    do_mvc: bool = False,
    mvc_scale: float = 1.0,
    do_adv_cls: bool = False,
    do_next_tp: bool = False,
    do_generate: bool = True,
    class_scale: float = 1.5,
    mask_ratio: List[float] = [],  # 0.3
    test_every: int = 1,
    warmup_duration: int = 500,
    fused_adam: bool = False,
    adv_class_scale: float = 0.1,
    lr_reduce_patience: int = 1,
    lr_reduce_factor: float = 0.6,
    lr_reduce_monitor: str = "val_loss",
    do_cls: bool = True,
    do_adv_batch: bool = False,
    run_full_forward: bool = False,
    lr: float = 0.001,
    optim: str = "adamW",
    weight_decay: float = 0.01,
    name="",
):
    """
    TrainingMode a callback to set the training specific info to the model.

    This is because lightning is unfortunately setup this way. the model should be separated from training
    but at the same time it has training specific methods... so we have to do this.

    Args:
        do_denoise (bool): Whether to apply denoising during training. Defaults to True.
        noise (List[float]): List of noise levels to apply if denoising is enabled. Defaults to [0.6], meaning only one forward path with 60% of the counts being dropped will happen.
        do_cce (bool): Whether to apply the Contrastive Cell Embedding from scGPT during training. Defaults to False.
        cce_sim (float): Similarity threshold for CCE. Defaults to 0.5.
        cce_scale (float): Scaling factor for CCE loss. Defaults to 0.002.
        do_ecs (bool): Whether to apply the Elastic Cell Similarity loss from scGPT during training. Defaults to False.
        ecs_threshold (float): Threshold for ECS. Defaults to 0.3.
        ecs_scale (float): Scaling factor for ECS loss. Defaults to 0.05.
        do_mvc (bool): Whether to do the cell embedding generation with the scGPT's MVC loss. Defaults to False.
        mvc_scale (float): Scaling factor for MVC loss. Defaults to 1.0.
        do_adv_cls (bool): Whether to apply adversarial classification during training. Defaults to False.
        do_generate (bool): Whether to do the bottleneck learning task. Defaults to True.
        class_scale (float): Scaling factor for classification loss. Defaults to 1.5.
        mask_ratio (List[float]): List of mask ratios to apply during training. Defaults to [], meaning no masking is applied during pretraining.
        warmup_duration (int): Number of warmup steps for learning rate scheduling. Defaults to 500.
        fused_adam (bool): Whether to use fused Adam optimizer. Defaults to True.
        adv_class_scale (float): Scaling factor for adversarial classification loss. Defaults to 0.1.
        lr_reduce_patience (int): Number of epochs with no improvement after which learning rate will be reduced. Defaults to 1.
        lr_reduce_factor (float): Factor by which the learning rate will be reduced. Defaults to 0.6.
        lr_reduce_monitor (str): Quantity to be monitored for learning rate reduction. Defaults to "val_loss".
        do_cls (bool): Whether to perform classification during training. Defaults to True.
        do_adv_batch (bool): Whether to apply adversarial batch training. Defaults to False.
        run_full_forward (bool): Whether to run a second forward pass without masking or denoising for the bottleneck learning / MVC case. Defaults to False.
        lr (float): Initial learning rate. Defaults to 0.001.
        optim (str): Optimizer to use during training. Defaults to "adamW".
        weight_decay (float): Weight decay to apply during optimization. Defaults to 0.01.
        name (str): Name of the training mode. Defaults to an empty string. should be an ID for the model
        test_every (int): Number of epochs between testing. Defaults to 1.
    """
    super().__init__()
    self.do_denoise = do_denoise
    self.noise = noise
    self.do_cce = do_cce
    self.cce_sim = cce_sim
    self.cce_scale = cce_scale
    self.do_ecs = do_ecs
    self.ecs_threshold = ecs_threshold
    self.ecs_scale = ecs_scale
    self.do_mvc = do_mvc
    self.do_adv_cls = do_adv_cls
    self.do_next_tp = do_next_tp
    self.do_generate = do_generate
    self.class_scale = class_scale
    self.mask_ratio = mask_ratio
    self.warmup_duration = warmup_duration
    self.fused_adam = fused_adam
    self.mvc_scale = mvc_scale
    self.do_cls = do_cls
    self.adv_class_scale = adv_class_scale
    self.lr_reduce_patience = lr_reduce_patience
    self.lr_reduce_factor = lr_reduce_factor
    self.lr_reduce_monitor = lr_reduce_monitor
    self.lr = lr
    self.optim = optim
    self.weight_decay = weight_decay
    self.do_cls = do_cls
    self.do_adv_batch = do_adv_batch
    self.run_full_forward = run_full_forward
    self.name = name
    self.test_every = test_every