Documentation for the `utils` modules

`scprint2.utils.sinkhorn`

Classes:

Name	Description
`SinkhornDistance`

`SinkhornDistance`

Bases: Module

SinkhornDistance Initialize the SinkhornDistance class

Parameters:	`eps` (`float`, default: `0.01` ) – Regularization parameter. Defaults to 1e-2. `max_iter` (`int`, default: `100` ) – Maximum number of Sinkhorn iterations. Defaults to 100. `reduction` (`str`, default: `'none'` ) – Specifies the reduction to apply to the output. Defaults to "none".

Methods:

Name	Description
`M`	Modified cost for logarithmic updates
`ave`	Barycenter subroutine, used by kinetic acceleration through extrapolation.
`forward`	forward Compute the Sinkhorn distance between two measures with cost matrix c

Source code in scprint2/utils/sinkhorn.py

def __init__(self, eps: float = 1e-2, max_iter: int = 100, reduction: str = "none"):
    """
    SinkhornDistance Initialize the SinkhornDistance class

    Args:
        eps (float, optional): Regularization parameter. Defaults to 1e-2.
        max_iter (int, optional): Maximum number of Sinkhorn iterations. Defaults to 100.
        reduction (str, optional): Specifies the reduction to apply to the output. Defaults to "none".
    """
    super(SinkhornDistance, self).__init__()
    self.eps = eps
    self.max_iter = max_iter
    self.reduction = reduction

`M`

Modified cost for logarithmic updates

Source code in scprint2/utils/sinkhorn.py

def M(self, C, u, v):
    "Modified cost for logarithmic updates"
    """$M_{ij} = (-c_{ij} + u_i + v_j) / epsilon$"""
    return (-C + u.unsqueeze(-1) + v.unsqueeze(1)) / self.eps

`ave` `staticmethod`

Barycenter subroutine, used by kinetic acceleration through extrapolation.

Source code in scprint2/utils/sinkhorn.py

@staticmethod
def ave(u, u1, tau):
    "Barycenter subroutine, used by kinetic acceleration through extrapolation."
    return tau * u + (1 - tau) * u1

`forward`

forward Compute the Sinkhorn distance between two measures with cost matrix c

Parameters:	`c` (`Tensor`) – The cost matrix between the two measures.

Returns:	`Tensor` – torch.Tensor: The computed Sinkhorn distance.

Source code in scprint2/utils/sinkhorn.py

def forward(self, c: torch.Tensor) -> torch.Tensor:
    """
    forward Compute the Sinkhorn distance between two measures with cost matrix c

    Args:
        c (torch.Tensor): The cost matrix between the two measures.

    Returns:
        torch.Tensor: The computed Sinkhorn distance.
    """
    C = -c
    x_points = C.shape[-2]
    batch_size = C.shape[0]

    # both marginals are fixed with equal weights
    mu = (
        torch.empty(
            batch_size,
            x_points,
            dtype=C.dtype,
            requires_grad=False,
            device=C.device,
        )
        .fill_(1.0 / x_points)
        .squeeze()
    )
    nu = (
        torch.empty(
            batch_size,
            x_points,
            dtype=C.dtype,
            requires_grad=False,
            device=C.device,
        )
        .fill_(1.0 / x_points)
        .squeeze()
    )
    u = torch.zeros_like(mu)
    v = torch.zeros_like(nu)

    # Stopping criterion
    thresh = 1e-12

    # Sinkhorn iterations
    for i in range(self.max_iter):
        if i % 2 == 0:
            u1 = u  # useful to check the update
            u = (
                self.eps
                * (torch.log(mu) - torch.logsumexp(self.M(C, u, v), dim=-1))
                + u
            )
            err = (u - u1).abs().sum(-1).mean()
        else:
            v = (
                self.eps
                * (
                    torch.log(nu)
                    - torch.logsumexp(self.M(C, u, v).transpose(-2, -1), dim=-1)
                )
                + v
            )
            v = v.detach().requires_grad_(False)
            v[v > 9 * 1e8] = 0.0
            v = v.detach().requires_grad_(True)

        if err.item() < thresh:
            break

    U, V = u, v
    # Transport plan pi = diag(a)*K*diag(b)
    pi = torch.exp(self.M(C, U, V))

    # Sinkhorn distance

    return pi, C, U, V

`scprint2.utils.utils`

Functions:

Name	Description
`add_points`	parts of volcano plot
`category_str2int`	category_str2int converts a list of category strings to a list of category integers.
`correlationMatrix`	Make an interactive correlation matrix from an array using bokeh
`createFoldersFor`	will recursively create folders if needed until having all the folders required to save the file in this filepath
`fileToList`	loads an input file with a\n b\n.. into a list [a,b,..]
`get_free_gpu`	get_free_gpu finds the GPU with the most free memory using nvidia-smi.
`get_git_commit`	get_git_commit gets the current git commit hash.
`heatmap`	Make an interactive heatmap from a dataframe using bokeh
`inf_loop`	wrapper function for endless data loader.
`isnotebook`	check whether excuting in jupyter notebook.
`listToFile`	listToFile loads a list with [a,b,..] into an input file a\n b\n..
`prepare_device`	setup GPU device if available. get gpu device indices which are used for DataParallel
`run_command`	run_command runs a command in the shell and prints the output.
`selector`	Part of Volcano plot: A function to separate tfs from everything else
`set_seed`	set random seed.
`subset_h5ad_by_format`	Create new anndata object according to slot info specifications.
`volcano`	Make an interactive volcano plot from Differential Expression analysis tools outputs