Documentation for `GRNAnnData` module

`grnndata.GRNAnnData`

Bases: AnnData

An AnnData object with a GRN matrix in varp["GRN"]

Parameters:

grn (Optional[csr_matrix | ndarray], default: None ) –

scipy.sparse.csr_matrix | np.ndarray a matrix with zeros and non-zeros signifying the presence of an edge and the direction of the edge respectively. The matrix should be square and the rows and columns should correspond to the genes in the AnnData object. The row index correpond to genes that are regulators and the column index corresponds to genes that are targets.

@see https://anndata.readthedocs.io for more informaiotn on AnnData objects

Methods:

Name	Description
`concat`	concat two GRNAnnData objects
`extract_links`	This function extracts scores from anndata.varp['key'] and returns them as a pandas DataFrame.
`get`	get a sub-GRNAnnData object with only the specified genes
`plot_subgraph`	plot_subgraph plots a subgraph of the gene regulatory network (GRN) centered around a seed gene.

Attributes:	`grn` – Property that returns the gene regulatory network (GRN) as a pandas DataFrame. `regulators` – regulators outputs the regulators' connections of the GRN as a pandas DataFrame. `targets` – targets outputs the targets' connections of the GRN as a pandas DataFrame.

Source code in grnndata/GRNAnnData.py

def __init__(
    self,
    *args,
    grn: Optional[scipy.sparse.csr_matrix | np.ndarray] = None,
    **kwargs,
):
    """An AnnData object with a GRN matrix in varp["GRN"]

    Args:
        grn: scipy.sparse.csr_matrix | np.ndarray a matrix with zeros and non-zeros
            signifying the presence of an edge and the direction of the edge
            respectively. The matrix should be square and the rows and columns
            should correspond to the genes in the AnnData object.
            The row index correpond to genes that are regulators and the column
            index corresponds to genes that are targets.

    @see https://anndata.readthedocs.io for more informaiotn on AnnData objects
    """
    # if isinstance(args[0], AnnData) and "GRN" in args[0].varp:
    #    args[0] = args[0].copy()
    #     grn = args[0].varp["GRN"]
    # elif grn is None:
    #    raise ValueError("grn argument must be provided")
    super(GRNAnnData, self).__init__(*args, **kwargs)
    self.varp["GRN"] = grn

`grn` `property`

Property that returns the gene regulatory network (GRN) as a pandas DataFrame. The index and columns of the DataFrame are the gene names stored in 'var_names'.

Returns:	– pd.DataFrame: The GRN as a DataFrame with gene names as index and columns.

`regulators` `property`

regulators outputs the regulators' connections of the GRN as a pandas DataFrame.

Returns:	– pd.DataFrame: The regulators of the GRN as a DataFrame with gene names as index and columns.

`targets` `property`

targets outputs the targets' connections of the GRN as a pandas DataFrame.

Returns:	– pd.DataFrame: The targets of the GRN as a DataFrame with gene names as index and columns.

`concat`

concat two GRNAnnData objects

Parameters:	`other` (`GRNAnnData`) – The other GRNAnnData object to concatenate with

Raises:	`ValueError` – Can only concatenate with another GRNAnnData object

Returns:	`AnnData` – The concatenated GRNAnnData object

Source code in grnndata/GRNAnnData.py

def concat(self, other):
    """
    concat two GRNAnnData objects

    Args:
        other (GRNAnnData): The other GRNAnnData object to concatenate with

    Raises:
        ValueError: Can only concatenate with another GRNAnnData object

    Returns:
        AnnData: The concatenated GRNAnnData object
    """
    if not isinstance(other, GRNAnnData):
        raise ValueError("Can only concatenate with another GRNAnnData object")
    return GRNAnnData(
        self.concatenate(other),
        grn=scipy.sparse.vstack([self.varp["GRN"], other.varp["GRN"]]),
    )

`extract_links`

This function extracts scores from anndata.varp['key'] and returns them as a pandas DataFrame.

The resulting DataFrame has the following structure

TF Gene Score A B 5 C D 8

Where 'TF' and 'Gene' are the indices of the genes in the regulatory network, and 'Score' is the corresponding weight.

Parameters:	`columns` (`list`, default: `['regulator', 'target', 'weight']` ) – The names of the columns in the resulting DataFrame. Defaults to ['regulator', 'target', 'weight'].

Returns:	– pd.DataFrame: The extracted scores as a DataFrame.

Source code in grnndata/GRNAnnData.py

def extract_links(
    self,
    columns: list = [
        "regulator",
        "target",
        "weight",
    ],  # output col names (e.g. 'TF', 'gene', 'score')
):
    """
    This function extracts scores from anndata.varp['key'] and returns them as a pandas DataFrame.

    The resulting DataFrame has the following structure:
        TF   Gene   Score
        A    B      5
        C    D      8

    Where 'TF' and 'Gene' are the indices of the genes in the regulatory network, and 'Score' is the corresponding weight.

    Args:
        columns (list, optional): The names of the columns in the resulting DataFrame. Defaults to ['regulator', 'target', 'weight'].

    Returns:
        pd.DataFrame: The extracted scores as a DataFrame.
    """
    return pd.DataFrame(
        [
            (regulator, target, weight)
            for regulator, row in enumerate(self.varp["GRN"])
            for target, weight in enumerate(row)
            if (isinstance(row, np.ndarray) and weight != 0)
            or (scipy.sparse.issparse(row) and row.getnnz() > 0 and weight != 0)
        ],
        columns=columns,
    ).sort_values(by=columns[2], ascending=False)

`get`

get a sub-GRNAnnData object with only the specified genes

Parameters:	`elem` (`str \| list`) – The gene names to include in the sub-GRNAnnData object

Returns:	`GRNAnnData`( `GRNAnnData` ) – The sub-GRNAnnData object with only the specified genes

Source code in grnndata/GRNAnnData.py

def get(self, elem: str | list[str]) -> "GRNAnnData":
    """
    get a sub-GRNAnnData object with only the specified genes

    Args:
        elem (str | list): The gene names to include in the sub-GRNAnnData object

    Returns:
        GRNAnnData: The sub-GRNAnnData object with only the specified genes
    """
    if type(elem) is str:
        elem = [elem]
    loc = self.var.index.isin(elem)
    reg = self.varp["GRN"][loc][:, loc]
    if len(reg.shape) == 1:
        reg = np.array([reg])
    sub = GRNAnnData(X=self.X[:, loc], obs=self.obs, var=self.var[loc], grn=reg)
    sub.varm["Targets"] = self.varp["GRN"][loc]
    sub.varm["Regulators"] = self.varp["GRN"].T[loc]
    sub.uns["regulated_genes"] = self.var.index.tolist()
    return sub

`plot_subgraph`

plot_subgraph plots a subgraph of the gene regulatory network (GRN) centered around a seed gene.

Parameters:

seed (str or list) –

The seed gene or list of genes around which the subgraph will be centered.
gene_col (str, default: 'symbol' ) –

The column name in the .var DataFrame that contains gene identifiers. Defaults to "symbol".
max_genes (int, default: 10 ) –

The maximum number of genes to include in the subgraph. Defaults to 10.
only (float, default: 0.3 ) –

The threshold for filtering connections. If less than 1, it is used as a minimum weight threshold. If 1 or greater, it is used as the number of top connections to retain. Defaults to 0.3.
palette (list, default: base_color_palette ) –

The color palette to use for plotting. Defaults to base_color_palette.
interactive (bool, default: True ) –

Whether to create an interactive plot. Defaults to True.
do_enr (bool, default: False ) –

Whether to perform enrichment analysis on the subgraph. Defaults to False.

Returns:	– d3graph or None: The d3graph object if interactive is True, otherwise None.

Source code in grnndata/GRNAnnData.py

def plot_subgraph(
    self,
    seed: str,
    gene_col: str = "symbol",
    max_genes: int = 10,
    only: float = 0.3,
    palette: list = base_color_palette,
    interactive: bool = True,
    do_enr: bool = False,
    **kwargs: dict,
):
    """
    plot_subgraph plots a subgraph of the gene regulatory network (GRN) centered around a seed gene.

    Args:
        seed (str or list): The seed gene or list of genes around which the subgraph will be centered.
        gene_col (str, optional): The column name in the .var DataFrame that contains gene identifiers. Defaults to "symbol".
        max_genes (int, optional): The maximum number of genes to include in the subgraph. Defaults to 10.
        only (float, optional): The threshold for filtering connections. If less than 1, it is used as a minimum weight threshold. If 1 or greater, it is used as the number of top connections to retain. Defaults to 0.3.
        palette (list, optional): The color palette to use for plotting. Defaults to base_color_palette.
        interactive (bool, optional): Whether to create an interactive plot. Defaults to True.
        do_enr (bool, optional): Whether to perform enrichment analysis on the subgraph. Defaults to False.

    Returns:
        d3graph or None: The d3graph object if interactive is True, otherwise None.
    """
    rn = {k: v for k, v in self.var[gene_col].items()}
    if type(seed) is str:
        gene_id = self.var[self.var[gene_col] == seed].index[0]
        elem = self.grn.loc[gene_id].sort_values(ascending=False).head(
            max_genes
        ).index.tolist() + [gene_id]
    else:
        elem = seed

    mat = self.grn.loc[elem, elem].rename(columns=rn).rename(index=rn)
    if only < 1:
        mat[mat < only] = 0
    else:
        top_connections = mat.stack().nlargest(only)
        top_connections.index.names = ["Gene1", "Gene2"]
        top_connections.name = "Weight"
        top_connections = top_connections.reset_index()
        mat.index.name += "_2"
        # Set anything not in the top N connections to 0
        mask = mat.stack().isin(
            top_connections.set_index(["Gene1", "Gene2"])["Weight"]
        )
        mat[~mask.unstack()] = 0
    mat = mat * 100
    color = [palette[0]] * len(mat)
    if type(seed) is str:
        color[mat.columns.get_loc(seed)] = palette[1]
    mat = mat.T
    if interactive:
        d3 = d3graph()
        d3.graph(mat, color=None)
        d3.set_node_properties(color=color, fontcolor="#000000", **kwargs)
        d3.set_edge_properties(directed=True)
        d3.show(notebook=True)
        return d3
    else:
        # Create a graph from the DataFrame
        G = nx.from_pandas_adjacency(mat, create_using=nx.DiGraph())
        # Draw the graph
        plt.figure(figsize=(15, 15))  # Increase the size of the plot
        nx.draw(G, with_labels=True, arrows=True)
        plt.show()
    if do_enr:
        enr = gp.enrichr(
            gene_list=list(G.nodes),
            gene_sets=[
                "KEGG_2021_Human",
                "MSigDB_Hallmark_2020",
                "Reactome_2022",
                "Tabula_Sapiens",
                "WikiPathway_2023_Human",
                "TF_Perturbations_Followed_by_Expression",
                "Reactome",
                "PPI_Hub_Proteins",
                "OMIM_Disease",
                "GO_Molecular_Function_2023",
            ],
            organism="Human",  # change accordingly
            # description='pathway',
            # cutoff=0.08, # test dataset, use lower value for real case
            background=self.var.symbol.tolist(),
        )
        print(enr.res2d.head(20))
    return G

Documentation for GRNAnnData module

grnndata.GRNAnnData

grn property

regulators property

targets property

concat

extract_links

get

plot_subgraph

Documentation for `GRNAnnData` module

`grnndata.GRNAnnData`

`grn` `property`

`regulators` `property`

`targets` `property`

`concat`

`extract_links`

`get`

`plot_subgraph`