scprint
  • Home
  • structure
  • pre-training
  • usage example

example notebooks

  • scPRINT use case on BPH
  • scPRINT use case on BPH (part 2, GN analysis)

documentation

  • model
  • tasks
  • cli
  • embedders
  • utils
scprint
  • example notebooks
  • scPRINT use case on BPH (part 2, GN analysis)

scPRINT use case on BPH (part 2, GN analysis)¶

In this use-case, which some of the results are presented in Figure 5 of our manuscript, we perform an extensive analysis of gene networks generated by scPRINT in our previous notebook for fibroblasts of the prostate in both normal and benign prostatic hyperplasia states.

We can look at many patterns from the gene networks such as hub genes, gene modules, and gene-gene interactions. Some of them are related to highly variable and differentially expressed genes while others can only be infered by using the gene networks.

Table of content:

  1. Hub genes
  2. Network similarity
  3. Network communities
    1. Cancer version
    2. Normal version
  4. Differential Gene - Gene connections
  5. Using classification
In [1]:
Copied!
from bengrn import BenGRN
import scanpy as sc
from bengrn.base import train_classifier

from anndata.utils import make_index_unique
from grnndata import utils as grnutils
from grnndata import read_h5ad

from matplotlib import pyplot as plt
from scdataloader import utils as data_utils
import numpy as np

from pyvis import network as pnx
import networkx as nx
import scipy.sparse
import pandas as pd
import gseapy as gp
from gseapy import dotplot

%load_ext autoreload
%autoreload 2
from bengrn import BenGRN import scanpy as sc from bengrn.base import train_classifier from anndata.utils import make_index_unique from grnndata import utils as grnutils from grnndata import read_h5ad from matplotlib import pyplot as plt from scdataloader import utils as data_utils import numpy as np from pyvis import network as pnx import networkx as nx import scipy.sparse import pandas as pd import gseapy as gp from gseapy import dotplot %load_ext autoreload %autoreload 2
💡 connected lamindb: jkobject/scprint
In [ ]:
Copied!
prostate_combined = sc.read_h5ad("../data/prostate_combined_o2uniqsx.h5ad")
prostate_combined = sc.read_h5ad("../data/prostate_combined_o2uniqsx.h5ad")
In [2]:
Copied!
# load the generated gene networks from the previous notebook
grn = read_h5ad("../../data/prostate_fibro_grn.h5ad")
grn_c = read_h5ad("../../data/prostate_cancer_fibro_grn.h5ad")
#remove duplicates
grn.var.symbol = make_index_unique(grn.var.symbol.astype(str))
grn_c.var.symbol = make_index_unique(grn_c.var.symbol.astype(str))

# convert gene ids to symbols
grn.var['ensembl_id'] = grn.var.index
grn.var.index = grn.var.symbol
grn_c.var['ensembl_id'] = grn_c.var.index
grn_c.var.index = grn_c.var.symbol

grn.obs.cleaned_pred_disease_ontology_term_id.value_counts(),grn_c.obs.cleaned_pred_disease_ontology_term_id.value_counts()
# load the generated gene networks from the previous notebook grn = read_h5ad("../../data/prostate_fibro_grn.h5ad") grn_c = read_h5ad("../../data/prostate_cancer_fibro_grn.h5ad") #remove duplicates grn.var.symbol = make_index_unique(grn.var.symbol.astype(str)) grn_c.var.symbol = make_index_unique(grn_c.var.symbol.astype(str)) # convert gene ids to symbols grn.var['ensembl_id'] = grn.var.index grn.var.index = grn.var.symbol grn_c.var['ensembl_id'] = grn_c.var.index grn_c.var.index = grn_c.var.symbol grn.obs.cleaned_pred_disease_ontology_term_id.value_counts(),grn_c.obs.cleaned_pred_disease_ontology_term_id.value_counts()
Out[2]:
(cleaned_pred_disease_ontology_term_id
 normal    300
 Name: count, dtype: int64,
 cleaned_pred_disease_ontology_term_id
 benign prostatic hyperplasia    300
 Name: count, dtype: int64)

1. Hub genes¶

We will use 2 different definition of centrality to compare the hubs of the networks in both states. The first one is the degree centrality, which is the number of connections of a node. The second one is the eigenvector centrality, which is the sum of the centrality of the neighbors of a node.

For each gene we mostly interogate their meaning with genecards, e.g. https://www.genecards.org/cgi-bin/carddisp.pl?gene=CD99

In [7]:
Copied!
# between the two networks we have ~3000 genes in common over their 4000 genes
common = set(grn_c.var.symbol) & set(grn.var.symbol)
len(common)
# between the two networks we have ~3000 genes in common over their 4000 genes common = set(grn_c.var.symbol) & set(grn.var.symbol) len(common)
Out[7]:
2881
In [8]:
Copied!
# hub genes based on edge centrality (number of connections / strength of connections)
grn.grn.sum(0).sort_values(ascending=False).head(20)
# hub genes based on edge centrality (number of connections / strength of connections) grn.grn.sum(0).sort_values(ascending=False).head(20)
Out[8]:
symbol
S100A6            58.131020
TGIF2-RAB5IF      51.177635
MIF               44.534096
DNAJB9            40.953262
IGFBP7            38.629910
APOD              38.003906
BRME1             37.598698
SPARCL1           37.160854
TIMP1             36.671650
DCN               34.621376
C1S               32.746254
MGP               30.496748
nan-270           29.884596
SLC25A6           29.308729
BLOC1S5-TXNDC5    28.986681
CETN1             28.746281
FTL               28.266533
AKR1C1            27.641245
FBLN1             26.828733
MT2A              25.590197
dtype: float32
In [7]:
Copied!
# without taking in account genes that are not present in the cancer network
grn.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)
# without taking in account genes that are not present in the cancer network grn.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)
Out[7]:
symbol
TGIF2-RAB5IF      51.177635
IGFBP7            38.629910
APOD              38.003906
BRME1             37.598698
SPARCL1           37.160854
TIMP1             36.671650
DCN               34.621376
C1S               32.746254
MGP               30.496748
BLOC1S5-TXNDC5    28.986681
AKR1C1            27.641245
FBLN1             26.828733
MT2A              25.590197
MYOC              25.428144
FAM205A           25.296257
YBX3              24.937267
CST3              24.676405
C11orf96          18.731924
TFPI              18.429214
ATP6V0C           17.695705
dtype: float32
In [11]:
Copied!
# cancer hub genes
grn_c.grn.sum(0).sort_values(ascending=False).head(20)
# cancer hub genes grn_c.grn.sum(0).sort_values(ascending=False).head(20)
Out[11]:
symbol
HSPA1A          75.300415
MT2A            57.938267
CREM            42.478115
TGIF2-RAB5IF    39.980556
HSPE1           38.607624
CALD1           36.512047
SPOCK3          35.564945
HLA-A           33.446110
SPARCL1         33.403152
RBP1            32.482758
C1S             32.037743
BRME1-1         31.879770
FABP4           31.604069
nan-99          31.555256
LUM             30.523245
EIF4A1          29.690474
ATP6V0C         29.475002
PDZD2           27.333025
DEFA1           26.979454
CTSG            25.070097
dtype: float32
In [10]:
Copied!
# without taking in account genes not present in the normal network
grn_c.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)
# without taking in account genes not present in the normal network grn_c.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)
Out[10]:
symbol
HSPA1A            75.300415
MT2A              57.938267
TGIF2-RAB5IF      39.980556
SPOCK3            35.564945
HLA-A             33.446110
SPARCL1           33.403152
C1S               32.037743
nan-99            31.555256
LUM               30.523245
EIF4A1            29.690474
ATP6V0C           29.475002
DEFA1             26.979454
IGFBP7            23.982243
DCN               23.304916
MGP               22.596451
CD99              21.624668
BLOC1S5-TXNDC5    20.914230
SERPING1          20.824636
COL6A2            18.076559
THBS1             17.826540
dtype: float32
In [12]:
Copied!
# top differential hubs
TOP = 10
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
# top differential hubs TOP = 10 (set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[12]:
{'HLA-A', 'HSPA1A', 'MT2A', 'SPOCK3'}
In [13]:
Copied!
TOP = 20
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 20 (set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[13]:
{'ATP6V0C', 'DEFA1', 'EIF4A1', 'HLA-A', 'HSPA1A', 'LUM', 'SPOCK3', 'nan-99'}
In [14]:
Copied!
TOP = 50
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 50 (set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[14]:
{'CD99',
 'CPE',
 'DEFA1',
 'EIF4A1',
 'HLA-A',
 'HSPA1A',
 'LGALS1',
 'LUM',
 'PYDC2',
 'SERPING1',
 'SPOCK3',
 'THBS1',
 'nan-191',
 'nan-99'}

overall we see that both are somewhat similar in the way the networks classifies main nodes but at least around half of the top 200 nodes are different between the two datasets and half of those are not due to diffent gene being used by the networks (50)

genes preferentially given as top elements in the tumor version only:

  • CD99
  • CPE
  • DEFA1
  • EIF4A1
  • HLA-A
  • HSPA1A
  • LGALS1
  • LUM
  • PYDC2
  • SERPING1
  • SPOCK3
  • THBS1
In [3]:
Copied!
# we now compute eigen centrality creating a sparse network by only keeping the top 20 neighbors for each gene in the network
TOP = 20

grnutils.get_centrality(grn, TOP, top_k_to_disp=0)
grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)
# we now compute eigen centrality creating a sparse network by only keeping the top 20 neighbors for each gene in the network TOP = 20 grnutils.get_centrality(grn, TOP, top_k_to_disp=0) grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)
Top central genes: []
Top central genes: []
Out[3]:
[]
In [12]:
Copied!
grn_c.var.centrality.sort_values(ascending=False).head(10)
grn_c.var.centrality.sort_values(ascending=False).head(10)
Out[12]:
symbol
HSPA1A     0.271584
MT2A       0.271584
CREM       0.271574
CD99       0.271517
NR4A3      0.271385
RBP1       0.271130
SOX4       0.264214
ATP6V0C    0.263235
HLA-A      0.232913
LUM        0.202007
Name: centrality, dtype: float64
In [9]:
Copied!
grn_c.var.centrality.sort_values(ascending=False).head(10)
grn_c.var.centrality.sort_values(ascending=False).head(10)
Out[9]:
symbol
HSPA1A     0.271584
MT2A       0.271584
CREM       0.271574
CD99       0.271517
NR4A3      0.271385
RBP1       0.271130
SOX4       0.264214
ATP6V0C    0.263235
HLA-A      0.232913
LUM        0.202007
Name: centrality, dtype: float64
In [10]:
Copied!
grn.var.centrality.sort_values(ascending=False).head(10)
grn.var.centrality.sort_values(ascending=False).head(10)
Out[10]:
symbol
MT2A       0.276470
SLC25A6    0.276183
CXCL8      0.275859
FTH1       0.275757
MIF        0.267968
DNAJB9     0.265635
SOX4       0.241950
RELB       0.234482
YBX3       0.216567
CST3       0.213585
Name: centrality, dtype: float64
In [13]:
Copied!
TOP = 10
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 10 (set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[13]:
{'ATP6V0C', 'CD99', 'HLA-A', 'HSPA1A', 'LUM'}
In [12]:
Copied!
TOP = 20
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 20 (set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[12]:
{'ATP6V0C',
 'CD99',
 'EIF4A1',
 'HLA-A',
 'HSPA1A',
 'LUM',
 'PAGE4',
 'RYR2',
 'SERPINF1'}
In [11]:
Copied!
TOP = 50
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 50 (set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[11]:
{'C1R',
 'CD99',
 'COL6A2',
 'HLA-A',
 'HNRNPA0',
 'HSPA1A',
 'LUM',
 'PAGE4',
 'SERPINA3',
 'SERPING1',
 'SPOCK3',
 'THBS1'}

it seems the networks are even more similar when comparing them at the level of the centrality of their nodes. this means that some are highly central while maybe not

2. Network similarity¶

We now look at the similarity of the two networks based on general overlap of their top K edges across their common nodes

In [41]:
Copied!
grn.var.index = grn.var.ensembl_id
grn_c.var.index = grn_c.var.ensembl_id
overlap = list(set(grn.var.index) & set(grn_c.var.index))
grn.var.index = grn.var.ensembl_id grn_c.var.index = grn_c.var.ensembl_id overlap = list(set(grn.var.index) & set(grn_c.var.index))
In [43]:
Copied!
K = 20
subgrn_c = grn_c.get(overlap).grn
subgrn_c = subgrn_c.apply(lambda row: row >= row.nlargest(K).min(), axis=1)

subgrn = grn.get(overlap).grn
subgrn = subgrn.apply(lambda row: row >= row.nlargest(K).min(), axis=1)
(subgrn & subgrn_c).sum(1).mean() / K
K = 20 subgrn_c = grn_c.get(overlap).grn subgrn_c = subgrn_c.apply(lambda row: row >= row.nlargest(K).min(), axis=1) subgrn = grn.get(overlap).grn subgrn = subgrn.apply(lambda row: row >= row.nlargest(K).min(), axis=1) (subgrn & subgrn_c).sum(1).mean() / K
Out[43]:
0.5055735930735931

when looking at the top 20 connection for each genes in the network we see that on average only 50% of them agree between the two networks

3. Network communities¶

Again looking at the top 20 connections for each gene in the network we use the leiden (or louvain) algorithm to find communities in both networks.

We then study the hub of these communities as well as their enrichment using enrichr and ontology databases. We will only look at communities between 20 and 200 genes

In [33]:
Copied!
TOP = 20

grnutils.get_centrality(grn, TOP, top_k_to_disp=0)
grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)
TOP = 20 grnutils.get_centrality(grn, TOP, top_k_to_disp=0) grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)
Top central genes: []
Top central genes: []
Out[33]:
[]
In [17]:
Copied!
grn_c = grnutils.compute_cluster(grn_c, 1.5, use="leiden", n_iterations=10, max_comm_size=100)
grn_c = grnutils.compute_cluster(grn_c, 1.5, use="leiden", n_iterations=10, max_comm_size=100)
In [4]:
Copied!
grn = grnutils.compute_cluster(grn, 1.5)
grn_c = grnutils.compute_cluster(grn_c, 1.5)
grn = grnutils.compute_cluster(grn, 1.5) grn_c = grnutils.compute_cluster(grn_c, 1.5)

3.1 Cancer version¶

In [30]:
Copied!
# our communities
grn.var['cluster_1.5'].value_counts().head(10)
# our communities grn.var['cluster_1.5'].value_counts().head(10)
Out[30]:
cluster_1.5
0     1620
1     1483
2      732
3      111
4       23
5        9
6        4
16       1
23       1
22       1
Name: count, dtype: int64
In [31]:
Copied!
G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='4'].index.tolist(), only=200, interactive=False)#
G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='4'].index.tolist(), only=200, interactive=False)#
['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['PRDM6', 'MSANTD3', 'STAG3', 'TRHDE', 'TBX5', 'HEPH', 'GALNT16',
       'NFATC4', 'SYNDIG1', 'HSF4', 'ZNF423', 'PLA2G4C', 'PLPPR2', 'SCN1B',
       'ASPA', 'SNX15', 'BAG2', 'PTK7', 'HRH2', 'PLCL1', 'PTCH2', 'TGFB3',
       'RASL11A', 'TOX2', 'CPNE5', 'GLIS2', 'HIVEP3', 'FGF13', 'ACAP3',
       'DUSP26', 'CTIF', 'CAMK1', 'SLC9A5', 'FBN2', 'LRIG1', 'LIMD1', 'TRPC1',
       'SLC2A13', 'ADAMTS12', 'FZD7', 'AFF2', 'CACNA1D', 'MRAS', 'ST6GALNAC6',
       'FGFR4', 'PTGER1', 'PKDCC', 'SLC9B2', 'NPY1R', 'ANKRD33B', 'DCLK2',
       'PURG', 'GXYLT2', 'LRRN4CL', 'CA8', 'EXOC3L1', 'BHLHE22', 'AATK',
       'CMC4', 'BOLA2', 'CCR10', 'TCEAL2', 'SLC24A3', 'ROR1', 'SLC51B',
       'EMID1', 'AGMO', 'CENPP', 'OPTC', 'SPOCK3', 'COL27A1', 'DACT3', 'STMN3',
       'DLL1', 'SMOC1', 'COLGALT2', 'ZDBF2', 'DOK6', 'HNRNPUL2-BSCL2',
       'C3orf84', 'C2orf74', 'ARHGEF25', 'NPIPB5', 'TNFSF12-TNFSF13',
       'ZFP91-CNTF', 'nan-41', 'nan-81', 'nan-96', 'nan-98', 'HERC3',
       'nan-116', 'nan-117'],
      dtype='object', name='symbol_2')
No description has been provided for this image
In [26]:
Copied!
# if you want to draw better looking networks
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_c_4.html")
# if you want to draw better looking networks pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True) pnet.from_nx(G) #first_node = list(G.nodes)[-1] #pnet.get_node(first_node)['color'] = "red" pnet.save_graph("../figures/pyvis/grn_c_4.html")
In [32]:
Copied!
# Assuming 'node_names' contains the list of gene names.
# Note: it seems with enrichr, one can only use one gene set at a time, choose accordingly
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn_c.var.symbol.tolist(),
)
enr.res2d.head(20)
# Assuming 'node_names' contains the list of gene names. # Note: it seems with enrichr, one can only use one gene set at a time, choose accordingly enr = gp.enrichr(gene_list=list(G.nodes), gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'], organism='Human', # change accordingly #description='pathway', #cutoff=0.08, # test dataset, use lower value for real case background=grn_c.var.symbol.tolist(), ) enr.res2d.head(20)
2024-07-25 18:31:53,534 [WARNING] Input library not found: Reactome. Skip
Out[32]:
Gene_set Term P-value Adjusted P-value Old P-value Old adjusted P-value Odds Ratio Combined Score Genes
0 GO_Molecular_Function_2023 Solute:Potassium Antiporter Activity (GO:0022821) 0.000523 0.050652 0 0 inf inf SLC24A3;SLC9A5
1 GO_Molecular_Function_2023 Coreceptor Activity Involved In Wnt Signaling ... 0.001547 0.050652 0 0 86.822222 561.890021 PTK7;ROR1
2 GO_Molecular_Function_2023 Sodium Ion Transmembrane Transporter Activity ... 0.001547 0.050652 0 0 86.822222 561.890021 SLC9A5;SLC9B2
3 GO_Molecular_Function_2023 Calcium Ion Transmembrane Transporter Activity... 0.001700 0.050652 0 0 16.432584 104.795452 SLC24A3;TRPC1;CACNA1D
4 GO_Molecular_Function_2023 Calcium-Dependent Phospholipid Binding (GO:000... 0.003047 0.050652 0 0 43.400000 251.445478 CPNE5;PLA2G4C
5 GO_Molecular_Function_2023 Coreceptor Activity Involved In Wnt Signaling ... 0.003047 0.050652 0 0 43.400000 251.445478 PTK7;ROR1
6 GO_Molecular_Function_2023 Metal Cation:Proton Antiporter Activity (GO:00... 0.003047 0.050652 0 0 43.400000 251.445478 SLC9A5;SLC9B2
7 GO_Molecular_Function_2023 Sodium:Proton Antiporter Activity (GO:0015385) 0.003047 0.050652 0 0 43.400000 251.445478 SLC9A5;SLC9B2
8 GO_Molecular_Function_2023 Iron Ion Binding (GO:0005506) 0.005002 0.070586 0 0 28.925926 153.247105 HEPH;AGMO
9 GO_Molecular_Function_2023 Calcium Channel Activity (GO:0005262) 0.005307 0.070586 0 0 10.099395 52.907606 SLC24A3;TRPC1;CACNA1D
10 GO_Molecular_Function_2023 Wnt Receptor Activity (GO:0042813) 0.007391 0.089364 0 0 21.688889 106.438022 FZD7;ROR1
11 GO_Molecular_Function_2023 RNA Polymerase II-specific DNA-binding Transcr... 0.010110 0.104284 0 0 7.715135 35.445265 TBX5;DUSP26;DLL1
12 GO_Molecular_Function_2023 Phospholipase Activity (GO:0004620) 0.010193 0.104284 0 0 17.346667 79.552387 PLCL1;PLA2G4C
13 GO_Molecular_Function_2023 Monoatomic Cation Channel Activity (GO:0005261) 0.011601 0.110210 0 0 7.284644 32.465172 SLC24A3;TRPC1;CACNA1D
14 GO_Molecular_Function_2023 Sodium Channel Regulator Activity (GO:0017080) 0.013389 0.118714 0 0 14.451852 62.335739 FGF13;SCN1B
15 GO_Molecular_Function_2023 RNA Polymerase II Cis-Regulatory Region Sequen... 0.022992 0.139045 0 0 2.668376 10.066740 HSF4;HIVEP3;BHLHE22;TBX5;ZNF423;NFATC4;GLIS2
16 GO_Molecular_Function_2023 Calcium:Sodium Antiporter Activity (GO:0005432) 0.023000 0.139045 0 0 inf inf SLC24A3
17 GO_Molecular_Function_2023 Potassium:Proton Antiporter Activity (GO:0015386) 0.023000 0.139045 0 0 inf inf SLC9A5
18 GO_Molecular_Function_2023 Xylosyltransferase Activity (GO:0042285) 0.023000 0.139045 0 0 inf inf GXYLT2
19 GO_Molecular_Function_2023 Tat Protein Binding (GO:0030957) 0.023000 0.139045 0 0 inf inf DLL1
In [33]:
Copied!
# we make a plot of the enrichment
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             top_term=20,
             figsize=(4,10), cutoff=0.25)
# we make a plot of the enrichment enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0] ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(), column="Adjusted P-value", #title='normal fibro PAGE4', cmap=plt.cm.viridis, size=4, # adjust dot size top_term=20, figsize=(4,10), cutoff=0.25)
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(
No description has been provided for this image
In [34]:
Copied!
G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='5'].index.tolist(), only=80, interactive=False)#
G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='5'].index.tolist(), only=80, interactive=False)#
['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['ABCF2', 'ITIH4', 'OLFM2', 'ICAM4', 'RIGI', 'KAZALD1', 'BST1', 'PANX1',
       'C9', 'FBXO2', 'ZNF410', 'ACVR1C', 'VAMP7', 'TOP2A', 'TAF4B',
       'KIAA1755', 'EDA', 'YPEL4', 'ENTPD3', 'GP9', 'NLGN2', 'CLCF1', 'RPRM',
       'ODF3B', 'MSC', 'NTF3', 'COL13A1', 'GP1BB', 'HSPA1A', 'TMEM158',
       'nan-82', 'NBPF19', 'SCO2', 'SMIM41'],
      dtype='object', name='symbol_2')
No description has been provided for this image
In [87]:
Copied!
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_c_5.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True) pnet.from_nx(G) #first_node = list(G.nodes)[-1] #pnet.get_node(first_node)['color'] = "red" pnet.save_graph("../figures/pyvis/grn_c_5.html")
In [44]:
Copied!
grn_c.var.loc['nan-82'] # par of the Glycoside Hydrolase Family
grn_c.var.loc['nan-82'] # par of the Glycoside Hydrolase Family
Out[44]:
uid                    4giyQrh7tkXI
symbol                       nan-82
ncbi_gene_ids                      
biotype              protein_coding
description           novel protein
synonyms                           
organism_id                       2
public_source_id                9.0
created_by_id                     1
mt                            False
ribo                          False
hb                            False
organism             NCBITaxon:9606
TFs                           False
ensembl_id          ENSG00000269590
centrality                      0.0
cluster_1.5                       5
Name: nan-82, dtype: object
In [35]:
Copied!
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn_c.var.symbol.tolist(),
)
enr.res2d.head(20)
enr = gp.enrichr(gene_list=list(G.nodes), gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'], organism='Human', # change accordingly #description='pathway', #cutoff=0.08, # test dataset, use lower value for real case background=grn_c.var.symbol.tolist(), ) enr.res2d.head(20)
2024-07-25 18:33:16,565 [WARNING] Input library not found: Reactome. Skip
Out[35]:
Gene_set Term P-value Adjusted P-value Old P-value Old adjusted P-value Odds Ratio Combined Score Genes
0 GO_Molecular_Function_2023 Death Receptor Binding (GO:0005123) 0.000416 0.028729 0 0 123.875000 964.236668 EDA;NTF3
1 GO_Molecular_Function_2023 Purine Ribonucleoside Triphosphate Binding (GO... 0.007050 0.073312 0 0 8.626100 42.739637 ABCF2;RIGI;HSPA1A
2 GO_Molecular_Function_2023 Receptor Ligand Activity (GO:0048018) 0.007732 0.073312 0 0 5.742222 27.920829 EDA;CLCF1;NTF3;HSPA1A
3 GO_Molecular_Function_2023 Activin Receptor Activity (GO:0017002) 0.008500 0.073312 0 0 inf inf ACVR1C
4 GO_Molecular_Function_2023 Gap Junction Hemi-Channel Activity (GO:0055077) 0.008500 0.073312 0 0 inf inf PANX1
5 GO_Molecular_Function_2023 NF-kappaB Binding (GO:0051059) 0.008500 0.073312 0 0 inf inf TAF4B
6 GO_Molecular_Function_2023 Hydrolase Activity, Hydrolyzing N-glycosyl Com... 0.008500 0.073312 0 0 inf inf BST1
7 GO_Molecular_Function_2023 Ubiquitin-Protein Transferase Activator Activi... 0.008500 0.073312 0 0 inf inf RIGI
8 GO_Molecular_Function_2023 DNA Binding, Bending (GO:0008301) 0.016930 0.083096 0 0 120.151515 490.060134 TOP2A
9 GO_Molecular_Function_2023 Disordered Domain Specific Binding (GO:0097718) 0.016930 0.083096 0 0 120.151515 490.060134 HSPA1A
10 GO_Molecular_Function_2023 UDP Phosphatase Activity (GO:0045134) 0.016930 0.083096 0 0 120.151515 490.060134 ENTPD3
11 GO_Molecular_Function_2023 GDP Phosphatase Activity (GO:0004382) 0.016930 0.083096 0 0 120.151515 490.060134 ENTPD3
12 GO_Molecular_Function_2023 Nucleoside Diphosphate Phosphatase Activity (G... 0.016930 0.083096 0 0 120.151515 490.060134 ENTPD3
13 GO_Molecular_Function_2023 ATP Binding (GO:0005524) 0.017220 0.083096 0 0 11.204545 45.509349 ABCF2;HSPA1A
14 GO_Molecular_Function_2023 Ubiquitin Protein Ligase Binding (GO:0031625) 0.020064 0.083096 0 0 10.265625 40.126707 RIGI;HSPA1A
15 GO_Molecular_Function_2023 Ubiquitin-Like Protein Ligase Binding (GO:0044... 0.021555 0.083096 0 0 9.852500 37.805688 RIGI;HSPA1A
16 GO_Molecular_Function_2023 Growth Factor Activity (GO:0008083) 0.023090 0.083096 0 0 9.471154 35.690549 CLCF1;NTF3
17 GO_Molecular_Function_2023 Adenyl Ribonucleotide Binding (GO:0032559) 0.024670 0.083096 0 0 9.118056 33.756535 ABCF2;HSPA1A
18 GO_Molecular_Function_2023 C3HC4-type RING Finger Domain Binding (GO:0055... 0.025290 0.083096 0 0 60.060606 220.863728 HSPA1A
19 GO_Molecular_Function_2023 Ubiquitin Binding (GO:0043130) 0.025290 0.083096 0 0 60.060606 220.863728 TOP2A
In [36]:
Copied!
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             top_term=20,
             figsize=(4,10), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0] ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(), column="Adjusted P-value", #title='normal fibro PAGE4', cmap=plt.cm.viridis, size=4, # adjust dot size top_term=20, figsize=(4,10), cutoff=0.25)
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(
No description has been provided for this image

similar story to cluster above. this time with PTN often found in the litterature with 6 other m Studies on the role of STEAP1, OR51E2, PTN, ABCC4, NDRG2 have been relatively in-depth. For instance, STEAP1 is all-called six-transmembrane epithelial antigen of the prostate 1, which belongs to a family of metalloproteinases involved in iron and copper homeostasis and other cellular processes

has been listed with 8 other members as predictors of CAF-related gene prognostic index (CRGPI) in prostate cancer. https://www.nature.com/articles/s41598-023-36125-0#Sec11

(NDRG2, TSPAN1, PTN, APOE, OR51E2, P4HB, STEAP1 and ABCC4 were used to construct molecular subtypes and)

https://www.genecards.org/cgi-bin/carddisp.pl?gene=GSN

3.2 Normal version¶

In [9]:
Copied!
grn.var['cluster_1.5'].value_counts().head(10)
grn.var['cluster_1.5'].value_counts().head(10)
Out[9]:
cluster_1.5
0     1620
1     1483
2      732
3      111
4       23
5        9
6        4
16       1
23       1
22       1
Name: count, dtype: int64

https://www.genecards.org/cgi-bin/carddisp.pl?gene=MYOC

In [37]:
Copied!
G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='3'].index.tolist(), only=200, interactive=False)
G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='3'].index.tolist(), only=200, interactive=False)
['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['TMEM132A', 'ADGRA2', 'AGPAT4', 'THAP3', 'DNAJC25', 'TRAM2', 'NTN1',
       'STAG3', 'COL4A4', 'HSD17B14',
       ...
       'nan-75', 'TRABD2B', 'nan-77', 'PCGF2', 'nan-107', 'nan-182', 'nan-192',
       'nan-195', 'nan-239', 'nan-259'],
      dtype='object', name='symbol_2', length=111)
No description has been provided for this image
In [26]:
Copied!
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_3.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True) pnet.from_nx(G) #first_node = list(G.nodes)[-1] #pnet.get_node(first_node)['color'] = "red" pnet.save_graph("../figures/pyvis/grn_3.html")
In [27]:
Copied!
grn.var.loc['nan-259'] # par of the Glycoside Hydrolase Family
grn.var.loc['nan-259'] # par of the Glycoside Hydrolase Family
Out[27]:
uid                    62JAXxQNqtOd
symbol                      nan-259
ncbi_gene_ids                 83737
biotype              protein_coding
description           novel protein
synonyms                           
organism_id                       2
public_source_id                9.0
created_by_id                     1
mt                            False
ribo                          False
hb                            False
organism             NCBITaxon:9606
TFs                           False
ensembl_id          ENSG00000289720
centrality                      0.0
cluster_1.5                       3
Name: nan-259, dtype: object
In [12]:
Copied!
grn.get(grn.var[grn.var['cluster_1.5']=='3'].index.tolist()).grn.sum(0).sort_values(ascending=False).head(3)##.grn # selenom
grn.get(grn.var[grn.var['cluster_1.5']=='3'].index.tolist()).grn.sum(0).sort_values(ascending=False).head(3)##.grn # selenom
Out[12]:
symbol
RNASEK     0.942102
SELENOM    0.292933
nan-259    0.256140
dtype: float32

RNASEK acidifying the internal vesicle important in endocytosis# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10547866/ RNA phosphodiester bond hydrolysis --> hydrolisis pathway in the model and RNA transcription regulation

In [38]:
Copied!
# Assuming 'node_names' contains the list of gene names
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(20)
# Assuming 'node_names' contains the list of gene names enr = gp.enrichr(gene_list=list(G.nodes), gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'], organism='Human', # change accordingly #description='pathway', #cutoff=0.08, # test dataset, use lower value for real case background=grn.var.symbol.tolist(), ) enr.res2d.head(20)
2024-07-25 18:33:49,291 [WARNING] Input library not found: Reactome. Skip
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/enrichr.py:643: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  self.results = pd.concat(self.results, ignore_index=True)
Out[38]:
Gene_set Term P-value Adjusted P-value Old P-value Old adjusted P-value Odds Ratio Combined Score Genes
0 GO_Molecular_Function_2023 Sulfotransferase Activity (GO:0008146) 0.007252 0.123369 0 0 20.138462 99.212171 CHST7;SULT1C4
1 GO_Molecular_Function_2023 RNA Polymerase II Transcription Regulatory Reg... 0.007713 0.123369 0 0 3.386005 16.472449 ZNF483;HAND2;NTN1;NFATC4;GLIS2;PURG;ZNF286A
2 GO_Molecular_Function_2023 ATPase Binding (GO:0051117) 0.013790 0.123369 0 0 13.415385 57.468682 SLC2A13;NKD2
3 GO_Molecular_Function_2023 Chondroitin Sulfotransferase Activity (GO:0034... 0.016750 0.123369 0 0 inf inf CHST7
4 GO_Molecular_Function_2023 Peptidyl-Proline 4-Dioxygenase Activity (GO:00... 0.016750 0.123369 0 0 inf inf P4HA3
5 GO_Molecular_Function_2023 Aryl Sulfotransferase Activity (GO:0004062) 0.016750 0.123369 0 0 inf inf SULT1C4
6 GO_Molecular_Function_2023 Histone Reader Activity (GO:0140566) 0.016750 0.123369 0 0 inf inf TONSL
7 GO_Molecular_Function_2023 Hydrolase Activity, Hydrolyzing N-glycosyl Com... 0.016750 0.123369 0 0 inf inf BST1
8 GO_Molecular_Function_2023 Tat Protein Binding (GO:0030957) 0.016750 0.123369 0 0 inf inf DLL1
9 GO_Molecular_Function_2023 WW Domain Binding (GO:0050699) 0.016750 0.123369 0 0 inf inf ENTREP3
10 GO_Molecular_Function_2023 RNA Polymerase II Cis-Regulatory Region Sequen... 0.016754 0.123369 0 0 3.180050 13.003634 ZNF483;HAND2;NTN1;NFATC4;GLIS2;ZNF286A
11 GO_Molecular_Function_2023 DNA-binding Transcription Activator Activity, ... 0.026719 0.134555 0 0 5.074219 18.380719 HAND2;INSL3;GLIS2
12 GO_Molecular_Function_2023 Procollagen-Proline Dioxygenase Activity (GO:0... 0.033223 0.134555 0 0 59.575758 202.825781 P4HA3
13 GO_Molecular_Function_2023 N-acetylglucosamine 6-O-sulfotransferase Activ... 0.033223 0.134555 0 0 59.575758 202.825781 CHST7
14 GO_Molecular_Function_2023 ABC-type Xenobiotic Transporter Activity (GO:0... 0.033223 0.134555 0 0 59.575758 202.825781 ABCC1
15 GO_Molecular_Function_2023 alpha-N-acetylgalactosaminide Alpha-2,6-Sialyl... 0.033223 0.134555 0 0 59.575758 202.825781 ST6GALNAC6
16 GO_Molecular_Function_2023 miRNA Binding (GO:0035198) 0.033223 0.134555 0 0 59.575758 202.825781 DND1
17 GO_Molecular_Function_2023 Myosin Heavy Chain Binding (GO:0032036) 0.033223 0.134555 0 0 59.575758 202.825781 NKD2
18 GO_Molecular_Function_2023 Estradiol 17-Beta-Dehydrogenase [NAD(P)] Activ... 0.033223 0.134555 0 0 59.575758 202.825781 HSD17B14
19 GO_Molecular_Function_2023 Solute:Proton Symporter Activity (GO:0015295) 0.033223 0.134555 0 0 59.575758 202.825781 SLC2A13
In [39]:
Copied!
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
            top_term=20,
            figsize=(4,10), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50] ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(), column="Adjusted P-value", #title='normal fibro PAGE4', cmap=plt.cm.viridis, size=4, # adjust dot size top_term=20, figsize=(4,10), cutoff=0.25)
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(
No description has been provided for this image

https://www.genecards.org/cgi-bin/carddisp.pl?gene=PTN

In [40]:
Copied!
G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='4'].index.tolist(), only=80, interactive=False)#
G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='4'].index.tolist(), only=80, interactive=False)#
['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['TRHDE', 'PGR', 'ESR1', 'RASD2', 'SYNDIG1', 'OLFM2', 'FZD10', 'BHMT2',
       'DUSP26', 'FOXF2', 'ADAM33', 'ADAMTS12', 'HS3ST3A1', 'CHODL', 'FTCD',
       'FNDC1', 'HSPB3', 'FOXL1', 'AGMO', 'ADAMTSL2', 'S100A6', 'AKR1B10',
       'SCGB1C2'],
      dtype='object', name='symbol_2')
No description has been provided for this image
In [29]:
Copied!
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_4.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True) pnet.from_nx(G) #first_node = list(G.nodes)[-1] #pnet.get_node(first_node)['color'] = "red" pnet.save_graph("../figures/pyvis/grn_4.html")
In [41]:
Copied!
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['GO_Molecular_Function_2023', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(20)
enr = gp.enrichr(gene_list=list(G.nodes), gene_sets=['GO_Molecular_Function_2023', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'], organism='Human', # change accordingly #description='pathway', #cutoff=0.08, # test dataset, use lower value for real case background=grn.var.symbol.tolist(), ) enr.res2d.head(20)
2024-07-25 18:34:27,578 [WARNING] Input library not found: Reactome. Skip
Out[41]:
Gene_set Term P-value Adjusted P-value Old P-value Old adjusted P-value Odds Ratio Combined Score Genes
0 GO_Molecular_Function_2023 RNA Polymerase II General Transcription Initia... 0.000032 0.000933 0 0 inf inf FOXF2;ESR1
1 GO_Molecular_Function_2023 Estrogen Response Element Binding (GO:0034056) 0.000032 0.000933 0 0 inf inf PGR;ESR1
2 GO_Molecular_Function_2023 General Transcription Initiation Factor Bindin... 0.000095 0.001860 0 0 378.666667 3508.819727 FOXF2;ESR1
3 GO_Molecular_Function_2023 Transcription Coactivator Binding (GO:0001223) 0.000188 0.002780 0 0 189.285714 1623.428489 PGR;ESR1
4 GO_Molecular_Function_2023 Transition Metal Ion Binding (GO:0046914) 0.000446 0.005263 0 0 13.515099 104.272240 BHMT2;AGMO;ADAM33;TRHDE
5 GO_Molecular_Function_2023 Metalloendopeptidase Activity (GO:0004222) 0.000551 0.005414 0 0 22.794231 171.058857 ADAMTSL2;ADAM33;ADAMTS12
6 GO_Molecular_Function_2023 Metallopeptidase Activity (GO:0008237) 0.000672 0.005666 0 0 21.155357 154.536591 ADAMTSL2;ADAM33;ADAMTS12
7 GO_Molecular_Function_2023 Transcription Coregulator Binding (GO:0001221) 0.001111 0.008195 0 0 54.013605 367.419603 PGR;ESR1
8 GO_Molecular_Function_2023 DNA-binding Transcription Activator Activity, ... 0.001429 0.009365 0 0 15.972973 104.640056 FOXF2;PGR;ESR1
9 GO_Molecular_Function_2023 ATPase Binding (GO:0051117) 0.002016 0.011893 0 0 37.780952 234.495632 PGR;ESR1
10 GO_Molecular_Function_2023 Zinc Ion Binding (GO:0008270) 0.002284 0.012250 0 0 13.407955 81.545835 BHMT2;ADAM33;TRHDE
11 GO_Molecular_Function_2023 Cis-Regulatory Region Sequence-Specific DNA Bi... 0.004517 0.021203 0 0 6.945569 37.504924 FOXF2;PGR;FOXL1;ESR1
12 GO_Molecular_Function_2023 RNA Polymerase II Cis-Regulatory Region Sequen... 0.005523 0.021203 0 0 6.541596 34.009122 FOXF2;PGR;FOXL1;ESR1
13 GO_Molecular_Function_2023 DNA Binding, Bending (GO:0008301) 0.005750 0.021203 0 0 inf inf FOXL1
14 GO_Molecular_Function_2023 TBP-class Protein Binding (GO:0017025) 0.005750 0.021203 0 0 inf inf ESR1
15 GO_Molecular_Function_2023 [Heparan Sulfate]-Glucosamine 3-Sulfotransfera... 0.005750 0.021203 0 0 inf inf HS3ST3A1
16 GO_Molecular_Function_2023 RNA Polymerase II Transcription Regulatory Reg... 0.008164 0.028333 0 0 5.812950 27.949028 FOXF2;PGR;FOXL1;ESR1
17 GO_Molecular_Function_2023 Endopeptidase Activity (GO:0004175) 0.008882 0.028593 0 0 8.021918 37.893119 ADAMTSL2;ADAM33;ADAMTS12
18 GO_Molecular_Function_2023 DNA Binding (GO:0003677) 0.009208 0.028593 0 0 7.911486 37.086673 FOXF2;PGR;FOXL1
19 GO_Molecular_Function_2023 Heparan Sulfate Sulfotransferase Activity (GO:... 0.011468 0.030756 0 0 180.727273 807.520606 HS3ST3A1
In [42]:
Copied!
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
                top_term=20,
                figsize=(4,10), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50] ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(), column="Adjusted P-value", #title='normal fibro PAGE4', cmap=plt.cm.viridis, size=4, # adjust dot size top_term=20, figsize=(4,10), cutoff=0.25)
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(
No description has been provided for this image

4. Differential Gene - Gene connections¶

  • look at the connections of some key TFs in here
  • do GSEA over these connections
  • compare the connections of these same TFs for both conditions
In [5]:
Copied!
G = grn_c.plot_subgraph("PAGE4", only=100, max_genes=40, interactive=False)#
G = grn_c.plot_subgraph("PAGE4", only=100, max_genes=40, interactive=False)#
['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#ff7f0e'] Index(['MT2A', 'HSPA1A', 'CREM', 'PTN', 'HLA-A', 'SPARCL1', 'SPOCK3', 'NR4A3',
       'APOD', 'CALD1', 'MGP', 'HSPE1', 'EIF4A1', 'CD99', 'EMP1', 'FABP4',
       'NAMPT', 'NNMT', 'RBP1', 'LUM', 'IGFBP7', 'THBS1', 'YBX3', 'NAF1',
       'DCN', 'A2M', 'CPE', 'C1R', 'ATP6V0C', 'MATN2', 'COL6A2', 'UBE2S',
       'CST3', 'RHOQ', 'UBE2C', 'C1S', 'H2AZ1', 'GPRC5A', 'HLA-C', 'LGALS1',
       'PAGE4'],
      dtype='object', name='symbol_2')
No description has been provided for this image
In [96]:
Copied!
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
first_node = list(G.nodes)[-1]
pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_c_PAGE4.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True) pnet.from_nx(G) first_node = list(G.nodes)[-1] pnet.get_node(first_node)['color'] = "red" pnet.save_graph("../figures/pyvis/grn_c_PAGE4.html")
In [22]:
Copied!
# Assuming 'node_names' contains the list of gene names
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=[
                #'KEGG_2021_Human', 
                 #'MSigDB_Hallmark_2020',
                 #'Reactome_2022', 
                 #'Tabula_Sapiens', 
                 'WikiPathway_2023_Human', #'TF_Perturbations_Followed_by_Expression', 
                 #'PPI_Hub_Proteins', 
                 #'OMIM_Disease', 
                 #'GO_Molecular_Function_2023'
                 ],
                 organism='Human', # change accordingly
                 #description='pathway',
                # cutoff=0.02, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(10)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split('R-HSA').str[0].str.split(' WP').str[0].str[:50]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             top_term=20,
             figsize=(4,10), cutoff=0.25)
# Assuming 'node_names' contains the list of gene names enr = gp.enrichr(gene_list=list(G.nodes), gene_sets=[ #'KEGG_2021_Human', #'MSigDB_Hallmark_2020', #'Reactome_2022', #'Tabula_Sapiens', 'WikiPathway_2023_Human', #'TF_Perturbations_Followed_by_Expression', #'PPI_Hub_Proteins', #'OMIM_Disease', #'GO_Molecular_Function_2023' ], organism='Human', # change accordingly #description='pathway', # cutoff=0.02, # test dataset, use lower value for real case background=grn.var.symbol.tolist(), ) enr.res2d.head(10) enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split('R-HSA').str[0].str.split(' WP').str[0].str[:50] ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(), column="Adjusted P-value", title='normal fibro PAGE4', cmap=plt.cm.viridis, size=4, # adjust dot size top_term=20, figsize=(4,10), cutoff=0.25)
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(
No description has been provided for this image
In [139]:
Copied!
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             figsize=(4,5), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0] ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(), column="Adjusted P-value", title='normal fibro PAGE4', cmap=plt.cm.viridis, size=4, # adjust dot size figsize=(4,5), cutoff=0.25)
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(
No description has been provided for this image
In [17]:
Copied!
G = grn.plot_subgraph("PAGE4", only=100, max_genes=40, interactive=False)
G = grn.plot_subgraph("PAGE4", only=100, max_genes=40, interactive=False)
['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#ff7f0e'] Index(['nan-270', 'S100A6', 'MT2A', 'SPARCL1', 'MIF', 'SLC25A6', 'IGFBP7',
       'APOD', 'MGP', 'DCN', 'A2M', 'PTN', 'RELB', 'DNAJB9', 'AKR1C1', 'C1S',
       'CST3', 'CXCL8', 'FTH1', 'BRME1', 'FAM205A', 'TIMP1', 'YBX3', 'NPPB',
       'TGIF2-RAB5IF', 'FBLN1', 'MTDH', 'IGFBP5', 'TFPI', 'MAP1B', 'COL6A2',
       'HNRNPA0', 'EIF4A1', 'CCL21', 'FTL', 'CETN1', 'NME2', 'LSAMP', 'ACR',
       'ATP6V0C', 'PAGE4'],
      dtype='object', name='symbol_2')
No description has been provided for this image
In [98]:
Copied!
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
first_node = list(G.nodes)[-1]
pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_PAGE4.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True) pnet.from_nx(G) first_node = list(G.nodes)[-1] pnet.get_node(first_node)['color'] = "red" pnet.save_graph("../figures/pyvis/grn_PAGE4.html")
In [24]:
Copied!
# Assuming 'node_names' contains the list of gene names
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 
                 #'MSigDB_Hallmark_2020',
                 #'Reactome_2022',
                 #'Tabula_Sapiens',
                 #'WikiPathway_2023_Human',
                 # 'TF_Perturbations_Followed_by_Expression', 
                 # 'Reactome', 
                 #'PPI_Hub_Proteins', 
                 #'OMIM_Disease',
                 'GO_Molecular_Function_2023'
                 ],
                 organism='Human', # change accordingly
                 #description='pathway',
                # cutoff=0.02, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(20)
# Assuming 'node_names' contains the list of gene names enr = gp.enrichr(gene_list=list(G.nodes), gene_sets=['KEGG_2021_Human', #'MSigDB_Hallmark_2020', #'Reactome_2022', #'Tabula_Sapiens', #'WikiPathway_2023_Human', # 'TF_Perturbations_Followed_by_Expression', # 'Reactome', #'PPI_Hub_Proteins', #'OMIM_Disease', 'GO_Molecular_Function_2023' ], organism='Human', # change accordingly #description='pathway', # cutoff=0.02, # test dataset, use lower value for real case background=grn.var.symbol.tolist(), ) enr.res2d.head(20)
Out[24]:
Gene_set Term P-value Adjusted P-value Old P-value Old adjusted P-value Odds Ratio Combined Score Genes
0 GO_Molecular_Function_2023 Peptidase Inhibitor Activity (GO:0030414) 0.000020 0.001566 0 0 104.105263 1128.623016 CST3;TIMP1;TFPI
1 GO_Molecular_Function_2023 Transition Metal Ion Binding (GO:0046914) 0.000463 0.016191 0 0 9.025463 69.301557 ACR;MT2A;FTH1;TIMP1;FTL
2 GO_Molecular_Function_2023 Double-Stranded RNA Binding (GO:0003725) 0.000607 0.016191 0 0 101.461538 751.496153 EIF4A1;MTDH
3 GO_Molecular_Function_2023 Endopeptidase Inhibitor Activity (GO:0004866) 0.001010 0.019977 0 0 18.306502 126.278211 CST3;TIMP1;TFPI
4 GO_Molecular_Function_2023 Endopeptidase Regulator Activity (GO:0061135) 0.001498 0.019977 0 0 50.705128 329.757829 CST3;TFPI
5 GO_Molecular_Function_2023 Ferrous Iron Binding (GO:0008198) 0.001498 0.019977 0 0 50.705128 329.757829 FTH1;FTL
6 GO_Molecular_Function_2023 Protease Binding (GO:0002020) 0.002465 0.027606 0 0 12.944079 77.736001 CST3;ACR;TIMP1
7 GO_Molecular_Function_2023 Iron Ion Binding (GO:0005506) 0.002761 0.027606 0 0 33.786325 199.079000 FTH1;FTL
8 GO_Molecular_Function_2023 Calcium Ion Binding (GO:0005509) 0.003252 0.028910 0 0 7.400664 42.393777 CETN1;S100A6;SPARCL1;FBLN1
9 GO_Molecular_Function_2023 Kinase Binding (GO:0019900) 0.005211 0.038377 0 0 9.688322 50.930509 PTN;RELB;HNRNPA0
10 GO_Molecular_Function_2023 RNA Binding (GO:0003723) 0.005277 0.038377 0 0 5.000000 26.222182 EIF4A1;YBX3;DCN;MTDH;HNRNPA0
11 GO_Molecular_Function_2023 Chemokine Activity (GO:0008009) 0.007445 0.039047 0 0 18.405594 90.191859 CXCL8;CCL21
12 GO_Molecular_Function_2023 Chemokine Receptor Binding (GO:0042379) 0.007445 0.039047 0 0 18.405594 90.191859 CXCL8;CCL21
13 GO_Molecular_Function_2023 Metal Ion Binding (GO:0046872) 0.008569 0.039047 0 0 5.523471 26.289761 CETN1;S100A6;SPARCL1;FBLN1
14 GO_Molecular_Function_2023 Protein Kinase Binding (GO:0019901) 0.009285 0.039047 0 0 7.734868 36.193807 PTN;RELB;HNRNPA0
15 GO_Molecular_Function_2023 mRNA 3'-UTR Binding (GO:0003730) 0.009893 0.039047 0 0 15.566075 71.852090 YBX3;HNRNPA0
16 GO_Molecular_Function_2023 NF-kappaB Binding (GO:0051059) 0.010250 0.039047 0 0 inf inf MTDH
17 GO_Molecular_Function_2023 Phosphatase Inhibitor Activity (GO:0019212) 0.010250 0.039047 0 0 inf inf PTN
18 GO_Molecular_Function_2023 Phosphotransferase Activity, Phosphate Group A... 0.010250 0.039047 0 0 inf inf NME2
19 GO_Molecular_Function_2023 Nucleobase-Containing Compound Kinase Activity... 0.010250 0.039047 0 0 inf inf NME2
In [30]:
Copied!
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             figsize=(4,5), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0] ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(), column="Adjusted P-value", title='normal fibro PAGE4', cmap=plt.cm.viridis, size=4, # adjust dot size figsize=(4,5), cutoff=0.25)
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(
No description has been provided for this image

5. Using classification¶

Here we used the network given by the mean of all heads in our scPRINT model. We could also use other scPRINT versions to see if the results are consistent. but we could also use the classification mechanism presented in our manuscript and other notebooks to get to possibly more specialized networks.

In [ ]:
Copied!
grn_normal = grn_inferer(layer=list(range(model.nlayers))[:], cell_type="naive B cell")
grn_normal, m, omni_cls = train_classifier(grn_normal, C=0.5, train_size=0.9, class_weight={
                                    1: 200, 0: 1}, shuffle=True)
grn_normal = grn_inferer(layer=list(range(model.nlayers))[:], cell_type="naive B cell") grn_normal, m, omni_cls = train_classifier(grn_normal, C=0.5, train_size=0.9, class_weight={ 1: 200, 0: 1}, shuffle=True)
number of expressed genes in this cell type: 16801
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Predicting: 0it [00:00, ?it/s]
true elem 2115 ...
doing regression....
metrics {'used_heads': 25, 'precision': 0.0, 'random_precision': 0.0006433550820644942, 'recall': 0.0, 'predicted_true': 177.0, 'number_of_true': 219.0, 'epr': 0.0}
No description has been provided for this image
In [ ]:
Copied!
# highlight differential links on genes that are expressed in both 
grn_normal.varp['all'] = grn_normal.varp['GRN'].copy()
grn_normal.varp['GRN'] = grn_normal.varp['classified']
# highlight differential links on genes that are expressed in both grn_normal.varp['all'] = grn_normal.varp['GRN'].copy() grn_normal.varp['GRN'] = grn_normal.varp['classified']

Notes on findings and their relation to the literature¶

CAV1-expressing CAFs contribute to invasion and metastasis in breast cancer24 loss of caveolin1 (CAV1) is found in metabolically reprogrammed CAFs that promote tumorigenesis

discoidin domain-containing receptor 2 (DDR2)21 and integrin α11β122, have emerged to identify CAFs in the context of a specific TME https://www.nature.com/articles/s12276-023-01013-0

normal fibroblast populations marked by the expression of Gli1 and Hoxb6

CAFs, remain perpetually activated with a high capacity for ECM synthesis and microenvironmental remodeling, leading to stromal desmoplasia, a phenomenon characterized by increased deposition of ECM components in tumors. In this regard, CAFs share many basic characteristics, such as a secretory phenotype and capacity to synthesize ECM components, with fibroblasts found in nonmalignant tissue fibrosis. Therefore, the classic markers found to be expressed in fibroblasts, including α-SMA, vimentin, desmin, fibroblast-specific protein 1 (FSP1; also known as S100A4)

Fibrogenesis is part of a normal protective response to tissue injury that can become irreversible and progressive, leading to fatal diseases. Senescent cells are a main driver of fibrotic diseases through their secretome, known as senescence-associated secretory phenotype (SASP) CAFs (vCAFs), cycling CAFs (cCAFs), and developmental CAFs (dCAFs)

The stem-like ‘universal’ type of fibroblast cell, marked by expression of peptidase inhibitor 16 (Pi16) and Col15a, found in the steady state across tissues, activated fibroblasts, as observed by the development of LRRC15+ CAFs in PDAC.

. SOCS1: Suppressor of cytokine signaling; STAT3: Signal transducer and activator of transcription 3; LDH: Lactate dehydrogenase; PYCR1: pyrroline-5-carboxylate reductase


senescence, and multiple types of fibrotic diseases in mice and humans are characterized by the accumulation of iron. https://www.nature.com/articles/s42255-023-00928-2


Current evidence indicate that cancer-associated fibroblasts (CAFs) play an important role in prostate cancer (PCa) development and progression. playing a role in invasion.

NDRG2, TSPAN1, PTN, APOE, OR51E2, P4HB, STEAP1 and ABCC4 were used to construct molecular subtypes and CAF-related gene prognostic index (CRGPI)

Studies on the role of STEAP1, OR51E2, PTN, ABCC4, NDRG2 have been relatively in-depth. For instance, STEAP1 is all-called six-transmembrane epithelial antigen of the prostate 1, which belongs to a family of metalloproteinases involved in iron and copper homeostasis and other cellular processes37. STEAP1 is overexpressed on the plasma membrane of PCa cells and is associated with PCa invasiveness and metastasis50. The possible mechanism of STEAP1 promoting tumor proliferation and metastasis is by acting as a channel for small molecules that are involved in intercellular communication OR51E2 perhaps could be used as one of the biomarkers for PCa53,54,55,56 https://www.nature.com/articles/s41598-023-36125-0#Sec11


The protein encoded by ABCC4 is a member of the superfamily of ATP-binding cassette (ABC) transporters, which is also called multidrug resistance protein 4 (MRP4)64. MRP4 was reported to be associated with drug resistance of PCa in many studies65,66,67

PRRX1, OSR1, FOXD1, HOXC4,LHX9, TBX3 and TWIST2, targeted five or more TFs inthe fibroblast TRN. This influential-set explained 62.5% ofthe total number of regulatory edges and they collectivelytargeted 15 out of 18 TFs represented in the network. Upona closer examination of the fibroblastic network, the inter-actions among OSR1–PRRX1–TWIST2 were notably co-ordinated, regulating one another in both directions https://www.researchgate.net/publication/264633205_A_transient_disruption_of_fibroblastic_transcriptional_regulatory_network_facilitates_trans-differentiation


FGF receptors are important in PCa initiation and progression. These endocrine FGFs require alpha-Klotho (KL) and/or beta-Klotho (KLB), two related single-pass transmembrane proteins restricted in their tissue distribution. FGF19 is expressed in primary and metastatic PCa tissues where it functions as an autocrine growth factor. https://www.e-cancer.fr/Professionnels-de-sante/Veille-bibliographique/Nota-Bene-Cancer/NBC-174/The-endocrine-fibroblast-growth-factor-FGF19-promotes-prostate-cancer-progression


• Prostate cancer stroma contains diverse populations of fibroblasts. • Carcinoma associated fibroblasts can promote prostate carcinogenesis. • Interaction among fibroblast subsets modulate stromal-epithelial paracrine signaling. The stroma in the normal prostate is predominantly composed of smooth muscle cells while in cancer this is partially replaced by fibroblasts and myofibroblasts6. These fibroblastic cells are responsible for collagen and extracellular matrix (ECM) deposition, changing local organ stiffness

iCAF, myCAF, apCAF

It has been shown that SDF-1/CXCL12, secreted by CAF, acts via TGFβ–mediated upregulation of CXCR4 in epithelial cells to activate Akt signaling associated with tumorigenesis. Overexpression of SDF-1/CXCL12 and TGFβ1 in benign human prostate fibroblasts has been shown to induce malignant transformation of benign prostate epithelial cells and formation of highly invasive tumors in vivo

increased cadherin 2 (CDH2) mRNA expression in LNCaP cells

Upregulation of DNA methyltransferases (DNMT) consistent with aberrant promoter hypermethylation of tumor-suppressor genes in multiple cancer types including PCa has been reported

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8788937/#:~:text=The%20stroma%20in%20the%20normal,deposition%2C%20changing%20local%20organ%20stiffness.

CD10 and GPR77 can define a human CAF subset that sustains cancer stemness


Fibroblasts are also fundamental for inducing angiogenesis by secreting angiogenic factors such as vascular endothelial growth factor (VEGF) and matrix proteins Epithelial cells release TGFβ ligands, Kallikrein-related peptidase-4 (KLK4), CAFs are characterized by molecular markers that are upregulated compared to normal fibroblasts, such as fibroblast activation protein (FAP), PDGFR-β, fibroblast-specific protein-1 (FSP-1) (also known as S100A4)

ACTA2 α-smooth actin (α-SMA) Upregulated RT-qPCR, IHC on tissue microarrays [42,43] ASPN Asporin Upregulated Tag-based RNA profiling, microarray profiling, IHC, RT-qPCR [38,44,45] CAV1 Caveolin-1 Downregulated Tag profiling, IHC, RT-qPCR [38] COL1A1 Collagen Type-I Upregulated RT-qPCR, IHC [42,43] CXCL12 Stromal cell-derived factor 1 (SDF1)/ (C-X-C motif chemokine ligand 12 (CXCL12) Upregulated Tag-profiling, RT-qPCR, ELISA [38,46] FAP Fibroblast activation protein Upregulated IHC [42] FGF2, FGF7, FGF10 Fibroblast growth factor-2/-7/-10 Upregulated RT-qPCR, Western Blot, IHC [43,47,48] FN1 Fibronectin Upregulated Tag profiling, IHC, RT-qPCR [38] ITGA1 Integrin-α1 (CD49a) Upregulated IHC [49] OGN Osteoglycin Upregulated Tag-profiling, IHC, RT-qPCR [38] PDGFRB Platelet-derived growth factor receptor β Upregulated Microarray profiling [44,50] POSTN Periostin Upregulated Microarray profiling, IHC [44] S100A4 Fibroblast-specific protein 1 (FSP1)/S100 Calcium Binding Protein A4 (S100A4) Upregulated Immunofluorescence, RT-qPCR, Western Blot [51,52,53] S100A6 S100 Calcium Binding Protein A6 Downregulated Tag profiling, IHC, RT-qPCR [38] SPARC Secreted Protein Acidic and Cysteine Rich Up/downregulated Microarray profiling, Tag-profiling [38,44] STC1 Stanniocalcin 1 Downregulated Tag profiling, IHC, RT-qPCR [38] THY1 Cluster of differentiation 90 (CD90) antigen Upregulated IHC [54] TNC Tenascin C Upregulated RT-qPCR, IHC [42,43] VIM Vimentin Upregulated IHC

Benign prostatic hyperplasia (BPH) is a non-malignant growth of the prostate, typically occurring in older men with an occurrence of 80–90% of men in their 70s BPH develops in the transition zone differently from PCa foci, which usually develop in the peripheral zone

ESR1= . Here we studied the role of CAF estrogen receptor alpha (ERα) and found that it could protect against PCa invasion. ERα could function through a CAF–epithelial interaction via selectively upregulating thrombospondin 2 (Thbs2) and downregulating matrix metalloproteinase 3 (MMP3) at the protein and messenger RNA levels https://academic.oup.com/carcin/article/35/6/1301/449417

ACTA2: we found that “reactive CAFs” induce shear resistance to prostate tumor cells via intercellular contact and soluble derived factors. The reactive CAFs showed higher expression of α-smooth muscle actin (α-SMA) and fibroblast activation protein (FAP) compared to differentiated CAFs https://www.oncotarget.com/article/27510/


Proteins that exhibited a significant increase in CAF versus NPF were enriched for the functional categories “cell adhesion” and the “extracellular matrix.” The CAF phosphoproteome exhibited enhanced phosphorylation of proteins associated with the “spliceosome” and “actin binding.” COL1A1/2 and COL5A1; the receptor tyrosine kinase discoidin domain-containing receptor 2 (DDR2), a receptor for fibrillar collagens; and lysyl oxidase-like 2 (LOXL2) https://www.mcponline.org/article/S1535-9476(20)31548-6/fulltext


•Cancer-associated fibroblasts (CAFs) exhibit a highly aligned cytoskeleton and extracellular matrix. •CAFs are stiffer than normal fibroblasts from the prostate. •Benign prostate epithelial cells are more compliant after co-culture with CAFs. •Benign epithelial cells are more invasive and proliferative in presence of CAFs. https://www.sciencedirect.com/science/article/pii/S2590006420300338


These findings suggest that CAFs exosomes drive PCa metastasis via the miR-500a-3p/FBXW7/HSF1 axis in a hypoxic microenvironmen https://www.nature.com/articles/s41417-024-00742-2


https://www.genecards.org/cgi-bin/carddisp.pl?gene=CCT8L2 CCT8L2 (Chaperonin Containing TCP1 Subunit 8 Like 2) is a Protein Coding gene. Gene Ontology (GO) annotations related to this gene include calcium-activated potassium channel activity and monoatomic anion channel activity. An important paralog of this gene is CCT8. related to the meta transmembrane activity via

Possible molecular chaperone; assists the folding of proteins upon ATP hydrolysis. The TRiC complex mediates the folding of WRAP53/TCAB1, thereby regulating telomere maintenance (PubMed:25467444). As part of the TRiC complex may play a role in the assembly of BBSome, a complex involved in ciliogenesis regulating transports vesicles to the cilia

https://www.genecards.org/cgi-bin/carddisp.pl?gene=TIMP1 TIMP Metallopeptidase Inhibitor 1 inhibitors of the matrix metalloproteinases (MMPs), a group of peptidases involved in degradation of the extracellular matrix. In addition to its inhibitory role against most of the known MMPs, the encoded protein is able to promote cell proliferation

connected to TLL2: zinc-dependent metalloprotease part of the metzincin protein complex which also is Among its related pathways are Collagen chain trimerization and Extracellular matrix organization.

PDXP, SEPTIN5 cytoskeletal organization and regulation

PRAME inhibits the signaling of retinoic acid and mediates the regulation of selenoproteins

many genes involved in cancer. indeed, it is known that 80% adult male above 70 years old will have such hyperplastic prostate. This is due to scenesence. and that Iron accumulation drives fibrosis, senescence and the senescence-associated secretory phenotype. (of which we have some proteins present here.) A mechanism that this clustered is one of genes that strongly regulates each other in this fibroblasts from pre-cancerous lesions via fibrotic iron accumulation and a secretory mechanism GJC1 (exchange of things between cells (here likely metals?))

Unknown genes like SMIM40. integral membrane protein could be linked to this phenotype

Previous Next

Built with MkDocs using a theme provided by Read the Docs.
« Previous Next »