Tutorial 1 - Analyzing scRNA-seq data at the Protein Activity Level

This Tutorial will guide the user thorugh the analysis of single-cell RNA-seq (scRNA-seq) data at the Protein Activity level using pyviper. Specifically, we will show how to transform a gene expression signature generated from sparse scRNA-seq data into a protein activity matrix using the VIPER algorithm. After a brief description of the installation procedure and the modules needed, this Jupyter Notebook is organized in the following sections.

Table of Contents

Step 1. Load a gene expression signature for single cells
Step 2. Load and inspect a lineage-specific gene regulatory network
Step 3. Convert the gene expression signature into a protein activity matrix using VIPER
Step 4. Analyze single-cells at the Protein Activity level
Key Takeaways

Install PyVIPER

Install pyviper from PyPI using pip. Alternatively, refer to the README in the current GitHub to install from the local directory.

[1]:
# !pip install viper-in-python

Import modules

Load pyviper and additional modules required used in this tutorial.

[2]:
import pyviper
import scanpy as sc
import anndata
import pandas as pd
import numpy as np
import random

import warnings
warnings.filterwarnings("ignore", message=".*The 'nopython' keyword.*") # for jit decorator issue with sc.pp.neighbors (09/30/2023)

Setup the path to the data.

[3]:
data_location = "https://zenodo.org/records/10059791/files/"

We will import the data directly from the above link. Optionally, users are allowed to download the files from here and to load them from their local computer. If the second option is preferred, the following lines can be uncommented and run to download the data on Unix systems and set the working directory to the path where the data will be stored.

[4]:
#!mkdir Tutorial_1_data_dir
#%cd Tutorial_1_data_dir
#!curl -O https://zenodo.org/records/10059791/files/Tutorial_1_gExpr_fibroblast_5802.tsv.gz -O https://zenodo.org/records/10059791/files/fibroblast-net.tsv
#data_location = !pwd
#data_location = data_location[0]+"/" # new value for data if working on data downloaded locally

Step 1. Load a gene expression signature for single-cells

Load a gene expression signature to be used as input for the viper function for Protein Activity inference and store it as an AnnData object to enable interoperability with scanpy. The time required for this process is approximately 5 s and 30 s if loading from the local directory or remote url, respectively.

[5]:
gene_expr_path = data_location + "Tutorial_1_gExpr_fibroblast_5802.tsv.gz"

gene_expr_signature = pd.read_csv(gene_expr_path, sep="\t") # read from remote
gene_expr_signature = sc.AnnData(gene_expr_signature) # convert to AnnData object

This signature was generated from a population of cancer-asssociated fibroblasts (CAFs) from publicly avaiable data from Peng et al., 2019 following a similar procedure to the one described in Tutorial 2.

Display matrix dimensions with cells on rows and features (genes) on columns.

[6]:
gene_expr_signature
[6]:
AnnData object with n_obs × n_vars = 5802 × 5000

Step 2. Load an inspect a lineage-specific gene regulatory network

Load and inspect a lineage-specific gene regulatory network generated with the ARACNe3. The time required to load a network is approximately 1 s (from local directory) and 40 s (from remote url).

[7]:
network_path = data_location + "fibroblast-net.tsv" # path to ARACNe network

network = pd.read_csv(network_path, delimiter="\t")

Convert the pd.DataFrame to an object of class Interactome to enable easier manipulation. Show the number of regulons using the size() method. You can see all available methods by typing help(pyviper.Interactome).

[8]:
network_interactome = pyviper.Interactome('fibroblasts', network) # convert to class Interactome
network_interactome.size() # show number of regulons in the Interactome
[8]:
6437

Display the first 5 interactions in the regulatory network. mor and likelihood are the mode of regulation of the given regulator-target pair and the likelihood of the interaction, respectively. Algorithms other than ARACNe can be used to generate gene regulatory networks, but we recommend postprocessing the output to suitable data formats to use viper.

[9]:
network_interactome.net_table.head()
[9]:
regulator target mor likelihood
0 AATF CCM2 0.012159 0.999799
1 AATF THOC2 0.007487 0.999597
2 AATF RAP1A -0.021062 0.999396
3 AATF SFT2D1 0.019400 0.999195
4 AATF DNAJC2 0.047129 0.998993

Filter out targets in the Interactome that are not present in the gene expression matrix. This step is recommended before running viper.

[10]:
network_interactome.filter_targets(gene_expr_signature.var_names)
Removed 3047484 targets.

As an example, display the number of targets of two regulators, MYC and EGFR2.

[11]:
n_MYC = len(network_interactome.get_reg('MYC')) # number of MYC targets in the network
n_EGFR = len(network_interactome.get_reg('EGFR')) # number of SERPINA12 targets in the network

print("The number of targets of MYC and SERPINA12 are " + str(n_MYC) + " and " + str(n_EGFR) + ", respectively.")
The number of targets of MYC and SERPINA12 are 284 and 183, respectively.

Step 3. Convert the gene expression signature into a protein activity matrix using VIPER

viper transforms a gene expression signature into a protein activity matrix by performing enrichment analysis on regulons. Two methods for enrichment analysis are available, aREA (default) and NaRnEA (by setting enrichment="narnea").

Method 1 - protein activity inference using aREA

As shown above, different regulators can have a different number of targets. Prune each regulon to have the same number of targets (50 in this case). This preprocessing step is advisable to avoid regulators with exceedingly number of targets dominating those having fewer in the computation of Normalized Enrichment Scores (NES), to make NES comparable.

[12]:
network_pruned = network_interactome.copy() # generate a copy of the unpruned network before pruning it
network_pruned.prune(max_targets=50,eliminate=True) # prune interactome to have exactly 50 targets
Removed 664967 targets.
Removed 47 regulators.

Now all the regulators in the network have exactly 50 transcriptional targets. See e.g. MYC and EGFR2.

[13]:
n_MYC = len(network_pruned.get_reg('MYC')) # number of MYC targets in the network
EGFR = len(network_pruned.get_reg('EGFR')) # number of SERPINA12 targets in the network

print("Number of MYC targets: " + str(n_MYC) + "\nNumber of EGFR targets: " + str(EGFR))
Number of MYC targets: 50
Number of EGFR targets: 50

Run viper to compute the protein activity matrix with aREA.

Mandatory inputs to viper are 1. a gene expression signature and 2. a gene regulatory network. Set the viper output to be an ndarray (The default output would be an AnnData object). The expected runtime is approximately 10-15 s.

[14]:
ProtAct_aREA = pyviper.viper(gex_data=gene_expr_signature, # gene expression signature
                             interactome=network_pruned, # gene regulatory network
                             enrichment = "area",
                             output_as_anndata=False,
                             njobs=1,
                             verbose=False)

ProtAct_aREA contains the activity of every regulatory protein described by its NES for each single cell.

[15]:
ProtAct_aREA # display the protein activity matrix
[15]:
AAMP AATF ABCA1 ABCA12 ABCA3 ABCA7 ABCA8 ABCB1 ABCB4 ABCB5 ... ZSWIM5 ZSWIM6 ZSWIM7 ZWINT ZXDA ZXDB ZXDC ZYX ZZEF1 ZZZ3
T1_AAACCTGAGATGTCGG -4.958661 -6.340532 5.591431 0.026641 -0.952873 0.024560 6.763799 -0.220392 -0.051190 0.051432 ... -0.079490 -0.701844 -6.250761 -0.011008 -1.692154 0.018511 -0.282504 -5.392813 0.063155 -0.051505
T1_AACTGGTTCTAACGGT -7.377609 -6.798952 -6.550723 1.626537 0.039575 0.132152 -9.076912 0.024765 -0.054462 0.046818 ... -0.039977 -1.666785 -5.924595 -0.070319 1.271308 -0.008729 0.347169 8.320215 -0.504002 0.030807
T1_AAGGTTCGTGTCAATC -6.881051 -6.227523 -5.938090 0.248098 -0.075314 0.431982 8.251861 0.353197 -0.024522 0.019459 ... -0.217323 1.762852 -5.514518 -0.042265 0.096729 -0.072773 -0.204381 -8.505392 0.020706 -0.005608
T1_ACAGCCGTCAAGCCTA -7.370794 -5.838071 -5.957978 -0.356005 -0.611744 0.173595 9.672680 1.063066 0.021172 0.004548 ... -0.023732 -1.628548 -5.588137 0.008851 -0.634454 -0.010447 -0.095425 -7.882212 0.052736 -0.012834
T1_AGACGTTAGGCGACAT -7.160952 -6.351434 -6.644756 -0.020971 0.197701 0.093815 -7.264978 0.045201 -0.019334 -0.005189 ... 0.016033 -1.136795 -6.277997 0.015763 1.310698 0.022143 -1.424293 -7.538373 -0.124011 -0.004775
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
T24_TTGCGTCCAGACAAAT -3.853770 -3.744690 -5.746431 -0.770776 -0.006618 0.075021 4.486073 0.032965 -0.072500 0.025214 ... -0.040233 2.667354 5.823175 0.706549 -0.558190 -0.006004 0.520060 4.892258 0.977937 0.303938
T24_TTGCGTCCATTAACCG -7.183422 6.414101 -6.321449 -1.070418 -0.604463 0.162847 -9.382614 0.062170 -0.088907 -0.006882 ... -0.078167 -1.471368 -6.268302 -0.468904 -1.011935 0.028369 0.770259 7.526919 0.083573 0.024982
T24_TTTATGCTCCTTTACA -7.402969 -6.773775 -6.956773 0.873204 0.104889 -0.613706 8.306039 -0.014412 -0.003422 -0.008079 ... -0.045534 -2.281823 6.220555 -0.622250 0.353278 -0.006841 -0.106645 -7.967557 0.531523 -0.018268
T24_TTTCCTCCACCGGAAA -5.338859 -5.148087 -4.857930 -0.589181 1.378187 0.016195 -6.906513 -0.113458 -0.017487 -0.001037 ... -0.038365 -1.267985 5.216135 -0.087870 -0.819144 -0.005551 -0.701608 -4.798419 0.062763 -0.084813
T24_TTTGCGCTCCTTTCGG 6.571542 -5.770066 -6.173451 1.207757 -0.328110 0.027743 7.764931 0.028870 -0.032131 0.034307 ... -0.153165 -2.324143 -4.856275 0.808057 -0.381149 0.053897 -0.455522 -6.480079 0.033859 -0.373861

5802 rows × 6390 columns

Method 2 - protein activity inference using (matrix)-NaRnEA

Run viper to compute the protein activity matrix using (matrix-based) NaRnEA .

Compute the protein activity using viper with NaRnEA as the enrichment method. Unlike aREA, the NaRnEA framework is suitable also for regulons of different size. Store the output in an AnnData object (default).

[16]:
ProtAct_NaRnEA = pyviper.viper(gex_data=gene_expr_signature, # gene expression signature
                                 interactome=network_interactome, # gene regulatory network
                                 enrichment = "narnea",
                                 eset_filter=False,
                                 njobs=1,
                                 verbose=False)

Show the output type and length of ProtAct_NaRnEA.

[17]:
ProtAct_NaRnEA # display the protein activity matrix as AnnData object
[17]:
AnnData object with n_obs × n_vars = 5802 × 6413
    uns: 'gex_data'
    layers: 'pes'

When run with the NaRnEA method, viper returns an additional layer, pes that stores the Proportional Enrichment Score (PES) of each regulator, a measure of protein activity in the \(-1 \le PES \le 1\) interval that can be used as an enhanced measure of effect size of each regulon’s activity. For further details, see Griffin et al., 2022.

Display NES protein activity matrix.

[18]:
ProtAct_NaRnEA.to_df() # NES matrix
[18]:
AAMP AATF ABCA1 ABCA12 ABCA3 ABCA7 ABCA8 ABCB1 ABCB4 ABCB5 ... ZSWIM5 ZSWIM6 ZSWIM7 ZWINT ZXDA ZXDB ZXDC ZYX ZZEF1 ZZZ3
T1_AAACCTGAGATGTCGG 4.721734 -6.619827 7.008813 1.861781 1.856598 -0.870238 14.555466 -0.624870 -0.835634 0.415443 ... -1.877274 3.158300 -7.010895 -0.767114 -2.016624 1.473462 2.726915 5.796276 2.409854 -2.594848
T1_AACTGGTTCTAACGGT -7.194335 7.539468 9.435460 4.098622 -0.469555 1.067867 -18.895628 1.415011 -0.808934 1.093563 ... 2.213714 -3.225383 -7.223655 0.832463 1.240214 -1.571365 -1.406364 12.754583 1.835522 2.965396
T1_AAGGTTCGTGTCAATC 8.892053 -9.453808 -10.276663 3.200662 -1.276665 0.668082 20.592220 0.475050 -0.928214 0.497169 ... 1.669964 3.418652 -8.993046 -1.793087 -0.958978 -2.596414 1.049315 -11.111879 -1.967781 -2.485891
T1_ACAGCCGTCAAGCCTA 7.851400 6.905695 9.582031 -2.862549 -0.883946 -1.470669 23.073534 -0.614457 1.140870 0.949438 ... -1.179793 -2.539163 -6.815755 -1.127239 -1.584070 1.258887 2.088022 -11.702254 2.173031 2.511354
T1_AGACGTTAGGCGACAT 7.647149 6.960122 -8.842428 2.599533 0.562035 -0.706634 -15.241218 0.168039 2.954947 -0.942813 ... 1.897963 -2.463266 -9.299325 0.864452 2.009431 -1.286205 -2.902146 8.258941 -2.184118 3.036592
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
T24_TTGCGTCCAGACAAAT 3.936121 3.912197 -7.635411 -2.268020 0.557243 -1.325284 10.868199 0.819879 -1.106013 1.725982 ... -2.446716 2.681818 5.761685 0.989011 -2.355896 -0.963434 1.598874 6.739289 2.979156 1.230419
T24_TTGCGTCCATTAACCG -7.200858 7.800428 8.200803 -2.343332 -1.674645 1.416084 -19.099210 0.633930 -0.581409 -1.299787 ... 1.167182 -2.944020 7.269258 -0.951725 -1.750199 1.705520 -1.838097 10.667582 1.591999 2.984735
T24_TTTATGCTCCTTTACA -7.106027 8.048133 -8.375441 2.619907 0.908485 -2.109688 20.656478 -0.355086 0.885489 -0.590344 ... 1.123289 2.807651 6.441253 1.058726 1.446795 -1.593226 1.463587 -10.230168 1.784666 2.937815
T24_TTTCCTCCACCGGAAA -6.272371 -5.982345 -6.917609 -3.526581 0.889230 -1.033967 -11.206110 0.544282 -1.331863 0.820132 ... 1.272478 -3.339912 5.365629 -2.224107 -1.598793 -1.075789 0.956203 -6.464254 -1.202947 2.896665
T24_TTTGCGCTCCTTTCGG 7.662765 5.670811 -7.459790 3.314262 1.253227 -2.886863 15.754894 0.312123 -1.264925 1.451191 ... -1.528563 -2.797515 5.459777 1.624329 1.406323 1.839792 1.615043 -6.028920 1.614101 2.219795

5802 rows × 6413 columns

Display the PES protein activity matrix.

[19]:
ProtAct_NaRnEA.to_df(layer="pes") # PES matrix
[19]:
AAMP AATF ABCA1 ABCA12 ABCA3 ABCA7 ABCA8 ABCB1 ABCB4 ABCB5 ... ZSWIM5 ZSWIM6 ZSWIM7 ZWINT ZXDA ZXDB ZXDC ZYX ZZEF1 ZZZ3
T1_AAACCTGAGATGTCGG 0.241036 -0.304539 0.281272 0.075302 0.065838 -0.030641 0.378996 -0.021552 -0.028560 0.013191 ... -0.058449 0.109650 -0.352310 -0.026813 -0.070147 0.050491 0.092683 0.262461 0.085498 -0.092541
T1_AACTGGTTCTAACGGT -0.377690 0.314298 0.353704 0.161168 -0.015489 0.036149 -0.498108 0.047821 -0.026871 0.033680 ... 0.065591 -0.115920 -0.365330 0.029592 0.042860 -0.053004 -0.044979 0.546406 0.062974 0.098528
T1_AAGGTTCGTGTCAATC 0.411896 -0.426499 -0.463221 0.121913 -0.040238 0.021839 0.489513 0.015586 -0.029855 0.014794 ... 0.047508 0.107629 -0.455912 -0.057534 -0.030965 -0.085069 0.034159 -0.548071 -0.067285 -0.085520
T1_ACAGCCGTCAAGCCTA 0.374073 0.280960 0.347664 -0.111102 -0.028572 -0.049797 0.563166 -0.020199 0.037355 0.028794 ... -0.035533 -0.090658 -0.345054 -0.037077 -0.052268 0.041115 0.068854 -0.577137 0.073317 0.081781
T1_AGACGTTAGGCGACAT 0.367849 0.286135 -0.395323 0.101320 0.019492 -0.024048 -0.399215 0.005632 0.097293 -0.028792 ... 0.055635 -0.088110 -0.469940 0.030555 0.068925 -0.043036 -0.091780 0.348528 -0.076160 0.099715
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
T24_TTGCGTCCAGACAAAT 0.204764 0.174948 -0.337461 -0.093431 0.019970 -0.047278 0.288253 0.028819 -0.038329 0.055596 ... -0.077137 0.094923 0.273529 0.036050 -0.083271 -0.033768 0.054981 0.311843 0.107347 0.043343
T24_TTGCGTCCATTAACCG -0.380139 0.335792 0.321417 -0.094081 -0.056706 0.048863 -0.508320 0.021759 -0.019652 -0.040733 ... 0.035385 -0.106378 0.330039 -0.032770 -0.060073 0.057730 -0.060230 0.473601 0.055765 0.101769
T24_TTTATGCTCCTTTACA -0.374351 0.343964 -0.370559 0.104330 0.031898 -0.073233 0.525972 -0.012040 0.029794 -0.018420 ... 0.033870 0.095181 0.289916 0.037852 0.050491 -0.054353 0.049172 -0.502406 0.062200 0.099569
T24_TTTCCTCCACCGGAAA -0.333195 -0.276134 -0.311172 -0.141074 0.031418 -0.035984 -0.299382 0.018663 -0.044897 0.025645 ... 0.038321 -0.121415 0.238997 -0.075915 -0.054567 -0.036805 0.032295 -0.321070 -0.042854 0.097915
T24_TTTGCGCTCCTTTCGG 0.387580 0.246106 -0.329504 0.133253 0.044279 -0.101097 0.406666 0.010766 -0.042976 0.045788 ... -0.047366 -0.101365 0.250277 0.058417 0.049472 0.062629 0.054655 -0.296234 0.056873 0.076209

5802 rows × 6413 columns

Notice that when setting the viper output to be an object of class AnnData, the input gene expression matrix is saved under the attribute gex_data of the output AnnData object:

[20]:
ProtAct_NaRnEA.uns["gex_data"][0:5,0:5].to_df()
[20]:
RP11-54O7.3 SAMD11 PLEKHN1 RP11-54O7.17 HES4
T1_AAACCTGAGATGTCGG -0.145409 -0.513375 -0.130302 -0.134275 1.855504
T1_AACTGGTTCTAACGGT -0.108846 -0.390002 -0.096722 -0.101255 2.064095
T1_AAGGTTCGTGTCAATC -0.084144 -0.306366 -0.074026 -0.078937 1.387576
T1_ACAGCCGTCAAGCCTA -0.099796 -0.359357 -0.088408 -0.093080 -0.622520
T1_AGACGTTAGGCGACAT -0.099830 -0.359472 -0.088440 -0.093111 3.986061

Step 4. Analyze single-cells at the Protein Activity level

This section presents some useful postprocessing functions that can be used on the protein activity matrix.

First, set the set of regulatory proteins to be those for the human species. This is step is redundant, since pyviper sets "human" as default, but it demonstrates how to have complete control over the analysis, if fine-tuning is desired.

[21]:
pyviper.config.set_regulators_species_to_use(species="human")

Run PCA to reduce the dimensionality of the dataset, by projecting the PES of transcriptional regulators (TFs) and co-transcriptional regulators (coTFs) by setting filter_by_feature_groups=["tfs", "cotfs"]. In this example, we focus on TFs and coTFs, major mechanistic determinants of cell state as shown in Paull et al., 2021 and Califano & Alvarez, 2016. Possible options for this parameter are ["tfs", "cotfs", "sig", "surf"], where the latter refer to signaling proteins and surface markers, respectively. The default (None) uses all the features in the protein activity matrix.

[22]:
random.seed(0)
pyviper.tl.pca(ProtAct_NaRnEA, layer="pes", filter_by_feature_groups=["tfs","cotfs"], zero_center=True,  svd_solver='arpack', random_state=0)

The tl module provides wrappers to several scanpy.tl functions, and it can be used to simplify the postprocessing of protein activity data, for instance by subsetting the class of regulatory proteins to analyze, as shown in the cell above.

Compute the neighbors graph of cells using the PCA representation of the protein activity matrix. For sake of simplicity, set n_neighbors = 20and n_pcs = 50 (please, notice that typically, these parameters need tuning).

[23]:
sc.pp.neighbors(ProtAct_NaRnEA, metric="correlation", n_neighbors=20, n_pcs=50, random_state=0)
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

Cluster cells based on their protein activity profiles using the Leiden algorithm with resolution=0.1 (in real-world applications this parameter should also be optimized).

[24]:
sc.tl.leiden(ProtAct_NaRnEA, resolution=0.1, n_iterations=-1, random_state=0)

Generate a UMAP embedding for the graph.

[25]:
sc.tl.umap(ProtAct_NaRnEA, random_state=0)

Display the 2-dimensional UMAP embedding using the pyviper.pl module for easier post-processing of key protein activity features. Color by cluster.

[26]:
sc.set_figure_params(dpi=90)
n_cells = ProtAct_NaRnEA.shape[0]
dot_size = 120000 / n_cells*2
pyviper.pl.umap(ProtAct_NaRnEA, color='leiden', palette="Set1", size=dot_size,add_outline=True)
/Users/lucazanella7/mambaforge/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:369: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/lucazanella7/mambaforge/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:379: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/lucazanella7/mambaforge/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(
_images/Tutorial-1_54_1.png

Identify the most active regulators in each cluster by cluster-based integration of their NES using the Stouffer’s method:

\[NES^C_j = \frac{\sum_{k \in C}NES_{k,j}}{\sqrt{n^C}}\]

where \(NES^C_j\) denotes the integrated NES for regulator \(j\) in cluster \(C\), \(NES_{k,j}\) is the NES of regulator \(j\) in cluster \(C\) and \(n^C\) is the number of cells in cluster \(C\).

[27]:
NES_integrated = pyviper.pp.stouffer(ProtAct_NaRnEA, "leiden", filter_by_feature_groups=["tfs","cotfs"], compute_pvals=False, return_as_df=True) # Stouffer integration
NES_integrated
[27]:
AATF ABCG1 ABCG4 ABLIM3 ABT1 ACAD8 ACTN2 ACTR5 ADH1A ADNP ... ZSWIM1 ZSWIM3 ZSWIM4 ZSWIM5 ZSWIM6 ZSWIM7 ZXDA ZXDB ZXDC ZZEF1
0_scores 67.525573 115.613861 27.080476 -28.436313 -132.404139 -33.429712 -13.909552 -47.931126 4.051971 -29.487164 ... -11.106391 -22.387029 62.365929 20.721141 -66.658539 -30.107811 -11.112864 1.628075 -26.403334 -22.699427
1_scores -91.800207 -142.827803 -17.348848 -18.037332 25.121737 36.583034 -0.532866 -34.507518 11.575850 -161.079785 ... 30.393149 26.228487 -35.711558 10.563270 -54.133110 71.437503 22.445477 -9.705253 22.332898 26.533621
2_scores 155.866707 112.772956 -28.882646 -12.207053 269.112806 34.378121 -17.958921 -0.020872 11.181962 -156.887473 ... 20.037990 -30.555842 -4.616144 -4.629360 -77.501573 52.334168 3.893929 -28.350717 23.055552 -45.316784
3_scores 60.353173 -7.015611 -18.430420 -13.188699 45.879802 13.423553 -6.909312 -5.033798 4.884707 -35.414756 ... 14.532806 -0.763232 6.192970 5.708014 -6.913988 41.258296 15.351770 -15.435729 22.549805 -13.521504

4 rows × 2164 columns

Analogously, the stouffer function can be employed to produce an output directly on the AnnData object. Remove ribosomal proteins from the Stouffer-integrated matrix (ribosomal protein genes are among the most highly expressed genes in most cell types).

[28]:
filtered_proteins = NES_integrated.columns[~NES_integrated.columns.str.startswith('RP')].to_list() # Exclude columns starting with 'RP' from integrated matrix
NES_integrated = NES_integrated.loc[:,filtered_proteins] # remove ribosomal proteins from DataFrame

Select the 10 most activated regulatory protein in each cluster.

[29]:
active_proteins = NES_integrated.apply(lambda row: row.nlargest(10).index.tolist(), axis=1)  # the top 10 most activated proteins in each cell type
active_proteins
[29]:
0_scores    [INHBA, MMP14, ID3, LOXL2, HOPX, RUNX2, LEF1, ...
1_scores    [TSHZ2, NFIA, NFIB, ZBTB16, TCF21, NR2F1, ZFP3...
2_scores    [ENO1, NACA, PFDN5, LOXL2, BTF3, HMGA1, CCDC85...
3_scores    [FOSB, NR4A1, ELL2, ID4, REL, JUND, NR4A2, NR4...
dtype: object

Finally, display the top activated proteins from ProtAct_NaRnEA on a heatmap

[30]:
protein_set = active_proteins.sum() # protein set to display on the heatmap
ax = pyviper.pl.heatmap(ProtAct_NaRnEA, var_names=protein_set, groupby="leiden", vcenter=0, cmap="RdBu_r",dendrogram=False,swap_axes=True, show_gene_labels=True)
_images/Tutorial-1_62_0.png

Key takeaways

This Tutorial describes how to inspect and process gene regulatory networks, by manipulating objects of class Interactome, and shows how to generate protein activity profiles from single-cell data using the viper function. Last, we presented key functions for the post-processing and visualization of protein activity data. In Tutorial 2 we will introduce the user to the generation of suitable gene expression signatures for viper and extend the VIPER analysis to simultaneously account for multiple cell pupulations through the exploitation of the metaVIPER paradigm.