pyVIPER (VIPER Analysis in Python for single-cell RNASeq)
This package enables network-based protein activity estimation on Python. It provides also interfaces for scanpy (single-cell RNASeq analysis in Python). Functions are partly transplanted from R package viper and the R package NaRnEA.
The user-friendly documentation is available at: https://alevax.github.io/pyviper/index.html.
Dependencies
scanpyfor single cell pipelinepandasandanndatafor data computing and storage.numpyandscipyfor scientific computation.joblibfor parallel computingtqdmshow progress bar
If you are using a version of scanpy <1.9.3, it is also advisable to downgrade pandas to (>=1.3.0 & <2.0), due to scanpy incompatibility (issue)
Installation
pypi
pip install viper-in-python
local
git clone https://github.com/alevax/pyviper/
cd pyviper
pip install -e .
Usage
import pandas as pd
import anndata
import pyviper
# Load sample data
ges = anndata.read_text("test/unit_tests/test_1/test_1_inputs/LNCaPWT_gExpr_GES.tsv").T
# Load network
network = pyviper.load.msigdb_regulon("h")
# Translate sample data from ensembl to gene names
pyviper.pp.translate(ges, desired_format = "human_symbol")
## Filter targets in the interactome
network.filter_targets(ges.var_names)
# Compute regulon activities
## area
activity = pyviper.viper(gex_data=ges, interactome=network, enrichment="area")
print(activity.to_df())
## narnea
activity = pyviper.viper(gex_data=ges, interactome=network, enrichment="narnea", eset_filter=False)
print(activity.to_df())
Tutorials
Structure and rationale
The main functions available from pyviper are:
pyviper.viper: “pyviper” function for Virtual Inference of Protein Activity by Enriched Regulon Analysis (VIPER). The function allows using 2 enrichment algorithms, aREA and (matrix)-NaRnEA (see below).pyviper.aREA: computes aREA (analytic rank-based enrichment analysis) and meta-aREApyviper.NaRnEA: computes matrix-NaRnEA, a vectorized, implementation of NaRnEApyviper.pp.translate: for translating between species (i.e. mouse vs human) and between ensembl, entrez and gene symbols.pyviper.tl.path_enr: computes pathway enrichment
Other notable functions include:
pyviper.tl.OncoMatch: computes OncoMatch, an algorithm to assess the activity conservation of MR proteins between two sets of samples (e.g. validate GEMMs as effective models of human samples)pyviper.pp.stouffer: computes signatures on a cluster-by-cluster basis using Cluster integration method for pathway enrichmentpyviper.pp.viper_similarity: computes the similarity between VIPER signaturespyviper.pp.repr_metacells: compute representative metacells (e.g. for ARACNe) using our method to maximize unique sample usage and minimize resampling (users can specify depth, percent data usage, etc).pyviper.pp.repr_subsample: select a representative subsample of data using our method to ensure a widely distributed sampling.
Additionally, the following submodules are available:
pyviper.load: submodule containing several utility functions useful for different analyses, includingload_msigdb_regulon,load_TFsetcpyviper.pl: submodule containing pyviper-wrappers forscanpyplottingpyviper.tl: submodule containing pyviper-wrappers forscanpydata transformationpyviper.config: submodule allowing users to specify current species and filepaths for regulators
Last, a new Interactome class allows users to load and interrogate ARACNe- and SCENIC-inferred gene regulatory networks.
Contact
Please, report any issues that you experience through this repository “Issues”.
For any other info or queries please write to Alessandro Vasciaveo (av2729@cumc.columbia.edu)
Citation
If you used pyVIPER in your publication, please cite our work here:
Wang, A.L.E., Lin, Z., Zanella, L., Vlahos, L., Girotto, M.A., Zafar, A., … & Vasciaveo, A. (2024). pyVIPER: A fast and scalable Python package for rank-based enrichment analysis of single-cell RNASeq data. bioRxiv, 2024-08. doi: https://doi.org/10.1101/2024.08.25.609585.
Manuscript in review
Contents
Tutorials
- Tutorial 1 - Analyzing scRNA-seq data at the Protein Activity Level
- Install PyVIPER
- Import modules
- Step 1. Load a gene expression signature for single-cells
- Step 2. Load an inspect a lineage-specific gene regulatory network
- Step 3. Convert the gene expression signature into a protein activity matrix using VIPER
- Step 4. Analyze single-cells at the Protein Activity level
- Key takeaways
- Tutorial 2 - Inferring Protein Activity from scRNA-seq data from multiple cell populations with the meta-VIPER approach
- Install PyVIPER
- Import modules
- Step 1. Load a gene expression matrix and associated metadata
- Step 2. Preprocess and generate a gene expression signature at the single-cell level
- Step 3. Load multiple ARACNe-inferred gene regulatory networks
- Step 4. Analyze single-cells at the Protein Activity level
- Pathway enrichment analysis
- Key takeaways
- Tutorial 3 - Generating metacells for reverse-engineering of ARACNe gene regulatory networks