pyVIPER (VIPER Analysis in Python for single-cell RNASeq)
This package enables network-based protein activity estimation on Python. It provides also interfaces for scanpy (single-cell RNASeq analysis in Python). Functions are partly transplanted from R package viper and the R package NaRnEA.
Dependencies
scanpy
for single cell pipelinepandas
andanndata
for data computing and storage.numpy
andscipy
for scientific computation.joblib
for parallel computingtqdm
show progress bar
If you are using a version of scanpy
<1.9.3, it is also advisable to downgrade pandas
to (>=1.3.0 & <2.0), due to scanpy
incompatibility (issue)
Installation
pypi
pip install viper-in-python
local
git clone https://github.com/alevax/pyviper/
cd pyviper
pip install -e .
Usage
import pandas as pd
import anndata
import pyviper
# Load sample data
ges = anndata.read_text("test/unit_tests/test_1/test_1_inputs/LNCaPWT_gExpr_GES.tsv").T
# Load network
network = pyviper.load.msigdb_regulon("h")
# Translate sample data from ensembl to gene names
pyviper.pp.translate(ges, desired_format = "human_symbol")
## Filter targets in the interactome
network.filter_targets(ges.var_names)
# Compute regulon activities
## area
activity = pyviper.viper(gex_data=ges, interactome=network, enrichment="area")
print(activity.to_df())
## narnea
activity = pyviper.viper(gex_data=ges, interactome=network, enrichment="narnea", eset_filter=False)
print(activity.to_df())
Tutorials
Structure and rationale
The main functions available from pyviper
are:
pyviper.viper
: “pyviper” function for Virtual Inference of Protein Activity by Enriched Regulon Analysis (VIPER). The function allows using 2 enrichment algorithms, aREA and (matrix)-NaRnEA (see below).pyviper.aREA
: computes aREA (analytic rank-based enrichment analysis) and meta-aREApyviper.NaRnEA
: computes matrix-NaRnEA, a vectorized, implementation of NaRnEApyviper.pp.translate
: for translating between species (i.e. mouse vs human) and between ensembl, entrez and gene symbols.pyviper.tl.path_enr
: computes pathway enrichment
Other notable functions include:
pyviper.tl.OncoMatch
: computes OncoMatch, an algorithm to assess the activity conservation of MR proteins between two sets of samples (e.g. validate GEMMs as effective models of human samples)pyviper.pp.stouffer
: computes signatures on a cluster-by-cluster basis using Cluster integration method for pathway enrichmentpyviper.pp.viper_similarity
: computes the similarity between VIPER signaturespyviper.pp.repr_metacells
: compute representative metacells (e.g. for ARACNe) using our method to maximize unique sample usage and minimize resampling (users can specify depth, percent data usage, etc).pyviper.pp.repr_subsample
: select a representative subsample of data using our method to ensure a widely distributed sampling.
Additionally, the following submodules are available:
pyviper.load
: submodule containing several utility functions useful for different analyses, includingload_msigdb_regulon
,load_TFs
etcpyviper.pl
: submodule containing pyviper-wrappers forscanpy
plottingpyviper.tl
: submodule containing pyviper-wrappers forscanpy
data transformationpyviper.config
: submodule allowing users to specify current species and filepaths for regulators
Last, a new Interactome
class allows users to load and interrogate ARACNe- and SCENIC-inferred gene regulatory networks.
Contact
Please, report any issues that you experience through this repository “Issues”.
For any other info or queries please write to Alessandro Vasciaveo (av2729@cumc.columbia.edu)
License
pyviper
is distributed under a MIT License (see LICENSE).
Citation
Manuscript in review
- Tutorial 1 - Analyzing scRNA-seq data at the Protein Activity Level
- Install PyVIPER
- Import modules
- Step 1. Load a gene expression signature for single-cells
- Step 2. Load an inspect a lineage-specific gene regulatory network
- Step 3. Convert the gene expression signature into a protein activity matrix using VIPER
- Step 4. Analyze single-cells at the Protein Activity level
- Key takeaways
- Tutorial 2 - Inferring Protein Activity from scRNA-seq data from multiple cell populations with the meta-VIPER approach
- Install PyVIPER
- Import modules
- Step 1. Load a gene expression matrix and associated metadata
- Step 2. Preprocess and generate a gene expression signature at the single-cell level
- Step 3. Load multiple ARACNe-inferred gene regulatory networks
- Step 4. Analyze single-cells at the Protein Activity level
- Pathway enrichment analysis
- Key takeaways
- Tutorial 3 - Generating metacells for reverse-engineering of ARACNe gene regulatory networks