image pyVIPER (VIPER Analysis in Python for single-cell RNASeq)

PyPI License: MIT Downloads

This package enables network-based protein activity estimation on Python. It provides also interfaces for scanpy (single-cell RNASeq analysis in Python). Functions are partly transplanted from R package viper and the R package NaRnEA.


Dependencies

  • scanpy for single cell pipeline

  • pandas and anndata for data computing and storage.

  • numpy and scipy for scientific computation.

  • joblib for parallel computing

  • tqdm show progress bar

If you are using a version of scanpy <1.9.3, it is also advisable to downgrade pandas to (>=1.3.0 & <2.0), due to scanpy incompatibility (issue)

Installation

pypi

pip install viper-in-python

local

git clone https://github.com/alevax/pyviper/
cd pyviper
pip install -e .

Usage

import pandas as pd
import anndata
import pyviper

# Load sample data
ges = anndata.read_text("test/unit_tests/test_1/test_1_inputs/LNCaPWT_gExpr_GES.tsv").T

# Load network
network = pyviper.load.msigdb_regulon("h")

# Translate sample data from ensembl to gene names
pyviper.pp.translate(ges, desired_format = "human_symbol")

## Filter targets in the interactome
network.filter_targets(ges.var_names)

# Compute regulon activities
## area
activity = pyviper.viper(gex_data=ges, interactome=network, enrichment="area")
print(activity.to_df())

## narnea
activity = pyviper.viper(gex_data=ges, interactome=network, enrichment="narnea", eset_filter=False)
print(activity.to_df())

Tutorials

  1. Analyzing scRNA-seq data at the Protein Activity Level

  2. Inferring Protein Activity from scRNA-seq data from multiple cell populations with the meta-VIPER approach

  3. Generating Metacells for ARACNe3 network generation and VIPER protein activity analysis

Structure and rationale

The main functions available from pyviper are:

  • pyviper.viper: “pyviper” function for Virtual Inference of Protein Activity by Enriched Regulon Analysis (VIPER). The function allows using 2 enrichment algorithms, aREA and (matrix)-NaRnEA (see below).

  • pyviper.aREA: computes aREA (analytic rank-based enrichment analysis) and meta-aREA

  • pyviper.NaRnEA: computes matrix-NaRnEA, a vectorized, implementation of NaRnEA

  • pyviper.pp.translate: for translating between species (i.e. mouse vs human) and between ensembl, entrez and gene symbols.

  • pyviper.tl.path_enr: computes pathway enrichment

Other notable functions include:

  • pyviper.tl.OncoMatch: computes OncoMatch, an algorithm to assess the activity conservation of MR proteins between two sets of samples (e.g. validate GEMMs as effective models of human samples)

  • pyviper.pp.stouffer: computes signatures on a cluster-by-cluster basis using Cluster integration method for pathway enrichment

  • pyviper.pp.viper_similarity: computes the similarity between VIPER signatures

  • pyviper.pp.repr_metacells: compute representative metacells (e.g. for ARACNe) using our method to maximize unique sample usage and minimize resampling (users can specify depth, percent data usage, etc).

  • pyviper.pp.repr_subsample: select a representative subsample of data using our method to ensure a widely distributed sampling.

Additionally, the following submodules are available:

  • pyviper.load: submodule containing several utility functions useful for different analyses, including load_msigdb_regulon, load_TFs etc

  • pyviper.pl: submodule containing pyviper-wrappers for scanpy plotting

  • pyviper.tl: submodule containing pyviper-wrappers for scanpy data transformation

  • pyviper.config: submodule allowing users to specify current species and filepaths for regulators

Last, a new Interactome class allows users to load and interrogate ARACNe- and SCENIC-inferred gene regulatory networks.

Contact

Please, report any issues that you experience through this repository “Issues”.

For any other info or queries please write to Alessandro Vasciaveo (av2729@cumc.columbia.edu)

License

pyviper is distributed under a MIT License (see LICENSE).

Citation

Manuscript in review