pyviper package

pyviper.aREA

class pyviper.aREA(gex_data, interactome, layer=None, eset_filter=False, min_targets=30, mvws=1, verbose=True)

Allows the individual to infer normalized enrichment scores from gene expression data using the Analytical Ranked Enrichment Analysis (aREA)[1] function.

It is the original basis of the VIPER (Virtual Inference of Protein-activity by Enriched Regulon analysis) algorithm.

The Interactome object must not contain any targets that are not in the features of gex_data. This can be accomplished by running:

interactome.filter_targets(gex_data.var_names)

It is highly recommended to do this on the unPruned network and then prune to ensure the pruned network contains a consistent number of targets per regulator, all of which exist within gex_data. A consistent number of targets allows regulators to have NES scores that are comparable to one another. A regulator that has more targets than others will have “boosted” NES scores, such that they cannot be compared to those with fewer targets.

Parameters

gex_data – Gene expression stored in an anndata object (e.g. from Scanpy) or in a pd.DataFrame.
interactome – An object of class Interactome.
layer (default: None) – The layer in the anndata object to use as the gene expression input.
eset_filter (default: False) – Whether to filter out genes not present in the interactome (True) or to keep this biological context (False). This will affect gene rankings.
min_targets (default: 30) – The minimum number of targets that each regulator in the interactome should contain. Regulators that contain fewer targets than this minimum will be culled from the network (via the Interactome.cull method). The reason users may choose to use this threshold is because adequate targets are needed to accurately predict enrichment.
mvws (default: 1) – (A) Number indicating either the exponent score for the metaViper weights. These are only applicable when enrichment = ‘area’ and are not used when enrichment = ‘narnea’. Roughly, a lower number (e.g. 1) results in networks being treated as a consensus network (useful for multiple networks of the same celltype with the same epigenetics), while a higher number (e.g. 10) results in networks being treated as separate (useful for multiple networks of different celltypes with different epigenetics). (B) The name of a column in gex_data that contains the manual assignments of samples to networks using list position or network names. (C) “auto”: assign samples to networks based on how well each network allows for sample enrichment.
verbose (default: True) – Whether extended output about the progress of the algorithm should be given.

Return type

A dataframe of DataFrame containing NES values.

References

[1] Alvarez, M. J., Shen, Y., Giorgi, F. M., Lachmann, A., Ding, B. B., Ye, B. H., & Califano, A. (2016). Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nature genetics, 48(8), 838-847.

pyviper.NaRnEA

class pyviper.NaRnEA(gex_data, interactome, layer=None, eset_filter=False, min_targets=30, verbose=True)

Allows the individual to infer normalized enrichment scores and proportional enrichment scores from gene expression data using the Nonparametric Analytical Rank-based Enrichment Analysis (NaRnEA)[1] function.

NaRnEA is an updated basis for the VIPER (Virtual Inference of Protein-activity by Enriched Regulon analysis) algorithm.

The Interactome object must not contain any targets that are not in the features of gex_data. This can be accomplished by running:

interactome.filter_targets(gex_data.var_names)

It is highly recommend to do this on the unPruned network and then prune to ensure the pruned network contains a consistent number of targets per regulator, all of which exist within gex_data. A regulator that has more targets than others will have “boosted” NES scores, such that they cannot be compared to those with fewer targets.

Parameters

gex_data – Gene expression stored in an anndata object (e.g. from Scanpy) or in a pd.DataFrame.
interactome – An object of class Interactome.
layer (default: None) – The layer in the anndata object to use as the gene expression input.
eset_filter (default: False) – Whether to filter out genes not present in the interactome (True) or to keep this biological context (False). This will affect gene rankings.
min_targets (default: 30) – The minimum number of targets that each regulator in the interactome should contain. Regulators that contain fewer targets than this minimum will be culled from the network (via the Interactome.cull method). The reason users may choose to use this threshold is because adequate targets are needed to accurately predict enrichment.
verbose (default: True) – Whether extended output about the progress of the algorithm should be given.

Returns

A dictionary containing :class:`~numpy.ndarray` containing NES values (key

Return type

‘nes’) and PES values (key: ‘pes’).

References

[1] Griffin, A. T., Vlahos, L. J., Chiuzan, C., & Califano, A. (2023). NaRnEA: An Information Theoretic Framework for Gene Set Analysis. Entropy, 25(3), 542.

pyviper.config

pyviper.config.set_regulators_filepath(group, species, new_filepath)

Allows the user to use a custom list of regulatory proteins instead of the default ones within pyVIPER’s data folder.

Parameters

group – A group of regulatory proteins of either: “tfs”, “cotfs”, “sig” or “surf”.
species – The species to which the group of proteins belongs to: “human” or “mouse”.
new_filepath – The new filepath that should be used to retrieve these sets of proteins.

Return type

None

pyviper.config.set_regulators_species_to_use(species)

Allows the user to specify which species they are currently studying, so the correct sets of regulatory proteins will be used during analysis.

Parameters: species – The species to which the group of proteins belongs to: “human” or “mouse”.
Return type: None

pyviper.config.set_regulators_filepath(group, species, new_filepath)

Allows the user to use a custom list of regulatory proteins instead of the default ones within pyVIPER’s data folder.

Parameters

group – A group of regulatory proteins of either: “tfs”, “cotfs”, “sig” or “surf”.
species – The species to which the group of proteins belongs to: “human” or “mouse”.
new_filepath – The new filepath that should be used to retrieve these sets of proteins.

Return type

None

pyviper.config.set_regulators_species_to_use(species)

Allows the user to specify which species they are currently studying, so the correct sets of regulatory proteins will be used during analysis.

Parameters: species – The species to which the group of proteins belongs to: “human” or “mouse”.
Return type: None

pyviper.Interactome

class pyviper.Interactome(name, net_table=None, input_type=None)

Bases: object

Create an Interactome object to contain the results of ARACNe. This object describes the relationship between regulator proteins (e.g. TFs and CoTFs) and their downstream target genes with mor (Mode Of Regulation, e.g. spearman correlation) indicating directionality and likelihood (e.g. mutual information) indicating weight of association. An Interactome object can be given to pyviper.viper along with a gene expression signature to generate a protein activity matrix with the VIPER (Virtual Inference of Protein-activity by Enriched Regulon analysis) algorithm[1].

Parameters

name – A filepath to one’s disk to store the Interactome.
net_table (default: None) –
Either (1) a pd.DataFrame containing four columns in this order:

”regulator”, “target”, “mor”, “likelihood”

(2) a filepath to this pd.DataFrame stored either as a .csv, .tsv or .pkl. (3) a filepath to an Interacome object stored as a .pkl.
input_type (default: None) – Only relevant when net_table is a filepath. If None, the input_type will be inferred from the net_table. Otherwise, specify “csv”, “tsv” or “pkl”.

References

[1] Alvarez, M. J., Shen, Y., Giorgi, F. M., Lachmann, A., Ding, B. B., Ye, B. H., & Califano, A. (2016). Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nature genetics, 48(8), 838-847.

copy()

Create a copy of this Interactome object.

Return type: An object of Interactome.

filter_regulators(regulators_keep=None, regulators_remove=None, verbose=True)

Filter regulators by choosing by name or by group which ones you intend to keep and which ones you intend to remove from this Interactome.

Note that the names of regulator that belong to the groups “tfs”, “cotfs”, “sig” and “surf” will be sourced via the paths specified in pyviper.config. To update these paths, use the pyviper.config.set_regulators_filepath function.

Parameters

regulators_keep (default: None) – This should be either: (1) An array or list containing the names of specific regulators you wish to keep in the network. When left as None, this parameter is not used to filter. (2) An array or list containing a group or groups of regulators that you wish to keep in the network. These groups should be one of the following: “tfs”, “cotfs”, “sig”, “surf”.
regulators_remove (default: None) – This should be either: (1) An array or list containing the names of specific regulators you wish to remove from the network. When left as None, this parameter is not used to filter. (2) An array or list containing a group or groups of regulators that you wish to remove from the network. These groups should be one of the following: “tfs”, “cotfs”, “sig”, “surf”.
verbose (default: True) – Report the number of regulators removed during filtering

filter_targets(targets_keep=None, targets_remove=None, verbose=True)

Filter targets by choosing by name which ones you intend to keep and which ones you intend remove from this Interactome.

When working with an anndata object or a gene expression array, it is highly recommended to filter the unPruned network before pruning. This is to ensure the pruned network contains a consistent number of targets per regulator regulator, all of which exist within gex_data. A regulator that has more targets than others will have “boosted” NES scores, such that they cannot be compared to those with fewer targets. For example, with an anndata object named gex_data, one may is suggested to do:

interactome.filter_targets(gex_data.var_names)

Parameters

targets_keep (default: None) – An array containing the names of targets you wish to keep in the network. When left as None, this parameter is not used to filter.
targets_remove (default: None) – An array containing the names of targets you wish to remove from the network. When left as None, this parameter is not used to filter.
verbose (default: True) – Report the number of targets removed during filtering

get_reg(regName)

Get the rows of the net_table where the regulator is regName.

Parameters: regName – The name of a regulator in this Interactome.
Return type: A dataframe of DataFrame.

get_reg_names()

Get an array of all unique regulators in this Interactome.

Return type: An array of strings of ndarray.

get_regulon_from_loom(file_path)

get_target_names()

Get a set of the unique targets in this Interactome

Return type: A 1D NumPy array.

ic_mat()

Get the DataFrame of all the likelihood values. Targets are in the rows, while Regulators are in the columns.

Return type: A dataframe of DataFrame.

icp_vec()

Get the vector containing the proportion of the “Interaction Confidence” (IC) score for each interaction in a network, relative to the maximum IC score in the network. This vector is generated by taking each individual regulon in the newtork and calculating the likelihood index proportion to all interactions.

Return type: An array of ndarray.

integrate(network_list, network_weights=None, normalize_likelihoods=False)

Integrate this Interactome object with one or more other Interacome objects to create a consensus network. In general, this should be done when interactome objects have the same epigenetics (e.g. due to being made from different datasets of same celltype). MetaVIPER should be used instead when you have multiple interactomes with different epigenetics (e.g. due to being made of data with different celltypes).

Parameters

network_list – A single object or a list of objects of class Interactome.
network_weights (default: None) – An array containing weights for each network being integrated. The first weight corresponds to this network, while the others correspond to those in the network list in order. If None, equal weights are used.
normalize_likelihoods (default: False) – An extra operation that can performed after the integration operation where within each regulator, likelihood values are ranked and scaled from 0 to 1.

mor_mat()

Get the DataFrame of all the correlation values. Targets are in the rows, while Regulators are in the columns.

Return type: A dataframe of DataFrame.

prune(max_targets=50, min_targets=None, eliminate=True, verbose=True)

Prune the Interactome by eliminating extra targets from regulators and, with eliminate = True, remove regulators with too few targets from the network. Note that by ensuring the pruned networks contains the same number of targets for each regulator, NES scores are comparable. If one regulator has more targest than another, than its NES score will be “boosted” and they cannot be compared against each other.

Parameters

max_targets (default: 50) – The maximum number of targets that each regulon is allowed.
min_targets (default: None) – The minimum number of targets that each regulon is required.
eliminate (default: True) – If eliminate = True, then any regulators with fewer targets than max_targets will be removed from the network. In other words, after pruning, all regulators will have exactly max_targets number of targets. This essentially sets min_targets equal to max_targets and ensures all NES scores are comparable with aREA.
verbose (default: True) – Report the number of targets and regulators removed during pruning

save(file_path, output_type=None)

Save the Interactome object to one’s disk. If saved as “csv” or “tsv”, just the interactome.net_table will be saved. If saved as “pkl”, the whole interactome object will be saved.

Parameters

file_path – A filepath to one’s disk to store the Interactome.
output_type (default: None) – If None, the output_type will be inferred from the file_path. Otherwise, specify “csv”, “tsv” or “pkl”.

Return type

None

size()

Get the the number of regulators in this Interactome.

Return type: An int

translate_regulators(desired_format, verbose=True)

Translate the regulators of the Interactome. The current name format of the regulators should be one of the following:

mouse_symbol, mouse_ensembl, mouse_entrez, human_symbol, human_ensembl or human_entrez

Parameters

desired_format – Desired format can be one of four strings: “mouse_symbol”, “mouse_ensembl”, “mouse_entrez”, “human_symbol”, “human_ensembl” or “human_entrez”.
verbose (default: True) – Report the number of regulators successfully and unsucessfully translated

translate_targets(desired_format, verbose=True)

Translate the targets of the Interactome. The current name format of the targets should be one of the following:

mouse_symbol, mouse_ensembl, mouse_entrez, human_symbol, human_ensembl or human_entrez

It is recommended to do this before pruning to ensure a consistent number of targets because if targets do not have a translation, they will be deleted, resulting in different numbers of targets in a pruned interactome that once had consistent number of targets.

Parameters

desired_format – Desired format can be one of four strings: “mouse_symbol”, “mouse_ensembl”, “mouse_entrez”, “human_symbol”, “human_ensembl” or “human_entrez”.
verbose (default: True) – Report the number of targets successfully and unsucessfully translated

pyviper.load

pyviper.load.TFs(species=None, path_to_tfs=None)

Retrieves a list of transcription factors (TFs).

Parameters

species (default: None) – When left as None, the species setting in pyviper.config will be used. Otherwise, manually specify “human” or “mouse”.
path_to_tfs (default: None) – When left as None, the path to TFs setting in pyviper.config will be used. Otherwise, manually specify a filepath to a .txt file containing TFs, one on each line.

Return type

A list containing transcription factors.

pyviper.load.coTFs(species=None, path_to_cotfs=None)

Retrieves a list of co-transcription factors (coTFs).

Parameters

species (default: None) – When left as None, the species setting in pyviper.config will be used. Otherwise, manually specify “human” or “mouse”.
path_to_cotfs (default: None) – When left as None, the path to coTFs setting in pyviper.config will be used. Otherwise, manually specify a filepath to a .txt file containing coTFs, one on each line.

Return type

A list containing co-transcription factors.

pyviper.load.human2mouse()

Retrieves the human to mouse translation pd.DataFrame from pyVIPER’s data folder. This dataframe contains six columns: human_symbol, mouse_symbol, human_ensembl, mouse_ensembl, human_entrez, mouse_entrez

Return type: A dataframe of DataFrame.

pyviper.load.msigdb_regulon(collection)

Retrieves an object or a list of objects of class Interactome from pyviper’s data folder containing a set of pathways from the Molecular Signatures Database (MSigDB), downloaded from https://www.gsea-msigdb.org/gsea/msigdb. These collections can be from one of the following:

‘h’ for Hallmark gene sets. Coherently expressed signatures derived by aggregating many MSigDB gene sets to represent well-defined biological states or processes. ‘c2’ for curated gene sets. From online pathway databases, publications in PubMed, and knowledge of domain experts. ‘c5’ for ontology gene sets. Consists of genes annotated by the same ontology term. ‘c6’ for oncogenic signature gene sets. Defined directly from microarray gene expression data from cancer gene perturbations. ‘c7’ for immunologic signature gene sets. Represents cell states and perturbations within the immune system.

Parameters: collection – A individual string or a list of strings containing the following: [“h”, “c2”, “c5”, “c6”, “c7”] corresponding to the collections above.
Return type: An individual object or list of objects of class pyviper.interactome.Interactome.

pyviper.load.sig(species=None, path_to_sig=None)

Retrieves a list of signalling proteins (sig).

Parameters

species (default: None) – When left as None, the species setting in pyviper.config will be used. Otherwise, manually specify “human” or “mouse”.
path_to_sig (default: None) – When left as None, the path to sig setting in pyviper.config will be used. Otherwise, manually specify a filepath to a .txt file containing signaling proteins, one on each line.

Return type

A list containing signaling proteins.

pyviper.load.surf(species=None, path_to_surf=None)

Retrieves a list of surface proteins (surf).

Parameters

species (default: None) – When left as None, the species setting in pyviper.config will be used. Otherwise, manually specify “human” or “mouse”.
path_to_sig (default: None) – When left as None, the path to surf setting in pyviper.config will be used. Otherwise, manually specify a filepath to a .txt file containing surface proteins, one on each line.

Return type

A list containing signaling proteins.

pyviper.pl

pyviper.pl.__get_stored_uns_data_and_prep_to_plot(adata, uns_data_slot, obsm_slot=None, uns_slot=None)

pyviper.pl.pca(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.pca.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’] on adata.obsm[‘X_pca’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’] on adata.obsm[‘X_pca’].
**kwargs – Arguments to provide to the sc.pl.pca function.

Return type

A plot of Axes.

pyviper.pl.umap(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.umap.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’] on adata.obsm[‘X_umap’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’] on adata.obsm[‘X_umap’].
**kwargs – Arguments to provide to the sc.pl.umap function.

Return type

A plot of Axes.

pyviper.pl.tsne(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.tsne.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’] on adata.obsm[‘X_tsne’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’] on adata.obsm[‘X_tsne’].
**kwargs – Arguments to provide to the sc.pl.tsne function.

Return type

A plot of Axes.

pyviper.pl.diffmap(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.diffmap.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’] on adata.obsm[‘X_diffmap’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’] on adata.obsm[‘X_diffmap’].
**kwargs – Arguments to provide to the sc.pl.diffmap function.

Return type

A plot of Axes.

pyviper.pl.draw_graph(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.draw_graph.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’] on adata.obsm[‘X_draw_graph_fa’] or adata.obsm[‘X_draw_graph_fr’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’] on adata.obsm[‘X_draw_graph_fa’] or adata.obsm[‘X_draw_graph_fr’].
**kwargs – Arguments to provide to the sc.pl.draw_graph function.

Return type

A plot of Axes.

pyviper.pl.spatial(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.spatial.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’] on adata.uns[‘spatial’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’] on adata.uns[‘spatial’].
**kwargs – Arguments to provide to the sc.pl.spatial function.

Return type

A plot of Axes.

pyviper.pl.embedding(adata, *, basis, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.embedding.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
basis – The name of the represenation in adata.obsm that should be used for plotting.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’] on adata.obsm[basis].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’] on adata.obsm[basis].
**kwargs – Arguments to provide to the sc.pl.embedding function.

Return type

A plot of Axes.

pyviper.pl.embedding_density(adata, *, basis='umap', plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.embedding_density.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
basis (default: 'umap') – The name of the represenation in adata.obsm that should be used for plotting.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’] on adata.obsm[basis].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’] on adata.obsm[basis].
**kwargs – Arguments to provide to the sc.pl.embedding_density function.

Return type

A plot of Axes.

pyviper.pl.heatmap(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.heatmap.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’].
**kwargs – Arguments to provide to the sc.pl.heatmap function.

Return type

A plot of Axes.

pyviper.pl.dotplot(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.dotplot.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’].
**kwargs – Arguments to provide to the sc.pl.dotplot function.

Return type

A plot of Axes.

pyviper.pl.tracksplot(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.tracksplot.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’].
**kwargs – Arguments to provide to the sc.pl.tracksplot function.

Return type

A plot of Axes.

pyviper.pl.violin(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.violin.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’].
**kwargs – Arguments to provide to the sc.pl.violin function.

Return type

A plot of Axes.

pyviper.pl.stacked_violin(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.stacked_violin.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’].
**kwargs – Arguments to provide to the sc.pl.stacked_violin function.

Return type

A plot of Axes.

pyviper.pl.matrixplot(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.matrixplot.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’].
**kwargs – Arguments to provide to the sc.pl.matrixplot function.

Return type

A plot of Axes.

pyviper.pl.clustermap(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.clustermap.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’].
**kwargs – Arguments to provide to the sc.pl.clustermap function.

Return type

A plot of Axes.

pyviper.pl.ranking(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.ranking.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’].
**kwargs – Arguments to provide to the sc.pl.ranking function.

Return type

A plot of Axes.

pyviper.pl.dendrogram(adata, *, plot_stored_gex_data=False, plot_stored_pax_data=False, **kwargs)

A wrapper for the scanpy function sc.pl.dendrogram.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
plot_stored_gex_data (default: False) – Plot adata.uns[‘gex_data’].
plot_stored_pax_data (default: False) – Plot adata.uns[‘pax_data’].
**kwargs – Arguments to provide to the sc.pl.dendrogram function.

Return type

A plot of Axes.

pyviper.pp

pyviper.pp.rank_norm(adata, NUM_FUN=<function _median>, DEM_FUN=<function _mad_from_R>, layer=None, key_added=None, copy=False)

Compute a double rank normalization on an anndata, np.array, or pd.DataFrame.

Parameters

adata – Data stored in an anndata object, np.array or pd.DataFrame.
NUM_FUN (default: np.median) – The first function to be applied across each column.
DEM_FUN (default: _mad_from_R) – The second function to be applied across each column.
layer (default: None) – For an anndata input, the layer to use. When None, the input layer is anndata.X.
key_added (default: None) – For an anndata input, the name of the layer where to store. When None, this is anndata.X.
copy (default: False) – Whether to return a rank-transformed copy (True) or to instead transform the original input (False).

Returns

When copy = False, saves the input data as a double rank transformed version.
When copy = True, return a double rank transformed version of the input data.

pyviper.pp.stouffer(adata, obs_column_name=None, layer=None, filter_by_feature_groups=None, key_added='stouffer', compute_pvals=True, null_iters=1000, verbose=True, return_as_df=False, copy=False)

Compute a stouffer signature on each of your clusters in an anndata object.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object, or a pandas dataframe containing input data.
obs_column_name – The name of the column of observations in adata to use as clusters, or a cluster vector corresponding to observations.
layer (default: None) – The layer to use as input data to compute the signatures.
filter_by_feature_groups (default: None) – The selected regulators, such that all other regulators are filtered out from input data. If None, all regulators will be included. Regulator sets must be from one of the following: “tfs”, “cotfs”, “sig”, “surf”.
key_added (default: 'stouffer') – The slot in adata.uns to store the stouffer signatures.
compute_pvals (default: True) – Whether to compute a p-value for each score to return in the results.
null_iters (default: 1000) – The number of iterations to use to compute a null model to assess the p-values of each of the stouffer scores.
verbose (default: True) – Whether to provide additional output during the execution of the function.
return_as_df (default: False) – If True, returns the stouffer signature in a pd.DataFrame. If False, stores it in adata.var[key_added].
copy (default: False) – Determines whether a copy of the input AnnData is returned.

Return type

When return_as_df is False, adds the cluster stouffer signatures to adata.var[key_added]. When return_as_df is True, returns as pd.DataFrame.

pyviper.pp.mwu(adata, obs_column_name=None, layer=None, filter_by_feature_groups=None, key_added='mwu', compute_pvals=True, verbose=True, return_as_df=False, copy=False)

Compute a Mann-Whitney U-Test signature on each of your clusters in an anndata object.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object, or a pandas dataframe containing input data.
obs_column_name – The name of the column of observations in adata to use as clusters, or a cluster vector corresponding to observations.
layer (default: None) – The layer to use as input data to compute the signatures.
filter_by_feature_groups (default: None) – The selected regulators, such that all other regulators are filtered out from input data. If None, all regulators will be included. Regulator sets must be from one of the following: “tfs”, “cotfs”, “sig”, “surf”.
key_added (default: 'mwu') – The slot in adata.uns to store the MWU signatures.
compute_pvals (default: True) – Whether to compute a p-value for each score to return in the results.
verbose (default: True) – Whether to provide additional output during the execution of the function.
return_as_df (default: False) – If True, returns the MWU signature in a pd.DataFrame. If False, stores it in adata.var[key_added].
copy (default: False) – Determines whether a copy of the input AnnData is returned.

Return type

When return_as_df is False, adds the cluster MWU signatures to adata.var[key_added]. When return_as_df is True, returns as pd.DataFrame.

pyviper.pp.spearman(adata, pca_slot='X_pca', obs_column_name=None, layer=None, filter_by_feature_groups=None, key_added='stouffer', compute_pvals=True, null_iters=1000, verbose=True, return_as_df=False, copy=False)

Compute spearman correlation between each gene product and the cluster centroids along with the statistical significance for each of your clusters in an anndata object.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object, or a pandas dataframe containing input data.
pca_slot – The slot in adata.obsm where a PCA is stored.
obs_column_name – The name of the column of observations in adata to use as clusters, or a cluster vector corresponding to observations.
layer (default: None) – The layer to use as input data to compute the correlation.
filter_by_feature_groups (default: None) – The selected regulators, such that all other regulators are filtered out from input data. If None, all regulators will be included. Regulator sets must be from one of the following: “tfs”, “cotfs”, “sig”, “surf”.
key_added (default: 'spearman') – The slot in adata.uns to store the spearman correlation.
compute_pvals (default: True) – Whether to compute a p-value for each score to return in the results.
null_iters (default: 1000) – The number of iterations to use to compute a null model to assess the p-values of each of the spearman scores.
verbose (default: True) – Whether to provide additional output during the execution of the function.
return_as_df (default: False) – If True, returns the spearman signature in a pd.DataFrame. If False, stores it in adata.var[key_added].
copy (default: False) – Determines whether a copy of the input AnnData is returned.

Return type

When return_as_df is False, adds the cluster spearman correlation to adata.var[key_added]. When return_as_df is True, returns as pd.DataFrame.

pyviper.pp.viper_similarity(adata, nn=None, ws=[4, 2], alternative=['two-sided', 'greater', 'less'], layer=None, filter_by_feature_groups=None, key_added='viper_similarity', copy=False)

Compute the similarity between the columns of a VIPER-predicted activity or gene expression matrix. While following the same concept as the two-tail Gene Set Enrichment Analysis (GSEA)[1], it is based on the aREA algorithm[2].

If ws is a single number, weighting is performed using an exponential function. If ws is a 2 numbers vector, weighting is performed with a symmetric sigmoid function using the first element as inflection point and the second as trend.

Parameters

adata – An anndata.AnnData containing protein activity (NES), where rows are observations/samples (e.g. cells or groups) and columns are features (e.g. proteins or pathways).
nn (default: None) – Optional number of top regulators to consider for computing the similarity
ws (default: [4, 2]) – Number indicating the weighting exponent for the signature, or vector of 2 numbers indicating the inflection point and the value corresponding to a weighting score of .1 for a sigmoid transformation, only used if nn is ommited.
alternative (default: 'two-sided') – Character string indicating whether the most active (greater), less active (less) or both tails (two.sided) of the signature should be used for computing the similarity.
layer (default: None) – The layer to use as input data to compute the signatures.
filter_by_feature_groups (default: None) – The selected regulators, such that all other regulators are filtered out from the input data. If None, all regulators will be included. Regulator sets must be from one of the following: “tfs”, “cotfs”, “sig”, “surf”.
key_added (default: "viper_similarity") – The name of the slot in the adata.obsp to store the output.
copy (default: False) – Determines whether a copy of the input AnnData is returned.

Return type

Saves a signature-based distance numpy.ndarray in adata.obsp[key_added].

References

[1] Julio, M. K. -d. et al. Regulation of extra-embryonic endoderm stem cell differentiation by Nodal and Cripto signaling. Development 138, 3885-3895 (2011).

[2] Alvarez, M. J., Shen, Y., Giorgi, F. M., Lachmann, A., Ding, B. B., Ye, B. H., & Califano, A. (2016). Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nature genetics, 48(8), 838-847.

pyviper.pp.aracne3_to_regulon(net_file, net_df=None, anno=None, MI_thres=0, regul_size=50, normalize_MI_per_regulon=True)

Process an output from ARACNe3 to return a pd.DataFrame describing a gene regulatory network with suitable columns for conversion to an object of the Interactome class.

Parameters

net_file – A string containing the path to the ARACNe3 output
net_df (default: None) – Whether to passt a pd.DataFrame instead of the path
anno (default: None) – Gene ID annotation
MI_thres (default: 0) – Threshold on Mutual Information (MI) to select the regulators and target pairs
regul_size (default: 50) – Number of (top) targets to include in each regulon
normalize_MI_per_regulon (default: True) – Whether to normalize MI values each regulon by the maximum value

Returns

A pd.DataFrame containing an ARACNe3-inferred gene regulatory network with the following 4 columns

Return type

“regulator”, “target”, “mor” (mode of regulation) and “likelihood”.

pyviper.pp.nes_to_pval(adata, layer=None, key_added=None, lower_tail=True, adjust=True, axs=1, neg_log=False, copy=False)

Transform VIPER-computed NES into p-values.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object, or a pandas dataframe containing input data, where rows are observations/samples (e.g. cells or groups) and columns are features (e.g. proteins or pathways).
layer (default: None) – Entry of layers to tranform.
key_added (default: None) – Name of layer to save result in a new layer instead of adata.X.
lower_tail (default: True) – If True (default), probabilities are P(X <= x) If False, probabilities are P(X > x)
adjust (default: True) – If True, returns adjusted p values using FDR Benjamini-Hochberg procedure. If False, does not adjust p values
axs (default: 1) – axis along which to perform the p-value correction (Used only if the input is a pd.DataFrame). Possible values are 0 or 1.
neg_log (default: False) – Whether to transform VIPER-computed NES into -log10(p-value).
copy (default: False) – Determines whether a copy of the input AnnData is returned.

Return type

Saves the input data as a transformed version. If key_added is specified, saves the results in adata.layers[key_added].

pyviper.pp.repr_subsample(adata, pca_slot='X_pca', size=1000, seed=0, key_added='repr_subsample', eliminate=False, verbose=True, njobs=1, copy=False)

A tool for create a subsample of the input data such it is well representative of all the populations within the input data rather than being a random sample. This is accomplished by pairing samples together in an iterative fashion until the desired sample size is reached.

Parameters

adata – An anndata object containing a distance object in adata.obsp.
pca_slot (default: "X_pca") – The slot in adata.obsm where the PCA object is stored. One way of generating this object is with sc.pp.pca.
size (default: 1000) – The size of the representative subsample
eliminate (default: False) – Whether to trim down adata to the subsample (True) or leave the subsample as an annotation in adata.obs[key_added].
seed (default: 0) – The random seed used when taking samples of the data.
verbose (default: True) – Whether to provide runtime information.
njobs (default: 1) – The number of cores to use for the analysis. Using more than 1 core (multicore) speeds up the analysis.
copy (default: False) – Determines whether a copy of the input AnnData is returned.

Returns

When copy is False, saves the subsample annotation in adata.var[key_added].
When copy is True, return an anndata with this annotation.
When eliminate is True, modify the adata by subsetting it down to the subsample.

pyviper.pp.repr_metacells(adata, counts=None, pca_slot='X_pca', dist_slot='corr_dist', clusters_slot=None, score_slot=None, score_min_thresh=None, size=500, n_cells_per_metacell=None, min_median_depth=10000, perc_data_to_use=None, perc_incl_data_reused=None, seed=0, key_added='metacells', verbose=True, njobs=1, copy=False)

A tool for create a representative selection of metacells from the data that aims to maximize reusing samples from the data, while simultaneously ensuring that all neighbors are close to the metacell they construct. When using this function, exactly two of the following parameters must be set: size, min_median_depth or n_cells_per_metacell, perc_data_to_use or perc_incl_data_reused. Note that min_median_depth and n_cells_per_metacell cannot both be set at the same time, since they directly relate (e.g. higher n_cells_per_metacell means more neighbors are used to construct a single metacell, meaning each metacell will have more counts, resulting in a higher median depth). Note that perc_data_to_use and perc_incl_data_reused cannot both be set at the same time, since they directly relate (e.g. higher perc_data_to_use means you include more data, which means it’s more likely to reuse more data, resulting in a higher perc_incl_data_reused).

Parameters

adata – An anndata object containing a distance object in adata.obsp.
counts (default: None) – A pandas DataFrame or AnnData object of unnormalized gene expression counts that has the same samples in the same order as that of adata. If counts are left as None, adata must have counts stored in adata.raw.
pca_slot (default: "X_pca") – The slot in adata.obsm where the PCA object is stored. One way of generating this object is with sc.pp.pca.
dist_slot (default: "corr_dist") – The slot in adata.obsp where the distance object is stored. One way of generating this object is with pyviper.pp.corr_distance.
clusters_slot (default: None) – The slot in adata.obs where cluster labels are stored. Cluster-specific metacells will be generated using the same parameters with the results for each cluster being stored separately in adata.uns.
score_slot (default: None) – The slot in adata.obs where a score used to determine and filter cell quality are stored (e.g. silhouette score).
score_min_thresh (default: None) – The score from adata.obs[score_slot] that a cell must have at minimum to be used for metacell construction (e.g. 0.25 is the rule of thumb for silhouette score).
size (default: 500) – A specific number of metacells to generate. If set to None, perc_data_to_use or perc_incl_data_reused can be used to specify the size when n_cells_per_metacell or min_median_depth is given.
n_cells_per_metacell (default: None) – The number of cells that should be used to generate single metacell. Note that this parameter and min_median_depth cannot both be set as they directly relate: e.g. higher n_cells_per_metacell leads to higher min_median_depth. If left as None, perc_data_to_use or perc_incl_data_reused can be used to specify n_cells_per_metacell when size is given.
min_median_depth (default: 10000) – The desired minimum median depth for the metacells (indirectly specifies n_cells_per_metacell). The default is set to 10000 as this is recommend by PISCES[1]. Note that this parameter and n_cells_per_metacell cannot both be set as they directly relate: e.g. higher min_median_depth leads to higher n_cells_per_metacell.
perc_data_to_use (default: None) – The percent of the total amount of provided samples that will be used in the creation of metacells. Note that this parameter and perc_incl_data_reused cannot both be set as they directly relate: e.g. higher perc_data_to_use leads to higher perc_incl_data_reused.
perc_incl_data_reused (default: None) – The percent of samples that are included in the creation of metacells that will be reused (i.e. used in more than one metacell). Note that this parameter and perc_data_to_use cannot both be set as they directly relate: e.g. higher perc_incl_data_reused leads to higher perc_data_to_use.
seed (default: 0) – The random seed used when taking samples of the data.
key_added (default: "metacells") – The name of the slot in the adata.uns to store the output.
verbose (default: True) – Whether to provide runtime information and quality statistics.
njobs (default: 1) – The number of cores to use for the analysis. Using more than 1 core (multicore) speeds up the analysis.
copy (default: False) – Determines whether a copy of the input AnnData is returned.

Return type

Saves the metacells as a pandas dataframe in adata.uns[key_added]. Attributes that contain parameters for and statistics about the construction of the metacells are stored in adata.uns[key_added].attrs. Set copy = True to return a new AnnData object.

References

Obradovic, A., Vlahos, L., Laise, P., Worley, J., Tan, X., Wang, A., & Califano, A. (2021). PISCES: A pipeline for the systematic, protein activity -based analysis of single cell RNA sequencing data. bioRxiv, 6, 22.

pyviper.tl

pyviper.tl.pca(adata, *, layer=None, filter_by_feature_groups=None, **kwargs)

A wrapper for the scanpy function sc.tl.pca.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
layer (default: None) – The layer to use as input data.
filter_by_feature_groups (default: None) – The selected regulators, such that all other regulators are filtered out from the input data. If None, all regulators will be included. Regulator sets must be from one of the following: “tfs”, “cotfs”, “sig”, “surf”.
**kwargs – Arguments to provide to the sc.tl.pca function.

pyviper.tl.dendrogram(adata, *, groupby, key_added=None, layer=None, filter_by_feature_groups=None, **kwargs)

A wrapper for the scanpy function sc.tl.dendrogram.

Parameters

adata – Gene expression, protein activity or pathways stored in an anndata object.
key_added (default: None) – The key in adata.uns where the dendrogram should be stored.
layer (default: None) – The layer to use as input data.
filter_by_feature_groups (default: None) – The selected regulators, such that all other regulators are filtered out from the input data. If None, all regulators will be included. Regulator sets must be from one of the following: “tfs”, “cotfs”, “sig”, “surf”.
**kwargs – Arguments to provide to the sc.tl.dendrogram function.

pyviper.tl.oncomatch(pax_data_to_test, pax_data_for_cMRs, tcm_size=50, both_ways=False, om_max_NES_threshold=30, om_min_logp_threshold=0, enrichment='aREA', key_added='om', return_as_df=False, copy=False)

The OncoMatch algorithm[1] assesses the overlap in differentially active MR proteins between two sets of samples (e.g. to validate GEMMs as effective models of human tumor samples). It does so by computing -log10 p-values for each sample in pax_data_to_test of the MRs of each sample in pax_data_for_cMRs.

Parameters

pax_data_to_test – An anndata.AnnData or pd.DataFrame containing protein activity (NES), where rows are observations/samples (e.g. cells or groups) and columns are features (e.g. proteins or pathways).
pax_data_for_cMRs – An anndata.AnnData or pd.DataFrame containing protein activity (NES), where rows are observations/samples (e.g. cells or groups) and columns are features (e.g. proteins or pathways).
tcm_size (default: 50) – Number of top MRs from each sample to use to compute regulators.
both_ways (default: False) – Whether to also use the candidate MRs of pax_data_to_test to compute NES for the samples in pax_data_for_cMRs, and then average.
om_max_NES_threshold (default: 30) – The maximum NES scores before using a cutoff.
om_min_logp_threshold (default: 0) – The minimum logp value threshold, such that all logp values smaller than this value are set to 0.
enrichment (default: 'aREA') – The method of compute enrichment. ‘aREA’ or ‘NaRnEA’
key_added (default: 'om') – The slot in pax_data_to_test.obsm to store the oncomatch results.
return_as_df (default: False) – Instead of adding the OncoMatch DataFrame to pax_data_to_test.obsm, return it directly.
copy (default: False) – Determines whether a copy of the input AnnData is returned.

Return type

When copy is False, stores a pd.DataFrame objects of -log10 p-values with shape (n_samples in pax_data_to_test, n_samples in pax_data_for_cMRs) in pax_data_to_test.obsm[key_added]. When copy is True, a copy of the AnnData is returned with these pd.DataFrames stored. When return_as_df is True, the OncoMatch DataFrame alone is directly returned by the function.

References

[1] Alvarez, M. J. et al. A precision oncology approach to the pharmacological targeting of mechanistic dependencies in neuroendocrine tumors. Nat Genet 50, 979–989, doi:10.1038/s41588-018-0138-4 (2018).

[2] Alvarez, M. J. et al. Reply to ’H-STS, L-STS and KRJ-I are not authentic GEPNET cell lines’. Nat Genet 51, 1427–1428, doi:10.1038/s41588-019-0509-5 (2019).

pyviper.tl.find_top_mrs(adata, pca_slot='X_pca', obs_column_name=None, layer=None, N=50, both=True, method='stouffer', key_added='mr', filter_by_feature_groups=None, rank=False, filter_by_top_mrs=False, return_as_df=False, copy=False, verbose=True)

Identify the top N master regulator proteins in a VIPER AnnData object

Parameters

adata – An anndata object containing a distance object in adata.obsp.
pca_slot – The slot in adata.obsm where a PCA is stored. Only required when method is “spearman”.
obs_column_name – The name of the column of observations in adata to use as clusters, or a cluster vector corresponding to observations. Required when method is “mwu” or “spearman”.
N (default: 50) – The number of MRs to return
both (default: True) – Whether to return both the top N and bottom N MRs (True) or just the top N (False).
method (default: "stouffer") – The method used to compute a signature to identify the top candidate master regulators (MRs). The options come from functions in pyviper.pp. Choose between “stouffer”, “mwu”, or “spearman”.
key_added (default: "mr") – The name of the slot in the adata.var to store the output.
filter_by_feature_groups (default: None) – The selected regulators, such that all other regulators are filtered out from the input data. If None, all regulators will be included. Regulator sets must be from one of the following: “tfs”, “cotfs”, “sig”, “surf”.
rank (default: False) – When False, a column is added to var with identified MRs labeled as “True”, while all other proteins are labeled as “False”. When True, top MRs are labeled N,N-1,N-2,…,1, bottom MRs are labeled -N,-N-1,-N-2, …,-1, and all other proteins are labeled 0. Higher rank means greater activity, while lower rank means less.
filter_by_top_mrs (default: False) – Whether to filter var to only the top MRs in adata
return_as_df (default: False) – Returns a pd.DataFrame of the top MRs per cluster
copy (default: False) – Determines whether a copy of the input AnnData is returned.
verbose (default: True) – Whether extended output about the progress of the algorithm is given.

Return type

Add a column to adata.var[key_added] or, when clusters given, adds multiple columns (e.g. key_added_clust1name, key_added_clust2name, etc) to adata.var. If copy, returns a new adata transformed by this function. If return_as_df, returns a DataFrame.

pyviper.tl.path_enr(gex_data, pathway_interactome, layer=None, eset_filter=True, method=None, enrichment='aREA', mvws=1, njobs=1, batch_size=10000, verbose=True, output_as_anndata=True, transfer_obs=True, store_input_data=True)

Run the variation of VIPER that is specific to pathway enrichment analysis: a single interactome and min_targets is set to 0.

Parameters

gex_data – Gene expression stored in an anndata object (e.g. from Scanpy).
pathway_interactome – An object of class Interactome or one of the following strings that corresponds to msigdb regulons: “c2”, “c5”, “c6”, “c7”, “h”.
layer (default: None) – The layer in the anndata object to use as the gene expression input.
eset_filter (default: False) – Whether to filter out genes not present in the interactome (True) or to keep this biological context (False). This will affect gene rankings.
method (default: None) – A method used to create a gene expression signature from gex_data.X. The default of None is used when gex_data.X is already a gene expression signature. Alternative inputs include “scale”, “rank”, “doublerank”, “mad”, and “ttest”.
enrichment (default: 'aREA') – The algorithm to use to calculate the enrichment. Choose betweeen Analytical Ranked Enrichment Analysis (aREA) and Nonparametric Analytical Rank-based Enrichment Analysis (NaRnEA) function. Default =’aREA’, alternative = ‘NaRnEA’.
mvws (default: 1) – (A) Number indicating either the exponent score for the metaViper weights. These are only applicable when enrichment = ‘aREA’ and are not used when enrichment = ‘NaRnEA’. Roughly, a lower number (e.g. 1) results in networks being treated as a consensus network (useful for multiple networks of the same celltype with the same epigenetics), while a higher number (e.g. 10) results in networks being treated as separate (useful for multiple networks of different celltypes with different epigenetics). (B) The name of a column in gex_data that contains the manual assignments of samples to networks using list position or network names. (C) “auto”: assign samples to networks based on how well each network allows for sample enrichment.
njobs (default: 1) – Number of cores to distribute sample batches into.
batch_size (default: 10000) – Maximum number of samples to process at once. Set to None to split all samples across provided njobs.
verbose (default: True) – Whether extended output about the progress of the algorithm is given.
output_as_anndata (default: True) – Way of delivering output.
transfer_obs (default: True) – Whether to transfer the observation metadata from the input anndata to the output anndata. Thus, not applicable when output_as_anndata==False.
store_input_data (default: True) – Whether to store the input anndata in an unstructured data slot (.uns) of the output anndata. Thus, not applicable when output_as_anndata==False. If input anndata already contains ‘gex_data’ in .uns, the input will assumed to be protein activity and will be stored in .uns as ‘pax_data’. Otherwise, the data will be stored as ‘gex_data’ in .uns.

Return type

Returns an AnnData object containing the pathways. When store_input_data, the input gex_data AnnData is stored within the dataframe.

pyviper.viper

pyviper.viper(gex_data, interactome, layer=None, eset_filter=True, method=None, enrichment='aREA', mvws=1, min_targets=30, njobs=1, batch_size=10000, verbose=True, output_as_anndata=True, transfer_obs=True, store_input_data=True)

The VIPER (Virtual Inference of Protein-activity by Enriched Regulon analysis) algorithm[1] allows individuals to compute protein activity using a gene expression signature and an Interactome object that describes the relationship between regulators and their downstream targets. Users can infer normalized enrichment scores (NES) using Analytical Ranked Enrichment Analysis (aREA)[1] or Nonparametric Analytical Rank-based Enrichment Analysis (NaRnEA)[2]. NaRnEA also compute proportional enrichment scores (PES).

The Interactome object must not contain any targets that are not in the features of gex_data. This can be accomplished by running:

interactome.filter_targets(gex_data.var_names)

It is highly recommend to do this on the unPruned network and then prune to ensure the pruned network contains a consistent number of targets per regulator, allow of which exist within gex_data.

Parameters

gex_data – Gene expression stored in an anndata object (e.g. from Scanpy).
interactome – An object of class Interactome or a list of Interactome objects.
layer (default: None) – The layer in the anndata object to use as the gene expression input.
eset_filter (default: False) – Whether to filter out genes not present in the interactome (True) or to keep this biological context (False). This will affect gene rankings.
method (default: None) – A method used to create a gene expression signature from gex_data.X. The default of None is used when gex_data.X is already a gene expression signature. Alternative inputs include “scale”, “rank”, “doublerank”, “mad”, and “ttest”.
enrichment (default: 'aREA') – The algorithm to use to calculate the enrichment. Choose betweeen Analytical Ranked Enrichment Analysis (aREA) and Nonparametric Analytical Rank-based Enrichment Analysis (NaRnEA) function. Default =’aREA’, alternative = ‘NaRnEA’.
mvws (default: 1) – (A) Number indicating either the exponent score for the metaViper weights. These are only applicable when enrichment = ‘aREA’ and are not used when enrichment = ‘NaRnEA’. Roughly, a lower number (e.g. 1) results in networks being treated as a consensus network (useful for multiple networks of the same celltype with the same epigenetics), while a higher number (e.g. 10) results in networks being treated as separate (useful for multiple networks of different celltypes with different epigenetics). (B) The name of a column in gex_data that contains the manual assignments of samples to networks using list position or network names. (C) “auto”: assign samples to networks based on how well each network allows for sample enrichment.
min_targets (default: 30) – The minimum number of targets that each regulator in the interactome should contain. Regulators that contain fewer targets than this minimum will be pruned from the network (via the Interactome.prune method). The reason users may choose to use this threshold is because adequate targets are needed to accurately predict enrichment.
njobs (default: 1) – Number of cores to distribute sample batches into.
batch_size (default: 10000) – Maximum number of samples to process at once. Set to None to split all samples across provided njobs.
verbose (default: True) – Whether extended output about the progress of the algorithm should be given.
output_as_anndata (default: True) – Way of delivering output.
transfer_obs (default: True) – Whether to transfer the observation metadata from the input anndata to the output anndata. Thus, not applicable when output_as_anndata==False.
store_input_data (default: True) – Whether to store the input anndata in an unstructured data slot (.uns) of the output anndata. Thus, not applicable when output_as_anndata==False. If input anndata already contains ‘gex_data’ in .uns, the input will assumed to be protein activity and will be stored in .uns as ‘pax_data’. Otherwise, the data will be stored as ‘gex_data’ in .uns.

Returns

A dictionary containing :class:`~numpy.ndarray` containing NES values (key (‘nes’) and PES values (key: ‘pes’) when output_as_anndata=False and enrichment = “NaRnEA”.)
A dataframe of DataFrame containing NES values when output_as_anndata=False and enrichment = “aREA”.
An anndata object containin NES values in .X when output_as_anndata=True (default). Will contain PES values in the layer ‘pes’ when enrichment = ‘NaRnEA’. Will contain .gex_data and/or .pax_data in the unstructured data slot (.uns) when store_input_data = True. Will contain identical .obs to the input anndata when transfer_obs = True.

References

[1] Alvarez, M. J., Shen, Y., Giorgi, F. M., Lachmann, A., Ding, B. B., Ye, B. H., & Califano, A. (2016). Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nature genetics, 48(8), 838-847.

[2] Griffin, A. T., Vlahos, L. J., Chiuzan, C., & Califano, A. (2023). NaRnEA: An Information Theoretic Framework for Gene Set Analysis. Entropy, 25(3), 542.