Welcome to Drug2cell’s documentation!

drug2cell.gsea(adata, targets=None, nested=False, categories=None, absolute=False, plot_args=True, sep=',')

Perform gene set enrichment analysis on the marker gene scores computed for the original object. Uses blitzgsea.

Returns:

  • a dictionary with clusters for which the original object markers were computed as the keys, and data frames of test results sorted on q-value as the items

  • a helper variable with plotting arguments for d2c.plot_gsea(), if plot_args=True. ['scores'] has the GSEA input, and ['targets'] is the gene group dictionary that was used.

Input

adataAnnData

With marker genes computed via sc.tl.rank_genes_groups() in the original expression space.

targetsdict of lists of str, optional (default: None)

The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If None, will use d2c.score() output if present, and if not present load the ChEMBL-derived drug target sets distributed with the package.

Accepts two forms:

  • A dictionary with the names of the groups as keys, and the entries being the

corresponding gene lists. - A dictionary of dictionaries defined like above, with names of gene group categories as keys. If passing one of those, please specify nested=True.

nestedbool, optional (default: False)

Whether targets is a dictionary of dictionaries with group categories as keys.

categoriesstr or list of str, optional (default: None)

If targets=None or nested=True, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.

absolutebool, optional (default: False)

If True, pass the absolute values of scores to GSEA. Improves statistical power.

plot_argsbool, optional (default: True)

Whether to return the second piece of output that holds pre-compiled information for d2c.plot_gsea().

sepstr, optional (default: ",")

The delimiter that was used with d2c.score() for gene group storage.

drug2cell.hypergeometric(adata, targets=None, nested=False, categories=None, pvals_adj_thresh=0.05, direction='both', corr_method='benjamini-hochberg', sep=',')

Perform a hypergeometric test to assess the overrepresentation of gene group members among marker genes computed for the original object.

Returns a dictionary with clusters for which the original object markers were computed as the keys, and data frames of test results sorted on q-value as the items.

Input

adataAnnData

With marker genes computed via sc.tl.rank_genes_groups() in the original expression space.

targetsdict of lists of str, optional (default: None)

The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If None, will use d2c.score() output if present, and if not present load the ChEMBL-derived drug target sets distributed with the package.

Accepts two forms:

  • A dictionary with the names of the groups as keys, and the entries being the

corresponding gene lists. - A dictionary of dictionaries defined like above, with names of gene group categories as keys. If passing one of those, please specify nested=True.

nestedbool, optional (default: False)

Whether targets is a dictionary of dictionaries with group categories as keys.

categoriesstr or list of str, optional (default: None)

If targets=None or nested=True, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.

pvals_adj_threshfloat, optional (default: 0.05)

The pvals_adj cutoff to use on the sc.tl.rank_genes_groups() output to identify markers.

directionstr, optional (default: "both")

Whether to seek out up/down-regulated genes for the groups, based on the values from scores. Can be "up", "down", or "both" (for no selection).

corr_methodstr, optional (default: "benjamini-hochberg")

Which FDR correction to apply to the p-values of the hypergeometric test. Can be "benjamini-hochberg" or "bonferroni".

sepstr, optional (default: ",")

The delimiter that was used with d2c.score() for gene group storage.

drug2cell.score(adata, targets=None, nested=False, categories=None, method='mean', layer=None, use_raw=False, n_bins=25, ctrl_size=50, sep=',')

Obtain per-cell scoring of gene groups of interest. Distributed with a set of ChEMBL drug targets that can be used immediately.

Please ensure that the gene nomenclature in your target sets is compatible with your .var_names (or .raw.var_names). The ChEMBL drug targets use HGNC (human gene names in line with standard cell ranger mapping output).

Adds .uns['drug2cell'] to the input AnnData, a new AnnData object with the same observation space but with the scored gene groups as the features. The gene group members used to compute the scores will be listed in .var['genes'] of the new object.

Input

adataAnnData

Using log-normalised data is recommended.

targetsdict of lists of str, optional (default: None)

The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If None, will load the ChEMBL-derived drug target sets distributed with the package.

Accepts two forms:

  • A dictionary with the names of the groups as keys, and the entries being the

corresponding gene lists. - A dictionary of dictionaries defined like above, with names of gene group categories as keys. If passing one of those, please specify nested=True.

nestedbool, optional (default: False)

Whether targets is a dictionary of dictionaries with group categories as keys.

categoriesstr or list of str, optional (default: None)

If targets=None or nested=True, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.

methodstr, optional (default: "mean")

The method to use to score the gene groups. The default is "mean", which computes the mean over all the genes. The other option is "seurat", which generates an appropriate background profile for each target set and subtracts it from the mean. This is inspired by sc.tl.rank_genes() logic, which in turn was inspired by Seurat’s gene group scoring algorithm.

layerstr, optional (default: None)

Which .layers of the input AnnData to use for the expression values. If None, will default to .X.

use_rawbool, optional (default: False)

Whether to use .raw.X for the expression values.

n_binsint, optional (default: 25)

Only used with method="seurat". The number of expression bins to partition the feature space into.

ctrl_sizeint, optional (default: 50)

Only used with method="seurat". The number of genes to randomly sample from each expression bin.

sepstr, optional (default: ",")

What delimiter to use when storing the corresponding gene groups for each feature in .uns['drug2cell'].var['genes']

Gene group loading

drug2cell.data.chembl()

Load the default ChEMBL drug target dictionary distributed with the package.

Returns the drug target dictionary - ATC categories as keys, with each ATC category a dictionary with corresponding drugs as keys.

drug2cell.data.consensuspathdb()

Load the ConsensusPathDB pathway gene memberships distributed with the package.

Returns a dictionary with pathway names as keys and memberships as items.

Utility functions

drug2cell.util.plot_gsea(enrichment, targets, scores, n=10)

Display the output of d2c.gsea() with blitzgsea’s top_table() plot.

The first d2c.gsea() output variable is enrichment, and passing the second d2c.gsea() output variable with a ** in front of it provides targets and scores.

Input

enrichmentdict of pd.DataFrame

Cluster names as keys, blitzgsea’s gsea() output as values

targetsdict of list of str

The gene group memberships that were used to compute GSEA

scoresdict of pd.DataFrame

Cluster names as keys, the input to blitzgsea

nint, optional (default: 10)

How many top scores to show for each group

drug2cell.util.prepare_plot_args(adata, targets=None, categories=None)

Prepare the var_names, var_group_positions and var_group_labels arguments for scanpy plotting functions to display scored gene groups and group them nicely. Returns plot_args, a dictionary of the values that can be used with scanpy plotting as **plot_args.

Input:

adataAnnData

Point the function to the .uns['drug2cell'] slot computed by the score() function earlier. It’s required to remove gene groups that were not represented in the data.

targetsdict of lists of str, optional (default: None)

The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If None, will load the ChEMBL-derived drug target sets distributed with the package. Must be the nested=True version of the input as described in the score function.

categoriesstr or list of str, optional (default: None)

If targets=None or nested=True, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.

drug2cell.chembl.filter_activities(dataframe, drug_max_phase=None, add_drug_mechanism=True, assay_type=None, remove_inactive=True, include_active=True, pchembl_target_column=None, pchembl_threshold=None)

Perform a sequential set of filtering operations on the provided ChEMBL data frame. The order of the filters matches the order of the arguments in the input description. Returns a data frame with the rows fulfilling the resulting criteria.

Input

dataframepd.DataFrame

The ChEMBL data frame to perform filtering operations on.

drug_max_phaseint or list of int, optional (default: None)

Subset the data frame to drugs in the provided clinical stages:

  • Phase 1: Testing of drug on healthy volunteers for dose-ranging

  • Phase 2: Initial testing of drug on patients to assess efficacy and safety

  • Phase 3: Testing of drug on patients to assess efficacy, effectiveness and safety (larger test group)

  • Phase 4: approved

add_drug_mechanismbool, optional (default: True)

Grant subsequent filtering immunity to rows with drug mechanism information present.

assay_typestr or list of str, optional (default: None)

Subset the data frame based on assay type information:

  • Binding (B) - Data measuring binding of compound to a molecular target, e.g. Ki, IC50, Kd.

  • Functional (F) - Data measuring the biological effect of a compound, e.g. %cell death in a cell line, rat weight.

  • ADMET (A) - ADME data e.g. t1/2, oral bioavailability.

  • Toxicity (T) - Data measuring toxicity of a compound, e.g., cytotoxicity.

  • Physicochemical (P) - Assays measuring physicochemical properties of the compounds in the absence of biological material e.g., chemical stability, solubility.

  • Unclassified (U) - A small proportion of assays cannot be classified into one of the above categories e.g., ratio of binding vs efficacy.

remove_inactivebool, optional (default: True)

Subset the data frame to remove inactive drug-target interactions.

include_activebool, optional (default: True)

Grant subsequent filtering immunity to active drug-target interactions.

pchembl_target_columnstr, optional (default: None)

Use the selected column in the data frame to dictate custom pChEMBL thresholds for each unique value in the column.

pchembl_thresholdfloat or dict of float, optional (default: None)

Subset the data frame to this pChEMBL minimum. If a single float, use that value. If a dict provided in conjunction with a pchembl_target_column, have the unique values of the specified column as keys of the dictionary, with entries being the desired threshold for that category.