Welcome to Drug2cell’s documentation!
- drug2cell.gsea(adata, targets=None, nested=False, categories=None, absolute=False, plot_args=True, sep=',', **kwargs)
Perform gene set enrichment analysis on the marker gene scores computed for the original object. Uses blitzgsea.
Returns:
a dictionary with clusters for which the original object markers were computed as the keys, and data frames of test results sorted on q-value as the items
a helper variable with plotting arguments for
d2c.plot_gsea(), ifplot_args=True.['scores']has the GSEA input, and['targets']is the gene group dictionary that was used.
Input
- adata
AnnData With marker genes computed via
sc.tl.rank_genes_groups()in the original expression space.- targets
dictof lists ofstr, optional (default:None) The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If
None, will used2c.score()output if present, and if not present load the ChEMBL-derived drug target sets distributed with the package.Accepts two forms:
A dictionary with the names of the groups as keys, and the entries being the
corresponding gene lists. - A dictionary of dictionaries defined like above, with names of gene group categories as keys. If passing one of those, please specify
nested=True.- nested
bool, optional (default:False) Whether
targetsis a dictionary of dictionaries with group categories as keys.- categories
stror list ofstr, optional (default:None) If
targets=Noneornested=True, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.- absolute
bool, optional (default:False) If
True, pass the absolute values of scores to GSEA. Improves statistical power.- plot_args
bool, optional (default:True) Whether to return the second piece of output that holds pre-compiled information for
d2c.plot_gsea().- sep
str, optional (default:",") The delimiter that was used with
d2c.score()for gene group storage.- kwargs
Any additional arguments to pass to
blitzgsea.gsea().
- drug2cell.hypergeometric(adata, targets=None, nested=False, categories=None, pvals_adj_thresh=0.05, direction='both', corr_method='benjamini-hochberg', sep=',')
Perform a hypergeometric test to assess the overrepresentation of gene group members among marker genes computed for the original object.
Returns a dictionary with clusters for which the original object markers were computed as the keys, and data frames of test results sorted on q-value as the items.
Input
- adata
AnnData With marker genes computed via
sc.tl.rank_genes_groups()in the original expression space.- targets
dictof lists ofstr, optional (default:None) The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If
None, will used2c.score()output if present, and if not present load the ChEMBL-derived drug target sets distributed with the package.Accepts two forms:
A dictionary with the names of the groups as keys, and the entries being the
corresponding gene lists. - A dictionary of dictionaries defined like above, with names of gene group categories as keys. If passing one of those, please specify
nested=True.- nested
bool, optional (default:False) Whether
targetsis a dictionary of dictionaries with group categories as keys.- categories
stror list ofstr, optional (default:None) If
targets=Noneornested=True, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.- pvals_adj_thresh
float, optional (default:0.05) The
pvals_adjcutoff to use on thesc.tl.rank_genes_groups()output to identify markers.- direction
str, optional (default:"both") Whether to seek out up/down-regulated genes for the groups, based on the values from
scores. Can be"up","down", or"both"(for no selection).- corr_method
str, optional (default:"benjamini-hochberg") Which FDR correction to apply to the p-values of the hypergeometric test. Can be
"benjamini-hochberg"or"bonferroni".- sep
str, optional (default:",") The delimiter that was used with
d2c.score()for gene group storage.
- adata
- drug2cell.score(adata, targets=None, nested=False, categories=None, method='mean', layer=None, use_raw=False, n_bins=25, ctrl_size=50, sep=',')
Obtain per-cell scoring of gene groups of interest. Distributed with a set of ChEMBL drug targets that can be used immediately.
Please ensure that the gene nomenclature in your target sets is compatible with your
.var_names(or.raw.var_names). The ChEMBL drug targets use HGNC (human gene names in line with standard cell ranger mapping output).Adds
.uns['drug2cell']to the input AnnData, a new AnnData object with the same observation space but with the scored gene groups as the features. The gene group members used to compute the scores will be listed in.var['genes']of the new object.Input
- adata
AnnData Using log-normalised data is recommended.
- targets
dictof lists ofstr, optional (default:None) The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If
None, will load the ChEMBL-derived drug target sets distributed with the package.Accepts two forms:
A dictionary with the names of the groups as keys, and the entries being the
corresponding gene lists. - A dictionary of dictionaries defined like above, with names of gene group categories as keys. If passing one of those, please specify
nested=True.- nested
bool, optional (default:False) Whether
targetsis a dictionary of dictionaries with group categories as keys.- categories
stror list ofstr, optional (default:None) If
targets=Noneornested=True, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.- method
str, optional (default:"mean") The method to use to score the gene groups. The default is
"mean", which computes the mean over all the genes. The other option is"seurat", which generates an appropriate background profile for each target set and subtracts it from the mean. This is inspired bysc.tl.rank_genes()logic, which in turn was inspired by Seurat’s gene group scoring algorithm.- layer
str, optional (default:None) Which
.layersof the input AnnData to use for the expression values. IfNone, will default to.X.- use_raw
bool, optional (default:False) Whether to use
.raw.Xfor the expression values.- n_bins
int, optional (default: 25) Only used with
method="seurat". The number of expression bins to partition the feature space into.- ctrl_size
int, optional (default: 50) Only used with
method="seurat". The number of genes to randomly sample from each expression bin.- sep
str, optional (default:",") What delimiter to use when storing the corresponding gene groups for each feature in
.uns['drug2cell'].var['genes']
- adata
Gene group loading
- drug2cell.data.chembl()
Load the default ChEMBL drug target dictionary distributed with the package.
Returns the drug target dictionary - ATC categories as keys, with each ATC category a dictionary with corresponding drugs as keys.
- drug2cell.data.consensuspathdb()
Load the ConsensusPathDB pathway gene memberships distributed with the package.
Returns a dictionary with pathway names as keys and memberships as items.
Utility functions
- drug2cell.util.plot_gsea(enrichment, targets, scores, n=10, interactive_plot=True, **kwargs)
Display the output of
d2c.gsea()with blitzgsea’stop_table()plot.The first
d2c.gsea()output variable isenrichment, and passing the secondd2c.gsea()output variable with a**in front of it providestargetsandscores.Input
- enrichment
dictofpd.DataFrame Cluster names as keys, blitzgsea’s
gsea()output as values- targets
dictof list ofstr The gene group memberships that were used to compute GSEA
- scores
dictofpd.DataFrame Cluster names as keys, the input to blitzgsea
- n
int, optional (default:10) How many top scores to show for each group
- interactive_plot
bool, optional (default:True) If
True, will display the plots within a Jupyter Notebook. IfFalse, will collect the figures into a list and return it at the end.- kwargs
Any additional arguments to pass to
blitzgsea.plot.top_table().
- enrichment
- drug2cell.util.prepare_plot_args(adata, targets=None, categories=None)
Prepare the
var_names,var_group_positionsandvar_group_labelsarguments for scanpy plotting functions to display scored gene groups and group them nicely. Returnsplot_args, a dictionary of the values that can be used with scanpy plotting as**plot_args.Input:
- adata
AnnData Point the function to the
.uns['drug2cell']slot computed by thescore()function earlier. It’s required to remove gene groups that were not represented in the data.- targets
dictof lists ofstr, optional (default:None) The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If
None, will load the ChEMBL-derived drug target sets distributed with the package. Must be thenested=Trueversion of the input as described in the score function.- categories
stror list ofstr, optional (default:None) If
targets=Noneornested=True, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.
- adata
- drug2cell.chembl.filter_activities(dataframe, drug_max_phase=None, add_drug_mechanism=True, assay_type=None, remove_inactive=True, include_active=True, pchembl_target_column=None, pchembl_threshold=None)
Perform a sequential set of filtering operations on the provided ChEMBL data frame. The order of the filters matches the order of the arguments in the input description. Returns a data frame with the rows fulfilling the resulting criteria.
Input
- dataframe
pd.DataFrame The ChEMBL data frame to perform filtering operations on.
- drug_max_phase
intor list ofint, optional (default:None) Subset the data frame to drugs in the provided clinical stages:
Phase 1: Testing of drug on healthy volunteers for dose-ranging
Phase 2: Initial testing of drug on patients to assess efficacy and safety
Phase 3: Testing of drug on patients to assess efficacy, effectiveness and safety (larger test group)
Phase 4: approved
- add_drug_mechanism
bool, optional (default:True) Grant subsequent filtering immunity to rows with drug mechanism information present.
- assay_type
stror list ofstr, optional (default:None) Subset the data frame based on assay type information:
Binding (B) - Data measuring binding of compound to a molecular target, e.g. Ki, IC50, Kd.
Functional (F) - Data measuring the biological effect of a compound, e.g. %cell death in a cell line, rat weight.
ADMET (A) - ADME data e.g. t1/2, oral bioavailability.
Toxicity (T) - Data measuring toxicity of a compound, e.g., cytotoxicity.
Physicochemical (P) - Assays measuring physicochemical properties of the compounds in the absence of biological material e.g., chemical stability, solubility.
Unclassified (U) - A small proportion of assays cannot be classified into one of the above categories e.g., ratio of binding vs efficacy.
- remove_inactive
bool, optional (default:True) Subset the data frame to remove inactive drug-target interactions.
- include_active
bool, optional (default:True) Grant subsequent filtering immunity to active drug-target interactions.
- pchembl_target_column
str, optional (default:None) Use the selected column in the data frame to dictate custom pChEMBL thresholds for each unique value in the column.
- pchembl_threshold
floatordictoffloat, optional (default:None) Subset the data frame to this pChEMBL minimum. If a single
float, use that value. If adictprovided in conjunction with apchembl_target_column, have the unique values of the specified column as keys of the dictionary, with entries being the desired threshold for that category.
- dataframe