Welcome to Drug2cell’s documentation!
- drug2cell.gsea(adata, targets=None, nested=False, categories=None, absolute=False, plot_args=True, sep=',')
Perform gene set enrichment analysis on the marker gene scores computed for the original object. Uses blitzgsea.
Returns:
a dictionary with clusters for which the original object markers were computed as the keys, and data frames of test results sorted on q-value as the items
a helper variable with plotting arguments for
d2c.plot_gsea()
, ifplot_args=True
.['scores']
has the GSEA input, and['targets']
is the gene group dictionary that was used.
Input
- adata
AnnData
With marker genes computed via
sc.tl.rank_genes_groups()
in the original expression space.- targets
dict
of lists ofstr
, optional (default:None
) The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If
None
, will used2c.score()
output if present, and if not present load the ChEMBL-derived drug target sets distributed with the package.Accepts two forms:
A dictionary with the names of the groups as keys, and the entries being the
corresponding gene lists. - A dictionary of dictionaries defined like above, with names of gene group categories as keys. If passing one of those, please specify
nested=True
.- nested
bool
, optional (default:False
) Whether
targets
is a dictionary of dictionaries with group categories as keys.- categories
str
or list ofstr
, optional (default:None
) If
targets=None
ornested=True
, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.- absolute
bool
, optional (default:False
) If
True
, pass the absolute values of scores to GSEA. Improves statistical power.- plot_args
bool
, optional (default:True
) Whether to return the second piece of output that holds pre-compiled information for
d2c.plot_gsea()
.- sep
str
, optional (default:","
) The delimiter that was used with
d2c.score()
for gene group storage.
- drug2cell.hypergeometric(adata, targets=None, nested=False, categories=None, pvals_adj_thresh=0.05, direction='both', corr_method='benjamini-hochberg', sep=',')
Perform a hypergeometric test to assess the overrepresentation of gene group members among marker genes computed for the original object.
Returns a dictionary with clusters for which the original object markers were computed as the keys, and data frames of test results sorted on q-value as the items.
Input
- adata
AnnData
With marker genes computed via
sc.tl.rank_genes_groups()
in the original expression space.- targets
dict
of lists ofstr
, optional (default:None
) The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If
None
, will used2c.score()
output if present, and if not present load the ChEMBL-derived drug target sets distributed with the package.Accepts two forms:
A dictionary with the names of the groups as keys, and the entries being the
corresponding gene lists. - A dictionary of dictionaries defined like above, with names of gene group categories as keys. If passing one of those, please specify
nested=True
.- nested
bool
, optional (default:False
) Whether
targets
is a dictionary of dictionaries with group categories as keys.- categories
str
or list ofstr
, optional (default:None
) If
targets=None
ornested=True
, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.- pvals_adj_thresh
float
, optional (default:0.05
) The
pvals_adj
cutoff to use on thesc.tl.rank_genes_groups()
output to identify markers.- direction
str
, optional (default:"both"
) Whether to seek out up/down-regulated genes for the groups, based on the values from
scores
. Can be"up"
,"down"
, or"both"
(for no selection).- corr_method
str
, optional (default:"benjamini-hochberg"
) Which FDR correction to apply to the p-values of the hypergeometric test. Can be
"benjamini-hochberg"
or"bonferroni"
.- sep
str
, optional (default:","
) The delimiter that was used with
d2c.score()
for gene group storage.
- adata
- drug2cell.score(adata, targets=None, nested=False, categories=None, method='mean', layer=None, use_raw=False, n_bins=25, ctrl_size=50, sep=',')
Obtain per-cell scoring of gene groups of interest. Distributed with a set of ChEMBL drug targets that can be used immediately.
Please ensure that the gene nomenclature in your target sets is compatible with your
.var_names
(or.raw.var_names
). The ChEMBL drug targets use HGNC (human gene names in line with standard cell ranger mapping output).Adds
.uns['drug2cell']
to the input AnnData, a new AnnData object with the same observation space but with the scored gene groups as the features. The gene group members used to compute the scores will be listed in.var['genes']
of the new object.Input
- adata
AnnData
Using log-normalised data is recommended.
- targets
dict
of lists ofstr
, optional (default:None
) The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If
None
, will load the ChEMBL-derived drug target sets distributed with the package.Accepts two forms:
A dictionary with the names of the groups as keys, and the entries being the
corresponding gene lists. - A dictionary of dictionaries defined like above, with names of gene group categories as keys. If passing one of those, please specify
nested=True
.- nested
bool
, optional (default:False
) Whether
targets
is a dictionary of dictionaries with group categories as keys.- categories
str
or list ofstr
, optional (default:None
) If
targets=None
ornested=True
, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.- method
str
, optional (default:"mean"
) The method to use to score the gene groups. The default is
"mean"
, which computes the mean over all the genes. The other option is"seurat"
, which generates an appropriate background profile for each target set and subtracts it from the mean. This is inspired bysc.tl.rank_genes()
logic, which in turn was inspired by Seurat’s gene group scoring algorithm.- layer
str
, optional (default:None
) Which
.layers
of the input AnnData to use for the expression values. IfNone
, will default to.X
.- use_raw
bool
, optional (default:False
) Whether to use
.raw.X
for the expression values.- n_bins
int
, optional (default: 25) Only used with
method="seurat"
. The number of expression bins to partition the feature space into.- ctrl_size
int
, optional (default: 50) Only used with
method="seurat"
. The number of genes to randomly sample from each expression bin.- sep
str
, optional (default:","
) What delimiter to use when storing the corresponding gene groups for each feature in
.uns['drug2cell'].var['genes']
- adata
Gene group loading
- drug2cell.data.chembl()
Load the default ChEMBL drug target dictionary distributed with the package.
Returns the drug target dictionary - ATC categories as keys, with each ATC category a dictionary with corresponding drugs as keys.
- drug2cell.data.consensuspathdb()
Load the ConsensusPathDB pathway gene memberships distributed with the package.
Returns a dictionary with pathway names as keys and memberships as items.
Utility functions
- drug2cell.util.plot_gsea(enrichment, targets, scores, n=10)
Display the output of
d2c.gsea()
with blitzgsea’stop_table()
plot.The first
d2c.gsea()
output variable isenrichment
, and passing the secondd2c.gsea()
output variable with a**
in front of it providestargets
andscores
.Input
- enrichment
dict
ofpd.DataFrame
Cluster names as keys, blitzgsea’s
gsea()
output as values- targets
dict
of list ofstr
The gene group memberships that were used to compute GSEA
- scores
dict
ofpd.DataFrame
Cluster names as keys, the input to blitzgsea
- n
int
, optional (default:10
) How many top scores to show for each group
- enrichment
- drug2cell.util.prepare_plot_args(adata, targets=None, categories=None)
Prepare the
var_names
,var_group_positions
andvar_group_labels
arguments for scanpy plotting functions to display scored gene groups and group them nicely. Returnsplot_args
, a dictionary of the values that can be used with scanpy plotting as**plot_args
.Input:
- adata
AnnData
Point the function to the
.uns['drug2cell']
slot computed by thescore()
function earlier. It’s required to remove gene groups that were not represented in the data.- targets
dict
of lists ofstr
, optional (default:None
) The gene groups to evaluate. Can be targets of known drugs, GO terms, pathway memberships, anything you can assign genes to. If
None
, will load the ChEMBL-derived drug target sets distributed with the package. Must be thenested=True
version of the input as described in the score function.- categories
str
or list ofstr
, optional (default:None
) If
targets=None
ornested=True
, this argument can be used to subset the gene groups to one or more categories (keys of the original dictionary). In case of the ChEMBL drug targets, these are ATC level 1/level 2 category codes.
- adata
- drug2cell.chembl.filter_activities(dataframe, drug_max_phase=None, add_drug_mechanism=True, assay_type=None, remove_inactive=True, include_active=True, pchembl_target_column=None, pchembl_threshold=None)
Perform a sequential set of filtering operations on the provided ChEMBL data frame. The order of the filters matches the order of the arguments in the input description. Returns a data frame with the rows fulfilling the resulting criteria.
Input
- dataframe
pd.DataFrame
The ChEMBL data frame to perform filtering operations on.
- drug_max_phase
int
or list ofint
, optional (default:None
) Subset the data frame to drugs in the provided clinical stages:
Phase 1: Testing of drug on healthy volunteers for dose-ranging
Phase 2: Initial testing of drug on patients to assess efficacy and safety
Phase 3: Testing of drug on patients to assess efficacy, effectiveness and safety (larger test group)
Phase 4: approved
- add_drug_mechanism
bool
, optional (default:True
) Grant subsequent filtering immunity to rows with drug mechanism information present.
- assay_type
str
or list ofstr
, optional (default:None
) Subset the data frame based on assay type information:
Binding (B) - Data measuring binding of compound to a molecular target, e.g. Ki, IC50, Kd.
Functional (F) - Data measuring the biological effect of a compound, e.g. %cell death in a cell line, rat weight.
ADMET (A) - ADME data e.g. t1/2, oral bioavailability.
Toxicity (T) - Data measuring toxicity of a compound, e.g., cytotoxicity.
Physicochemical (P) - Assays measuring physicochemical properties of the compounds in the absence of biological material e.g., chemical stability, solubility.
Unclassified (U) - A small proportion of assays cannot be classified into one of the above categories e.g., ratio of binding vs efficacy.
- remove_inactive
bool
, optional (default:True
) Subset the data frame to remove inactive drug-target interactions.
- include_active
bool
, optional (default:True
) Grant subsequent filtering immunity to active drug-target interactions.
- pchembl_target_column
str
, optional (default:None
) Use the selected column in the data frame to dictate custom pChEMBL thresholds for each unique value in the column.
- pchembl_threshold
float
ordict
offloat
, optional (default:None
) Subset the data frame to this pChEMBL minimum. If a single
float
, use that value. If adict
provided in conjunction with apchembl_target_column
, have the unique values of the specified column as keys of the dictionary, with entries being the desired threshold for that category.
- dataframe