Add method to calculate embeddings for variable by distance aggregation#807
Add method to calculate embeddings for variable by distance aggregation#807
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #807 +/- ##
==========================================
- Coverage 69.99% 69.75% -0.24%
==========================================
Files 39 40 +1
Lines 5525 5561 +36
Branches 1029 1037 +8
==========================================
+ Hits 3867 3879 +12
- Misses 1363 1387 +24
Partials 295 295
|
for more information, see https://pre-commit.ci
…se/squidpy into var_by_distance_clustering
|
hi @LLehner , thank you for this, would you mind elaborating a bit when this would be used? also, what if the embedding are pre-calculated, or the user would like to use something other than the UMAP, should that be an option? finally, I think a test would be required before we get this in, thanks! |
for more information, see https://pre-commit.ci
|
Hey @giovp, this feature was coming out of a discussion with @maiiashulman. We ran into a situation in which the "literature-curated" signature for hypoxia was either 20 or 4000 genes, the latter obviously being useless. So we wondered which other genes maybe show the same spatially variable pattern as a function of distance to a certain cell-type (e.g. epithelial). This is essentially a graphical method to see if a given set of genes (f.e. the 20 gene signature) even varies in a similar pattern. But I agree with your points; if we see that it's actually doing something useful, we should make it a bit more flexible. |
for more information, see https://pre-commit.ci
…se/squidpy into var_by_distance_clustering
…se/squidpy into var_by_distance_clustering
…se/squidpy into var_by_distance_clustering
for more information, see https://pre-commit.ci
|
@timtreis this function now returns an Additionally the question is whether a The function call would change from: |
for more information, see https://pre-commit.ci
Description
Adds a method in
toolsto calculate embeddings of variables by their counts aggregated by distance.Example usage
import squidpy as sqload example data set
adata = sq.datasets.seqfish()Calculate distances of each observation to a specified anchor point (e.g. cell type or tissue location). Here we use cell type "Endothelium" in the annotation column "celltype_mapped_refined":
sq.tl.var_by_distance(adata, groups="Endothelium", cluster_key="celltype_mapped_refined")The resulting distances are stored in
adata.obsm["design_matrix"]. Now we can calculate the embeddings, which are returned as a new anndata object:adata_new = sq.tl.var_embeddings(adata, group="Endothelium", design_matrix_key="design_matrix")Note that by default the bin of distance 0, meaning the counts that belong to the anchor point, are excluded. This can be changed by setting
include_anchor=Trueinsq.tl.var_embeddings().adata_new.X contains the aggregated var x distance_bin count matrix.
adata_new.obs contains the variables as a categorical matrix, which is required to highlight them in plots.
TODO