STAGATE – deciphering spatial domains from spatially resolved transcriptomics
Installation (tensorflow1 framework)
Software dependencies
scanpy
tensorflow==1.15.0
The use of the mclust algorithm requires the rpy2 package and the mclust package. See https://pypi.org/project/rpy2/ and https://cran.r-project.org/web/packages/mclust/index.html for detail.
Installation
Downloading STAGATE code from https://github.com/QIFEIDKN/STGATE
cd STAGATE-main
python setup.py build
python setup.py install
import STAGATE
Installation (pyG framework)
Software dependencies
scanpy
pytorch
pyG
The use of the mclust algorithm requires the rpy2 package and the mclust package. See https://pypi.org/project/rpy2/ and https://cran.r-project.org/web/packages/mclust/index.html for detail.
The cell type-aware module has not been supported by STAGATE_pyG yet.
Installation
Downloading STAGATE_pyG code from https://github.com/QIFEIDKN/STGATE_pyG
cd STAGATE_pyG-main
python setup.py build
python setup.py install
import STAGATE_pyG
Tutorial 1: 10x Visium (DLPFC dataset)
Here we present our re-analysis of 151676 sample of the dorsolateral prefrontal cortex (DLPFC) dataset. Maynard et al. has manually annotated DLPFC layers and white matter (WM) based on the morphological features and gene markers.
This tutorial demonstrates how to identify spatial domains on 10x Visium data using STAGATE. The processed data are available at https://github.com/LieberInstitute/spatialLIBD. We downloaded the manual annotation from the spatialLIBD package and provided at https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing.
Preparation
[1]:
import warnings
warnings.filterwarnings("ignore")
[2]:
import pandas as pd
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import os
import sys
[3]:
from sklearn.metrics.cluster import adjusted_rand_score
[4]:
import STAGATE
[5]:
# the location of R (used for the mclust clustering)
os.environ['R_HOME'] = 'D:\Program Files\R\R-4.0.3'
os.environ['R_USER'] = 'D:\ProgramData\Anaconda3\Lib\site-packages\rpy2'
[6]:
section_id = '151676'
[7]:
input_dir = os.path.join('Data', section_id)
adata = sc.read_visium(path=input_dir, count_file=section_id+'_filtered_feature_bc_matrix.h5')
adata.var_names_make_unique()
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
[8]:
adata
[8]:
AnnData object with n_obs × n_vars = 3460 × 33538
obs: 'in_tissue', 'array_row', 'array_col'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatial'
obsm: 'spatial'
[9]:
#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
[10]:
# read the annotation
Ann_df = pd.read_csv(os.path.join('Data', section_id, section_id+'_truth.txt'), sep='\t', header=None, index_col=0)
Ann_df.columns = ['Ground Truth']
[11]:
adata.obs['Ground Truth'] = Ann_df.loc[adata.obs_names, 'Ground Truth']
[12]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.spatial(adata, img_key="hires", color=["Ground Truth"])
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical

Constructing the spatial network
[13]:
STAGATE.Cal_Spatial_Net(adata, rad_cutoff=150)
STAGATE.Stats_Spatial_Net(adata)
------Calculating spatial graph...
The graph contains 20052 edges, 3460 cells.
5.7954 neighbors per cell on average.

Running STAGATE
[14]:
adata = STAGATE.train_STAGATE(adata, alpha=0)
Size of Input: (3460, 3000)
WARNING:tensorflow:From D:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\ops\math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
100%|████████████████████████████████████████████████████████████████████████████████| 500/500 [00:28<00:00, 17.60it/s]
[15]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
adata = STAGATE.mclust_R(adata, used_obsm='STAGATE', num_cluster=7)
[16]:
obs_df = adata.obs.dropna()
ARI = adjusted_rand_score(obs_df['mclust'], obs_df['Ground Truth'])
print('Adjusted rand index = %.2f' %ARI)
Adjusted rand index = 0.60
[17]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color=["mclust", "Ground Truth"], title=['STAGATE (ARI=%.2f)'%ARI, "Ground Truth"])

[18]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.spatial(adata, color=["mclust", "Ground Truth"], title=['STAGATE (ARI=%.2f)'%ARI, "Ground Truth"])

Spatial trajectory inference (PAGA)
[19]:
used_adata = adata[adata.obs['Ground Truth']!='nan',]
used_adata
[19]:
View of AnnData object with n_obs × n_vars = 3431 × 33538
obs: 'in_tissue', 'array_row', 'array_col', 'Ground Truth', 'mclust'
var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
uns: 'spatial', 'hvg', 'log1p', 'Ground Truth_colors', 'Spatial_Net', 'neighbors', 'umap', 'mclust_colors'
obsm: 'spatial', 'STAGATE', 'X_umap'
obsp: 'distances', 'connectivities'
[20]:
sc.tl.paga(used_adata, groups='Ground Truth')
Trying to set attribute `.uns` of view, copying.
[21]:
plt.rcParams["figure.figsize"] = (4,3)
sc.pl.paga_compare(used_adata, legend_fontsize=10, frameon=False, size=20,
title=section_id+'_STGATE', legend_fontoutline=2, show=False)
[21]:
[<matplotlib.axes._axes.Axes at 0x183007d1e48>,
<matplotlib.axes._axes.Axes at 0x18300411f98>]

Tutorial 2: The usage of cell type-aware module for 10x Visium
To better characterize the spatial similarity at the boundary of spatial domains, STAGATE adopts an attention mechanism to adaptively learn the similarity of neighboring spots, and an optional cell type-aware module through integrating the pre-clustering of gene expressions.
This tutorial demonstrates how to identify spatial domains on 10x Visium data using STAGATE with the cell type-aware module.
In this tutorial, we foucs on the Adult Mouse Brain Section 1 (Coronal) from 10x genomic website (https://www.10xgenomics.com/resources/datasets/adult-mouse-brain-section-1-coronal-stains-dapi-anti-neu-n-1-standard-1-1-0).
Preparation
[1]:
import warnings
warnings.filterwarnings("ignore")
[2]:
import pandas as pd
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import os
import sys
[3]:
import STAGATE
[4]:
adata = sc.read_visium(path='Data', count_file='V1_Adult_Mouse_Brain_Coronal_Section_1_filtered_feature_bc_matrix.h5')
adata.var_names_make_unique()
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
[5]:
adata
[5]:
AnnData object with n_obs × n_vars = 2903 × 32285
obs: 'in_tissue', 'array_row', 'array_col'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatial'
obsm: 'spatial'
[6]:
#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
[7]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.spatial(adata, img_key="hires")
... storing 'feature_types' as categorical
... storing 'genome' as categorical

Constructing the spatial network
[8]:
STAGATE.Cal_Spatial_Net(adata, rad_cutoff=300)
STAGATE.Stats_Spatial_Net(adata)
------Calculating spatial graph...
The graph contains 16982 edges, 2903 cells.
5.8498 neighbors per cell on average.

Running STAGATE with cell type-aware module
Parameter alpha is the weight of the cell type-aware spatial network, and pre_resolution is the resolution parameter of pre-clustering.
[9]:
adata = STAGATE.train_STAGATE(adata, alpha=0.5, pre_resolution=0.2,
n_epochs=1000, save_attention=True)
Size of Input: (2903, 3000)
WARNING:tensorflow:From D:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\ops\math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
------Pre-clustering using louvain with resolution=0.20
0%| | 0/1000 [00:00<?, ?it/s]
------Pruning the graph...
16982 edges before pruning.
15058 edges after pruning.
100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:54<00:00, 18.26it/s]
[10]:
# pre-clustering result
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.spatial(adata, img_key="hires", color="expression_louvain_label", size=1.5, title='pre-clustering result')

[11]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
sc.tl.louvain(adata, resolution=1)
[12]:
adata.uns['louvain_colors']=['#aec7e8', '#9edae5', '#d62728', '#dbdb8d', '#ff9896',
'#8c564b', '#696969', '#778899', '#17becf', '#ffbb78',
'#e377c2', '#98df8a', '#aa40fc', '#c5b0d5', '#c49c94',
'#f7b6d2', '#279e68', '#b5bd61', '#ad494a', '#8c6d31',
'#1f77b4', '#ff7f0e']
[13]:
plt.rcParams["figure.figsize"] = (4, 4)
sc.pl.spatial(adata, img_key="hires", color="louvain", size=1.5)

[14]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color="louvain", legend_loc='on data', s=10)

Visualization of the attention layer
[15]:
import matplotlib as mpl
import networkx as nx
[16]:
att_df = pd.DataFrame(adata.uns['STAGATE_attention'][0].toarray(), index=adata.obs_names, columns=adata.obs_names)
att_df = att_df.values
for it in range(att_df.shape[0]):
att_df[it, it] = 0
[17]:
G_atten = nx.from_numpy_matrix(att_df)
M = G_atten.number_of_edges()
edge_colors = range(2, M + 2)
[18]:
coor_df = pd.DataFrame(adata.obsm['spatial'].copy(), index=adata.obs_names)
coor_df[1] = -1 * coor_df[1]
image_pos = dict(zip(range(coor_df.shape[0]), [np.array(coor_df.iloc[it,]) for it in range(coor_df.shape[0])]))
[19]:
labels = nx.get_edge_attributes(G_atten,'weight')
[20]:
fig, ax = plt.subplots(figsize=[9,10])
nx.draw_networkx_nodes(G_atten, image_pos, node_size=5, ax=ax)
cmap = plt.cm.plasma
edges = nx.draw_networkx_edges(G_atten, image_pos, edge_color=labels.values(),width=4, ax=ax,
edge_cmap=cmap,edge_vmax=0.25,edge_vmin=0.05)
ax = plt.gca()
sm = plt.cm.ScalarMappable(cmap=cmap, norm=plt.Normalize(vmin = 0.05, vmax=0.25))
sm._A = []
plt.colorbar(sm)
ax.set_axis_off()

Running STAGATE without cell type-aware module (for comparison)
[21]:
adata = STAGATE.train_STAGATE(adata, alpha=0, n_epochs=1000, save_attention=True)
Size of Input: (2903, 3000)
100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:47<00:00, 20.97it/s]
[22]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
sc.tl.louvain(adata, resolution=1)
[23]:
adata.uns['louvain_colors']=['#aec7e8', '#9edae5', '#d62728', '#dbdb8d', '#ff9896',
'#8c564b', '#696969', '#778899', '#17becf', '#ffbb78',
'#e377c2', '#98df8a', '#aa40fc', '#c5b0d5', '#c49c94',
'#f7b6d2', '#279e68', '#b5bd61', '#ad494a', '#8c6d31',
'#1f77b4', '#ff7f0e']
[24]:
plt.rcParams["figure.figsize"] = (4, 4)
sc.pl.spatial(adata, img_key="hires", color="louvain", size=1.5)

[25]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color="louvain", legend_loc='on data', s=10)

Visualization of the attention layer (without cell type-aware module)
[26]:
att_df = pd.DataFrame(adata.uns['STAGATE_attention'][0].toarray(), index=adata.obs_names, columns=adata.obs_names)
att_df = att_df.values
for it in range(att_df.shape[0]):
att_df[it, it] = 0
[27]:
G_atten = nx.from_numpy_matrix(att_df)
M = G_atten.number_of_edges()
edge_colors = range(2, M + 2)
[28]:
coor_df = pd.DataFrame(adata.obsm['spatial'].copy(), index=adata.obs_names)
coor_df[1] = -1 * coor_df[1]
image_pos = dict(zip(range(coor_df.shape[0]), [np.array(coor_df.iloc[it,]) for it in range(coor_df.shape[0])]))
[29]:
labels = nx.get_edge_attributes(G_atten,'weight')
[30]:
fig, ax = plt.subplots(figsize=[9,10])
nx.draw_networkx_nodes(G_atten, image_pos, node_size=5, ax=ax)
cmap = plt.cm.plasma
edges = nx.draw_networkx_edges(G_atten, image_pos, edge_color=labels.values(),width=4, ax=ax,
edge_cmap=cmap,edge_vmax=0.25,edge_vmin=0.05)
ax = plt.gca()
sm = plt.cm.ScalarMappable(cmap=cmap, norm=plt.Normalize(vmin = 0.05, vmax=0.25))
sm._A = []
plt.colorbar(sm)
ax.set_axis_off()

Tutorial 3: Slide-seqV2 mouse olfactory bulb
This tutorial demonstrates how to identify spatial domains on Slide-seqV2 data.
In this tutorial, we foucs on the Slide-seqV2 mouse olfactory bulb data (Puck_200127_15 from https://singlecell.broadinstitute.org/single_cell/study/SCP815/highly-sensitive-spatial-transcriptomics-at-near-cellular-resolution-with-slide-seqv2#study-summary).
We removed spots outside the main tissue area, and the used spots can be downloaded from https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing.
Preparation
[1]:
import warnings
warnings.filterwarnings("ignore")
[2]:
import pandas as pd
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import os
import sys
[3]:
import STAGATE
[4]:
input_dir = 'Data'
counts_file = os.path.join(input_dir, 'Puck_200127_15.digital_expression.txt')
coor_file = os.path.join(input_dir, 'Puck_200127_15_bead_locations.csv')
[5]:
counts = pd.read_csv(counts_file, sep='\t', index_col=0)
coor_df = pd.read_csv(coor_file, index_col=0)
print(counts.shape, coor_df.shape)
(21220, 21724) (21724, 2)
[6]:
adata = sc.AnnData(counts.T)
adata.var_names_make_unique()
coor_df = coor_df.loc[adata.obs_names, ['xcoord', 'ycoord']]
adata.obsm["spatial"] = coor_df.to_numpy()
[7]:
sc.pp.calculate_qc_metrics(adata, inplace=True)
[8]:
adata
[8]:
AnnData object with n_obs × n_vars = 21724 × 21220
obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes'
var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
obsm: 'spatial'
[9]:
plt.rcParams["figure.figsize"] = (6,5)
sc.pl.embedding(adata, basis="spatial", color="log1p_total_counts",s=6, show=False)
plt.title('')
plt.axis('off')
[9]:
(-289.81710000000004, 6312.7151, 173.30850000000004, 5709.8615)

[ ]:
# can be downloaded from https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing
used_barcode = pd.read_csv('Data/used_barcodes.txt', sep='\t', header=None)
used_barcode = used_barcode[0]
[11]:
adata = adata[used_barcode,]
[12]:
plt.rcParams["figure.figsize"] = (5,5)
sc.pl.embedding(adata, basis="spatial", color="log1p_total_counts",s=10, show=False, title='Removing spots outside the main tissue area')
plt.axis('off')
[12]:
(588.545, 5108.555, 847.6700000000001, 5670.73)

[13]:
sc.pp.filter_genes(adata, min_cells=50)
print('After flitering: ', adata.shape)
Trying to set attribute `.var` of view, copying.
After flitering: (20139, 11750)
[14]:
#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
Constructing the spatial network
[15]:
STAGATE.Cal_Spatial_Net(adata, rad_cutoff=50)
STAGATE.Stats_Spatial_Net(adata)
------Calculating spatial graph...
The graph contains 228288 edges, 20139 cells.
11.3356 neighbors per cell on average.

Running STAGATE
[16]:
adata = STAGATE.train_STAGATE(adata, alpha=0)
Size of Input: (20139, 3000)
WARNING:tensorflow:From D:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\ops\math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
100%|████████████████████████████████████████████████████████████████████████████████| 500/500 [14:09<00:00, 1.70s/it]
[17]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
[18]:
sc.tl.louvain(adata, resolution=0.5)
[19]:
adata.obsm["spatial"] = adata.obsm["spatial"] * (-1)
[20]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis="spatial", color="louvain",s=6, show=False, title='STAGATE')
plt.axis('off')
[20]:
(-5108.555, -588.545, -5670.73, -847.6700000000001)

[21]:
sc.pl.umap(adata, color='louvain', title='STAGATE')

SCANPY results (for comparison)
[22]:
sc.pp.pca(adata, n_comps=30)
[23]:
sc.pp.neighbors(adata, use_rep='X_pca')
sc.tl.louvain(adata, resolution=0.5)
sc.tl.umap(adata)
[24]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis="spatial", color="louvain",s=6, show=False, title='SCANPY')
plt.axis('off')
[24]:
(-5108.555, -588.545, -5670.73, -847.6700000000001)

[25]:
sc.pl.umap(adata, color='louvain', title='SCANPY')

Tutorial 4: Stereo-seq mouse olfactory bulb
This tutorial demonstrates how to identify spatial domains on Stereo-seq data.
In this tutorial, we foucs on the Stereo-seq mouse olfactory bulb data (https://github.com/JinmiaoChenLab/SEDR_analyses/).
We removed spots outside the main tissue area, and the used spots can be downloaded from https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing.
Preparation
[1]:
import warnings
warnings.filterwarnings("ignore")
[2]:
import pandas as pd
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import os
import sys
[3]:
import STAGATE
[4]:
counts_file = os.path.join('Data/RNA_counts.tsv')
coor_file = os.path.join('Data/position.tsv')
[5]:
counts = pd.read_csv(counts_file, sep='\t', index_col=0)
coor_df = pd.read_csv(coor_file, sep='\t')
print(counts.shape, coor_df.shape)
(27106, 19527) (19527, 3)
[6]:
counts.columns = ['Spot_'+str(x) for x in counts.columns]
coor_df.index = coor_df['label'].map(lambda x: 'Spot_'+str(x))
coor_df = coor_df.loc[:, ['x','y']]
[7]:
coor_df.head()
[7]:
x | y | |
---|---|---|
label | ||
Spot_1 | 12555.007833 | 6307.537859 |
Spot_2 | 12623.666667 | 6297.166667 |
Spot_3 | 12589.567164 | 6302.552239 |
Spot_4 | 12642.495050 | 6307.386139 |
Spot_5 | 13003.333333 | 6307.990991 |
[8]:
adata = sc.AnnData(counts.T)
adata.var_names_make_unique()
[9]:
adata
[9]:
AnnData object with n_obs × n_vars = 19527 × 27106
[10]:
coor_df = coor_df.loc[adata.obs_names, ['y', 'x']]
adata.obsm["spatial"] = coor_df.to_numpy()
sc.pp.calculate_qc_metrics(adata, inplace=True)
[11]:
plt.rcParams["figure.figsize"] = (5,4)
sc.pl.embedding(adata, basis="spatial", color="n_genes_by_counts", show=False)
plt.title("")
plt.axis('off')
[11]:
(6002.432692307693, 12486.580128205129, 9908.545833333334, 15086.093055555555)

[12]:
used_barcode = pd.read_csv(os.path.join('Data/used_barcodes.txt'), sep='\t', header=None)
used_barcode = used_barcode[0]
adata = adata[used_barcode,]
[13]:
adata
[13]:
View of AnnData object with n_obs × n_vars = 19109 × 27106
obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes'
var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
obsm: 'spatial'
[14]:
plt.rcParams["figure.figsize"] = (5,4)
sc.pl.embedding(adata, basis="spatial", color="n_genes_by_counts", show=False)
plt.title("")
plt.axis('off')
[14]:
(6005.190789473685, 12428.6600877193, 9986.774763741741, 15062.302776957436)

[15]:
sc.pp.filter_genes(adata, min_cells=50)
print('After flitering: ', adata.shape)
Trying to set attribute `.var` of view, copying.
After flitering: (19109, 14376)
[16]:
#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
Constructing the spatial network
[17]:
STAGATE.Cal_Spatial_Net(adata, rad_cutoff=50)
STAGATE.Stats_Spatial_Net(adata)
------Calculating spatial graph...
The graph contains 144318 edges, 19109 cells.
7.5524 neighbors per cell on average.

Running STAGATE
[18]:
adata = STAGATE.train_STAGATE(adata, alpha=0)
Size of Input: (19109, 3000)
WARNING:tensorflow:From D:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\ops\math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
100%|████████████████████████████████████████████████████████████████████████████████| 500/500 [12:33<00:00, 1.51s/it]
[19]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
[20]:
sc.tl.louvain(adata, resolution=0.8)
[21]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis="spatial", color="louvain",s=6, show=False, title='STAGATE')
plt.axis('off')
[21]:
(6005.190789473685, 12428.6600877193, 9986.774763741741, 15062.302776957436)

[22]:
sc.pl.umap(adata, color='louvain', title='STAGATE')

SCANPY results (for comparison)
[23]:
sc.pp.pca(adata, n_comps=30)
[24]:
sc.pp.neighbors(adata, use_rep='X_pca')
sc.tl.louvain(adata, resolution=0.8)
sc.tl.umap(adata)
[25]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis="spatial", color="louvain",s=6, show=False, title='SCANPY')
plt.axis('off')
[25]:
(6005.190789473685, 12428.6600877193, 9986.774763741741, 15062.302776957436)

[26]:
sc.pl.umap(adata, color='louvain', title='SCANPY')

[ ]:
Tutorial 5: 3D spatial domain identification
We extended STAGATE for 3D spatial domain identification by simultaneously considering the 2D SNN within each section and neighboring spots between adjacent section. The key idea of the usage of 3D SNN is that the biological differences between consecutive sections should be continuous, so we can enhance the similarity between adjacent sections to eliminate the discontinuous independent technical noises.
In this tutorial, we applied STAGATE onto a pseudo 3D ST data constructed by aligning the spots of the “cord-like” structure in seven hippocampus sections profiled by Slide-seq.
The original Slide-seq can be downloaded from https://portals.broadinstitute.org/single_cell/study/slide-seq-study. We provided the processed data and aligned coordinates at https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing.
Preparation
[1]:
import warnings
warnings.filterwarnings("ignore")
[2]:
import pandas as pd
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import os
import sys
[3]:
import STAGATE
[4]:
data = pd.read_csv('Data/3D_Hippo_expression.txt', sep='\t', index_col=0)
Aligned_coor = pd.read_csv('Data/ICP_Align_Coor.txt', sep='\t', index_col=0)
[5]:
Aligned_coor.head()
[5]:
X | Y | Z | Section | |
---|---|---|---|---|
AAAAACAACCAAT-13 | 3946.072727 | 3599.436364 | 6000 | Puck_180531_13 |
AAAAGTCCATTAT-13 | 2401.628788 | 3550.333333 | 6000 | Puck_180531_13 |
AAACAACGCGCGA-13 | 2921.968153 | 3353.687898 | 6000 | Puck_180531_13 |
AAACAATTCAATA-13 | 2711.556338 | 3365.542254 | 6000 | Puck_180531_13 |
AAACACGCTGCCC-13 | 2351.863354 | 4265.447205 | 6000 | Puck_180531_13 |
[6]:
adata = sc.AnnData(data)
adata
[6]:
AnnData object with n_obs × n_vars = 10908 × 9420
[7]:
#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
[8]:
# loading metadata and aligned coordinates
adata.obs['X'] = Aligned_coor.loc[adata.obs_names, 'X']
adata.obs['Y'] = Aligned_coor.loc[adata.obs_names, 'Y']
adata.obs['Z'] = Aligned_coor.loc[adata.obs_names, 'Z']
adata.obs['Section_id'] = Aligned_coor.loc[adata.obs_names, 'Section']
[9]:
# loading the spatial locations
adata.obsm['spatial'] = adata.obs.loc[:, ['X', 'Y']].values
[10]:
section_colors = ['#02899A', '#0E994D', '#86C049', '#FBB21F', '#F48022', '#DA5326', '#BA3326']
[11]:
fig = plt.figure(figsize=(4, 4))
ax1 = plt.axes(projection='3d')
for it, label in enumerate(np.unique(adata.obs['Section_id'])):
temp_Coor = adata.obs.loc[adata.obs['Section_id']==label, :]
temp_xd = temp_Coor['X']
temp_yd = temp_Coor['Y']
temp_zd = temp_Coor['Z']
ax1.scatter3D(temp_xd, temp_yd, temp_zd, c=section_colors[it],s=0.2, marker="o", label=label)
ax1.set_xlabel('')
ax1.set_ylabel('')
ax1.set_zlabel('')
ax1.set_xticklabels([])
ax1.set_yticklabels([])
ax1.set_zticklabels([])
plt.legend(bbox_to_anchor=(1,0.8), markerscale=10, frameon=False)
ax1.elev = 45
ax1.azim = -20
plt.show()

Constructing 3D spatial networks
[12]:
section_order = ['Puck_180531_13', 'Puck_180531_16', 'Puck_180531_17',
'Puck_180531_18', 'Puck_180531_19', 'Puck_180531_22',
'Puck_180531_23']
[13]:
STAGATE.Cal_Spatial_Net_3D(adata, rad_cutoff_2D=50, rad_cutoff_Zaxis=50,
key_section='Section_id', section_order = section_order, verbose=True)
Radius used for 2D SNN: 50
Radius used for SNN between sections: 50
------Calculating 2D SNN of section Puck_180531_13
Trying to set attribute `.uns` of view, copying.
This graph contains 22974 edges, 1692 cells.
13.5780 neighbors per cell on average.
------Calculating 2D SNN of section Puck_180531_16
Trying to set attribute `.uns` of view, copying.
This graph contains 20546 edges, 1476 cells.
13.9201 neighbors per cell on average.
------Calculating 2D SNN of section Puck_180531_17
Trying to set attribute `.uns` of view, copying.
This graph contains 11794 edges, 1138 cells.
10.3638 neighbors per cell on average.
------Calculating 2D SNN of section Puck_180531_18
Trying to set attribute `.uns` of view, copying.
This graph contains 18278 edges, 1363 cells.
13.4101 neighbors per cell on average.
------Calculating 2D SNN of section Puck_180531_19
Trying to set attribute `.uns` of view, copying.
This graph contains 22344 edges, 1788 cells.
12.4966 neighbors per cell on average.
------Calculating 2D SNN of section Puck_180531_22
Trying to set attribute `.uns` of view, copying.
This graph contains 24544 edges, 1835 cells.
13.3755 neighbors per cell on average.
------Calculating 2D SNN of section Puck_180531_23
Trying to set attribute `.uns` of view, copying.
This graph contains 18922 edges, 1616 cells.
11.7092 neighbors per cell on average.
------Calculating SNN between adjacent section Puck_180531_13 and Puck_180531_16.
Trying to set attribute `.uns` of view, copying.
This graph contains 34482 edges, 3168 cells.
10.8845 neighbors per cell on average.
------Calculating SNN between adjacent section Puck_180531_16 and Puck_180531_17.
Trying to set attribute `.uns` of view, copying.
This graph contains 23078 edges, 2614 cells.
8.8286 neighbors per cell on average.
------Calculating SNN between adjacent section Puck_180531_17 and Puck_180531_18.
Trying to set attribute `.uns` of view, copying.
This graph contains 25284 edges, 2501 cells.
10.1096 neighbors per cell on average.
------Calculating SNN between adjacent section Puck_180531_18 and Puck_180531_19.
Trying to set attribute `.uns` of view, copying.
This graph contains 24944 edges, 3151 cells.
7.9162 neighbors per cell on average.
------Calculating SNN between adjacent section Puck_180531_19 and Puck_180531_22.
Trying to set attribute `.uns` of view, copying.
This graph contains 26128 edges, 3623 cells.
7.2117 neighbors per cell on average.
------Calculating SNN between adjacent section Puck_180531_22 and Puck_180531_23.
Trying to set attribute `.uns` of view, copying.
This graph contains 21110 edges, 3451 cells.
6.1171 neighbors per cell on average.
3D SNN contains 294428 edges, 10908 cells.
26.9919 neighbors per cell on average.
Running STAGATE with 3D spatial networks
[14]:
adata = STAGATE.train_STAGATE(adata, alpha=0)
Size of Input: (10908, 3000)
WARNING:tensorflow:From /home/xzhou/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2022-04-29 21:26:31.725355: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-04-29 21:26:31.748905: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3499910000 Hz
2022-04-29 21:26:31.750571: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cb33581890 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-04-29 21:26:31.750614: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2022-04-29 21:26:31.754849: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-04-29 21:26:32.055109: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cb35070bb0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-04-29 21:26:32.055162: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2022-04-29 21:26:32.055175: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2022-04-29 21:26:32.057154: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:65:00.0
2022-04-29 21:26:32.057729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:b3:00.0
2022-04-29 21:26:32.057850: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2022-04-29 21:26:32.057894: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2022-04-29 21:26:32.057927: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2022-04-29 21:26:32.057959: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2022-04-29 21:26:32.057992: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2022-04-29 21:26:32.058026: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2022-04-29 21:26:32.058057: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2022-04-29 21:26:32.058061: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-04-29 21:26:32.058076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-04-29 21:26:32.058080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 1
2022-04-29 21:26:32.058083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N N
2022-04-29 21:26:32.058086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1: N N
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [08:42<00:00, 1.05s/it]
[15]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
[16]:
num_cluster = 4
adata = STAGATE.mclust_R(adata, num_cluster, used_obsm='STAGATE')
R[write to console]: Package 'mclust' version 5.4.7
Type 'citation("mclust")' for citing this R package in publications.
[17]:
adata.uns['Section_id_colors'] = ['#02899A', '#0E994D', '#86C049', '#FBB21F', '#F48022', '#DA5326', '#BA3326']
[18]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color=['mclust', 'Section_id'])
... storing 'Section_id' as categorical

[19]:
fig = plt.figure(figsize=(4, 4))
ax1 = plt.axes(projection='3d')
for it, label in enumerate(np.unique(adata.obs['mclust'])):
temp_Coor = adata.obs.loc[adata.obs['mclust']==label, :]
temp_xd = temp_Coor['X']
temp_yd = temp_Coor['Y']
temp_zd = temp_Coor['Z']
ax1.scatter3D(temp_xd, temp_yd, temp_zd, c=adata.uns['mclust_colors'][it],s=0.2, marker="o", label=label)
ax1.set_xlabel('')
ax1.set_ylabel('')
ax1.set_zlabel('')
ax1.set_xticklabels([])
ax1.set_yticklabels([])
ax1.set_zticklabels([])
plt.legend(bbox_to_anchor=(1.2,0.8), markerscale=10, frameon=False)
plt.title('STAGATE-3D')
ax1.elev = 45
ax1.azim = -20
plt.show()

Running STAGATE with 2D spatial networks (for comparison)
The 2D spatial networks are saved at adata.uns (‘Spatial_Net_2D’). For camparison, we replace the 3D spatial networks with 2D spatial network and run STAGATE.
[20]:
adata_2D = adata.copy()
[21]:
adata_2D.uns['Spatial_Net'] = adata.uns['Spatial_Net_2D'].copy()
[22]:
adata_2D = STAGATE.train_STAGATE(adata_2D, alpha=0)
Size of Input: (10908, 3000)
2022-04-29 21:36:00.700666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:65:00.0
2022-04-29 21:36:00.701352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:b3:00.0
2022-04-29 21:36:00.701516: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: 无法打开共享对象文件: 没有那个文件或目录
2022-04-29 21:36:00.701577: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: 无法打开共享对象文件: 没有那个文件或目录
2022-04-29 21:36:00.701612: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: 无法打开共享对象文件: 没有那个文件或目录
2022-04-29 21:36:00.701646: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: 无法打开共享对象文件: 没有那个文件或目录
2022-04-29 21:36:00.701681: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: 无法打开共享对象文件: 没有那个文件或目录
2022-04-29 21:36:00.701713: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: 无法打开共享对象文件: 没有那个文件或目录
2022-04-29 21:36:00.701747: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: 无法打开共享对象文件: 没有那个文件或目录
2022-04-29 21:36:00.701752: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-04-29 21:36:00.701779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-04-29 21:36:00.701783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 1
2022-04-29 21:36:00.701787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N N
2022-04-29 21:36:00.701789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1: N N
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [08:15<00:00, 1.01it/s]
[23]:
sc.pp.neighbors(adata_2D, use_rep='STAGATE')
sc.tl.umap(adata_2D)
[24]:
num_cluster = 4
adata_2D = STAGATE.mclust_R(adata_2D, num_cluster, used_obsm='STAGATE')
[25]:
adata_2D.uns['Section_id_colors'] = ['#02899A', '#0E994D', '#86C049', '#FBB21F', '#F48022', '#DA5326', '#BA3326']
[26]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata_2D, color=['mclust', 'Section_id'])

[27]:
fig = plt.figure(figsize=(4, 4))
ax1 = plt.axes(projection='3d')
for it, label in enumerate(np.unique(adata_2D.obs['mclust'])):
temp_Coor = adata_2D.obs.loc[adata_2D.obs['mclust']==label, :]
temp_xd = temp_Coor['X']
temp_yd = temp_Coor['Y']
temp_zd = temp_Coor['Z']
ax1.scatter3D(temp_xd, temp_yd, temp_zd, c=adata_2D.uns['mclust_colors'][it],s=0.2, marker="o", label=label)
ax1.set_xlabel('')
ax1.set_ylabel('')
ax1.set_zlabel('')
ax1.set_xticklabels([])
ax1.set_yticklabels([])
ax1.set_zticklabels([])
plt.legend(bbox_to_anchor=(1.2,0.8), markerscale=10, frameon=False)
plt.title('STAGATE-2D')
ax1.elev = 45
ax1.azim = -20
plt.show()

STAGATE-2D vs STAGATE-3D
[28]:
fig, axs = plt.subplots(2, 2, figsize=(8,8))
sc.pl.umap(adata_2D, color='mclust', title='mclust (STAGATE-2D)', show=False, ax=axs[0,0])
sc.pl.umap(adata_2D, color='Section_id', title='Section_id (STAGATE-2D)', show=False, ax=axs[0,1])
sc.pl.umap(adata, color='mclust', title='mclust (STAGATE-3D)', show=False, ax=axs[1,0])
sc.pl.umap(adata, color='Section_id', title='Section_id (STAGATE-3D)', show=False, ax=axs[1,1])
[28]:
<AxesSubplot:title={'center':'Section_id (STAGATE-3D)'}, xlabel='UMAP1', ylabel='UMAP2'>

[ ]:
Tutorial 6: Denoising
This tutorial demonstrates how to denoise expressions. The data used here is the same as those in Tutorial 1 (Sample 151676 of the DLPFC dataset).
Preparation
[1]:
import warnings
warnings.filterwarnings("ignore")
[2]:
import pandas as pd
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import os
import sys
[3]:
import STAGATE
[4]:
section_id = '151676'
[5]:
input_dir = os.path.join('Data', section_id)
adata = sc.read_visium(path=input_dir, count_file=section_id+'_filtered_feature_bc_matrix.h5')
adata.var_names_make_unique()
sc.pp.calculate_qc_metrics(adata, inplace=True)
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
[6]:
adata
[6]:
AnnData object with n_obs × n_vars = 3460 × 33538
obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes'
var: 'gene_ids', 'feature_types', 'genome', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
uns: 'spatial'
obsm: 'spatial'
[7]:
adata = adata[:,adata.var['total_counts']>100]
adata
[7]:
View of AnnData object with n_obs × n_vars = 3460 × 10747
obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes'
var: 'gene_ids', 'feature_types', 'genome', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
uns: 'spatial'
obsm: 'spatial'
[8]:
#Normalization
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
[9]:
# read the annotation
Ann_df = pd.read_csv(os.path.join('Data', section_id, section_id+'_truth.txt'), sep='\t', header=None, index_col=0)
Ann_df.columns = ['Ground Truth']
[10]:
adata.obs['Ground Truth'] = Ann_df.loc[adata.obs_names, 'Ground Truth']
[11]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.spatial(adata, img_key="hires", color=["Ground Truth"])
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical

Constructing the spatial network
[12]:
STAGATE.Cal_Spatial_Net(adata, rad_cutoff=150)
STAGATE.Stats_Spatial_Net(adata)
------Calculating spatial graph...
The graph contains 20052 edges, 3460 cells.
5.7954 neighbors per cell on average.

Denoising expression using STAGATE
When performing imputation and denoising, set save_reconstrction=True. The reconstructed expressions are saved in ‘STAGATE_ReX’ of adata.layers.
[13]:
adata = STAGATE.train_STAGATE(adata, alpha=0, save_reconstrction=True)
Size of Input: (3460, 10747)
WARNING:tensorflow:From D:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\ops\math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
100%|████████████████████████████████████████████████████████████████████████████████| 500/500 [00:51<00:00, 9.69it/s]
[14]:
adata
[14]:
AnnData object with n_obs × n_vars = 3460 × 10747
obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'Ground Truth'
var: 'gene_ids', 'feature_types', 'genome', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
uns: 'spatial', 'log1p', 'Ground Truth_colors', 'Spatial_Net'
obsm: 'spatial', 'STAGATE'
layers: 'STAGATE_ReX'
[15]:
plot_gene = 'ATP2B4'
fig, axs = plt.subplots(1, 2, figsize=(8, 4))
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[0], title='RAW_'+plot_gene, vmax='p99')
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[1], title='STAGATE_'+plot_gene, layer='STAGATE_ReX', vmax='p99')
[15]:
[<AxesSubplot:title={'center':'STAGATE_ATP2B4'}, xlabel='spatial1', ylabel='spatial2'>]

[16]:
plot_gene = 'RASGRF2'
fig, axs = plt.subplots(1, 2, figsize=(8, 4))
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[0], title='RAW_'+plot_gene, vmax='p99')
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[1], title='STAGATE_'+plot_gene, layer='STAGATE_ReX', vmax='p99')
[16]:
[<AxesSubplot:title={'center':'STAGATE_RASGRF2'}, xlabel='spatial1', ylabel='spatial2'>]

[17]:
plot_gene = 'LAMP5'
fig, axs = plt.subplots(1, 2, figsize=(8, 4))
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[0], title='RAW_'+plot_gene, vmax='p99')
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[1], title='STAGATE_'+plot_gene, layer='STAGATE_ReX', vmax='p99')
[17]:
[<AxesSubplot:title={'center':'STAGATE_LAMP5'}, xlabel='spatial1', ylabel='spatial2'>]

[18]:
plot_gene = 'NEFH'
fig, axs = plt.subplots(1, 2, figsize=(8, 4))
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[0], title='RAW_'+plot_gene, vmax='p99')
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[1], title='STAGATE_'+plot_gene, layer='STAGATE_ReX', vmax='p99')
[18]:
[<AxesSubplot:title={'center':'STAGATE_NEFH'}, xlabel='spatial1', ylabel='spatial2'>]

[19]:
plot_gene = 'NTNG2'
fig, axs = plt.subplots(1, 2, figsize=(8, 4))
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[0], title='RAW_'+plot_gene, vmax='p99')
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[1], title='STAGATE_'+plot_gene, layer='STAGATE_ReX', vmax='p99')
[19]:
[<AxesSubplot:title={'center':'STAGATE_NTNG2'}, xlabel='spatial1', ylabel='spatial2'>]

[20]:
plot_gene = 'B3GALT2'
fig, axs = plt.subplots(1, 2, figsize=(8, 4))
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[0], title='RAW_'+plot_gene, vmax='p99')
sc.pl.spatial(adata, img_key="hires", color=plot_gene, show=False, ax=axs[1], title='STAGATE_'+plot_gene, layer='STAGATE_ReX', vmax='p99')
[20]:
[<AxesSubplot:title={'center':'STAGATE_B3GALT2'}, xlabel='spatial1', ylabel='spatial2'>]

Tutorial 7: Slide-seqV2 mouse olfactory bulb (pyG framework)
This tutorial demonstrates how to identify spatial domains on Slide-seqV2 data using STAGATE based on pyG (PyTorch Geometric) framework.
Benefit from the optimization of the pyG package for training graph neural networks, it is more than 10x faster than STAGATE based on the tensorflow1 framework, and can use a batch training strategy to deal with large-scale data (See Tutorial 8).
The download and installation of the STAGATE package based on the pyG framework can be found at https://github.com/zhanglabtools/STAGATE_pyG.
In this tutorial, we foucs on the Slide-seqV2 mouse olfactory bulb data (Puck_200127_15 from https://singlecell.broadinstitute.org/single_cell/study/SCP815/highly-sensitive-spatial-transcriptomics-at-near-cellular-resolution-with-slide-seqv2#study-summary).
We removed spots outside the main tissue area, and the used spots can be downloaded from https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing.
Preparation
[1]:
import pandas as pd
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import os
import sys
[2]:
import torch
import STAGATE_pyG
[3]:
input_dir = 'Data'
counts_file = os.path.join(input_dir, 'Puck_200127_15.digital_expression.txt')
coor_file = os.path.join(input_dir, 'Puck_200127_15_bead_locations.csv')
[4]:
counts = pd.read_csv(counts_file, sep='\t', index_col=0)
coor_df = pd.read_csv(coor_file, index_col=0)
print(counts.shape, coor_df.shape)
(21220, 21724) (21724, 2)
[5]:
adata = sc.AnnData(counts.T)
adata.var_names_make_unique()
coor_df = coor_df.loc[adata.obs_names, ['xcoord', 'ycoord']]
adata.obsm["spatial"] = coor_df.to_numpy()
[6]:
sc.pp.calculate_qc_metrics(adata, inplace=True)
[7]:
adata
[7]:
AnnData object with n_obs × n_vars = 21724 × 21220
obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes'
var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
obsm: 'spatial'
[8]:
plt.rcParams["figure.figsize"] = (6,5)
sc.pl.embedding(adata, basis="spatial", color="log1p_total_counts",s=6, show=False)
plt.title('')
plt.axis('off')
[8]:
(-289.81710000000004, 6312.7151, 173.30850000000004, 5709.8615)

[9]:
# can be downloaded from https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing
used_barcode = pd.read_csv('Data/used_barcodes.txt', sep='\t', header=None)
used_barcode = used_barcode[0]
[10]:
adata = adata[used_barcode,]
[11]:
plt.rcParams["figure.figsize"] = (5,5)
sc.pl.embedding(adata, basis="spatial", color="log1p_total_counts",s=10, show=False, title='Removing spots outside the main tissue area')
plt.axis('off')
[11]:
(588.545, 5108.555, 847.6700000000001, 5670.73)

[12]:
sc.pp.filter_genes(adata, min_cells=50)
print('After flitering: ', adata.shape)
Trying to set attribute `.var` of view, copying.
After flitering: (20139, 11750)
[13]:
#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
Constructing the spatial network
[14]:
STAGATE_pyG.Cal_Spatial_Net(adata, rad_cutoff=50)
STAGATE_pyG.Stats_Spatial_Net(adata)
------Calculating spatial graph...
The graph contains 228300 edges, 20139 cells.
11.3362 neighbors per cell on average.

Running STAGATE
[15]:
adata = STAGATE_pyG.train_STAGATE(adata, device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu'))
Size of Input: (20139, 3000)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:37<00:00, 26.61it/s]
[16]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
[17]:
sc.tl.louvain(adata, resolution=0.5)
[18]:
adata.obsm["spatial"] = adata.obsm["spatial"] * (-1)
[19]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis="spatial", color="louvain",s=6, show=False, title='STAGATE')
plt.axis('off')
[19]:
(-5108.555, -588.545, -5670.73, -847.6700000000001)

[20]:
sc.pl.umap(adata, color='louvain', title='STAGATE')

Tutorial 8: Batch training strategy (pyG framework)
This tutorial demonstrates how to train STAGATE in batches on Slide-seqV2 data using STAGATE based on pyG (PyTorch Geometric) framework.
Benefit from the optimization of the pyG package for training graph neural networks, it is more than 10x faster than STAGATE based on the tensorflow1 framework, and can use a batch training strategy to deal with large-scale data.
The download and installation of the STAGATE package based on the pyG framework can be found at https://github.com/zhanglabtools/STAGATE_pyG.
In this tutorial, we foucs on the Slide-seqV2 mouse olfactory bulb data (Puck_200127_15 from https://singlecell.broadinstitute.org/single_cell/study/SCP815/highly-sensitive-spatial-transcriptomics-at-near-cellular-resolution-with-slide-seqv2#study-summary).
We removed spots outside the main tissue area, and the used spots can be downloaded from https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing.
Strategies for dividing subgraphs
Because we build the spatial network based on spatial location, our network can be directly divided into subgraphs in the following form.
The above picture is an example of num_batch_x=3, num_batch_y=2. Specifically, we divide subgraphs according to quantiles on a single spatial coordinate.
Preparation
[1]:
import pandas as pd
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import os
import sys
from tqdm import tqdm
[2]:
import torch
import STAGATE_pyG
import torch.nn.functional as F
[3]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
[4]:
input_dir = 'Data'
counts_file = os.path.join(input_dir, 'Puck_200127_15.digital_expression.txt')
coor_file = os.path.join(input_dir, 'Puck_200127_15_bead_locations.csv')
[5]:
counts = pd.read_csv(counts_file, sep='\t', index_col=0)
coor_df = pd.read_csv(coor_file, index_col=0)
print(counts.shape, coor_df.shape)
(21220, 21724) (21724, 2)
[6]:
adata = sc.AnnData(counts.T)
adata.var_names_make_unique()
coor_df = coor_df.loc[adata.obs_names, ['xcoord', 'ycoord']]
adata.obsm["spatial"] = coor_df.to_numpy()
[7]:
sc.pp.calculate_qc_metrics(adata, inplace=True)
[8]:
adata
[8]:
AnnData object with n_obs × n_vars = 21724 × 21220
obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes'
var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
obsm: 'spatial'
[9]:
plt.rcParams["figure.figsize"] = (6,5)
sc.pl.embedding(adata, basis="spatial", color="log1p_total_counts",s=6, show=False)
plt.title('')
plt.axis('off')
[9]:
(-289.81710000000004, 6312.7151, 173.30850000000004, 5709.8615)

[10]:
# can be downloaded from https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing
used_barcode = pd.read_csv('Data/used_barcodes.txt', sep='\t', header=None)
used_barcode = used_barcode[0]
[11]:
adata = adata[used_barcode,]
[12]:
plt.rcParams["figure.figsize"] = (5,5)
sc.pl.embedding(adata, basis="spatial", color="log1p_total_counts",s=10, show=False, title='Removing spots outside the main tissue area')
plt.axis('off')
[12]:
(588.545, 5108.555, 847.6700000000001, 5670.73)

[13]:
sc.pp.filter_genes(adata, min_cells=50)
print('After flitering: ', adata.shape)
Trying to set attribute `.var` of view, copying.
After flitering: (20139, 11750)
[14]:
#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
[15]:
# Different with STAGATE (tensorflow version)
adata = adata[:,adata.var['highly_variable']]
adata.shape
[15]:
(20139, 3000)
Dividing subgraphs
[16]:
adata.obs['X'] = adata.obsm['spatial'][:,0]
adata.obs['Y'] = adata.obsm['spatial'][:,1]
Trying to set attribute `.obs` of view, copying.
[17]:
# grid setting
num_batch_x = 3
num_batch_y = 2
[18]:
Batch_list = STAGATE_pyG.Batch_Data(adata, num_batch_x=num_batch_x, num_batch_y=num_batch_y,
spatial_key=['X', 'Y'], plot_Stats=True)

Constructing the spatial network
[19]:
# Consturcting network for each batch
for temp_adata in Batch_list:
STAGATE_pyG.Cal_Spatial_Net(temp_adata, rad_cutoff=50)
#STAGATE_pyG.Stats_Spatial_Net(temp_adata)
------Calculating spatial graph...
Trying to set attribute `._uns` of view, copying.
The graph contains 41528 edges, 3782 cells.
10.9804 neighbors per cell on average.
------Calculating spatial graph...
Trying to set attribute `._uns` of view, copying.
The graph contains 32236 edges, 2931 cells.
10.9983 neighbors per cell on average.
------Calculating spatial graph...
Trying to set attribute `._uns` of view, copying.
The graph contains 41060 edges, 3590 cells.
11.4373 neighbors per cell on average.
------Calculating spatial graph...
Trying to set attribute `._uns` of view, copying.
The graph contains 33970 edges, 3123 cells.
10.8774 neighbors per cell on average.
------Calculating spatial graph...
Trying to set attribute `._uns` of view, copying.
The graph contains 29512 edges, 2698 cells.
10.9385 neighbors per cell on average.
------Calculating spatial graph...
Trying to set attribute `._uns` of view, copying.
The graph contains 45606 edges, 4017 cells.
11.3532 neighbors per cell on average.
[20]:
data_list = [STAGATE_pyG.Transfer_pytorch_Data(adata) for adata in Batch_list]
for temp in data_list:
temp.to(device)
[21]:
STAGATE_pyG.Cal_Spatial_Net(adata, rad_cutoff=50)
data = STAGATE_pyG.Transfer_pytorch_Data(adata)
------Calculating spatial graph...
The graph contains 228300 edges, 20139 cells.
11.3362 neighbors per cell on average.
DataLoader for bathces
[22]:
from torch_geometric.loader import DataLoader
# batch_size=1 or 2
loader = DataLoader(data_list, batch_size=1, shuffle=True)
Running STAGATE
[23]:
# hyper-parameters
num_epoch = 1000
lr=0.001
weight_decay=1e-4
hidden_dims = [512, 30]
[24]:
model = STAGATE_pyG.STAGATE(hidden_dims = [data_list[0].x.shape[1]]+hidden_dims).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
[25]:
for epoch in tqdm(range(1, num_epoch+1)):
for batch in loader:
model.train()
optimizer.zero_grad()
z, out = model(batch.x, batch.edge_index)
loss = F.mse_loss(batch.x, out) #F.nll_loss(out[data.train_mask], data.y[data.train_mask])
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 5.)
optimizer.step()
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:55<00:00, 18.14it/s]
[26]:
# The total network
data.to(device)
[26]:
Data(x=[20139, 3000], edge_index=[2, 248439])
[27]:
model.eval()
z, out = model(data.x, data.edge_index)
STAGATE_rep = z.to('cpu').detach().numpy()
adata.obsm['STAGATE'] = STAGATE_rep
Clustering and UMAP
[28]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
[29]:
sc.tl.louvain(adata, resolution=0.5)
[30]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.embedding(adata, basis="spatial", color="louvain",s=6, show=False)#, legend_loc=False)
plt.title('')
plt.axis('off')
[30]:
(588.545, 5108.555, 847.6700000000001, 5670.73)

[31]:
sc.pl.umap(adata, color='louvain')

Tutorial 9: ISH-based STARmap dataset
Here we present our re-analysis of the ISH-based mouse visual cortex STARmap dataset (See Fig. S16 of the STAGATE paper for details).
The raw data are available at https://www.dropbox.com/sh/f7ebheru1lbz91s/AADm6D54GSEFXB1feRy6OSASa/visual_1020/20180505_BY3_1kgenes?dl=0&subfolder_nav_tracking=1.
The annotation information and the processed SCANPY object are provided at https://drive.google.com/drive/folders/1I1nxheWlc2RXSdiv24dex3YRaEh780my?usp=sharing.
Preparation
[1]:
import pandas as pd
import numpy as np
import scanpy as sc
import os
import sys
import matplotlib.pyplot as plt
import seaborn as sns
import gc
import warnings
warnings.filterwarnings("ignore")
[2]:
import STAGATE
[3]:
adata = sc.read('STARmap_20180505_BY3_1k.h5ad')
[4]:
adata.obs.head()
[4]:
Total_counts | X | Y | label | |
---|---|---|---|---|
Cell_9 | 310 | 4980.777664 | -49.771247 | L6 |
Cell_10 | 533 | 8729.177425 | -53.473114 | L4 |
Cell_13 | 452 | 11547.566359 | -46.315493 | L2/3 |
Cell_15 | 288 | 3280.980264 | -37.612811 | L6 |
Cell_16 | 457 | 7582.793601 | -45.350312 | L4 |
[5]:
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
Constructing the spatial network
[6]:
STAGATE.Cal_Spatial_Net(adata, rad_cutoff=400)
STAGATE.Stats_Spatial_Net(adata)
------Calculating spatial graph...
The graph contains 7838 edges, 1207 cells.
6.4938 neighbors per cell on average.

Running STAGATE
[7]:
adata = STAGATE.train_STAGATE(adata, alpha=0)
Size of Input: (1207, 1020)
WARNING:tensorflow:From /home/dkn/anaconda3/envs/STAGATE_tf/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
100%|██████████| 500/500 [00:21<00:00, 22.79it/s]
[8]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
[9]:
adata = STAGATE.mclust_R(adata, used_obsm='STAGATE', num_cluster=7)
R[write to console]: __ __
____ ___ _____/ /_ _______/ /_
/ __ `__ \/ ___/ / / / / ___/ __/
/ / / / / / /__/ / /_/ (__ ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/ version 5.4.10
Type 'citation("mclust")' for citing this R package in publications.
fitting ...
|======================================================================| 100%
[ ]:
[10]:
sc.pl.umap(adata, color='mclust')

[11]:
plt.rcParams["figure.figsize"] = (4, 2)
sc.pl.embedding(adata, basis="spatial", color='mclust', s=20, show=False)#, legend_loc=False)
[11]:
<AxesSubplot:title={'center':'mclust'}, xlabel='spatial1', ylabel='spatial2'>

[14]:
# layer annotation
plt.rcParams["figure.figsize"] = (4, 2)
sc.pl.embedding(adata, basis="spatial", color='label', s=20, show=False)#, legend_loc=False)
[14]:
<AxesSubplot:title={'center':'label'}, xlabel='spatial1', ylabel='spatial2'>

Calculate ARI
[15]:
from sklearn.metrics.cluster import adjusted_rand_score
[16]:
adjusted_rand_score(adata.obs['label'], adata.obs['mclust'])
[16]:
0.5441835384954468
Additional Tutorial 1: Work for multiple sections (without batch effects)
Here we show STAGATE’s ablility to identify spatial domains for multiple sections.
The DLPFC samples (151673-51676) are used in this tutorial.
Preparation
[1]:
import pandas as pd
import numpy as np
import scanpy as sc
import os
import sys
import matplotlib.pyplot as plt
import seaborn as sns
import gc
import warnings
warnings.filterwarnings("ignore")
[2]:
import STAGATE
[3]:
section_list = ['151673', '151674', '151675', '151676']
Load Data
[4]:
adata_list = {}
for section_id in section_list:
input_dir = os.path.join('../Data', section_id, 'Data')
temp_adata = sc.read_visium(path=input_dir, count_file=section_id+'_filtered_feature_bc_matrix.h5')
temp_adata.var_names_make_unique()
Truth_df = pd.read_csv(os.path.join('../Data', section_id, section_id+'_truth.txt'), sep='\t', header=None, index_col=0)
temp_adata.obs['Ground Truth'] = Truth_df.loc[temp_adata.obs_names, 1]
# make the spot name unique
temp_adata.obs_names = [x+'_'+section_id for x in temp_adata.obs_names]
adata_list[section_id] = temp_adata.copy()
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
[5]:
fig, axs = plt.subplots(1, 4, figsize=(12, 3))
it=0
for section_id in section_list:
if it == 3:
sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it],
color=["Ground Truth"], title=section_id, show=False)
else:
sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it], legend_loc=None,
color=["Ground Truth"], title=section_id, show=False)
it+=1
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical

Constructing the spatial network for each secion
[6]:
for section_id in section_list:
STAGATE.Cal_Spatial_Net(adata_list[section_id], rad_cutoff=150)
STAGATE.Stats_Spatial_Net(adata_list[section_id])
------Calculating spatial graph...
The graph contains 21124 edges, 3639 cells.
5.8049 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 21258 edges, 3673 cells.
5.7876 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 20762 edges, 3592 cells.
5.7801 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 20052 edges, 3460 cells.
5.7954 neighbors per cell on average.




Note that the spatial network are saved in adata.uns[‘Spatial_Net’], which can be conbat directly for multiple sections.
[7]:
adata_list['151673'].uns['Spatial_Net']
[7]:
Cell1 | Cell2 | Distance | |
---|---|---|---|
0 | AAACAAGTATCTCCCA-1_151673 | AGAGGCTTCGGAAACC-1_151673 | 138.061580 |
2 | AAACAAGTATCTCCCA-1_151673 | ACTAGTTGCGATCGTC-1_151673 | 138.003623 |
3 | AAACAAGTATCTCCCA-1_151673 | GCATCGGCCGTGTAGG-1_151673 | 138.061580 |
4 | AAACAAGTATCTCCCA-1_151673 | CAGCAGTCCAGACTAT-1_151673 | 138.003623 |
5 | AAACAAGTATCTCCCA-1_151673 | TTCTTATCCGCTGGGT-1_151673 | 137.927517 |
... | ... | ... | ... |
1 | TTGTTTGTGTAAATTC-1_151673 | GCCTATTTGCTACACA-1_151673 | 138.061580 |
2 | TTGTTTGTGTAAATTC-1_151673 | GTAGCGCTGTTGTAGT-1_151673 | 138.798415 |
4 | TTGTTTGTGTAAATTC-1_151673 | TTAAACCGGTAGCGAC-1_151673 | 137.927517 |
5 | TTGTTTGTGTAAATTC-1_151673 | GGATCAAAGGACGAGG-1_151673 | 138.423264 |
6 | TTGTTTGTGTAAATTC-1_151673 | CGTAGCGCCGACGTTG-1_151673 | 137.003650 |
21124 rows × 3 columns
Conbat the scanpy objects and spatial networks
[8]:
adata = sc.concat([adata_list[x] for x in section_list], keys=None)
[9]:
adata.uns['Spatial_Net'] = pd.concat([adata_list[x].uns['Spatial_Net'] for x in section_list])
[10]:
STAGATE.Stats_Spatial_Net(adata)

Normalization
[11]:
#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
Running STAGATE
[12]:
adata = STAGATE.train_STAGATE(adata, alpha=0)
Size of Input: (14364, 3000)
WARNING:tensorflow:From /home/dkn/anaconda3/envs/STAGATE_tf/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
100%|██████████| 500/500 [04:57<00:00, 1.68it/s]
[13]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
[14]:
adata = STAGATE.mclust_R(adata, used_obsm='STAGATE', num_cluster=7)
R[write to console]: __ __
____ ___ _____/ /_ _______/ /_
/ __ `__ \/ ___/ / / / / ___/ __/
/ / / / / / /__/ / /_/ (__ ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/ version 5.4.10
Type 'citation("mclust")' for citing this R package in publications.
fitting ...
|======================================================================| 100%
[15]:
adata.obs['Sample'] = [x.split('_')[-1] for x in adata.obs_names]
[16]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='Sample')
... storing 'Sample' as categorical

[17]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='Ground Truth')

[18]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='mclust')

[19]:
temp_adata
[19]:
AnnData object with n_obs × n_vars = 3460 × 33538
obs: 'in_tissue', 'array_row', 'array_col', 'Ground Truth'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatial'
obsm: 'spatial'
[20]:
fig, axs = plt.subplots(1, 4, figsize=(12, 3))
it=0
for section_id in section_list:
adata_list[section_id].obs['STAGATE'] = adata.obs.loc[adata_list[section_id].obs_names, 'mclust']
if it == 3:
sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it],
color=["STAGATE"], title=section_id, show=False)
else:
sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it], legend_loc=None,
color=["STAGATE"], title=section_id, show=False)
it+=1

Calculate ARI
[21]:
from sklearn.metrics.cluster import adjusted_rand_score
[22]:
for section_id in section_list:
temp_adata = adata[adata.obs['Sample']==section_id]
temp_obs = temp_adata.obs.dropna()
temp_ARI = adjusted_rand_score(temp_obs['mclust'], temp_obs['Ground Truth'])
print('ARI of section ID %s: %.3f' %(section_id, temp_ARI))
ARI of section ID 151673: 0.564
ARI of section ID 151674: 0.650
ARI of section ID 151675: 0.627
ARI of section ID 151676: 0.620
Additional Tutorial 2: Work for multiple sections (STAGATE + Harmony)
Here we combine STAGATE and harmony algorithms to remove batch effects.
The mouse olfactory bulb data generated by different platform (Slide-seqV2 and Stereo-seq) are used in this tutorial. The detials of these two datasets can be found in Tutorial 3 and Tutorial 4.
Preparation
[1]:
import pandas as pd
import numpy as np
import scanpy as sc
import os
import sys
import matplotlib.pyplot as plt
import seaborn as sns
import gc
import warnings
warnings.filterwarnings("ignore")
[2]:
import STAGATE
Load Data
[3]:
adata_list = {}
Slide-seqV2
[4]:
input_dir = 'MOB_Data/Slide-seqV2'
counts_file = os.path.join(input_dir, 'Puck_200127_15.digital_expression.txt')
coor_file = os.path.join(input_dir, 'Puck_200127_15_bead_locations.csv')
[5]:
counts = pd.read_csv(counts_file, sep='\t', index_col=0)
coor_df = pd.read_csv(coor_file, index_col=0)
print(counts.shape, coor_df.shape)
(21220, 21724) (21724, 2)
[6]:
adata = sc.AnnData(counts.T)
adata.var_names_make_unique()
coor_df = coor_df.loc[adata.obs_names, ['xcoord', 'ycoord']]
adata.obsm["spatial"] = coor_df.to_numpy()
[7]:
sc.pp.calculate_qc_metrics(adata, inplace=True)
[8]:
adata
[8]:
AnnData object with n_obs × n_vars = 21724 × 21220
obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes'
var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
obsm: 'spatial'
[9]:
plt.rcParams["figure.figsize"] = (6,5)
sc.pl.embedding(adata, basis="spatial", color="log1p_total_counts",s=6, show=False)
plt.title('')
plt.axis('off')
[9]:
(-289.81710000000004, 6312.7151, 173.30850000000004, 5709.8615)

[10]:
# can be downloaded from https://drive.google.com/drive/folders/10lhz5VY7YfvHrtV40MwaqLmWz56U9eBP?usp=sharing
used_barcode = pd.read_csv(os.path.join(input_dir, 'used_barcodes.txt'), sep='\t', header=None)
used_barcode = used_barcode[0]
[11]:
adata = adata[used_barcode,]
adata
[11]:
View of AnnData object with n_obs × n_vars = 20139 × 21220
obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes'
var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
obsm: 'spatial'
[12]:
plt.rcParams["figure.figsize"] = (5,5)
sc.pl.embedding(adata, basis="spatial", color="log1p_total_counts",s=10, show=False, title='Removing spots outside the main tissue area')
plt.axis('off')
[12]:
(588.545, 5108.555, 847.6700000000001, 5670.73)

[13]:
sc.pp.filter_genes(adata, min_cells=50)
print('After flitering: ', adata.shape)
Trying to set attribute `.var` of view, copying.
After flitering: (20139, 11750)
[14]:
# make spot name unique
adata.obs_names = [x+'_SlideSeqV2' for x in adata.obs_names]
[15]:
adata_list['SlideSeqV2'] = adata.copy()
Stereo-seq
[16]:
input_dir = 'MOB_Data/Stereo-seq'
counts_file = os.path.join(input_dir, 'RNA_counts.tsv')
coor_file = os.path.join(input_dir, 'position.tsv')
[17]:
counts = pd.read_csv(counts_file, sep='\t', index_col=0)
coor_df = pd.read_csv(coor_file, sep='\t')
print(counts.shape, coor_df.shape)
(27106, 19527) (19527, 3)
[18]:
counts.columns = ['Spot_'+str(x) for x in counts.columns]
coor_df.index = coor_df['label'].map(lambda x: 'Spot_'+str(x))
coor_df = coor_df.loc[:, ['x','y']]
[19]:
coor_df.head()
[19]:
x | y | |
---|---|---|
label | ||
Spot_1 | 12555.007833 | 6307.537859 |
Spot_2 | 12623.666667 | 6297.166667 |
Spot_3 | 12589.567164 | 6302.552239 |
Spot_4 | 12642.495050 | 6307.386139 |
Spot_5 | 13003.333333 | 6307.990991 |
[20]:
adata = sc.AnnData(counts.T)
adata.var_names_make_unique()
[21]:
adata
[21]:
AnnData object with n_obs × n_vars = 19527 × 27106
[22]:
coor_df = coor_df.loc[adata.obs_names, ['y', 'x']]
adata.obsm["spatial"] = coor_df.to_numpy()
sc.pp.calculate_qc_metrics(adata, inplace=True)
[23]:
plt.rcParams["figure.figsize"] = (5,4)
sc.pl.embedding(adata, basis="spatial", color="n_genes_by_counts", show=False)
plt.title("")
plt.axis('off')
[23]:
(6002.432692307693, 12486.580128205129, 9908.545833333334, 15086.093055555555)

[24]:
used_barcode = pd.read_csv(os.path.join(input_dir, 'used_barcodes.txt'), sep='\t', header=None)
used_barcode = used_barcode[0]
adata = adata[used_barcode,]
[25]:
adata
[25]:
View of AnnData object with n_obs × n_vars = 19109 × 27106
obs: 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes'
var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
obsm: 'spatial'
[26]:
plt.rcParams["figure.figsize"] = (5,4)
sc.pl.embedding(adata, basis="spatial", color="n_genes_by_counts", show=False)
plt.title("")
plt.axis('off')
[26]:
(6005.190789473685, 12428.6600877193, 9986.774763741741, 15062.302776957436)

[27]:
sc.pp.filter_genes(adata, min_cells=50)
print('After flitering: ', adata.shape)
Trying to set attribute `.var` of view, copying.
After flitering: (19109, 14376)
[28]:
# make spot name unique
adata.obs_names = [x+'_StereoSeq' for x in adata.obs_names]
[29]:
adata_list['StereoSeq'] = adata.copy()
Constructing the spatial network for each secion
Slide-seqV2
[30]:
STAGATE.Cal_Spatial_Net(adata_list['SlideSeqV2'], rad_cutoff=50)
STAGATE.Stats_Spatial_Net(adata_list['SlideSeqV2'])
------Calculating spatial graph...
The graph contains 228300 edges, 20139 cells.
11.3362 neighbors per cell on average.

[31]:
STAGATE.Cal_Spatial_Net(adata_list['StereoSeq'], rad_cutoff=50)
STAGATE.Stats_Spatial_Net(adata_list['StereoSeq'])
------Calculating spatial graph...
The graph contains 144318 edges, 19109 cells.
7.5524 neighbors per cell on average.

Note that the spatial network are saved in adata.uns[‘Spatial_Net’], which can be conbat directly for multiple sections.
[32]:
adata_list['SlideSeqV2'].uns['Spatial_Net']
[32]:
Cell1 | Cell2 | Distance | |
---|---|---|---|
0 | AAAAAAACAAAAGG_SlideSeqV2 | ATAAGTTGCCCCGT_SlideSeqV2 | 41.494698 |
1 | AAAAAAACAAAAGG_SlideSeqV2 | CTCCGGGCTCTTCA_SlideSeqV2 | 44.777226 |
2 | AAAAAAACAAAAGG_SlideSeqV2 | CCAGCAAAGCTACA_SlideSeqV2 | 29.429237 |
3 | AAAAAAACAAAAGG_SlideSeqV2 | GCCTAAAGCTTTTG_SlideSeqV2 | 25.927784 |
5 | AAAAAAACAAAAGG_SlideSeqV2 | CCTCCTTAACGTTA_SlideSeqV2 | 33.634060 |
... | ... | ... | ... |
9 | TTTTTTTTTTTTAT_SlideSeqV2 | CCTATAACAGCCTG_SlideSeqV2 | 30.802922 |
10 | TTTTTTTTTTTTAT_SlideSeqV2 | CTTGGGCATATAAG_SlideSeqV2 | 37.316216 |
11 | TTTTTTTTTTTTAT_SlideSeqV2 | AGTAGTTGCGGCCG_SlideSeqV2 | 13.688316 |
12 | TTTTTTTTTTTTAT_SlideSeqV2 | TGTATTCACTTTGC_SlideSeqV2 | 18.753666 |
13 | TTTTTTTTTTTTAT_SlideSeqV2 | TAAAACGCGCGAGA_SlideSeqV2 | 25.347584 |
228300 rows × 3 columns
Conbat the scanpy objects and spatial networks
[33]:
adata = sc.concat([adata_list['SlideSeqV2'], adata_list['StereoSeq']], keys=None)
[34]:
adata.uns['Spatial_Net'] = pd.concat([adata_list['SlideSeqV2'].uns['Spatial_Net'], adata_list['StereoSeq'].uns['Spatial_Net']])
[35]:
STAGATE.Stats_Spatial_Net(adata)

Normalization
[36]:
#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
Running STAGATE
[37]:
adata = STAGATE.train_STAGATE(adata, alpha=0)
Size of Input: (39248, 3000)
WARNING:tensorflow:From /home/dkn/anaconda3/envs/STAGATE_tf/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
100%|██████████| 500/500 [18:06<00:00, 2.17s/it]
[38]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
[39]:
adata.obs['Tech'] = [x.split('_')[-1] for x in adata.obs_names]
[40]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='Tech', title='Unintegrated')
... storing 'Tech' as categorical

Run Harmony on the STAGATE represention
[41]:
import harmonypy as hm
[42]:
data_mat = adata.obsm['STAGATE'].copy()
meta_data = adata.obs.copy()
[43]:
# Run Harmony
ho = hm.run_harmony(data_mat, meta_data, ['Tech'])
2022-10-30 18:54:54,390 - harmonypy - INFO - Iteration 1 of 10
2022-10-30 18:55:05,947 - harmonypy - INFO - Iteration 2 of 10
2022-10-30 18:55:20,272 - harmonypy - INFO - Iteration 3 of 10
2022-10-30 18:55:24,721 - harmonypy - INFO - Converged after 3 iterations
[44]:
# Write the adjusted PCs to a new file.
res = pd.DataFrame(ho.Z_corr)
res.columns = adata.obs_names
[45]:
adata_Harmony = sc.AnnData(res.T)
[46]:
adata_Harmony.obsm['spatial'] = pd.DataFrame(adata.obsm['spatial'], index=adata.obs_names).loc[adata_Harmony.obs_names,].values
adata_Harmony.obs['Tech'] = adata.obs.loc[adata_Harmony.obs_names, 'Tech']
[47]:
sc.pp.neighbors(adata_Harmony)
sc.tl.umap(adata_Harmony)
[48]:
sc.tl.louvain(adata_Harmony, resolution=0.8)
[49]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata_Harmony, color='Tech', title='STAGATE + Harmony')

[50]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata_Harmony, color='louvain')

[ ]:
[60]:
fig, axs = plt.subplots(1, 2, figsize=(6, 3))
it=0
for temp_tech in ['StereoSeq', 'SlideSeqV2']:
temp_adata = adata_Harmony[adata_Harmony.obs['Tech']==temp_tech, ]
if it == 1:
sc.pl.embedding(temp_adata, basis="spatial", color="louvain",s=6, ax=axs[it],
show=False, title=temp_tech)
else:
sc.pl.embedding(temp_adata, basis="spatial", color="louvain",s=6, ax=axs[it], legend_loc=None,
show=False, title=temp_tech)
it+=1

[ ]:

News
2022.03.05 STAGATE based on pyG (PyTorch Geometric) framework is availble at STAGATE_pyG (https://github.com/QIFEIDKN/STAGATE_pyG).
Benefit from the optimization of the pyG package for training graph neural networks, it is more than 10x faster than STAGATE based on the tensorflow1 framework, and can use a batch training strategy to deal with large-scale data (See Tutorials 7 and 8 for details).
The cell type-aware module has not been supported by STAGATE_pyG yet.
Introduction
STAGATE is designed for spatial clustering and denoising expressions of spatial resolved transcriptomics (ST) data.
STAGATE learns low-dimensional latent embeddings with both spatial information and gene expressions via a graph attention auto-encoder. The method adopts an attention mechanism in the middle layer of the encoder and decoder, which adaptively learns the edge weights of spatial neighbor networks, and further uses them to update the spot representation by collectively aggregating information from its neighbors. The latent embeddings and the reconstructed expression profiles can be used to downstream tasks such as spatial domain identification, visualization, spatial trajectory inference, data denoising and 3D expression domain extraction.
Citation
Dong, Kangning, and Shihua Zhang. “Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder.” Nature Communications 13.1 (2022): 1-12.