Additional Tutorial 1: Work for multiple sections (without batch effects)

Here we show STAGATE’s ablility to identify spatial domains for multiple sections.

The DLPFC samples (151673-51676) are used in this tutorial.

Preparation

[1]:
import pandas as pd
import numpy as np
import scanpy as sc
import os
import sys
import matplotlib.pyplot as plt
import seaborn as sns
import gc
import warnings
warnings.filterwarnings("ignore")
[2]:
import STAGATE
[3]:
section_list = ['151673', '151674', '151675', '151676']

Load Data

[4]:
adata_list = {}
for section_id in section_list:
    input_dir = os.path.join('../Data', section_id, 'Data')
    temp_adata = sc.read_visium(path=input_dir, count_file=section_id+'_filtered_feature_bc_matrix.h5')
    temp_adata.var_names_make_unique()

    Truth_df = pd.read_csv(os.path.join('../Data', section_id, section_id+'_truth.txt'), sep='\t', header=None, index_col=0)
    temp_adata.obs['Ground Truth'] = Truth_df.loc[temp_adata.obs_names, 1]
    # make the spot name unique
    temp_adata.obs_names = [x+'_'+section_id for x in temp_adata.obs_names]

    adata_list[section_id] = temp_adata.copy()
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
[5]:
fig, axs = plt.subplots(1, 4, figsize=(12, 3))
it=0
for section_id in section_list:
    if it == 3:
        sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it],
                      color=["Ground Truth"], title=section_id, show=False)
    else:
        sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it], legend_loc=None,
                      color=["Ground Truth"], title=section_id, show=False)
    it+=1
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
_images/AT1_8_1.png

Constructing the spatial network for each secion

[6]:
for section_id in section_list:
    STAGATE.Cal_Spatial_Net(adata_list[section_id], rad_cutoff=150)
    STAGATE.Stats_Spatial_Net(adata_list[section_id])
------Calculating spatial graph...
The graph contains 21124 edges, 3639 cells.
5.8049 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 21258 edges, 3673 cells.
5.7876 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 20762 edges, 3592 cells.
5.7801 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 20052 edges, 3460 cells.
5.7954 neighbors per cell on average.
_images/AT1_10_1.png
_images/AT1_10_2.png
_images/AT1_10_3.png
_images/AT1_10_4.png

Note that the spatial network are saved in adata.uns[‘Spatial_Net’], which can be conbat directly for multiple sections.

[7]:
adata_list['151673'].uns['Spatial_Net']
[7]:
Cell1 Cell2 Distance
0 AAACAAGTATCTCCCA-1_151673 AGAGGCTTCGGAAACC-1_151673 138.061580
2 AAACAAGTATCTCCCA-1_151673 ACTAGTTGCGATCGTC-1_151673 138.003623
3 AAACAAGTATCTCCCA-1_151673 GCATCGGCCGTGTAGG-1_151673 138.061580
4 AAACAAGTATCTCCCA-1_151673 CAGCAGTCCAGACTAT-1_151673 138.003623
5 AAACAAGTATCTCCCA-1_151673 TTCTTATCCGCTGGGT-1_151673 137.927517
... ... ... ...
1 TTGTTTGTGTAAATTC-1_151673 GCCTATTTGCTACACA-1_151673 138.061580
2 TTGTTTGTGTAAATTC-1_151673 GTAGCGCTGTTGTAGT-1_151673 138.798415
4 TTGTTTGTGTAAATTC-1_151673 TTAAACCGGTAGCGAC-1_151673 137.927517
5 TTGTTTGTGTAAATTC-1_151673 GGATCAAAGGACGAGG-1_151673 138.423264
6 TTGTTTGTGTAAATTC-1_151673 CGTAGCGCCGACGTTG-1_151673 137.003650

21124 rows × 3 columns

Conbat the scanpy objects and spatial networks

[8]:
adata = sc.concat([adata_list[x] for x in section_list], keys=None)
[9]:
adata.uns['Spatial_Net'] = pd.concat([adata_list[x].uns['Spatial_Net'] for x in section_list])
[10]:
STAGATE.Stats_Spatial_Net(adata)
_images/AT1_16_0.png

Normalization

[11]:
#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

Running STAGATE

[12]:
adata = STAGATE.train_STAGATE(adata, alpha=0)
Size of Input:  (14364, 3000)
WARNING:tensorflow:From /home/dkn/anaconda3/envs/STAGATE_tf/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
100%|██████████| 500/500 [04:57<00:00,  1.68it/s]
[13]:
sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)
[14]:
adata = STAGATE.mclust_R(adata, used_obsm='STAGATE', num_cluster=7)
R[write to console]:                    __           __
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 5.4.10
Type 'citation("mclust")' for citing this R package in publications.

fitting ...
  |======================================================================| 100%
[15]:
adata.obs['Sample'] = [x.split('_')[-1] for x in adata.obs_names]
[16]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='Sample')
... storing 'Sample' as categorical
_images/AT1_24_1.png
[17]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='Ground Truth')
_images/AT1_25_0.png
[18]:
plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='mclust')
_images/AT1_26_0.png
[19]:
temp_adata
[19]:
AnnData object with n_obs × n_vars = 3460 × 33538
    obs: 'in_tissue', 'array_row', 'array_col', 'Ground Truth'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'
[20]:
fig, axs = plt.subplots(1, 4, figsize=(12, 3))
it=0
for section_id in section_list:
    adata_list[section_id].obs['STAGATE'] = adata.obs.loc[adata_list[section_id].obs_names, 'mclust']
    if it == 3:
        sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it],
                      color=["STAGATE"], title=section_id, show=False)
    else:
        sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it], legend_loc=None,
                      color=["STAGATE"], title=section_id, show=False)
    it+=1
_images/AT1_28_0.png

Calculate ARI

[21]:
from sklearn.metrics.cluster import adjusted_rand_score
[22]:
for section_id in section_list:
    temp_adata = adata[adata.obs['Sample']==section_id]
    temp_obs = temp_adata.obs.dropna()
    temp_ARI = adjusted_rand_score(temp_obs['mclust'], temp_obs['Ground Truth'])
    print('ARI of section ID %s: %.3f' %(section_id, temp_ARI))
ARI of section ID 151673: 0.564
ARI of section ID 151674: 0.650
ARI of section ID 151675: 0.627
ARI of section ID 151676: 0.620