Additional Tutorial 1: Work for multiple sections (without batch effects)

Here we show STAGATE’s ablility to identify spatial domains for multiple sections.

The DLPFC samples (151673-51676) are used in this tutorial.

Preparation

[1]:

import pandas as pd
import numpy as np
import scanpy as sc
import os
import sys
import matplotlib.pyplot as plt
import seaborn as sns
import gc
import warnings
warnings.filterwarnings("ignore")

[2]:

import STAGATE

[3]:

section_list = ['151673', '151674', '151675', '151676']

Load Data

[4]:

adata_list = {}
for section_id in section_list:
    input_dir = os.path.join('../Data', section_id, 'Data')
    temp_adata = sc.read_visium(path=input_dir, count_file=section_id+'_filtered_feature_bc_matrix.h5')
    temp_adata.var_names_make_unique()

    Truth_df = pd.read_csv(os.path.join('../Data', section_id, section_id+'_truth.txt'), sep='\t', header=None, index_col=0)
    temp_adata.obs['Ground Truth'] = Truth_df.loc[temp_adata.obs_names, 1]
    # make the spot name unique
    temp_adata.obs_names = [x+'_'+section_id for x in temp_adata.obs_names]

    adata_list[section_id] = temp_adata.copy()

Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.

[5]:

fig, axs = plt.subplots(1, 4, figsize=(12, 3))
it=0
for section_id in section_list:
    if it == 3:
        sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it],
                      color=["Ground Truth"], title=section_id, show=False)
    else:
        sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it], legend_loc=None,
                      color=["Ground Truth"], title=section_id, show=False)
    it+=1

... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
... storing 'Ground Truth' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical

Constructing the spatial network for each secion

[6]:

for section_id in section_list:
    STAGATE.Cal_Spatial_Net(adata_list[section_id], rad_cutoff=150)
    STAGATE.Stats_Spatial_Net(adata_list[section_id])

------Calculating spatial graph...
The graph contains 21124 edges, 3639 cells.
5.8049 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 21258 edges, 3673 cells.
5.7876 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 20762 edges, 3592 cells.
5.7801 neighbors per cell on average.
------Calculating spatial graph...
The graph contains 20052 edges, 3460 cells.
5.7954 neighbors per cell on average.

Note that the spatial network are saved in adata.uns[‘Spatial_Net’], which can be conbat directly for multiple sections.

[7]:

adata_list['151673'].uns['Spatial_Net']

[7]:

	Cell1	Cell2	Distance
0	AAACAAGTATCTCCCA-1_151673	AGAGGCTTCGGAAACC-1_151673	138.061580
2	AAACAAGTATCTCCCA-1_151673	ACTAGTTGCGATCGTC-1_151673	138.003623
3	AAACAAGTATCTCCCA-1_151673	GCATCGGCCGTGTAGG-1_151673	138.061580
4	AAACAAGTATCTCCCA-1_151673	CAGCAGTCCAGACTAT-1_151673	138.003623
5	AAACAAGTATCTCCCA-1_151673	TTCTTATCCGCTGGGT-1_151673	137.927517
...	...	...	...
1	TTGTTTGTGTAAATTC-1_151673	GCCTATTTGCTACACA-1_151673	138.061580
2	TTGTTTGTGTAAATTC-1_151673	GTAGCGCTGTTGTAGT-1_151673	138.798415
4	TTGTTTGTGTAAATTC-1_151673	TTAAACCGGTAGCGAC-1_151673	137.927517
5	TTGTTTGTGTAAATTC-1_151673	GGATCAAAGGACGAGG-1_151673	138.423264
6	TTGTTTGTGTAAATTC-1_151673	CGTAGCGCCGACGTTG-1_151673	137.003650

21124 rows × 3 columns

Conbat the scanpy objects and spatial networks

[8]:

adata = sc.concat([adata_list[x] for x in section_list], keys=None)

[9]:

adata.uns['Spatial_Net'] = pd.concat([adata_list[x].uns['Spatial_Net'] for x in section_list])

[10]:

STAGATE.Stats_Spatial_Net(adata)

Normalization

[11]:

#Normalization
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

Running STAGATE

[12]:

adata = STAGATE.train_STAGATE(adata, alpha=0)

Size of Input:  (14364, 3000)
WARNING:tensorflow:From /home/dkn/anaconda3/envs/STAGATE_tf/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

100%|██████████| 500/500 [04:57<00:00,  1.68it/s]

[13]:

sc.pp.neighbors(adata, use_rep='STAGATE')
sc.tl.umap(adata)

[14]:

adata = STAGATE.mclust_R(adata, used_obsm='STAGATE', num_cluster=7)

R[write to console]:                    __           __
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 5.4.10
Type 'citation("mclust")' for citing this R package in publications.

fitting ...
  |======================================================================| 100%

[15]:

adata.obs['Sample'] = [x.split('_')[-1] for x in adata.obs_names]

[16]:

plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='Sample')

... storing 'Sample' as categorical

[17]:

plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='Ground Truth')

[18]:

plt.rcParams["figure.figsize"] = (3, 3)
sc.pl.umap(adata, color='mclust')

[19]:

temp_adata

[19]:

AnnData object with n_obs × n_vars = 3460 × 33538
    obs: 'in_tissue', 'array_row', 'array_col', 'Ground Truth'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

[20]:

fig, axs = plt.subplots(1, 4, figsize=(12, 3))
it=0
for section_id in section_list:
    adata_list[section_id].obs['STAGATE'] = adata.obs.loc[adata_list[section_id].obs_names, 'mclust']
    if it == 3:
        sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it],
                      color=["STAGATE"], title=section_id, show=False)
    else:
        sc.pl.spatial(adata_list[section_id], img_key="hires", ax=axs[it], legend_loc=None,
                      color=["STAGATE"], title=section_id, show=False)
    it+=1

Calculate ARI

[21]:

from sklearn.metrics.cluster import adjusted_rand_score

[22]:

for section_id in section_list:
    temp_adata = adata[adata.obs['Sample']==section_id]
    temp_obs = temp_adata.obs.dropna()
    temp_ARI = adjusted_rand_score(temp_obs['mclust'], temp_obs['Ground Truth'])
    print('ARI of section ID %s: %.3f' %(section_id, temp_ARI))

ARI of section ID 151673: 0.564
ARI of section ID 151674: 0.650
ARI of section ID 151675: 0.627
ARI of section ID 151676: 0.620