Exon-Level Differential Gene Analysis (EDEG)
In this section, we perform exon-level differential gene analysis using the feature matrix. This analysis aims to identify genes that exhibit significant differences in exon-level expression between different conditions or cell types.
Step 1: Identify Exon Markers Using MAST
In this step, DOLPHIN uses the MAST model through the Seurat package to compute p-values for each exon. This helps identify exons that are differentially expressed across cell clusters or experimental conditions.
Note: A separate conda environment is required to run Seurat. You can create it using the following commands:
conda env create -f environment_linux_R.yaml
pip install .
and then install MAST using the code below
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MAST")
[1]:
### Step 1-1: Convert .h5ad file to .rds format using Python
# This step uses the Python kernel to call an R script that converts
# the input AnnData (.h5ad) file into a Seurat-compatible .rds object.
from DOLPHIN.EDEG.call_convert import run_h5ad_rds
run_h5ad_rds(
input_anndata = "./Feature_PDAC.h5ad",
output_rds = "./Feature_PDAC.rds"
)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Warning message:
package ‘dplyr’ was built under R version 4.2.3
Attaching SeuratObject
Seurat v4 was just loaded with SeuratObject v5; disabling v5 assays and
validation routines, and ensuring assays work in strict v3/v4
compatibility mode
Warning message:
package ‘Seurat’ was built under R version 4.2.1
Warning message:
package ‘patchwork’ was built under R version 4.2.3
Warning message:
package ‘reticulate’ was built under R version 4.2.3
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning message:
In asMethod(object) :
sparse->dense coercion: allocating vector of size 6.5 GiB
[2]:
### Step 1-2: Run MAST to identify exon-level markers (using R kernel for this step)
library(Seurat)
Warning message:
“package ‘Seurat’ was built under R version 4.2.1”
Attaching SeuratObject
Seurat v4 was just loaded with SeuratObject v5; disabling v5 assays and
validation routines, and ensuring assays work in strict v3/v4
compatibility mode
[ ]:
seurat_obj <- readRDS(file = "./Feature_PDAC.rds")
seurat_obj <- NormalizeData(seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000)
[ ]:
seurat_obj@meta.data$Condition <- ifelse(grepl("N", seurat_obj@meta.data$source), "normal", "cancer")
[5]:
unique(seurat_obj@meta.data$cluster)
- Macrophage cell
- Stellate cell
- Endothelial cell
- Ductal cell type 1
- Fibroblast cell
- B cell
- Acinar cell
- Endocrine cell
- T cell
- Ductal cell type 2
Levels:
- 'Acinar cell'
- 'B cell'
- 'Ductal cell type 1'
- 'Ductal cell type 2'
- 'Endocrine cell'
- 'Endothelial cell'
- 'Fibroblast cell'
- 'Macrophage cell'
- 'Stellate cell'
- 'T cell'
[ ]:
Idents(seurat_obj) <- "Condition"
[9]:
### Performing within-cluster comparisons at the cluster level.
### You can modify the code below based on the design of your project.
### The code below performs comparison between normal and cancer cells within the ductal cell population.
sub_seurat <- subset(seurat_obj, subset = cluster %in% c("Ductal cell type 1", "Ductal cell type 2"))
DE_MAST <- FindMarkers(sub_seurat, ident.1="cancer", ident.2="normal", test.use="MAST", logfc.threshold=0.5)
write.csv(DE_MAST, file = paste0("./PDAC_MAST_ductal.csv"), row.names = TRUE)
Done!
Combining coefficients and standard errors
Calculating log-fold changes
Calculating likelihood ratio tests
Refitting on reduced model...
Done!