Exon-Level Differential Gene Analysis (EDEG)

In this section, we perform exon-level differential gene analysis using the feature matrix. This analysis aims to identify genes that exhibit significant differences in exon-level expression between different conditions or cell types.

Step 1: Identify Exon Markers Using MAST

In this step, DOLPHIN uses the MAST model through the Seurat package to compute p-values for each exon. This helps identify exons that are differentially expressed across cell clusters or experimental conditions.

Note: A separate conda environment is required to run Seurat. You can create it using the following commands:

conda env create -f environment_linux_R.yaml
pip install .

and then install MAST using the code below

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("MAST")
[1]:
### Step 1-1: Convert .h5ad file to .rds format using Python
# This step uses the Python kernel to call an R script that converts
# the input AnnData (.h5ad) file into a Seurat-compatible .rds object.
from DOLPHIN.EDEG.call_convert import run_h5ad_rds

run_h5ad_rds(
    input_anndata = "./Feature_PDAC.h5ad",
    output_rds = "./Feature_PDAC.rds"
)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Warning message:
package ‘dplyr’ was built under R version 4.2.3
Attaching SeuratObject
Seurat v4 was just loaded with SeuratObject v5; disabling v5 assays and
validation routines, and ensuring assays work in strict v3/v4
compatibility mode
Warning message:
package ‘Seurat’ was built under R version 4.2.1
Warning message:
package ‘patchwork’ was built under R version 4.2.3
Warning message:
package ‘reticulate’ was built under R version 4.2.3
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning message:
In asMethod(object) :
  sparse->dense coercion: allocating vector of size 6.5 GiB
[2]:
###  Step 1-2: Run MAST to identify exon-level markers (using R kernel for this step)
library(Seurat)
Warning message:
“package ‘Seurat’ was built under R version 4.2.1”
Attaching SeuratObject

Seurat v4 was just loaded with SeuratObject v5; disabling v5 assays and
validation routines, and ensuring assays work in strict v3/v4
compatibility mode

[ ]:
seurat_obj <- readRDS(file = "./Feature_PDAC.rds")
seurat_obj <- NormalizeData(seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000)
[ ]:
seurat_obj@meta.data$Condition <- ifelse(grepl("N", seurat_obj@meta.data$source), "normal", "cancer")
[5]:
unique(seurat_obj@meta.data$cluster)
  1. Macrophage cell
  2. Stellate cell
  3. Endothelial cell
  4. Ductal cell type 1
  5. Fibroblast cell
  6. B cell
  7. Acinar cell
  8. Endocrine cell
  9. T cell
  10. Ductal cell type 2
Levels:
  1. 'Acinar cell'
  2. 'B cell'
  3. 'Ductal cell type 1'
  4. 'Ductal cell type 2'
  5. 'Endocrine cell'
  6. 'Endothelial cell'
  7. 'Fibroblast cell'
  8. 'Macrophage cell'
  9. 'Stellate cell'
  10. 'T cell'
[ ]:
Idents(seurat_obj) <- "Condition"
[9]:
### Performing within-cluster comparisons at the cluster level.
### You can modify the code below based on the design of your project.
### The code below performs comparison between normal and cancer cells within the ductal cell population.

sub_seurat <- subset(seurat_obj, subset = cluster %in% c("Ductal cell type 1", "Ductal cell type 2"))

DE_MAST <- FindMarkers(sub_seurat, ident.1="cancer", ident.2="normal", test.use="MAST", logfc.threshold=0.5)
write.csv(DE_MAST, file = paste0("./PDAC_MAST_ductal.csv"), row.names = TRUE)

Done!

Combining coefficients and standard errors

Calculating log-fold changes

Calculating likelihood ratio tests

Refitting on reduced model...


Done!