{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exon-Level Differential Gene Analysis (EDEG)\n", "\n", "In this section, we perform exon-level differential gene analysis using the feature matrix. \n", "This analysis aims to identify genes that exhibit significant differences in exon-level expression \n", "between different conditions or cell types." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Identify Exon Markers Using MAST\n", "\n", "In this step, **DOLPHIN** uses the [MAST](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5) model through the [Seurat](https://satijalab.org/seurat/) package to compute p-values for each exon. This helps identify exons that are differentially expressed across cell clusters or experimental conditions.\n", "\n", "> **Note:** A separate conda environment is required to run Seurat. You can create it using the following commands:\n", "\n", "```bash\n", "conda env create -f environment_linux_R.yaml\n", "pip install .\n", "```\n", "\n", "and then install MAST using the code below\n", "\n", "```bash\n", "if (!requireNamespace(\"BiocManager\", quietly = TRUE))\n", " install.packages(\"BiocManager\")\n", "\n", "BiocManager::install(\"MAST\")\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "Attaching package: ‘dplyr’\n", "\n", "The following objects are masked from ‘package:stats’:\n", "\n", " filter, lag\n", "\n", "The following objects are masked from ‘package:base’:\n", "\n", " intersect, setdiff, setequal, union\n", "\n", "Warning message:\n", "package ‘dplyr’ was built under R version 4.2.3 \n", "Attaching SeuratObject\n", "Seurat v4 was just loaded with SeuratObject v5; disabling v5 assays and\n", "validation routines, and ensuring assays work in strict v3/v4\n", "compatibility mode\n", "Warning message:\n", "package ‘Seurat’ was built under R version 4.2.1 \n", "Warning message:\n", "package ‘patchwork’ was built under R version 4.2.3 \n", "Warning message:\n", "package ‘reticulate’ was built under R version 4.2.3 \n", "Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')\n", "Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')\n", "Warning message:\n", "In asMethod(object) :\n", " sparse->dense coercion: allocating vector of size 6.5 GiB\n" ] } ], "source": [ "### Step 1-1: Convert .h5ad file to .rds format using Python\n", "# This step uses the Python kernel to call an R script that converts\n", "# the input AnnData (.h5ad) file into a Seurat-compatible .rds object.\n", "from DOLPHIN.EDEG.call_convert import run_h5ad_rds\n", "\n", "run_h5ad_rds(\n", " input_anndata = \"./Feature_PDAC.h5ad\",\n", " output_rds = \"./Feature_PDAC.rds\"\n", ")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning message:\n", "“package ‘Seurat’ was built under R version 4.2.1”\n", "Attaching SeuratObject\n", "\n", "Seurat v4 was just loaded with SeuratObject v5; disabling v5 assays and\n", "validation routines, and ensuring assays work in strict v3/v4\n", "compatibility mode\n", "\n" ] } ], "source": [ "### Step 1-2: Run MAST to identify exon-level markers (using R kernel for this step)\n", "library(Seurat)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "seurat_obj <- readRDS(file = \"./Feature_PDAC.rds\")\n", "seurat_obj <- NormalizeData(seurat_obj, normalization.method = \"LogNormalize\", scale.factor = 10000)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "seurat_obj@meta.data$Condition <- ifelse(grepl(\"N\", seurat_obj@meta.data$source), \"normal\", \"cancer\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "