{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Identify Gene-Level Markers Using Exon-Level Information\n", "\n", "In the previous step, exon-level markers were identified. To determine gene-level differential expression, we now aggregate exon marker information by gene, enabling a higher-level view of differences.\n", "\n", "The following function converts exon-level marker results from Seurat into gene-level statistical information. It uses the Stouffer method to combine exon p-values, weighted by exon length, and also computes average log2 fold changes, similarly weighted by exon length.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from DOLPHIN.EDEG.generate_EDEG import run_edeg" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/mnt/data/kailu9/DOLPHIN/DOLPHIN/EDEG/generate_EDEG.py:82: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n", " adata.var[\"_new\"]= adata.var.groupby('gene_id').cumcount() + 1\n" ] } ], "source": [ "pd_EDEG = run_edeg(seurat_output = \"./PDAC_MAST_ductal.csv\", \n", " adata_input = \"./Feature_PDAC.h5ad\", \n", " gtf_path = \"./dolphin_exon_gtf/dolphin.exon.gtf\", \n", " output = \"./PDAC_MAST_ductal_final.csv\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Exon_names | \n", "p_val | \n", "avg_log2FC | \n", "pct.1 | \n", "pct.2 | \n", "p_val_adj | \n", "Gene_names | \n", "MAST_exon_weight | \n", "MAST_weighted_abs_avg_log2FC | \n", "MAST_weighted_stouffer_pval | \n", "MAST_weighted_stouffer_pval_adj_bonf | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "FXYD6-FXYD2-1 | \n", "3.076099e-226 | \n", "-4.886860 | \n", "0.513 | \n", "1.000 | \n", "1.090148e-220 | \n", "FXYD6-FXYD2 | \n", "0.535714 | \n", "3.975477 | \n", "0.000000e+00 | \n", "0.000000e+00 | \n", "
| 1 | \n", "FXYD2-3 | \n", "3.274657e-226 | \n", "-4.886604 | \n", "0.513 | \n", "1.000 | \n", "1.160515e-220 | \n", "FXYD2 | \n", "0.123970 | \n", "2.737925 | \n", "0.000000e+00 | \n", "0.000000e+00 | \n", "
| 2 | \n", "RPS26-4 | \n", "1.653250e-221 | \n", "-1.763729 | \n", "0.949 | \n", "0.989 | \n", "5.859002e-216 | \n", "RPS26 | \n", "0.351335 | \n", "1.726111 | \n", "0.000000e+00 | \n", "0.000000e+00 | \n", "
| 3 | \n", "ENSG00000230202-1 | \n", "2.466199e-216 | \n", "-1.603401 | \n", "0.954 | \n", "0.966 | \n", "8.740035e-211 | \n", "ENSG00000230202 | \n", "1.000000 | \n", "1.603401 | \n", "2.466199e-216 | \n", "2.772007e-213 | \n", "
| 4 | \n", "RPL34P18-1 | \n", "4.327019e-207 | \n", "-1.299259 | \n", "0.983 | \n", "0.994 | \n", "1.533465e-201 | \n", "RPL34P18 | \n", "1.000000 | \n", "1.299259 | \n", "4.327019e-207 | \n", "4.863569e-204 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 1367 | \n", "MMP7-2 | \n", "9.521563e-03 | \n", "0.532423 | \n", "0.534 | \n", "0.534 | \n", "1.000000e+00 | \n", "MMP7 | \n", "1.000000 | \n", "0.532423 | \n", "9.521563e-03 | \n", "1.000000e+00 | \n", "
| 1368 | \n", "RPL10P9-1 | \n", "2.555855e-02 | \n", "1.158921 | \n", "0.990 | \n", "1.000 | \n", "1.000000e+00 | \n", "RPL10P9 | \n", "1.000000 | \n", "1.158921 | \n", "2.555855e-02 | \n", "1.000000e+00 | \n", "
| 1369 | \n", "REG1A-3 | \n", "1.038789e-01 | \n", "0.667930 | \n", "0.155 | \n", "0.102 | \n", "1.000000e+00 | \n", "REG1A | \n", "0.489279 | \n", "0.906575 | \n", "1.972519e-05 | \n", "2.217111e-02 | \n", "
| 1370 | \n", "TSPAN8-1 | \n", "1.534439e-01 | \n", "0.592725 | \n", "0.833 | \n", "0.852 | \n", "1.000000e+00 | \n", "TSPAN8 | \n", "0.782946 | \n", "0.574851 | \n", "2.444462e-03 | \n", "1.000000e+00 | \n", "
| 1371 | \n", "SST-1 | \n", "1.714755e-01 | \n", "2.050056 | \n", "0.208 | \n", "0.148 | \n", "1.000000e+00 | \n", "SST | \n", "1.000000 | \n", "2.050056 | \n", "1.714755e-01 | \n", "1.000000e+00 | \n", "
1372 rows × 11 columns
\n", "