{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Identify Gene-Level Markers Using Exon-Level Information\n", "\n", "In the previous step, exon-level markers were identified. To determine gene-level differential expression, we now aggregate exon marker information by gene, enabling a higher-level view of differences.\n", "\n", "The following function converts exon-level marker results from Seurat into gene-level statistical information. It uses the Stouffer method to combine exon p-values, weighted by exon length, and also computes average log2 fold changes, similarly weighted by exon length.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from DOLPHIN.EDEG.generate_EDEG import run_edeg" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/mnt/data/kailu9/DOLPHIN/DOLPHIN/EDEG/generate_EDEG.py:82: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.\n", " adata.var[\"_new\"]= adata.var.groupby('gene_id').cumcount() + 1\n" ] } ], "source": [ "pd_EDEG = run_edeg(seurat_output = \"./PDAC_MAST_ductal.csv\", \n", " adata_input = \"./Feature_PDAC.h5ad\", \n", " gtf_path = \"./dolphin_exon_gtf/dolphin.exon.gtf\", \n", " output = \"./PDAC_MAST_ductal_final.csv\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Exon_namesp_valavg_log2FCpct.1pct.2p_val_adjGene_namesMAST_exon_weightMAST_weighted_abs_avg_log2FCMAST_weighted_stouffer_pvalMAST_weighted_stouffer_pval_adj_bonf
0FXYD6-FXYD2-13.076099e-226-4.8868600.5131.0001.090148e-220FXYD6-FXYD20.5357143.9754770.000000e+000.000000e+00
1FXYD2-33.274657e-226-4.8866040.5131.0001.160515e-220FXYD20.1239702.7379250.000000e+000.000000e+00
2RPS26-41.653250e-221-1.7637290.9490.9895.859002e-216RPS260.3513351.7261110.000000e+000.000000e+00
3ENSG00000230202-12.466199e-216-1.6034010.9540.9668.740035e-211ENSG000002302021.0000001.6034012.466199e-2162.772007e-213
4RPL34P18-14.327019e-207-1.2992590.9830.9941.533465e-201RPL34P181.0000001.2992594.327019e-2074.863569e-204
....................................
1367MMP7-29.521563e-030.5324230.5340.5341.000000e+00MMP71.0000000.5324239.521563e-031.000000e+00
1368RPL10P9-12.555855e-021.1589210.9901.0001.000000e+00RPL10P91.0000001.1589212.555855e-021.000000e+00
1369REG1A-31.038789e-010.6679300.1550.1021.000000e+00REG1A0.4892790.9065751.972519e-052.217111e-02
1370TSPAN8-11.534439e-010.5927250.8330.8521.000000e+00TSPAN80.7829460.5748512.444462e-031.000000e+00
1371SST-11.714755e-012.0500560.2080.1481.000000e+00SST1.0000002.0500561.714755e-011.000000e+00
\n", "

1372 rows × 11 columns

\n", "
" ], "text/plain": [ " Exon_names p_val avg_log2FC pct.1 pct.2 \\\n", "0 FXYD6-FXYD2-1 3.076099e-226 -4.886860 0.513 1.000 \n", "1 FXYD2-3 3.274657e-226 -4.886604 0.513 1.000 \n", "2 RPS26-4 1.653250e-221 -1.763729 0.949 0.989 \n", "3 ENSG00000230202-1 2.466199e-216 -1.603401 0.954 0.966 \n", "4 RPL34P18-1 4.327019e-207 -1.299259 0.983 0.994 \n", "... ... ... ... ... ... \n", "1367 MMP7-2 9.521563e-03 0.532423 0.534 0.534 \n", "1368 RPL10P9-1 2.555855e-02 1.158921 0.990 1.000 \n", "1369 REG1A-3 1.038789e-01 0.667930 0.155 0.102 \n", "1370 TSPAN8-1 1.534439e-01 0.592725 0.833 0.852 \n", "1371 SST-1 1.714755e-01 2.050056 0.208 0.148 \n", "\n", " p_val_adj Gene_names MAST_exon_weight \\\n", "0 1.090148e-220 FXYD6-FXYD2 0.535714 \n", "1 1.160515e-220 FXYD2 0.123970 \n", "2 5.859002e-216 RPS26 0.351335 \n", "3 8.740035e-211 ENSG00000230202 1.000000 \n", "4 1.533465e-201 RPL34P18 1.000000 \n", "... ... ... ... \n", "1367 1.000000e+00 MMP7 1.000000 \n", "1368 1.000000e+00 RPL10P9 1.000000 \n", "1369 1.000000e+00 REG1A 0.489279 \n", "1370 1.000000e+00 TSPAN8 0.782946 \n", "1371 1.000000e+00 SST 1.000000 \n", "\n", " MAST_weighted_abs_avg_log2FC MAST_weighted_stouffer_pval \\\n", "0 3.975477 0.000000e+00 \n", "1 2.737925 0.000000e+00 \n", "2 1.726111 0.000000e+00 \n", "3 1.603401 2.466199e-216 \n", "4 1.299259 4.327019e-207 \n", "... ... ... \n", "1367 0.532423 9.521563e-03 \n", "1368 1.158921 2.555855e-02 \n", "1369 0.906575 1.972519e-05 \n", "1370 0.574851 2.444462e-03 \n", "1371 2.050056 1.714755e-01 \n", "\n", " MAST_weighted_stouffer_pval_adj_bonf \n", "0 0.000000e+00 \n", "1 0.000000e+00 \n", "2 0.000000e+00 \n", "3 2.772007e-213 \n", "4 4.863569e-204 \n", "... ... \n", "1367 1.000000e+00 \n", "1368 1.000000e+00 \n", "1369 2.217111e-02 \n", "1370 1.000000e+00 \n", "1371 1.000000e+00 \n", "\n", "[1372 rows x 11 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd_EDEG" ] } ], "metadata": { "kernelspec": { "display_name": "DOLPHIN", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.15" } }, "nbformat": 4, "nbformat_minor": 2 }