Step 2: Identify Gene-Level Markers Using Exon-Level Information

In the previous step, exon-level markers were identified. To determine gene-level differential expression, we now aggregate exon marker information by gene, enabling a higher-level view of differences.

The following function converts exon-level marker results from Seurat into gene-level statistical information. It uses the Stouffer method to combine exon p-values, weighted by exon length, and also computes average log2 fold changes, similarly weighted by exon length.

[1]:
from DOLPHIN.EDEG.generate_EDEG import run_edeg
[2]:
pd_EDEG = run_edeg(seurat_output = "./PDAC_MAST_ductal.csv",
    adata_input = "./Feature_PDAC.h5ad",
    gtf_path = "./dolphin_exon_gtf/dolphin.exon.gtf",
    output = "./PDAC_MAST_ductal_final.csv")
/mnt/data/kailu9/DOLPHIN/DOLPHIN/EDEG/generate_EDEG.py:82: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  adata.var["_new"]= adata.var.groupby('gene_id').cumcount() + 1
[3]:
pd_EDEG
[3]:
Exon_names p_val avg_log2FC pct.1 pct.2 p_val_adj Gene_names MAST_exon_weight MAST_weighted_abs_avg_log2FC MAST_weighted_stouffer_pval MAST_weighted_stouffer_pval_adj_bonf
0 FXYD6-FXYD2-1 3.076099e-226 -4.886860 0.513 1.000 1.090148e-220 FXYD6-FXYD2 0.535714 3.975477 0.000000e+00 0.000000e+00
1 FXYD2-3 3.274657e-226 -4.886604 0.513 1.000 1.160515e-220 FXYD2 0.123970 2.737925 0.000000e+00 0.000000e+00
2 RPS26-4 1.653250e-221 -1.763729 0.949 0.989 5.859002e-216 RPS26 0.351335 1.726111 0.000000e+00 0.000000e+00
3 ENSG00000230202-1 2.466199e-216 -1.603401 0.954 0.966 8.740035e-211 ENSG00000230202 1.000000 1.603401 2.466199e-216 2.772007e-213
4 RPL34P18-1 4.327019e-207 -1.299259 0.983 0.994 1.533465e-201 RPL34P18 1.000000 1.299259 4.327019e-207 4.863569e-204
... ... ... ... ... ... ... ... ... ... ... ...
1367 MMP7-2 9.521563e-03 0.532423 0.534 0.534 1.000000e+00 MMP7 1.000000 0.532423 9.521563e-03 1.000000e+00
1368 RPL10P9-1 2.555855e-02 1.158921 0.990 1.000 1.000000e+00 RPL10P9 1.000000 1.158921 2.555855e-02 1.000000e+00
1369 REG1A-3 1.038789e-01 0.667930 0.155 0.102 1.000000e+00 REG1A 0.489279 0.906575 1.972519e-05 2.217111e-02
1370 TSPAN8-1 1.534439e-01 0.592725 0.833 0.852 1.000000e+00 TSPAN8 0.782946 0.574851 2.444462e-03 1.000000e+00
1371 SST-1 1.714755e-01 2.050056 0.208 0.148 1.000000e+00 SST 1.000000 2.050056 1.714755e-01 1.000000e+00

1372 rows × 11 columns