Step 2: Identify Gene-Level Markers Using Exon-Level Information
In the previous step, exon-level markers were identified. To determine gene-level differential expression, we now aggregate exon marker information by gene, enabling a higher-level view of differences.
The following function converts exon-level marker results from Seurat into gene-level statistical information. It uses the Stouffer method to combine exon p-values, weighted by exon length, and also computes average log2 fold changes, similarly weighted by exon length.
[1]:
from DOLPHIN.EDEG.generate_EDEG import run_edeg
[2]:
pd_EDEG = run_edeg(seurat_output = "./PDAC_MAST_ductal.csv",
adata_input = "./Feature_PDAC.h5ad",
gtf_path = "./dolphin_exon_gtf/dolphin.exon.gtf",
output = "./PDAC_MAST_ductal_final.csv")
/mnt/data/kailu9/DOLPHIN/DOLPHIN/EDEG/generate_EDEG.py:82: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
adata.var["_new"]= adata.var.groupby('gene_id').cumcount() + 1
[3]:
pd_EDEG
[3]:
| Exon_names | p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | Gene_names | MAST_exon_weight | MAST_weighted_abs_avg_log2FC | MAST_weighted_stouffer_pval | MAST_weighted_stouffer_pval_adj_bonf | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | FXYD6-FXYD2-1 | 3.076099e-226 | -4.886860 | 0.513 | 1.000 | 1.090148e-220 | FXYD6-FXYD2 | 0.535714 | 3.975477 | 0.000000e+00 | 0.000000e+00 |
| 1 | FXYD2-3 | 3.274657e-226 | -4.886604 | 0.513 | 1.000 | 1.160515e-220 | FXYD2 | 0.123970 | 2.737925 | 0.000000e+00 | 0.000000e+00 |
| 2 | RPS26-4 | 1.653250e-221 | -1.763729 | 0.949 | 0.989 | 5.859002e-216 | RPS26 | 0.351335 | 1.726111 | 0.000000e+00 | 0.000000e+00 |
| 3 | ENSG00000230202-1 | 2.466199e-216 | -1.603401 | 0.954 | 0.966 | 8.740035e-211 | ENSG00000230202 | 1.000000 | 1.603401 | 2.466199e-216 | 2.772007e-213 |
| 4 | RPL34P18-1 | 4.327019e-207 | -1.299259 | 0.983 | 0.994 | 1.533465e-201 | RPL34P18 | 1.000000 | 1.299259 | 4.327019e-207 | 4.863569e-204 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1367 | MMP7-2 | 9.521563e-03 | 0.532423 | 0.534 | 0.534 | 1.000000e+00 | MMP7 | 1.000000 | 0.532423 | 9.521563e-03 | 1.000000e+00 |
| 1368 | RPL10P9-1 | 2.555855e-02 | 1.158921 | 0.990 | 1.000 | 1.000000e+00 | RPL10P9 | 1.000000 | 1.158921 | 2.555855e-02 | 1.000000e+00 |
| 1369 | REG1A-3 | 1.038789e-01 | 0.667930 | 0.155 | 0.102 | 1.000000e+00 | REG1A | 0.489279 | 0.906575 | 1.972519e-05 | 2.217111e-02 |
| 1370 | TSPAN8-1 | 1.534439e-01 | 0.592725 | 0.833 | 0.852 | 1.000000e+00 | TSPAN8 | 0.782946 | 0.574851 | 2.444462e-03 | 1.000000e+00 |
| 1371 | SST-1 | 1.714755e-01 | 2.050056 | 0.208 | 0.148 | 1.000000e+00 | SST | 1.000000 | 2.050056 | 1.714755e-01 | 1.000000e+00 |
1372 rows × 11 columns