Alternative Splicing Analysis

Step 1: Convert Outrigger PSI output to .h5ad format

This step converts the Outrigger PSI matrix into an .h5ad file for downstream analysis. Missing (NaN) values are preserved to reflect unquantified splicing events.

[1]:
from DOLPHIN.AS.convert_psi_to_h5ad import run_convert_psi
[ ]:
adata_psi = run_convert_psi(
    metadata_path="./fsla_meta.csv",
    outrigger_path="./outrigger_output",
    out_name='fsla',
    out_directory="./"
)
100%|██████████| 795/795 [05:34<00:00,  2.38it/s]
[3]:
adata_psi
[3]:
AnnData object with n_obs × n_vars = 795 × 9487
    obs: 'celltype1', 'celltype2'
    var: 'gene_name'
[7]:
adata_psi.to_df().head()
[7]:
isoform1=junction:10:100246936-100253420:-|isoform2=junction:10:100250333-100253420:-@exon:10:100250248-100250332:-@junction:10:100246936-100250247:- isoform1=junction:10:100256477-100260965:-|isoform2=junction:10:100260320-100260965:-@exon:10:100260218-100260319:-@junction:10:100256477-100260217:- isoform1=junction:10:100489762-100490705:-|isoform2=junction:10:100490323-100490705:-@exon:10:100490008-100490322:-@junction:10:100489762-100490007:- isoform1=junction:10:100496432-100497666:-|isoform2=junction:10:100497281-100497666:-@exon:10:100497135-100497280:-@junction:10:100496432-100497134:- isoform1=junction:10:100498208-100499159:-|isoform2=junction:10:100498805-100499159:-@exon:10:100498705-100498804:-@junction:10:100498208-100498704:- isoform1=junction:10:100516961-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100516961-100526398:- isoform1=junction:10:100523930-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100523930-100526398:- isoform1=junction:10:100983818-100986748:-|isoform2=junction:10:100984075-100986748:-@exon:10:100983948-100984074:-@junction:10:100983818-100983947:- isoform1=junction:10:101611770-101624744:-|isoform2=junction:10:101612478-101624744:-@exon:10:101612337-101612477:-@junction:10:101611770-101612336:- isoform1=junction:10:101624811-101672914:-|isoform2=junction:10:101667981-101672914:-@exon:10:101667886-101667980:-@junction:10:101624811-101667885:- ... isoform1=junction:X:78945496-78960507:+|isoform2=junction:X:78945496-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+ isoform1=junction:X:78947864-78960507:+|isoform2=junction:X:78947864-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+ isoform1=junction:X:79361480-79362941:-|isoform2=junction:X:79362692-79362941:-@exon:X:79362581-79362691:-@junction:X:79361480-79362580:- isoform1=junction:X:81202246-81276983:+|isoform2=junction:X:81202246-81202436:+@exon:X:81202437-81202576:+@junction:X:81202577-81276983:+ isoform1=junction:Y:12909408-12912726:+|isoform2=junction:Y:12909408-12911838:+@exon:Y:12911839-12911968:+@junction:Y:12911969-12912726:+ isoform1=junction:Y:13359987-13366266:-|isoform2=junction:Y:13360529-13366266:-@exon:Y:13360430-13360528:-@junction:Y:13359987-13360429:- isoform1=junction:Y:19587508-19590082:+|isoform2=junction:Y:19587508-19589520:+@exon:Y:19589521-19589612:+@junction:Y:19589613-19590082:+ isoform1=junction:Y:19735751-19741317:-|isoform2=junction:Y:19739663-19741317:-@exon:Y:19739528-19739662:-@junction:Y:19735751-19739527:- isoform1=junction:Y:20582694-20588023:+|isoform2=junction:Y:20582694-20584473:+@exon:Y:20584474-20584524:+@junction:Y:20584525-20588023:+ isoform1=junction:Y:2854772-2866792:+|isoform2=junction:Y:2854772-2865087:+@exon:Y:2865088-2865245:+@junction:Y:2865246-2866792:+
SRR18388386 NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN ... NaN NaN 1.0 0.000000 NaN NaN NaN NaN NaN 1.0
SRR18387779 NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN ... NaN NaN NaN 0.054945 NaN NaN NaN NaN NaN 1.0
SRR18387770 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 1.0 0.000000 NaN NaN NaN NaN NaN 1.0
SRR18388394 NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0
SRR18387788 NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN ... NaN NaN 1.0 NaN NaN NaN NaN NaN NaN 1.0

5 rows × 9487 columns

Step 2: Cell Clustering Using PSI Values

This step processes the _PSI.h5ad file to facilitate cell clustering. To enable PCA and downstream analyses, missing PSI (NaN) values are imputed with random values between 0 and 1. The resulting matrix is saved as a new .h5ad file containing the imputed PSI values.

[1]:
from DOLPHIN.AS.convert_random_psi import run_psi_random
[ ]:
adata_psi_random = run_psi_random(
    outrigger_psi_data="./alternative_splicing/fsla_PSI.h5ad",
    out_name="fsla",
    out_directory='./')
[4]:
adata_psi_random.to_df().head()
[4]:
isoform1=junction:10:100246936-100253420:-|isoform2=junction:10:100250333-100253420:-@exon:10:100250248-100250332:-@junction:10:100246936-100250247:- isoform1=junction:10:100256477-100260965:-|isoform2=junction:10:100260320-100260965:-@exon:10:100260218-100260319:-@junction:10:100256477-100260217:- isoform1=junction:10:100489762-100490705:-|isoform2=junction:10:100490323-100490705:-@exon:10:100490008-100490322:-@junction:10:100489762-100490007:- isoform1=junction:10:100496432-100497666:-|isoform2=junction:10:100497281-100497666:-@exon:10:100497135-100497280:-@junction:10:100496432-100497134:- isoform1=junction:10:100498208-100499159:-|isoform2=junction:10:100498805-100499159:-@exon:10:100498705-100498804:-@junction:10:100498208-100498704:- isoform1=junction:10:100516961-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100516961-100526398:- isoform1=junction:10:100523930-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100523930-100526398:- isoform1=junction:10:100983818-100986748:-|isoform2=junction:10:100984075-100986748:-@exon:10:100983948-100984074:-@junction:10:100983818-100983947:- isoform1=junction:10:101611770-101624744:-|isoform2=junction:10:101612478-101624744:-@exon:10:101612337-101612477:-@junction:10:101611770-101612336:- isoform1=junction:10:101624811-101672914:-|isoform2=junction:10:101667981-101672914:-@exon:10:101667886-101667980:-@junction:10:101624811-101667885:- ... isoform1=junction:X:78945496-78960507:+|isoform2=junction:X:78945496-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+ isoform1=junction:X:78947864-78960507:+|isoform2=junction:X:78947864-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+ isoform1=junction:X:79361480-79362941:-|isoform2=junction:X:79362692-79362941:-@exon:X:79362581-79362691:-@junction:X:79361480-79362580:- isoform1=junction:X:81202246-81276983:+|isoform2=junction:X:81202246-81202436:+@exon:X:81202437-81202576:+@junction:X:81202577-81276983:+ isoform1=junction:Y:12909408-12912726:+|isoform2=junction:Y:12909408-12911838:+@exon:Y:12911839-12911968:+@junction:Y:12911969-12912726:+ isoform1=junction:Y:13359987-13366266:-|isoform2=junction:Y:13360529-13366266:-@exon:Y:13360430-13360528:-@junction:Y:13359987-13360429:- isoform1=junction:Y:19587508-19590082:+|isoform2=junction:Y:19587508-19589520:+@exon:Y:19589521-19589612:+@junction:Y:19589613-19590082:+ isoform1=junction:Y:19735751-19741317:-|isoform2=junction:Y:19739663-19741317:-@exon:Y:19739528-19739662:-@junction:Y:19735751-19739527:- isoform1=junction:Y:20582694-20588023:+|isoform2=junction:Y:20582694-20584473:+@exon:Y:20584474-20584524:+@junction:Y:20584525-20588023:+ isoform1=junction:Y:2854772-2866792:+|isoform2=junction:Y:2854772-2865087:+@exon:Y:2865088-2865245:+@junction:Y:2865246-2866792:+
SRR18388386 0.548814 0.715189 0.602763 0.544883 0.423655 0.645894 1.000000 0.891773 0.963663 0.383442 ... 0.192200 0.916999 1.000000 0.000000 0.224325 0.646099 0.377303 0.239175 0.843921 1.0
SRR18387779 0.471649 0.285935 0.872293 0.419384 0.465397 0.191993 1.000000 0.549905 0.656898 0.418817 ... 0.690972 0.570053 0.554895 0.054945 0.350364 0.765937 0.074863 0.808629 0.241341 1.0
SRR18387770 0.467157 0.207176 0.913840 0.688435 0.001312 0.802888 0.192368 0.410850 0.828048 0.916628 ... 0.655658 0.150540 1.000000 0.000000 0.511076 0.095635 0.835670 0.217615 0.790823 1.0
SRR18388394 0.921630 0.576743 0.486409 0.646680 0.844161 0.301350 1.000000 0.214558 0.589372 0.956229 ... 0.258662 0.366379 0.270519 0.613285 0.829367 0.948184 0.816119 0.677352 0.157222 1.0
SRR18387788 0.978044 0.774697 0.780661 0.542142 0.946817 0.996528 1.000000 0.841271 0.634649 0.234987 ... 0.359507 0.236903 1.000000 0.950538 0.375510 0.247062 0.320256 0.047280 0.274863 1.0

5 rows × 9487 columns

Step 3: Differential Alternative Splicing Analysis

In this step, we perform differential alternative splicing analysis using the Wilcoxon test. To enable this, missing PSI (NaN) values are imputed using the average PSI across all events within each cell cluster. Specifically, for each cluster, we calculate the mean PSI across all available events and use this value to fill NaNs in that cluster’s cells. This ensures that events with sparse coverage still receive imputations based on the overall splicing profile of their respective cluster.

[1]:
from DOLPHIN.AS.generate_differential_as import run_differential_as
[ ]:
adata_psi_DAS = run_differential_as(
    outrigger_psi_data="./alternative_splicing/fsla_PSI.h5ad",
    out_name="fsla",
    cluster_name="celltype1",
    out_directory='./'
)
Total number of splicing events before filtering: 9487
Number of splicing events after filtering (>= 10 cells with valid PSI): 4978