Alternative Splicing Analysis
Step 1: Convert Outrigger PSI output to .h5ad format
This step converts the Outrigger PSI matrix into an .h5ad file for downstream analysis. Missing (NaN) values are preserved to reflect unquantified splicing events.
[1]:
from DOLPHIN.AS.convert_psi_to_h5ad import run_convert_psi
[ ]:
adata_psi = run_convert_psi(
metadata_path="./fsla_meta.csv",
outrigger_path="./outrigger_output",
out_name='fsla',
out_directory="./"
)
100%|██████████| 795/795 [05:34<00:00, 2.38it/s]
[3]:
adata_psi
[3]:
AnnData object with n_obs × n_vars = 795 × 9487
obs: 'celltype1', 'celltype2'
var: 'gene_name'
[7]:
adata_psi.to_df().head()
[7]:
| isoform1=junction:10:100246936-100253420:-|isoform2=junction:10:100250333-100253420:-@exon:10:100250248-100250332:-@junction:10:100246936-100250247:- | isoform1=junction:10:100256477-100260965:-|isoform2=junction:10:100260320-100260965:-@exon:10:100260218-100260319:-@junction:10:100256477-100260217:- | isoform1=junction:10:100489762-100490705:-|isoform2=junction:10:100490323-100490705:-@exon:10:100490008-100490322:-@junction:10:100489762-100490007:- | isoform1=junction:10:100496432-100497666:-|isoform2=junction:10:100497281-100497666:-@exon:10:100497135-100497280:-@junction:10:100496432-100497134:- | isoform1=junction:10:100498208-100499159:-|isoform2=junction:10:100498805-100499159:-@exon:10:100498705-100498804:-@junction:10:100498208-100498704:- | isoform1=junction:10:100516961-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100516961-100526398:- | isoform1=junction:10:100523930-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100523930-100526398:- | isoform1=junction:10:100983818-100986748:-|isoform2=junction:10:100984075-100986748:-@exon:10:100983948-100984074:-@junction:10:100983818-100983947:- | isoform1=junction:10:101611770-101624744:-|isoform2=junction:10:101612478-101624744:-@exon:10:101612337-101612477:-@junction:10:101611770-101612336:- | isoform1=junction:10:101624811-101672914:-|isoform2=junction:10:101667981-101672914:-@exon:10:101667886-101667980:-@junction:10:101624811-101667885:- | ... | isoform1=junction:X:78945496-78960507:+|isoform2=junction:X:78945496-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+ | isoform1=junction:X:78947864-78960507:+|isoform2=junction:X:78947864-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+ | isoform1=junction:X:79361480-79362941:-|isoform2=junction:X:79362692-79362941:-@exon:X:79362581-79362691:-@junction:X:79361480-79362580:- | isoform1=junction:X:81202246-81276983:+|isoform2=junction:X:81202246-81202436:+@exon:X:81202437-81202576:+@junction:X:81202577-81276983:+ | isoform1=junction:Y:12909408-12912726:+|isoform2=junction:Y:12909408-12911838:+@exon:Y:12911839-12911968:+@junction:Y:12911969-12912726:+ | isoform1=junction:Y:13359987-13366266:-|isoform2=junction:Y:13360529-13366266:-@exon:Y:13360430-13360528:-@junction:Y:13359987-13360429:- | isoform1=junction:Y:19587508-19590082:+|isoform2=junction:Y:19587508-19589520:+@exon:Y:19589521-19589612:+@junction:Y:19589613-19590082:+ | isoform1=junction:Y:19735751-19741317:-|isoform2=junction:Y:19739663-19741317:-@exon:Y:19739528-19739662:-@junction:Y:19735751-19739527:- | isoform1=junction:Y:20582694-20588023:+|isoform2=junction:Y:20582694-20584473:+@exon:Y:20584474-20584524:+@junction:Y:20584525-20588023:+ | isoform1=junction:Y:2854772-2866792:+|isoform2=junction:Y:2854772-2865087:+@exon:Y:2865088-2865245:+@junction:Y:2865246-2866792:+ | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SRR18388386 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | ... | NaN | NaN | 1.0 | 0.000000 | NaN | NaN | NaN | NaN | NaN | 1.0 |
| SRR18387779 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | 0.054945 | NaN | NaN | NaN | NaN | NaN | 1.0 |
| SRR18387770 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | 1.0 | 0.000000 | NaN | NaN | NaN | NaN | NaN | 1.0 |
| SRR18388394 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 |
| SRR18387788 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | ... | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 |
5 rows × 9487 columns
Step 2: Cell Clustering Using PSI Values
This step processes the _PSI.h5ad file to facilitate cell clustering. To enable PCA and downstream analyses, missing PSI (NaN) values are imputed with random values between 0 and 1. The resulting matrix is saved as a new .h5ad file containing the imputed PSI values.
[1]:
from DOLPHIN.AS.convert_random_psi import run_psi_random
[ ]:
adata_psi_random = run_psi_random(
outrigger_psi_data="./alternative_splicing/fsla_PSI.h5ad",
out_name="fsla",
out_directory='./')
[4]:
adata_psi_random.to_df().head()
[4]:
| isoform1=junction:10:100246936-100253420:-|isoform2=junction:10:100250333-100253420:-@exon:10:100250248-100250332:-@junction:10:100246936-100250247:- | isoform1=junction:10:100256477-100260965:-|isoform2=junction:10:100260320-100260965:-@exon:10:100260218-100260319:-@junction:10:100256477-100260217:- | isoform1=junction:10:100489762-100490705:-|isoform2=junction:10:100490323-100490705:-@exon:10:100490008-100490322:-@junction:10:100489762-100490007:- | isoform1=junction:10:100496432-100497666:-|isoform2=junction:10:100497281-100497666:-@exon:10:100497135-100497280:-@junction:10:100496432-100497134:- | isoform1=junction:10:100498208-100499159:-|isoform2=junction:10:100498805-100499159:-@exon:10:100498705-100498804:-@junction:10:100498208-100498704:- | isoform1=junction:10:100516961-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100516961-100526398:- | isoform1=junction:10:100523930-100526974:-|isoform2=junction:10:100526555-100526974:-@exon:10:100526399-100526554:-@junction:10:100523930-100526398:- | isoform1=junction:10:100983818-100986748:-|isoform2=junction:10:100984075-100986748:-@exon:10:100983948-100984074:-@junction:10:100983818-100983947:- | isoform1=junction:10:101611770-101624744:-|isoform2=junction:10:101612478-101624744:-@exon:10:101612337-101612477:-@junction:10:101611770-101612336:- | isoform1=junction:10:101624811-101672914:-|isoform2=junction:10:101667981-101672914:-@exon:10:101667886-101667980:-@junction:10:101624811-101667885:- | ... | isoform1=junction:X:78945496-78960507:+|isoform2=junction:X:78945496-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+ | isoform1=junction:X:78947864-78960507:+|isoform2=junction:X:78947864-78952192:+@exon:X:78952193-78952335:+@junction:X:78952336-78960507:+ | isoform1=junction:X:79361480-79362941:-|isoform2=junction:X:79362692-79362941:-@exon:X:79362581-79362691:-@junction:X:79361480-79362580:- | isoform1=junction:X:81202246-81276983:+|isoform2=junction:X:81202246-81202436:+@exon:X:81202437-81202576:+@junction:X:81202577-81276983:+ | isoform1=junction:Y:12909408-12912726:+|isoform2=junction:Y:12909408-12911838:+@exon:Y:12911839-12911968:+@junction:Y:12911969-12912726:+ | isoform1=junction:Y:13359987-13366266:-|isoform2=junction:Y:13360529-13366266:-@exon:Y:13360430-13360528:-@junction:Y:13359987-13360429:- | isoform1=junction:Y:19587508-19590082:+|isoform2=junction:Y:19587508-19589520:+@exon:Y:19589521-19589612:+@junction:Y:19589613-19590082:+ | isoform1=junction:Y:19735751-19741317:-|isoform2=junction:Y:19739663-19741317:-@exon:Y:19739528-19739662:-@junction:Y:19735751-19739527:- | isoform1=junction:Y:20582694-20588023:+|isoform2=junction:Y:20582694-20584473:+@exon:Y:20584474-20584524:+@junction:Y:20584525-20588023:+ | isoform1=junction:Y:2854772-2866792:+|isoform2=junction:Y:2854772-2865087:+@exon:Y:2865088-2865245:+@junction:Y:2865246-2866792:+ | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SRR18388386 | 0.548814 | 0.715189 | 0.602763 | 0.544883 | 0.423655 | 0.645894 | 1.000000 | 0.891773 | 0.963663 | 0.383442 | ... | 0.192200 | 0.916999 | 1.000000 | 0.000000 | 0.224325 | 0.646099 | 0.377303 | 0.239175 | 0.843921 | 1.0 |
| SRR18387779 | 0.471649 | 0.285935 | 0.872293 | 0.419384 | 0.465397 | 0.191993 | 1.000000 | 0.549905 | 0.656898 | 0.418817 | ... | 0.690972 | 0.570053 | 0.554895 | 0.054945 | 0.350364 | 0.765937 | 0.074863 | 0.808629 | 0.241341 | 1.0 |
| SRR18387770 | 0.467157 | 0.207176 | 0.913840 | 0.688435 | 0.001312 | 0.802888 | 0.192368 | 0.410850 | 0.828048 | 0.916628 | ... | 0.655658 | 0.150540 | 1.000000 | 0.000000 | 0.511076 | 0.095635 | 0.835670 | 0.217615 | 0.790823 | 1.0 |
| SRR18388394 | 0.921630 | 0.576743 | 0.486409 | 0.646680 | 0.844161 | 0.301350 | 1.000000 | 0.214558 | 0.589372 | 0.956229 | ... | 0.258662 | 0.366379 | 0.270519 | 0.613285 | 0.829367 | 0.948184 | 0.816119 | 0.677352 | 0.157222 | 1.0 |
| SRR18387788 | 0.978044 | 0.774697 | 0.780661 | 0.542142 | 0.946817 | 0.996528 | 1.000000 | 0.841271 | 0.634649 | 0.234987 | ... | 0.359507 | 0.236903 | 1.000000 | 0.950538 | 0.375510 | 0.247062 | 0.320256 | 0.047280 | 0.274863 | 1.0 |
5 rows × 9487 columns
Step 3: Differential Alternative Splicing Analysis
In this step, we perform differential alternative splicing analysis using the Wilcoxon test. To enable this, missing PSI (NaN) values are imputed using the average PSI across all events within each cell cluster. Specifically, for each cluster, we calculate the mean PSI across all available events and use this value to fill NaNs in that cluster’s cells. This ensures that events with sparse coverage still receive imputations based on the overall splicing profile of their respective cluster.
[1]:
from DOLPHIN.AS.generate_differential_as import run_differential_as
[ ]:
adata_psi_DAS = run_differential_as(
outrigger_psi_data="./alternative_splicing/fsla_PSI.h5ad",
out_name="fsla",
cluster_name="celltype1",
out_directory='./'
)
Total number of splicing events before filtering: 9487
Number of splicing events after filtering (>= 10 cells with valid PSI): 4978