Alternative Splicing Analysis
DOLPHIN performs alternative splicing analysis using the Outrigger module from the Expedition toolkit. This method quantifies alternative splicing events from aggregated BAM files and computes percent spliced-in (PSI) values for exon-exon junctions [1].
- DOLPHIN.AS.convert_psi_to_h5ad.run_convert_psi(metadata_path, outrigger_path, out_name, out_directory='./')[source]
Convert Outrigger PSI output into an AnnData object for alternative splicing analysis.
This function processes PSI (Percent Spliced In) values generated by Outrigger, constructs a sample (cell) by event matrix, annotates events with gene names, and returns a fully structured AnnData object for downstream analysis.
- Parameters:
metadata_path (str) – Path to the metadata file. Must be a tab-separated file containing a column named ‘CB’ for cell barcodes.
outrigger_path (str) –
- Path to the directory containing Outrigger results. This directory must include:
”psi/outrigger_summary.csv”: PSI values per cell and event.
”index/se/validated/events.csv” and “index/mxe/validated/events.csv”: Event-to-gene mapping.
out_name (str) – Output filename prefix (without extension). Final output will be named “<out_name>_PSI.h5ad”.
out_directory (str, optional) – Directory where the output .h5ad file will be saved. Default is the current directory (“./”).
- Returns:
adata – A structured AnnData object with the following components: - X : np.ndarray
The PSI matrix (cells by events), with PSI values for each splicing event.
- obspandas.DataFrame
Cell-level metadata, with cell barcodes and additional columns from the input metadata file.
- varpandas.DataFrame
Event-level metadata with gene names
The output adata will be saved as: <out_directory>/alternative_splicing/<out_name>_PSI.h5ad.
- Return type:
anndata.AnnData
- DOLPHIN.AS.convert_random_psi.run_psi_random(outrigger_psi_data, out_name, out_directory='./', seed_num=0)[source]
Randomly impute missing PSI values for downstream clustering analysis.
This function reads an AnnData <out_name>_PSI.h5ad file containing PSI values generated by Outrigger. It replaces all missing values (NaNs) in the PSI matrix with random values uniformly sampled between 0 and 1, enabling compatibility with dimensionality reduction and clustering methods (e.g., PCA, Leiden). The output is a new .h5ad file with the same structure but with imputed values.
- Parameters:
outrigger_psi_data (str) – Path to the input <out_name>_PSI.h5ad file containing PSI values with NaNs.
out_directory (str, optional) – Directory where the output .h5ad file will be saved. Default is the current directory (“./”).
out_name (str) – Output filename prefix (without extension). The result will be saved as “<out_name>_PSI_random.h5ad”.
seed_num (int, optional) – Random seed for reproducibility. Default is 0.
- Returns:
adata – An AnnData object with random-imputed PSI values. The file is saved to: <out_directory>/alternative_splicing/<out_name>_PSI_random.h5ad.
- Return type:
anndata.AnnData
- DOLPHIN.AS.generate_differential_as.run_differential_as(outrigger_psi_data, out_name, cluster_name, out_directory='./', n_cell=10)[source]
This function imputes missing PSI values in preparation for downstream differential alternative splicing analysis.
This function imputes missing PSI values by calculating the average PSI across all splicing events within each cell cluster. The cluster-wise mean is used to fill in NaN-psi values, ensuring that sparsely detected events still receive a representative value. Only splicing events detected in at least n_cell cells are retained for analysis.
- Parameters:
outrigger_psi_data (str) – Path to the input <out_name>_PSI.h5ad file containing PSI values with NaNs.
cluster_name (str) – Column name in adata.obs of <out_name>_PSI.h5ad that contains cell cluster labels.
out_directory (str, optional) – Directory where the output .h5ad file will be saved. Default is the current directory (“./”).
n_cell (int, optional) – Minimum number of cells in which a splicing event must be detected to be retained. Default is 10.
out_name (str) – Prefix for the output file. The result will be saved as “<out_name>_PSI_DAS.h5ad”.
- Returns:
adata – AnnData object with cluster-mean-imputed PSI values, ready for differential splicing analysis. Saved to: <out_directory>/alternative_splicing/<out_name>_PSI_DAS.h5ad.
- Return type:
anndata.AnnData
- DOLPHIN.AS.convert_modality_ohe.run_modality_ohe(anchor_output, adata_psi, cluster_name, out_directory)[source]
Convert splicing modality output from Anchor into a one-hot encoded matrix and save as AnnData.
This function processes the modality classification of splicing events generated by the [Anchor](https://github.com/yeolab/anchor) module of Expedition. It converts each categorical modality (e.g., included, excluded, bimodal) into numerical values and aligns them to cells using the provided cluster labels in the adata_psi.
- Parameters:
anchor_output (str) – Path to the CSV file containing modality annotations for splicing events from Anchor.
adata_psi (str) – Path to the .h5ad file containing the PSI matrix and cell metadata.
cluster_name (str) – Column name in adata_psi.obs indicating cluster identity of each cell. Must match with column names in the Anchor output.
out_directory (str) – Path to save the resulting one-hot encoded AnnData .h5ad file.
- Returns:
adata – An AnnData object with modality one-hot encoded values.
- Return type:
anndata.AnnData