DOLPHIN Model module

This module provides functions to run the DOLPHIN model.

DOLPHIN.model.run_model.run_DOLPHIN(data_type, graph_in, fea_in, current_out_path='./', params=None, device='auto', seed_num=0)[source]

Run the DOLPHIN model on single-cell RNA-seq data to obtain latent cell embeddings.

Parameters:
  • data_type (str) – Specifies the type of input single-cell RNA-seq data. - “full-length”: For full-length RNA-seq data. - “10x”: For 10x Genomics RNA-seq data.

  • graph_in (object) – The input graph structure (precomputed from exon-level data).

  • fea_in (anndata.AnnData) – The input feature matrix, provided as an AnnData object.

  • current_out_path (str, optional) – Output directory where the resulting cell embeddings will be saved. The embeddings will be written to DOLPHIN_Z.h5ad. Default is ‘./’.

  • params (dict, optional) –

    A dictionary of model hyperparameters. If not provided, default parameters will be used depending on data_type. Customizable parameters include:

    • ”gat_channel” : Number of GAT output channels per head.

    • ”nhead” : Number of GAT attention heads.

    • ”gat_dropout” : Dropout rate in the GAT layer.

    • ”list_gra_enc_hid” : Encoder MLP hidden layer sizes.

    • ”gra_p_dropout” : Dropout rate in the encoder.

    • ”z_dim” : Dimensionality of the latent space.

    • ”list_fea_dec_hid” : Feature decoder MLP hidden layer sizes.

    • ”list_adj_dec_hid” : Adjacency decoder MLP hidden layer sizes.

    • ”lr” : Learning rate.

    • ”batch” : Mini-batch size.

    • ”epochs” : Number of training epochs.

    • ”kl_beta” : KL divergence loss weight.

    • ”fea_lambda” : Feature reconstruction loss weight.

    • ”adj_lambda” : Adjacency reconstruction loss weight.

  • device (str, optional) –

    Computational device to run the model on. Options are:

    • ’auto’ (default): Automatically selects ‘cuda’ if a GPU is available, otherwise falls back to ‘cpu’.

    • ’cuda’ or ‘cuda:0’: Use the first available GPU.

    • ’cpu’: Run on CPU only.

    GPU acceleration is recommended for large datasets or faster training.

  • seed_num (int, optional) – Random seed for reproducibility. Default is 0.

Returns:

Saves the latent cell embedding matrix to DOLPHIN_Z.h5ad under current_out_path.

Return type:

None