{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup and Train the DOLPHIN Model\n",
    "\n",
    "This tutorial provides a step-by-step guide on configuring the model architecture, setting hyperparameters, and visualizing cell embedding clusters using DOLPHIN.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from DOLPHIN.model import run_DOLPHIN\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Processed Dataset\n",
    "\n",
    "Specify the graph data input and the highly variable gene (HVG)-filtered feature matrix obtained from the preprocessing step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#load datasets\n",
    "graph_data = \"model_<sample_name>.pt\"\n",
    "feature_data = \"FeatureCompHvg_<sample_name>.h5ad\"\n",
    "## save the output adata, default is set to the current folder\n",
    "output_path = './'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set Hyperparameters and Train the Model\n",
    "\n",
    "The function `run_DOLPHIN` is used to configure hyperparameters and train the model. Below is a detailed explanation of its parameters:\n",
    "\n",
    "---\n",
    "\n",
    "#### **Function Definition**\n",
    "```python\n",
    "run_DOLPHIN(data_type, graph_in, fea_in, current_out_path='./', params=None, device='cuda:0', seed_num=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parameters\n",
    "\n",
    "##### 1. `data_type` Specifies the type of input single-cell RNA-seq data:\n",
    "- `\"full-length\"`: For full-length RNA-seq data.\n",
    "- `\"10x\"`: For 10x Genomics RNA-seq data.\n",
    "\n",
    "##### 2. `graph_in` The input graph dataset.\n",
    "\n",
    "##### 3. `fea_in` The input feature matrix, provided as an AnnData object (`adata`).\n",
    "\n",
    "##### 4. `current_out_path` Specifies the output directory where the resulting cell embeddings (`X_z`) will be saved.  The output file will be named: `DOLPHIN_Z.h5ad`\n",
    "\n",
    "##### 5. `params` Model hyperparameters.  \n",
    "If `data_type` is set, you can use the **default hyperparameters** or provide your own in a dictionary format.  \n",
    "Below is a list of customizable hyperparameters:\n",
    "\n",
    "| Parameter             | Description                                                              |\n",
    "|-----------------------|--------------------------------------------------------------------------|\n",
    "| `\"gat_channel\"`       | Number of features per node after the GAT layer.                        |\n",
    "| `\"nhead\"`             | Number of attention heads in the graph attention layer.                 |\n",
    "| `\"gat_dropout\"`       | Dropout rate for the GAT layer.                                         |\n",
    "| `\"list_gra_enc_hid\"`  | Neuron sizes for each fully connected layer of the encoder.              |\n",
    "| `\"gra_p_dropout\"`     | Dropout rate for the encoder.                                           |\n",
    "| `\"z_dim\"`             | Dimensionality of the latent Z space.                                   |\n",
    "| `\"list_fea_dec_hid\"`  | Neuron sizes for each fully connected layer of the feature decoder.      |\n",
    "| `\"list_adj_dec_hid\"`  | Neuron sizes for each fully connected layer of the adjacency decoder.    |\n",
    "| `\"lr\"`                | Learning rate for optimization.                                         |\n",
    "| `\"batch\"`             | Mini-batch size.                                                       |\n",
    "| `\"epochs\"`            | Number of training epochs.                                              |\n",
    "| `\"kl_beta\"`           | KL divergence weight.                                                  |\n",
    "| `\"fea_lambda\"`        | Feature matrix reconstruction loss weight.                              |\n",
    "| `\"adj_lambda\"`        | Adjacency matrix reconstruction loss weight.                            |\n",
    "\n",
    "##### 6. `device` Specifies the device for training.  Default: `\"cuda:0\"` for GPU training (highly recommended).\n",
    "\n",
    "##### 7. `seed_num`Sets the random seed for reproducibility.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "run_DOLPHIN(\"full-length\", graph_data, feature_data, output_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cell Embedding Cluster Using `X_z`\n",
    "\n",
    "The cell embedding matrix `X_z` represents the low-dimensional latent space learned by the DOLPHIN model.  \n",
    "This matrix can be used to visualize cell clusters and analyze their relationships in the latent space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "import scanpy as sc\n",
    "from sklearn.metrics import adjusted_rand_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata = sc.read_h5ad(\"./DOLPHIN_Z.h5ad\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sc.pp.neighbors(adata, use_rep=\"X_z\")\n",
    "sc.tl.umap(adata)\n",
    "sc.tl.leiden(adata)\n",
    "print(len(set(adata.obs[\"leiden\"])))\n",
    "adjusted_rand_score(adata.obs[\"celltype\"], adata.obs[\"leiden\"])\n",
    "sc.pl.umap(adata, color=['leiden', \"celltype\"], wspace=0.5)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "DOLPHIN",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}