{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cell Aggregation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Find Cell Neighbors Using K-Nearest Neighbors (KNN)\n", "\n", "In this step, we will use the K-Nearest Neighbors (KNN) algorithm to find the nearest neighboring cells for each cell in the dataset. \n", "The output file will be saved as:\n", "\n", " N__.csv\n", "\n", "Each row represents a pair of neighboring cells:\n", "\n", "| main_name | neighbor |\n", "|-----------|----------|\n", "| CellA | CellB |\n", "| CellA | CellC |\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saved neighbor list to ./N_fsla_10.csv\n" ] } ], "source": [ "from DOLPHIN.cell_reads_aggregation.find_cell_neighbor import run_find_neighbor\n", "\n", "run_find_neighbor(\n", " embedding_data=\"./DOLPHIN_Z.h5ad\",\n", " out_name='fsla'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Cell Aggregation - Adding Junction Reads from Neighboring Cells\n", "\n", "In this step, we will perform cell aggregation by incorporating confident junction reads from neighboring cells. This process enhances the signal for alternative splicing analysis and helps to resolve potential noise by taking into account the junction read patterns of nearby cells. \n", "\n", "Before running this step, please ensure that you have single cell BAM files (and bam.bai) and junction read files generated by STAR alignment.\n", "\n", "For a tutorial on how to run STAR alignment, please refer to Step 1 [full-length](./step1_1_preprocess_full_length.md) or [10x](./step1_2_preprocess_10X.md).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Processing BAM files: 100%|██████████| 795/795 [04:46<00:00, 2.78it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Saved summary: fsla_read_counts.csv\n", "Saved raw flagstat log: fsla_flagstat_raw.txt\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "### step1: Get the Number of Reads per BAM File\n", "\n", "from DOLPHIN.cell_reads_aggregation.get_single_bam_reads import run_reads_count\n", "\n", "run_reads_count(\n", " out_name=\"fsla\",\n", " bam_file_path=\"./All_Bam\",\n", " out_directory=\"./\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "### Step 2: Cell Aggregation\n", "### Before running the function below, ensure that the first column of the _read_counts.csv file contains the cell IDs (matching the metadata file).\n", "\n", "from DOLPHIN.cell_reads_aggregation.process_reads_aggregation import run_reads_aggregation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run_reads_aggregation(\n", " metadata_path=\"./fsla_meta.csv\",\n", " bam_file_path=\"./01_std_star/All_Bam\",\n", " bam_file_extension=\".std.Aligned.sortedByCoord.out.bam\",\n", " junction_file_path=\"./01_std_star/All_SJ\",\n", " junction_file_extension=\".std.SJ.out.tab\",\n", " neighbor_file=\"./N_fsla_10.csv\",\n", " read_count_path=\"./fsla_read_counts.csv\",\n", " out_directory=\"./\"\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "DOLPHIN", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.15" } }, "nbformat": 4, "nbformat_minor": 2 }