Cell Aggregation

Step 1: Find Cell Neighbors Using K-Nearest Neighbors (KNN)

In this step, we will use the K-Nearest Neighbors (KNN) algorithm to find the nearest neighboring cells for each cell in the dataset. The output file will be saved as:

N_<out_name>_<N_neighbor>.csv

Each row represents a pair of neighboring cells:

main_name

neighbor

CellA

CellB

CellA

CellC

[ ]:
from DOLPHIN.cell_reads_aggregation.find_cell_neighbor import run_find_neighbor

run_find_neighbor(
    embedding_data="./DOLPHIN_Z.h5ad",
    out_name='fsla'
)
Saved neighbor list to ./N_fsla_10.csv

Step 2: Cell Aggregation - Adding Junction Reads from Neighboring Cells

In this step, we will perform cell aggregation by incorporating confident junction reads from neighboring cells. This process enhances the signal for alternative splicing analysis and helps to resolve potential noise by taking into account the junction read patterns of nearby cells.

Before running this step, please ensure that you have single cell BAM files (and bam.bai) and junction read files generated by STAR alignment.

For a tutorial on how to run STAR alignment, please refer to Step 1 full-length or 10x.

[ ]:
### step1: Get the Number of Reads per BAM File

from DOLPHIN.cell_reads_aggregation.get_single_bam_reads import run_reads_count

run_reads_count(
    out_name="fsla",
    bam_file_path="./All_Bam",
    out_directory="./"
)
Processing BAM files: 100%|██████████| 795/795 [04:46<00:00,  2.78it/s]
Saved summary: fsla_read_counts.csv
Saved raw flagstat log: fsla_flagstat_raw.txt

[ ]:
### Step 2: Cell Aggregation
### Before running the function below, ensure that the first column of the <out_name>_read_counts.csv file contains the cell IDs (matching the metadata file).

from DOLPHIN.cell_reads_aggregation.process_reads_aggregation import run_reads_aggregation
[ ]:
run_reads_aggregation(
    metadata_path="./fsla_meta.csv",
    bam_file_path="./01_std_star/All_Bam",
    bam_file_extension=".std.Aligned.sortedByCoord.out.bam",
    junction_file_path="./01_std_star/All_SJ",
    junction_file_extension=".std.SJ.out.tab",
    neighbor_file="./N_fsla_10.csv",
    read_count_path="./fsla_read_counts.csv",
    out_directory="./"
)