Cell Aggregation
Step 1: Find Cell Neighbors Using K-Nearest Neighbors (KNN)
In this step, we will use the K-Nearest Neighbors (KNN) algorithm to find the nearest neighboring cells for each cell in the dataset. The output file will be saved as:
N_<out_name>_<N_neighbor>.csv
Each row represents a pair of neighboring cells:
main_name |
neighbor |
|---|---|
CellA |
CellB |
CellA |
CellC |
[ ]:
from DOLPHIN.cell_reads_aggregation.find_cell_neighbor import run_find_neighbor
run_find_neighbor(
embedding_data="./DOLPHIN_Z.h5ad",
out_name='fsla'
)
Saved neighbor list to ./N_fsla_10.csv
Step 2: Cell Aggregation - Adding Junction Reads from Neighboring Cells
In this step, we will perform cell aggregation by incorporating confident junction reads from neighboring cells. This process enhances the signal for alternative splicing analysis and helps to resolve potential noise by taking into account the junction read patterns of nearby cells.
Before running this step, please ensure that you have single cell BAM files (and bam.bai) and junction read files generated by STAR alignment.
For a tutorial on how to run STAR alignment, please refer to Step 1 full-length or 10x.
[ ]:
### step1: Get the Number of Reads per BAM File
from DOLPHIN.cell_reads_aggregation.get_single_bam_reads import run_reads_count
run_reads_count(
out_name="fsla",
bam_file_path="./All_Bam",
out_directory="./"
)
Processing BAM files: 100%|██████████| 795/795 [04:46<00:00, 2.78it/s]
Saved summary: fsla_read_counts.csv
Saved raw flagstat log: fsla_flagstat_raw.txt
[ ]:
### Step 2: Cell Aggregation
### Before running the function below, ensure that the first column of the <out_name>_read_counts.csv file contains the cell IDs (matching the metadata file).
from DOLPHIN.cell_reads_aggregation.process_reads_aggregation import run_reads_aggregation
[ ]:
run_reads_aggregation(
metadata_path="./fsla_meta.csv",
bam_file_path="./01_std_star/All_Bam",
bam_file_extension=".std.Aligned.sortedByCoord.out.bam",
junction_file_path="./01_std_star/All_SJ",
junction_file_extension=".std.SJ.out.tab",
neighbor_file="./N_fsla_10.csv",
read_count_path="./fsla_read_counts.csv",
out_directory="./"
)