QCatch

This web provides functionality for generating QC reports summarizing the output of alevin-fry (He et al., Nature Methods 19, 316–322 (2022)).

Summary

Number of retained cells: The number of valid, high-quality cells that passed the 2-step cell calling process and optional doublet removal. The 2-step cell calling identifies cells through: (1) initial filtering, and (2) the EmptyDrops method, which detects additional cells whose expression profiles are significantly distinct from the ambient background. The doublet removal step is optional, enabled by the user, and implemented using the Scrublet package.
Number of all processed cells: The total number of cell barcodes observed in the processed sample. Cells with zero reads have been excluded.
Mean reads per retained cell: The total number of reads assigned to the retained cell barcodes, including the mapped and unmapped reads, divided by the number of retained cells.
Median UMI per retained cell: The median number of deduplicated reads (UMIs) per retained cell.
Median genes per retained cell: The median number of detected genes per retained cell.
Total genes detected for retained cells: the total number of unique genes detected acorss all retained cells.
Mapping rate: Fraction of reads that mapped to the augmented reference, calculated as mapped reads / total processed reads.
Sequencing saturation: Sequencing saturation measures the proportion of reads coming from already-observed UMIs, calculated as 1 - (deduplicated reads / total reads). High saturation suggests limited gain from additional sequencing, while low saturation indicates that further sequencing could reveal more unique molecules (UMIs).

Number of retained cells	1,139	Number of all processed cells	5,788
Mean reads per retained cell	9,750	Median UMI per retained cell	3,006
Median genes per retained cell	1,429	Total genes detected for retained cells	22,078
Mapping rate	88.24%	Sequencing saturation	59.56%

🔬 Cell Filtering Pipeline

🧬 Step 1: Cell Calling

Internal 2-step filtering
(OrdMag + EmptyDrops)

1,150 cells

🔍 Step 2: Doublet Removal

- 11 doublets

1,139 singlets

✨ Final Result

Retained cells

1,139

Plots
Log info

Note: For the plots below, the title will specify whether they represent all processed cells or only the retained cells.

⚠️ Low Data Quality Detected

- ⚠️ Warning❗️: Step2- Empty drop failed: non_ambient_result is None. This may indicate low data quality, an incomplete input matrix, or an incorrect chemistry version.

🦒 Knee Plots

The left plot shows the number of UMIs against cell rank (ordered by UMI count). This Knee plot can help identify low-quality cells with too few UMIs.
The right plot shows the number of detected genes against cell rank (ordered by UMI count).

Rank: cells are ranked by number of UMIs.
UMI: deduplicated read.

🔢 UMI Counts and Detected Gene Across Cell Barcodes

The barcode frequency is calculated as the number of reads associated with each cell barcode.
The first two plots show cell barcodes ranked by total read count, plotted against two key metrics: the number of UMIs and the number of detected genes per barcode.
The third plot illustrates how the number of detected genes increases with UMI count per cell.

🧽 UMI Deduplication Plot

The scatter plot compares the number of mapped reads and number of UMI for each retained cell. Each point represents a cell, with the x-axis showing mapped reads count and the y-axis showing deduplicated UMIs count. The reference line indicates the mean deduplication rate across all cells.

UMI Deduplication: UMI deduplication is the process of identifying and removing duplicate reads that arise from PCR amplification of the same original molecule.
Dedup Rate: The UMI count devided by number of mapped reads for each cell.

🧬 Distribution of Detected Gene Count and Mitochondrial Percentage Plot

The left plot depicts the distribution of detected gene counts.
The right plot shows the distribution of mitochondrial gene expression percentages across cells. Note: The "All Cells" plot does not display every processed cell. To improve visualization and reduce clutter from very low-quality cells, we excluded cells with fewer than 10 detected genes—these are typically considered nearly empty. In contrast, the "Retained Cells" plot includes all retained cells, without applying this gene count filter.

🧩 Bar plot for S/U/A counts and (S+A)/(U+S+A) Ratio Plot

When using “USA mode” in alevin-fry, spliced (S), unspliced (U), and ambiguous (A) read counts are generated separately for each gene in each cell.
In the bar plot, we first sum the spliced, unspliced, and ambiguous counts across all genes and all cells. The plot then displays the total number of reads in each splicing category: Spliced (S), Unspliced (U), and Ambiguous (A).
In the histogram, we calculate the splicing ratio for each cell as (S + A) / (S + U + A), where the counts are summed across all genes. The histogram shows the distribution of these per-cell splicing ratios.

🗺️ Clustering: UMAP and t-SNE

These plots are low-dimensional projections of high-dimensional gene expression data. Each point represents a single cell. Cells that appear close together in the plot are inferred to have similar transcriptomic profiles, indicating potential similarity in cell type or state.
Note: By default, only final retained cells (singlets) are shown after doublet removal. Use the toggle buttons above to view doublets if doublet visualization was enabled with the --visualize_doublets flag. Standard preprocessing steps were applied using `Scanpy`, including normalization, log transformation, feature selection, and dimensionality reduction.

See the source python code for the plots below:


        # Shared embedding for doublet visualization
        # Preprocessing on all cells (singlets + doublets)
        sc.pp.normalize_total(adata_with_doublets)
        sc.pp.log1p(adata_with_doublets)
        sc.pp.highly_variable_genes(adata_with_doublets, n_top_genes=min(2000, n_valid))
        sc.tl.pca(adata_with_doublets)
        sc.pp.neighbors(adata_with_doublets)
        sc.tl.umap(adata_with_doublets)
        sc.tl.tsne(adata_with_doublets)
        sc.tl.leiden(adata_with_doublets, flavor="igraph", n_iterations=2)

        # View 1: "Retained Cells Only" - Filter to singlets, color by Leiden clusters
        singlet_mask = ~adata_with_doublets.obs["predicted_doublet"].fillna(True)
        adata_singlets = adata_with_doublets[singlet_mask, :]

        umap_df_singlets = pd.DataFrame(adata_singlets.obsm["X_umap"], columns=["UMAP1", "UMAP2"])
        umap_df_singlets["leiden"] = adata_singlets.obs["leiden"].values
        fig_umap_singlets = px.scatter(umap_df_singlets, x="UMAP1", y="UMAP2", color="leiden",
                                        title="UMAP with Leiden Clusters (Retained Cells Only)")

        tsne_df_singlets = pd.DataFrame(adata_singlets.obsm["X_tsne"], columns=["TSNE1", "TSNE2"])
        tsne_df_singlets["leiden"] = adata_singlets.obs["leiden"].values
        fig_tsne_singlets = px.scatter(tsne_df_singlets, x="TSNE1", y="TSNE2", color="leiden",
                                        title="t-SNE with Leiden Clusters (Retained Cells Only)")

        # View 2: "With Doublets" - All cells, color by doublet status
        adata_with_doublets.obs["doublet_label"] = adata_with_doublets.obs["predicted_doublet"].map(
            {True: "Doublet", False: "Singlet"}).astype(str)

        umap_df_all = pd.DataFrame(adata_with_doublets.obsm["X_umap"], columns=["UMAP1", "UMAP2"])
        umap_df_all["doublet_label"] = adata_with_doublets.obs["doublet_label"].values
        umap_df_all["doublet_score"] = adata_with_doublets.obs["doublet_score"].values
        fig_umap_doublets = px.scatter(umap_df_all, x="UMAP1", y="UMAP2", color="doublet_label",
                                        hover_data=["doublet_score"],
                                        color_discrete_map={"Singlet": "#3498db", "Doublet": "#e74c3c"},
                                        title="UMAP with Doublet Classification (With Doublets)")

        tsne_df_all = pd.DataFrame(adata_with_doublets.obsm["X_tsne"], columns=["TSNE1", "TSNE2"])
        tsne_df_all["doublet_label"] = adata_with_doublets.obs["doublet_label"].values
        tsne_df_all["doublet_score"] = adata_with_doublets.obs["doublet_score"].values
        fig_tsne_doublets = px.scatter(tsne_df_all, x="TSNE1", y="TSNE2", color="doublet_label",
                                        hover_data=["doublet_score"],
                                        color_discrete_map={"Singlet": "#3498db", "Doublet": "#e74c3c"},
                                        title="t-SNE with Doublet Classification (With Doublets)")

📜 Quant Log Information

alt_resolved_cell_numbers: A list of global cell indices where an alternative resolution strategy was applied for large connected components. If this list is empty, no cells used the alternative resolution strategy.
cmd: The command line used for this af_quant process.
dump_eq: Indicates whether equivalence class (EQ class) information was dumped.
empty_resolved_cell_numbers: A list of global cell indices with no gene expression.
num_genes: The total number of genes. When usa_mode is enabled, this count represents the sum of gene across three categories: unspliced(U), spliced(S), and ambiguous(A).
num_quantified_cells: The number of cells that were quantified.
resolution_strategy: The resolution strategy used for quantification.
usa_mode: Indicates that data was processed in Unspliced-Spliced-Ambiguous (USA) mode to classify each transcript’s splicing state.
version_str: The tool’s version number.

Category	Content
alt_resolved_cell_numbers	[]
cmd	/fs/nexus-projects/sc_frag_len/nextflow/conda_envs/cache/simpleaf_env/bin/alevin-fry quant -i /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant -o /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant -t 32 -m /fs/nexus-projects/sc_frag_len/nextflow/end2end_forseti_2026/simpleaf_with_new_piscem/af_test_workdir/human-2024-A_splici_piscem_142/index/t2g_3col.tsv -r cr-like
dump_eq	False
empty_resolved_cell_numbers	[1121, 1625, 1893, 2389, 3074]
num_genes	115818
num_quantified_cells	5793
quant_options	{'cmdline': '/fs/nexus-projects/sc_frag_len/nextflow/conda_envs/cache/simpleaf_env/bin/alevin-fry quant -i /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant -o /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant -t 32 -m /fs/nexus-projects/sc_frag_len/nextflow/end2end_forseti_2026/simpleaf_with_new_piscem/af_test_workdir/human-2024-A_splici_piscem_142/index/t2g_3col.tsv -r cr-like', 'dump_eq': False, 'filter_list': None, 'init_uniform': False, 'input_dir': '/fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant', 'large_graph_thresh': 0, 'num_bootstraps': 0, 'num_threads': 32, 'output_dir': '/fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant', 'pug_exact_umi': False, 'resolution': 'CellRangerLike', 'sa_model': 'WinnerTakeAll', 'small_thresh': 10, 'summary_stat': False, 'tg_map': '/fs/nexus-projects/sc_frag_len/nextflow/end2end_forseti_2026/simpleaf_with_new_piscem/af_test_workdir/human-2024-A_splici_piscem_142/index/t2g_3col.tsv', 'use_mtx': True, 'version': '0.11.2'}
resolution_strategy	CellRangerLike
usa_mode	True
version_str	0.11.2

📝 Permit List Log Information

cmd: The command-line input provided by users for generating the permit list.
expected_ori: The expected alignment orientation for the sequencing chemistry being processed.
gpl_options: The actual command line executed for the 'generate permit list' process, including pre-filled settings.
max-ambig-record: The maximum number of reference sequences to which a read can be mapped.
permit-list-type: The type of permit list being used.
velo_mode: A placeholder parameter reserved for future integration with alevin-fry-Forseti; currently always set to false.
version_str: The version number of the tool.

Category	Content
cmd	/fs/nexus-projects/sc_frag_len/nextflow/conda_envs/cache/simpleaf_env/bin/alevin-fry generate-permit-list -i /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_map -d fw -t 8 --unfiltered-pl /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/af_home/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c --min-reads 10 -o /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant
expected_ori	+
gpl_options	{'cmdline': '/fs/nexus-projects/sc_frag_len/nextflow/conda_envs/cache/simpleaf_env/bin/alevin-fry generate-permit-list -i /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_map -d fw -t 8 --unfiltered-pl /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/af_home/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c --min-reads 10 -o /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant', 'expected_ori': 'Forward', 'fmeth': {'UnfilteredExternalList': ['/fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/af_home/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c', 10]}, 'input_dir': '/fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_map', 'output_dir': '/fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant', 'threads': 8, 'velo_mode': False, 'version': '0.11.2'}
max-ambig-record	931
permit-list-type	unfiltered
velo_mode	False
version_str	0.11.2