QCatch

This web provides functionality for generating QC reports summarizing the output of alevin-fry (He et al., Nature Methods 19, 316–322 (2022)).

Summary

Number of retained cells: The number of valid, high-quality cells that passed the 2-step cell calling process and optional doublet removal. The 2-step cell calling identifies cells through: (1) initial filtering, and (2) the EmptyDrops method, which detects additional cells whose expression profiles are significantly distinct from the ambient background. The doublet removal step is optional, enabled by the user, and implemented using the Scrublet package.
Number of all processed cells: The total number of cell barcodes observed in the processed sample. Cells with zero reads have been excluded.
Mean reads per retained cell: The total number of reads assigned to the retained cell barcodes, including the mapped and unmapped reads, divided by the number of retained cells.
Median UMI per retained cell: The median number of deduplicated reads (UMIs) per retained cell.
Median genes per retained cell: The median number of detected genes per retained cell.
Total genes detected for retained cells: the total number of unique genes detected acorss all retained cells.
Mapping rate: Fraction of reads that mapped to the augmented reference, calculated as mapped reads / total processed reads.
Sequencing saturation: Sequencing saturation measures the proportion of reads coming from already-observed UMIs, calculated as 1 - (deduplicated reads / total reads). High saturation suggests limited gain from additional sequencing, while low saturation indicates that further sequencing could reveal more unique molecules (UMIs).

Number of retained cells1,139Number of all processed cells5,788
Mean reads per retained cell9,750Median UMI per retained cell3,006
Median genes per retained cell1,429Total genes detected for retained cells22,078
Mapping rate88.24%Sequencing saturation59.56%
🦒 Knee Plots

The left plot shows the number of UMIs against cell rank (ordered by UMI count). This Knee plot can help identify low-quality cells with too few UMIs.
The right plot shows the number of detected genes against cell rank (ordered by UMI count).

Rank: cells are ranked by number of UMIs.
UMI: deduplicated read.


🔢 UMI Counts and Detected Gene Across Cell Barcodes

The barcode frequency is calculated as the number of reads associated with each cell barcode.
The first two plots show cell barcodes ranked by total read count, plotted against two key metrics: the number of UMIs and the number of detected genes per barcode.
The third plot illustrates how the number of detected genes increases with UMI count per cell.


🧽 UMI Deduplication Plot

The scatter plot compares the number of mapped reads and number of UMI for each retained cell. Each point represents a cell, with the x-axis showing mapped reads count and the y-axis showing deduplicated UMIs count. The reference line indicates the mean deduplication rate across all cells.

UMI Deduplication: UMI deduplication is the process of identifying and removing duplicate reads that arise from PCR amplification of the same original molecule.
Dedup Rate: The UMI count devided by number of mapped reads for each cell.


🧬 Distribution of Detected Gene Count and Mitochondrial Percentage Plot

The left plot depicts the distribution of detected gene counts.
The right plot shows the distribution of mitochondrial gene expression percentages across cells. Note: The "All Cells" plot does not display every processed cell. To improve visualization and reduce clutter from very low-quality cells, we excluded cells with fewer than 10 detected genes—these are typically considered nearly empty. In contrast, the "Retained Cells" plot includes all retained cells, without applying this gene count filter.


🧩 Bar plot for S/U/A counts and (S+A)/(U+S+A) Ratio Plot

When using “USA mode” in alevin-fry, spliced (S), unspliced (U), and ambiguous (A) read counts are generated separately for each gene in each cell.
In the bar plot, we first sum the spliced, unspliced, and ambiguous counts across all genes and all cells. The plot then displays the total number of reads in each splicing category: Spliced (S), Unspliced (U), and Ambiguous (A).
In the histogram, we calculate the splicing ratio for each cell as (S + A) / (S + U + A), where the counts are summed across all genes. The histogram shows the distribution of these per-cell splicing ratios.


🗺️ Clustering: UMAP and t-SNE

These plots are low-dimensional projections of high-dimensional gene expression data. Each point represents a single cell. Cells that appear close together in the plot are inferred to have similar transcriptomic profiles, indicating potential similarity in cell type or state.
Note: By default, only final retained cells (singlets) are shown after doublet removal. Use the toggle buttons above to view doublets if doublet visualization was enabled with the --visualize_doublets flag. Standard preprocessing steps were applied using `Scanpy`, including normalization, log transformation, feature selection, and dimensionality reduction.

See the source python code for the plots below:


        # Shared embedding for doublet visualization
        # Preprocessing on all cells (singlets + doublets)
        sc.pp.normalize_total(adata_with_doublets)
        sc.pp.log1p(adata_with_doublets)
        sc.pp.highly_variable_genes(adata_with_doublets, n_top_genes=min(2000, n_valid))
        sc.tl.pca(adata_with_doublets)
        sc.pp.neighbors(adata_with_doublets)
        sc.tl.umap(adata_with_doublets)
        sc.tl.tsne(adata_with_doublets)
        sc.tl.leiden(adata_with_doublets, flavor="igraph", n_iterations=2)

        # View 1: "Retained Cells Only" - Filter to singlets, color by Leiden clusters
        singlet_mask = ~adata_with_doublets.obs["predicted_doublet"].fillna(True)
        adata_singlets = adata_with_doublets[singlet_mask, :]

        umap_df_singlets = pd.DataFrame(adata_singlets.obsm["X_umap"], columns=["UMAP1", "UMAP2"])
        umap_df_singlets["leiden"] = adata_singlets.obs["leiden"].values
        fig_umap_singlets = px.scatter(umap_df_singlets, x="UMAP1", y="UMAP2", color="leiden",
                                        title="UMAP with Leiden Clusters (Retained Cells Only)")

        tsne_df_singlets = pd.DataFrame(adata_singlets.obsm["X_tsne"], columns=["TSNE1", "TSNE2"])
        tsne_df_singlets["leiden"] = adata_singlets.obs["leiden"].values
        fig_tsne_singlets = px.scatter(tsne_df_singlets, x="TSNE1", y="TSNE2", color="leiden",
                                        title="t-SNE with Leiden Clusters (Retained Cells Only)")

        # View 2: "With Doublets" - All cells, color by doublet status
        adata_with_doublets.obs["doublet_label"] = adata_with_doublets.obs["predicted_doublet"].map(
            {True: "Doublet", False: "Singlet"}).astype(str)

        umap_df_all = pd.DataFrame(adata_with_doublets.obsm["X_umap"], columns=["UMAP1", "UMAP2"])
        umap_df_all["doublet_label"] = adata_with_doublets.obs["doublet_label"].values
        umap_df_all["doublet_score"] = adata_with_doublets.obs["doublet_score"].values
        fig_umap_doublets = px.scatter(umap_df_all, x="UMAP1", y="UMAP2", color="doublet_label",
                                        hover_data=["doublet_score"],
                                        color_discrete_map={"Singlet": "#3498db", "Doublet": "#e74c3c"},
                                        title="UMAP with Doublet Classification (With Doublets)")

        tsne_df_all = pd.DataFrame(adata_with_doublets.obsm["X_tsne"], columns=["TSNE1", "TSNE2"])
        tsne_df_all["doublet_label"] = adata_with_doublets.obs["doublet_label"].values
        tsne_df_all["doublet_score"] = adata_with_doublets.obs["doublet_score"].values
        fig_tsne_doublets = px.scatter(tsne_df_all, x="TSNE1", y="TSNE2", color="doublet_label",
                                        hover_data=["doublet_score"],
                                        color_discrete_map={"Singlet": "#3498db", "Doublet": "#e74c3c"},
                                        title="t-SNE with Doublet Classification (With Doublets)")
        
📜 Quant Log Information

alt_resolved_cell_numbers: A list of global cell indices where an alternative resolution strategy was applied for large connected components. If this list is empty, no cells used the alternative resolution strategy.
cmd: The command line used for this af_quant process.
dump_eq: Indicates whether equivalence class (EQ class) information was dumped.
empty_resolved_cell_numbers: A list of global cell indices with no gene expression.
num_genes: The total number of genes. When usa_mode is enabled, this count represents the sum of gene across three categories: unspliced(U), spliced(S), and ambiguous(A).
num_quantified_cells: The number of cells that were quantified.
resolution_strategy: The resolution strategy used for quantification.
usa_mode: Indicates that data was processed in Unspliced-Spliced-Ambiguous (USA) mode to classify each transcript’s splicing state.
version_str: The tool’s version number.

Category Content
alt_resolved_cell_numbers[]
cmd/fs/nexus-projects/sc_frag_len/nextflow/conda_envs/cache/simpleaf_env/bin/alevin-fry quant -i /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant -o /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant -t 32 -m /fs/nexus-projects/sc_frag_len/nextflow/end2end_forseti_2026/simpleaf_with_new_piscem/af_test_workdir/human-2024-A_splici_piscem_142/index/t2g_3col.tsv -r cr-like
dump_eqFalse
empty_resolved_cell_numbers[1121, 1625, 1893, 2389, 3074]
num_genes115818
num_quantified_cells5793
quant_options{'cmdline': '/fs/nexus-projects/sc_frag_len/nextflow/conda_envs/cache/simpleaf_env/bin/alevin-fry quant -i /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant -o /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant -t 32 -m /fs/nexus-projects/sc_frag_len/nextflow/end2end_forseti_2026/simpleaf_with_new_piscem/af_test_workdir/human-2024-A_splici_piscem_142/index/t2g_3col.tsv -r cr-like', 'dump_eq': False, 'filter_list': None, 'init_uniform': False, 'input_dir': '/fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant', 'large_graph_thresh': 0, 'num_bootstraps': 0, 'num_threads': 32, 'output_dir': '/fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant', 'pug_exact_umi': False, 'resolution': 'CellRangerLike', 'sa_model': 'WinnerTakeAll', 'small_thresh': 10, 'summary_stat': False, 'tg_map': '/fs/nexus-projects/sc_frag_len/nextflow/end2end_forseti_2026/simpleaf_with_new_piscem/af_test_workdir/human-2024-A_splici_piscem_142/index/t2g_3col.tsv', 'use_mtx': True, 'version': '0.11.2'}
resolution_strategyCellRangerLike
usa_modeTrue
version_str0.11.2
📝 Permit List Log Information

cmd: The command-line input provided by users for generating the permit list.
expected_ori: The expected alignment orientation for the sequencing chemistry being processed.
gpl_options: The actual command line executed for the 'generate permit list' process, including pre-filled settings.
max-ambig-record: The maximum number of reference sequences to which a read can be mapped.
permit-list-type: The type of permit list being used.
velo_mode: A placeholder parameter reserved for future integration with alevin-fry-Forseti; currently always set to false.
version_str: The version number of the tool.

Category Content
cmd/fs/nexus-projects/sc_frag_len/nextflow/conda_envs/cache/simpleaf_env/bin/alevin-fry generate-permit-list -i /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_map -d fw -t 8 --unfiltered-pl /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/af_home/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c --min-reads 10 -o /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant
expected_ori+
gpl_options{'cmdline': '/fs/nexus-projects/sc_frag_len/nextflow/conda_envs/cache/simpleaf_env/bin/alevin-fry generate-permit-list -i /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_map -d fw -t 8 --unfiltered-pl /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/af_home/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c --min-reads 10 -o /fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant', 'expected_ori': 'Forward', 'fmeth': {'UnfilteredExternalList': ['/fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/af_home/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c', 10]}, 'input_dir': '/fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_map', 'output_dir': '/fs/nexus-projects/sc_frag_len/nextflow/QCatch_work_folder/Mimic_low_quality_data/low_depth/af_test_workdir/pbmc1k_sub_quant/af_quant', 'threads': 8, 'velo_mode': False, 'version': '0.11.2'}
max-ambig-record931
permit-list-typeunfiltered
velo_modeFalse
version_str0.11.2