QCatch

This web provides functionality for generating QC reports summarizing the output of alevin-fry (He et al., Nature Methods 19, 316–322 (2022)).

Summary

Number of retained cells: The number of valid and high quality cells that passed the cell calling step. This includes cells identified during the initial filtering and additional cells identified by the EmptyDrops step, whose expression profiles are significantly distinct from the ambient background.
Number of all processed cells: The total number of cell barcodes observed in the processed sample. Cells with zero reads have been excluded.
Mean reads per retained cell: The total number of reads assigned to the retained cell barcodes, including the mapped and unmapped reads, divided by the number of retained cells.
Median UMI per retained cell: The median number of deduplicated reads (UMIs) per retained cell.
Median genes per retained cell: The median number of detected genes per retained cell.
Total genes detected for retained cells: the total number of unique genes detected acorss all retained cells.
Mapping rate: Fraction of reads that mapped to the augmented reference, calculated as mapped reads / total processed reads.
Sequencing saturation: Sequencing saturation measures the proportion of reads coming from already-observed UMIs, calculated as 1 - (deduplicated reads / total reads). High saturation suggests limited gain from additional sequencing, while low saturation indicates that further sequencing could reveal more unique molecules (UMIs).

Number of retained cells	1,223	Number of all processed cells	71,080
Mean reads per retained cell	46,993	Median UMI per retained cell	10,445
Median genes per retained cell	3,263	Total genes detected for retained cells	26,132
Mapping rate	86.2%	Sequencing saturation	69.97%

Plots
Log info

Note: For the plots below, the title will specify whether they represent all processed cells or only the retained cells.

🦒 Knee Plots

The left plot shows the number of UMIs against cell rank (ordered by UMI count). This Knee plot can help identify low-quality cells with too few UMIs.
The right plot shows the number of detected genes against cell rank (ordered by UMI count).

Rank: cells are ranked by number of UMIs.
UMI: deduplicated read.

🔢 UMI Counts and Detected Gene Across Cell Barcodes

The barcode frequency is calculated as the number of reads associated with each cell barcode.
The first two plots show cell barcodes ranked by total read count, plotted against two key metrics: the number of UMIs and the number of detected genes per barcode.
The third plot illustrates how the number of detected genes increases with UMI count per cell.

🧽 UMI Deduplication Plot

The scatter plot compares the number of mapped reads and number of UMI for each retained cell. Each point represents a cell, with the x-axis showing mapped reads count and the y-axis showing deduplicated UMIs count. The reference line indicates the mean deduplication rate across all cells.

UMI Deduplication: UMI deduplication is the process of identifying and removing duplicate reads that arise from PCR amplification of the same original molecule.
Dedup Rate: The UMI count devided by number of mapped reads for each cell.

🧬 Distribution of Detected Gene Count and Mitochondrial Percentage Plot

The left plot depicts the distribution of detected gene counts.
The right plot shows the distribution of mitochondrial gene expression percentages across cells. Note: The “All Cells” plot does not display every processed cell. To improve visualization and reduce clutter from very low-quality cells, we excluded cells with fewer than 20 detected genes—these are typically considered nearly empty. In contrast, the “Retained Cells” plot includes all retained cells, without applying this gene count filter.

🧩 Bar plot for S/U/A counts and (S+A)/(U+S+A) Ratio Plot

When using “USA mode” in alevin-fry, spliced (S), unspliced (U), and ambiguous (A) read counts are generated separately for each gene in each cell.
In the bar plot, we first sum the spliced, unspliced, and ambiguous counts across all genes and all cells. The plot then displays the total number of reads in each splicing category: Spliced (S), Unspliced (U), and Ambiguous (A).
In the histogram, we calculate the splicing ratio for each cell as (S + A) / (S + U + A), where the counts are summed across all genes. The histogram shows the distribution of these per-cell splicing ratios.

🗺️ Clustering: UMAP and t-SNE

These plots are low-dimensional projections of high-dimensional gene expression data. Each point represents a single cell. Cells that appear close together in the plot are inferred to have similar transcriptomic profiles, indicating potential similarity in cell type or state.
Note: Only retained cells are included in these visualizations. All retained cells are shown without further filtering. Standard preprocessing steps were applied using `Scanpy`, including normalization, log transformation, feature selection, and dimensionality reduction.

See the source python code for the plots below:


        # Pre-processing for UMAP and t-SNE

        sc.settings.set_figure_params(dpi=200, facecolor="white")
        # Normalizing to median total counts
        sc.pp.normalize_total(adata)
        # Logarithmize the data
        sc.pp.log1p(adata)

        # feature selection
        n_valid = adata.X.shape[1]
        sc.pp.highly_variable_genes(adata, n_top_genes=min(2000, n_valid))
        # dimensionality Reduction
        sc.tl.pca(adata)
        # nearest neighbor graph constuction and visualization
        sc.pp.neighbors(adata)
        # UMAP
        sc.tl.umap(adata)

        # clustering
        # Using the igraph implementation and a fixed number of iterations can be significantly faster, especially for larger datasets
        sc.tl.leiden(adata, flavor="igraph", n_iterations=2)

        n_cells = adata.n_obs
        perplexity = min(30, max(2, (n_cells - 1) // 3))

        # t-SNE
        sc.tl.tsne(adata, perplexity=perplexity)

        # Create a Plotly-based UMAP scatter plot with Leiden clusters
        umap_df = pd.DataFrame(adata.obsm["X_umap"], columns=["UMAP1", "UMAP2"])
        umap_df["leiden"] = adata.obs["leiden"].values

        # UMAP in plotly
        opacity = 0.7
        # modify dot size
        dot_size = 3
        fig_umap = px.scatter(
            umap_df,
            x="UMAP1",
            y="UMAP2",
            color="leiden",
            title="UMAP with Leiden Clusters (Retained Cells Only)",
            width=width,
            height=height,
            opacity=opacity,
        ).update_traces(marker={"size": dot_size})  # Set the dot size

        # Center title and reduce margin
        fig_umap.update_layout(title_x=0.5, margin={"t": 30, "l": 10, "r": 10, "b": 20})

        # t-SNE plot in plotly
        tsne_df = pd.DataFrame(adata.obsm["X_tsne"], columns=["TSNE1", "TSNE2"])
        tsne_df["leiden"] = adata.obs["leiden"].values

        fig_tsne = px.scatter(
            tsne_df,
            x="TSNE1",
            y="TSNE2",
            color="leiden",
            title="t-SNE with Leiden Clusters (Retained Cells Only)",
            width=480,
            height=360,
            opacity=0.7,
        ).update_traces(marker={"size": 3})

📜 Quant Log Information

alt_resolved_cell_numbers: A list of global cell indices where an alternative resolution strategy was applied for large connected components. If this list is empty, no cells used the alternative resolution strategy.
cmd: The command line used for this af_quant process.
dump_eq: Indicates whether equivalence class (EQ class) information was dumped.
empty_resolved_cell_numbers: A list of global cell indices with no gene expression.
num_genes: The total number of genes. When usa_mode is enabled, this count represents the sum of gene across three categories: unspliced(U), spliced(S), and ambiguous(A).
num_quantified_cells: The number of cells that were quantified.
resolution_strategy: The resolution strategy used for quantification.
usa_mode: Indicates that data was processed in Unspliced-Spliced-Ambiguous (USA) mode to classify each transcript’s splicing state.
version_str: The tool’s version number.

Category	Content
alt_resolved_cell_numbers	[]
cmd	/fs/cbcb-lab/rob/rob/miniforge3/envs/simpleaf/bin/alevin-fry quant -i /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant -o /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant -t 32 -m /scratch0/rob/sc_data/refdata-gex-GRCh38-2024-A_piscem_index/index/t2g_3col.tsv -r cr-like
dump_eq	False
empty_resolved_cell_numbers	17642, 17123, 18106, 6870, 18692, 19335, 28396, 14161, 15604, 11322, 1410, 1427, 26150, 10063, 34257, 2995, 3020, 14551, 5927, 14608, 14709, 10501, 24211, 9099, 12128, 35615, 43848, 43920, 13646, 5093, 9401, 15264, 6596, 36848, 9538, 41270, 61528, 64772, 58728, 68858, 67326, 64181, 68770
num_genes	115818
num_quantified_cells	71123
quant_options	{'cmdline': '/fs/cbcb-lab/rob/rob/miniforge3/envs/simpleaf/bin/alevin-fry quant -i /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant -o /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant -t 32 -m /scratch0/rob/sc_data/refdata-gex-GRCh38-2024-A_piscem_index/index/t2g_3col.tsv -r cr-like', 'dump_eq': False, 'filter_list': None, 'init_uniform': False, 'input_dir': '/scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant', 'large_graph_thresh': 0, 'num_bootstraps': 0, 'num_threads': 32, 'output_dir': '/scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant', 'pug_exact_umi': False, 'resolution': 'CellRangerLike', 'sa_model': 'WinnerTakeAll', 'small_thresh': 10, 'summary_stat': False, 'tg_map': '/scratch0/rob/sc_data/refdata-gex-GRCh38-2024-A_piscem_index/index/t2g_3col.tsv', 'use_mtx': True, 'version': '0.11.2'}
resolution_strategy	CellRangerLike
usa_mode	True
version_str	0.11.2

📝 Permit List Log Information

cmd: The command-line input provided by users for generating the permit list.
expected_ori: The expected alignment orientation for the sequencing chemistry being processed.
gpl_options: The actual command line executed for the 'generate permit list' process, including pre-filled settings.
max-ambig-record: The maximum number of reference sequences to which a read can be mapped.
permit-list-type: The type of permit list being used.
velo_mode: A placeholder parameter reserved for future integration with alevin-fry-Forseti; currently always set to false.
version_str: The version number of the tool.

Category	Content
cmd	/fs/cbcb-lab/rob/rob/miniforge3/envs/simpleaf/bin/alevin-fry generate-permit-list -i /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_map -d fw -t 32 --unfiltered-pl /nfshomes/nomad/.afhome/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c --min-reads 10 -o /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant
expected_ori	+
gpl_options	{'cmdline': '/fs/cbcb-lab/rob/rob/miniforge3/envs/simpleaf/bin/alevin-fry generate-permit-list -i /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_map -d fw -t 32 --unfiltered-pl /nfshomes/nomad/.afhome/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c --min-reads 10 -o /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant', 'expected_ori': 'Forward', 'fmeth': {'UnfilteredExternalList': ['/nfshomes/nomad/.afhome/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c', 10]}, 'input_dir': '/scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_map', 'output_dir': '/scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant', 'threads': 32, 'velo_mode': False, 'version': '0.11.2'}
max-ambig-record	953
permit-list-type	unfiltered
velo_mode	False
version_str	0.11.2