QCatch
This web provides functionality for generating QC reports summarizing the output of alevin-fry (He et al., Nature Methods 19, 316–322 (2022)).
Summary
Number of retained cells: The number of valid and high quality cells that passed the cell calling step. This includes cells identified during the initial filtering and additional cells identified by the EmptyDrops step, whose expression profiles are significantly distinct from the ambient background.
Number of all processed cells: The total number of cell barcodes observed in the processed sample. Cells with zero reads have been excluded.
Mean reads per retained cell: The total number of reads assigned to the retained cell barcodes, including the mapped and unmapped reads, divided by the number of retained cells.
Median UMI per retained cell: The median number of deduplicated reads (UMIs) per retained cell.
Median genes per retained cell: The median number of detected genes per retained cell.
Total genes detected for retained cells: the total number of unique genes detected acorss all retained cells.
Mapping rate: Fraction of reads that mapped to the augmented reference, calculated as mapped reads / total processed reads.
Sequencing saturation: Sequencing saturation measures the proportion of reads coming from already-observed UMIs, calculated as 1 - (deduplicated reads / total reads). High saturation suggests limited gain from additional sequencing, while low saturation indicates that further sequencing could reveal more unique molecules (UMIs).
Number of retained cells | 1,223 | Number of all processed cells | 71,080 |
Mean reads per retained cell | 46,993 | Median UMI per retained cell | 10,445 |
Median genes per retained cell | 3,263 | Total genes detected for retained cells | 26,132 |
Mapping rate | 86.2% | Sequencing saturation | 69.97% |
🦒 Knee Plots
The left plot shows the number of UMIs against cell rank (ordered by UMI count). This Knee plot can help identify low-quality cells with too few UMIs.
The right plot shows the number of detected genes against cell rank (ordered by UMI count).
Rank: cells are ranked by number of UMIs.
UMI: deduplicated read.
🔢 UMI Counts and Detected Gene Across Cell Barcodes
The barcode frequency is calculated as the number of reads associated with each cell barcode.
The first two plots show cell barcodes ranked by total read count, plotted against two key metrics: the number of UMIs and the number of detected genes per barcode.
The third plot illustrates how the number of detected genes increases with UMI count per cell.
🧽 UMI Deduplication Plot
The scatter plot compares the number of mapped reads and number of UMI for each retained cell. Each point represents a cell, with the x-axis showing mapped reads count and the y-axis showing deduplicated UMIs count. The reference line indicates the mean deduplication rate across all cells.
UMI Deduplication: UMI deduplication is the process of identifying and removing duplicate reads that arise from PCR amplification of the same original molecule.
Dedup Rate: The UMI count devided by number of mapped reads for each cell.
🧬 Distribution of Detected Gene Count and Mitochondrial Percentage Plot
The left plot depicts the distribution of detected gene counts.
The right plot shows the distribution of mitochondrial gene expression percentages across cells. Note: The “All Cells” plot does not display every processed cell. To improve visualization and reduce clutter from very low-quality cells, we excluded cells with fewer than 20 detected genes—these are typically considered nearly empty. In contrast, the “Retained Cells” plot includes all retained cells, without applying this gene count filter.
🧩 Bar plot for S/U/A counts and (S+A)/(U+S+A) Ratio Plot
When using “USA mode” in alevin-fry, spliced (S), unspliced (U), and ambiguous (A) read counts are generated separately for each gene in each cell.
In the bar plot, we first sum the spliced, unspliced, and ambiguous counts across all genes and all cells. The plot then displays the total number of reads in each splicing category: Spliced (S), Unspliced (U), and Ambiguous (A).
In the histogram, we calculate the splicing ratio for each cell as (S + A) / (S + U + A), where the counts are summed across all genes. The histogram shows the distribution of these per-cell splicing ratios.
🗺️ Clustering: UMAP and t-SNE
These plots are low-dimensional projections of high-dimensional gene expression data. Each point represents a single cell. Cells that appear close together in the plot are inferred to have similar transcriptomic profiles, indicating potential similarity in cell type or state.
Note: Only retained cells are included in these visualizations. All retained cells are shown without further filtering. Standard preprocessing steps were applied using `Scanpy`, including normalization, log transformation, feature selection, and dimensionality reduction.
See the source python code for the plots below:
# Pre-processing for UMAP and t-SNE
sc.settings.set_figure_params(dpi=200, facecolor="white")
# Normalizing to median total counts
sc.pp.normalize_total(adata)
# Logarithmize the data
sc.pp.log1p(adata)
# feature selection
n_valid = adata.X.shape[1]
sc.pp.highly_variable_genes(adata, n_top_genes=min(2000, n_valid))
# dimensionality Reduction
sc.tl.pca(adata)
# nearest neighbor graph constuction and visualization
sc.pp.neighbors(adata)
# UMAP
sc.tl.umap(adata)
# clustering
# Using the igraph implementation and a fixed number of iterations can be significantly faster, especially for larger datasets
sc.tl.leiden(adata, flavor="igraph", n_iterations=2)
n_cells = adata.n_obs
perplexity = min(30, max(2, (n_cells - 1) // 3))
# t-SNE
sc.tl.tsne(adata, perplexity=perplexity)
# Create a Plotly-based UMAP scatter plot with Leiden clusters
umap_df = pd.DataFrame(adata.obsm["X_umap"], columns=["UMAP1", "UMAP2"])
umap_df["leiden"] = adata.obs["leiden"].values
# UMAP in plotly
opacity = 0.7
# modify dot size
dot_size = 3
fig_umap = px.scatter(
umap_df,
x="UMAP1",
y="UMAP2",
color="leiden",
title="UMAP with Leiden Clusters (Retained Cells Only)",
width=width,
height=height,
opacity=opacity,
).update_traces(marker={"size": dot_size}) # Set the dot size
# Center title and reduce margin
fig_umap.update_layout(title_x=0.5, margin={"t": 30, "l": 10, "r": 10, "b": 20})
# t-SNE plot in plotly
tsne_df = pd.DataFrame(adata.obsm["X_tsne"], columns=["TSNE1", "TSNE2"])
tsne_df["leiden"] = adata.obs["leiden"].values
fig_tsne = px.scatter(
tsne_df,
x="TSNE1",
y="TSNE2",
color="leiden",
title="t-SNE with Leiden Clusters (Retained Cells Only)",
width=480,
height=360,
opacity=0.7,
).update_traces(marker={"size": 3})
📜 Quant Log Information
alt_resolved_cell_numbers: A list of global cell indices where an alternative resolution strategy was applied for large connected components. If this list is empty, no cells used the alternative resolution strategy.
cmd: The command line used for this af_quant process.
dump_eq: Indicates whether equivalence class (EQ class) information was dumped.
empty_resolved_cell_numbers: A list of global cell indices with no gene expression.
num_genes: The total number of genes. When usa_mode
is enabled, this count represents the sum of gene across three categories: unspliced(U), spliced(S), and ambiguous(A).
num_quantified_cells: The number of cells that were quantified.
resolution_strategy: The resolution strategy used for quantification.
usa_mode: Indicates that data was processed in Unspliced-Spliced-Ambiguous (USA) mode to classify each transcript’s splicing state.
version_str: The tool’s version number.
Category | Content |
---|---|
alt_resolved_cell_numbers | [] |
cmd | /fs/cbcb-lab/rob/rob/miniforge3/envs/simpleaf/bin/alevin-fry quant -i /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant -o /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant -t 32 -m /scratch0/rob/sc_data/refdata-gex-GRCh38-2024-A_piscem_index/index/t2g_3col.tsv -r cr-like |
dump_eq | False |
empty_resolved_cell_numbers |
17642, 17123, 18106, 6870, 18692, 19335, 28396, 14161, 15604, 11322, 1410, 1427, 26150, 10063, 34257, 2995, 3020, 14551, 5927, 14608, 14709, 10501, 24211, 9099, 12128, 35615, 43848, 43920, 13646, 5093, 9401, 15264, 6596, 36848, 9538, 41270, 61528, 64772, 58728, 68858, 67326, 64181, 68770
|
num_genes | 115818 |
num_quantified_cells | 71123 |
quant_options | {'cmdline': '/fs/cbcb-lab/rob/rob/miniforge3/envs/simpleaf/bin/alevin-fry quant -i /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant -o /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant -t 32 -m /scratch0/rob/sc_data/refdata-gex-GRCh38-2024-A_piscem_index/index/t2g_3col.tsv -r cr-like', 'dump_eq': False, 'filter_list': None, 'init_uniform': False, 'input_dir': '/scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant', 'large_graph_thresh': 0, 'num_bootstraps': 0, 'num_threads': 32, 'output_dir': '/scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant', 'pug_exact_umi': False, 'resolution': 'CellRangerLike', 'sa_model': 'WinnerTakeAll', 'small_thresh': 10, 'summary_stat': False, 'tg_map': '/scratch0/rob/sc_data/refdata-gex-GRCh38-2024-A_piscem_index/index/t2g_3col.tsv', 'use_mtx': True, 'version': '0.11.2'} |
resolution_strategy | CellRangerLike |
usa_mode | True |
version_str | 0.11.2 |
📝 Permit List Log Information
cmd: The command-line input provided by users for generating the permit list.
expected_ori: The expected alignment orientation for the sequencing chemistry being processed.
gpl_options: The actual command line executed for the 'generate permit list' process, including pre-filled settings.
max-ambig-record: The maximum number of reference sequences to which a read can be mapped.
permit-list-type: The type of permit list being used.
velo_mode: A placeholder parameter reserved for future integration with alevin-fry-Forseti; currently always set to false.
version_str: The version number of the tool.
Category | Content |
---|---|
cmd | /fs/cbcb-lab/rob/rob/miniforge3/envs/simpleaf/bin/alevin-fry generate-permit-list -i /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_map -d fw -t 32 --unfiltered-pl /nfshomes/nomad/.afhome/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c --min-reads 10 -o /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant |
expected_ori | + |
gpl_options | {'cmdline': '/fs/cbcb-lab/rob/rob/miniforge3/envs/simpleaf/bin/alevin-fry generate-permit-list -i /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_map -d fw -t 32 --unfiltered-pl /nfshomes/nomad/.afhome/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c --min-reads 10 -o /scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant', 'expected_ori': 'Forward', 'fmeth': {'UnfilteredExternalList': ['/nfshomes/nomad/.afhome/plist/2c9dfb98babe5a57ae763778adb9ebb7bfa531e105823bc26163892089333f8c', 10]}, 'input_dir': '/scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_map', 'output_dir': '/scratch0/rob/sc_data/pbmc_1k_v3_simpleaf/af_quant', 'threads': 32, 'velo_mode': False, 'version': '0.11.2'} |
max-ambig-record | 953 |
permit-list-type | unfiltered |
velo_mode | False |
version_str | 0.11.2 |