-
Resolving the splicing origins of UMIs to improve the specificity of single-cell RNA-seq transcriptome mapping with alevin-fry
As the number of different tools available to enable versatile and efficient pre-processing of single-cell and single-nucleus RNA-seq data continues to increase, the trade-offs of different approaches to pre-processing are starting to be explored in more detail. For example, one of the high-level results reported in the recent pre-print for the STARsolo tool is that “Pseudoalignment-to-transcriptome predicts expression for thousands of non-expressed genes”. This is unfortunate to the extent that such methods are the fastest and require the least working memory to preprocess single-cell experiments. While more investigation is certainly warranted into the exact mechanisms that lead to the prediction of expression for these genes, and the extent to which they might affect subsequent analysis, one of the primary mechanisms posited is spurious pseudoalignment of sequenced fragments that truly arise from intronic sequences, rather than from spliced, mature mRNA. Given these observations, the question naturally arises, “Can these spuriously mapped fragments be controlled for without forfeiting all of the resource advantages of pseudoalignment-to-transcriptome?”. …
-
Processing data quickly and in small memory with alevin-fry
The alevin-fry pipeline provides many different options for processing your single-cell RNA-seq data, with different choices making different sets of simplifying assumptions about what type of processing is appropriate. As the developers of this tool, we are still actively exploring the effects that different choices (which imply different assumptions) have on data pre-processing and downstream analysis. However, in this tutorial, the focus is on processing data quickly and in a small amount of memory. Thus, we will run the pipeline in a configuration that makes the most simplifying assumptions about how the data should be processed. In this configuration, the pipeline is adopting certain computational simplifications that are argued for in Melsted et al. 2019, specifically the use of pseudoalignment to map reads to the target transcriptome, the omission of small-edit distance UMI collapse (only identical UMIs are collapsed). …