-
Processing feature barcoding data with alevin-fry
In this tutorial we will look at how to process a CITE-seq experiment (a type of feature barcoding experiment) using an alevin-fry based pipeline. Note : This tutorial is meant to mimic the original tutorial for feature barcode analysis with alevin written by Avi Srivastava and Yuhan Hao. Thus, most of the descriptive text and commands are taken directly from that tutorial. However, here we will be analyzing the data using the alevin-fry pipeline instead of alevin. …
-
An introduction to RNA-velocity using alevin-fry
Recently, RNA-velocity estimation has becomes increasingly popular tool in single-cell RNA seq analysis. In this post, we will discuss an additional advantage brought by the Unspliced-Spliced-Ambiguous (USA) mode introduced in alevin-fry 0.3.0 and later. That is, the solution presented in that approach for controlling the spurious mapping to spliced transcripts of sequenced fragments arising from introns (in the absence of full decoy) basically gives us the preprocessing results we need to perform an RNA-velocity analysis “for free”. Here we provide an end-to-end tutorial describing how to perform an RNA-velocity analysis for a 10x Chromium dataset. In this tutorial, we will show the whole analysis pipeline, starting from the raw FASTQ files to the gorgeous velocity plots (generated by scVelo) that you may like to include in your next analysis or paper. …
-
Resolving the splicing origins of UMIs to improve the specificity of single-cell RNA-seq transcriptome mapping with alevin-fry
As the number of different tools available to enable versatile and efficient pre-processing of single-cell and single-nucleus RNA-seq data continues to increase, the trade-offs of different approaches to pre-processing are starting to be explored in more detail. For example, one of the high-level results reported in the recent pre-print for the STARsolo tool is that “Pseudoalignment-to-transcriptome predicts expression for thousands of non-expressed genes”. This is unfortunate to the extent that such methods are the fastest and require the least working memory to preprocess single-cell experiments. While more investigation is certainly warranted into the exact mechanisms that lead to the prediction of expression for these genes, and the extent to which they might affect subsequent analysis, one of the primary mechanisms posited is spurious pseudoalignment of sequenced fragments that truly arise from intronic sequences, rather than from spliced, mature mRNA. Given these observations, the question naturally arises, “Can these spuriously mapped fragments be controlled for without forfeiting all of the resource advantages of pseudoalignment-to-transcriptome?”. …
-
Processing data quickly and in small memory with alevin-fry
The alevin-fry pipeline provides many different options for processing your single-cell RNA-seq data, with different choices making different sets of simplifying assumptions about what type of processing is appropriate. As the developers of this tool, we are still actively exploring the effects that different choices (which imply different assumptions) have on data pre-processing and downstream analysis. However, in this tutorial, the focus is on processing data quickly and in a small amount of memory. Thus, we will run the pipeline in a configuration that makes the most simplifying assumptions about how the data should be processed. In this configuration, the pipeline is adopting certain computational simplifications that are argued for in Melsted et al. 2019, specifically the use of pseudoalignment to map reads to the target transcriptome, the omission of small-edit distance UMI collapse (only identical UMIs are collapsed). …