Introduction

salmon¹ estimates transcript-level abundances from RNA-seq reads. Rather than producing a full base-to-base alignment of every read to a genome, salmon maps reads directly to a transcriptome and uses a probabilistic model — streaming/online inference followed by an offline EM/VBEM, in the spirit of eXpress² — to assign multi-mapping reads, correct for technical biases³⁴, and report abundances in TPM and estimated read counts.

The pipeline

salmon index builds a reusable index from a transcriptome FASTA (the set of transcript sequences you want to quantify against, optionally with decoy sequences such as the genome).
salmon quant maps your reads against that index, fits the experiment model (fragment-length distribution, optional sequence/GC/positional bias), runs the EM/VBEM optimizer over equivalence classes, and writes the results.
quant.sf is the primary output: one row per transcript with its length, effective length, TPM, and estimated read count. Downstream R/Bioconductor tools (tximport, tximeta, fishpond, swish) read it directly.

salmon 2.0 (Rust) vs. C++ salmon

salmon 2.0 is a complete rewrite in Rust. The design goals were speed, portability, and easy installation, without breaking the things downstream tools depend on.

Same outputs. quant.sf, cmd_info.json, lib_format_counts.json, and aux_info/meta_info.json are unchanged. Inferential replicates (bootstrap/Gibbs) are written in the same format, so tximport/fishpond/swish keep working. See the output format specification.
Single portable binary. No compiler, Boost, or system libraries to install.
A new alignment-free mode (--sketch) for when you want maximum speed.
One breaking change you must act on: the index format changed, so you must rebuild your index with 2.0. Pointing 2.0 at a C++ (pufferfish) index — or C++ salmon at a 2.0 index — is detected and rejected with a clear message.

If you are upgrading from C++ salmon, read Migrating from C++ salmon for the full list of changed, removed, and new options.

References

Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., & Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods, 14(4), 417–419. https://doi.org/10.1038/nmeth.4197 ↩
Roberts, A., & Pachter, L. (2013). Streaming fragment assignment for real-time analysis of sequencing experiments. Nature Methods, 10(1), 71–73. https://doi.org/10.1038/nmeth.2251 ↩
Roberts, A., Trapnell, C., Donaghey, J., Rinn, J. L., & Pachter, L. (2011). Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biology, 12(3), R22. https://doi.org/10.1186/gb-2011-12-3-r22 ↩
Love, M. I., Hogenesch, J. B., & Irizarry, R. A. (2016). Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nature Biotechnology, 34(12), 1287–1291. https://doi.org/10.1038/nbt.3682 ↩

Introduction

The pipeline

salmon 2.0 (Rust) vs. C++ salmon

References

Footnotes