Skip to content

RAD I/O & deterministic quantification

salmon can separate the two halves of a run — mapping reads to the transcriptome and quantifying abundances from those mappings — by writing the mappings to a RAD file (the Reduced Alignment Data format shared across the COMBINE-lab tools) and quantifying from it later. The same machinery powers a fully deterministic quantification mode.

  • Map once, quantify many times — re-run quantification with different options (bias correction, --numBootstraps, gene aggregation) without re-mapping.
  • Quantify a shared RAD — quantify a RAD produced by piscem map-bulk with salmon’s EM.
  • Split the phases across machines — map on one host, quantify on another.

Add --writeRad <PATH> to any reads-mode run to also emit per-fragment mappings. The sketch or selective-alignment profile is chosen automatically from the mapping mode. Quantification still runs; add --skipQuant to map only:

Terminal window
# map only, write a RAD
salmon quant -i salmon_index -l A -1 r1.fq.gz -2 r2.fq.gz -p 16 \
--writeRad mappings.rad --skipQuant -o out_map

The file is piscem map-bulk-compatible and can be re-quantified with --rad. salmon additionally bakes an order-independent fragment-length distribution, initial abundances, and the resolved library format into the RAD header, so re-quantifying it is a single pass (see Determinism below).

Terminal window
# quantify a salmon- or piscem-produced RAD (no -i needed)
salmon quant --rad mappings.rad -l A -p 16 -o out_quant

--rad reads the RAD in parallel and runs the full EM. Reference names travel in the RAD header, so no index is required. salmon auto-detects whether the RAD came from salmon (header values are present and consumed — one pass) or from piscem (nothing baked — a first pass derives a unique-fragment fragment-length distribution, then quantifies). Bias correction (--seqBias, --gcBias, --posBias), --numBootstraps, --numGibbsSamples, and -g/--geneMap all work in RAD mode.

A RAD produced by piscem map-bulk quantifies directly:

Terminal window
piscem map-bulk -i piscem_index -1 r1.fq.gz -2 r2.fq.gz -t 16 -o pi
salmon quant --rad pi.rad -l A -p 16 -o out_quant

Deterministic quantification (--deterministic)

Section titled “Deterministic quantification (--deterministic)”

RAD-mode quantification is byte-identical across thread counts and runs. --deterministic brings that guarantee to FASTQ input directly:

Terminal window
salmon quant -i salmon_index -l A -1 r1.fq.gz -2 r2.fq.gz -p 16 \
--deterministic -o out

It maps the reads once to an intermediate RAD, then quantifies from it with a fixed fragment-length distribution — no second mapping pass. The intermediate RAD is written under the output directory and removed on success unless you pass --keepRad (or --writeRad <PATH> to choose its location and keep it).

The determinism comes from making the computation itself order-independent rather than sorting records:

  • equivalence-class weights accumulate in fixed-point integers (integer addition is associative, hence independent of thread count and arrival order);
  • the fragment-length distribution is built from integer count histograms in a fixed order, then frozen;
  • the bias observed models accumulate in fixed-point integers too, so the trained models — and the bias-corrected effective lengths — are byte-identical across thread counts.

--deterministic works with bias correction and does not require -t: the reference sequences for the second pass are taken from the index.

Deterministic alignment mode (-a --deterministic)

Section titled “Deterministic alignment mode (-a --deterministic)”

--deterministic also applies to alignment mode — quantifying a name-grouped transcriptome BAM:

Terminal window
salmon quant -a aln.bam -l A -p 16 --deterministic -o out

salmon writes the BAM’s placements to an intermediate RAD (baking the fixed fragment-length distribution and, with bias correction, the seed abundances) and quantifies from it in a single pass — byte-identical across thread counts.

By default each placement is scored by its BAM alignment score (AS), which is what --noErrorModel does in ordinary alignment mode. This is deliberate: across every benchmark with ground truth (uniform and realistic position-dependent Illumina errors, at 50 bp and 76 bp), AS scoring is at least as accurate as an error model, and it needs only a single BAM pass.

To reproduce salmon’s classic error-modeled weighting deterministically, pass --errorModel (which requires -t, the transcriptome the reads were aligned to):

Terminal window
salmon quant -a aln.bam -t txome.fa -l A -p 16 --deterministic --errorModel -o out

This trains an order-independent error model — first-order per-base transition counts accumulated as integers, merged across threads by integer addition (so the trained model is independent of thread count), then normalized once and used to score every placement. It stays fully deterministic, but costs a second BAM pass (train, then score), roughly doubling the alignment-mode runtime.

When is it worth it? On well-aligned short-read data with informative AS tags (e.g. bowtie2, STAR) it changes results but does not measurably improve accuracy against truth, so the default AS scoring is preferred. Reach for --errorModel when you specifically want parity with salmon’s traditional error-modeled quant, or when the aligner does not emit a usable AS (without which AS scoring falls back to uniform weighting).

RAD chunks can be compressed, on by default for both --writeRad and the --deterministic intermediate:

FlagEffect
--radCompress=lz4Default. Fast LZ4; ≈ 1.25× smaller on a typical human RAD, neutral-to-faster to write.
--radCompress=zstdBetter ratio (≈ 1.9× smaller), a little slower.
--radCompress=noneUncompressed.
--noCompressRadForce uncompressed (overrides --radCompress).

Compression is transparent and lossless: chunks are decompressed by the reader, so every consumer (salmon, alevin-fry, piscem-infer) is unchanged, and a RAD with no codec tag — every file produced before this feature, and every piscem RAD — reads as uncompressed automatically. The quantified quant.sf is identical regardless of codec.