RAD I/O & deterministic quantification
salmon can separate the two halves of a run — mapping reads to the transcriptome and quantifying abundances from those mappings — by writing the mappings to a RAD file (the Reduced Alignment Data format shared across the COMBINE-lab tools) and quantifying from it later. The same machinery powers a fully deterministic quantification mode.
Why decouple map and quant
Section titled “Why decouple map and quant”- Map once, quantify many times — re-run quantification with different
options (bias correction,
--numBootstraps, gene aggregation) without re-mapping. - Quantify a shared RAD — quantify a RAD produced by
piscem
map-bulkwith salmon’s EM. - Split the phases across machines — map on one host, quantify on another.
Writing a RAD (--writeRad)
Section titled “Writing a RAD (--writeRad)”Add --writeRad <PATH> to any reads-mode run to also emit per-fragment mappings.
The sketch or selective-alignment profile is chosen automatically from the
mapping mode. Quantification still runs; add --skipQuant to map only:
# map only, write a RADsalmon quant -i salmon_index -l A -1 r1.fq.gz -2 r2.fq.gz -p 16 \ --writeRad mappings.rad --skipQuant -o out_mapThe file is piscem map-bulk-compatible and can be re-quantified with
--rad. salmon additionally bakes an order-independent fragment-length
distribution, initial abundances, and the resolved library format into the RAD
header, so re-quantifying it is a single pass (see Determinism below).
Quantifying a RAD (--rad)
Section titled “Quantifying a RAD (--rad)”# quantify a salmon- or piscem-produced RAD (no -i needed)salmon quant --rad mappings.rad -l A -p 16 -o out_quant--rad reads the RAD in parallel and runs the full EM. Reference names travel in
the RAD header, so no index is required. salmon auto-detects whether the RAD came
from salmon (header values are present and consumed — one pass) or from piscem
(nothing baked — a first pass derives a unique-fragment fragment-length
distribution, then quantifies). Bias correction (--seqBias, --gcBias,
--posBias), --numBootstraps, --numGibbsSamples, and -g/--geneMap all work
in RAD mode.
piscem interoperability
Section titled “piscem interoperability”A RAD produced by piscem map-bulk quantifies directly:
piscem map-bulk -i piscem_index -1 r1.fq.gz -2 r2.fq.gz -t 16 -o pisalmon quant --rad pi.rad -l A -p 16 -o out_quantDeterministic quantification (--deterministic)
Section titled “Deterministic quantification (--deterministic)”RAD-mode quantification is byte-identical across thread counts and runs.
--deterministic brings that guarantee to FASTQ input directly:
salmon quant -i salmon_index -l A -1 r1.fq.gz -2 r2.fq.gz -p 16 \ --deterministic -o outIt maps the reads once to an intermediate RAD, then quantifies from it with a
fixed fragment-length distribution — no second mapping pass. The intermediate RAD
is written under the output directory and removed on success unless you pass
--keepRad (or --writeRad <PATH> to choose its location and keep it).
The determinism comes from making the computation itself order-independent rather than sorting records:
- equivalence-class weights accumulate in fixed-point integers (integer addition is associative, hence independent of thread count and arrival order);
- the fragment-length distribution is built from integer count histograms in a fixed order, then frozen;
- the bias observed models accumulate in fixed-point integers too, so the trained models — and the bias-corrected effective lengths — are byte-identical across thread counts.
--deterministic works with bias correction and does not require -t: the
reference sequences for the second pass are taken from the index.
Deterministic alignment mode (-a --deterministic)
Section titled “Deterministic alignment mode (-a --deterministic)”--deterministic also applies to alignment mode — quantifying a
name-grouped transcriptome BAM:
salmon quant -a aln.bam -l A -p 16 --deterministic -o outsalmon writes the BAM’s placements to an intermediate RAD (baking the fixed fragment-length distribution and, with bias correction, the seed abundances) and quantifies from it in a single pass — byte-identical across thread counts.
By default each placement is scored by its BAM alignment score (AS), which is
what --noErrorModel does in ordinary alignment mode. This is deliberate: across
every benchmark with ground truth (uniform and realistic position-dependent
Illumina errors, at 50 bp and 76 bp), AS scoring is at least as accurate as
an error model, and it needs only a single BAM pass.
--errorModel (opt-in)
Section titled “--errorModel (opt-in)”To reproduce salmon’s classic error-modeled weighting deterministically, pass
--errorModel (which requires -t, the transcriptome the reads were aligned to):
salmon quant -a aln.bam -t txome.fa -l A -p 16 --deterministic --errorModel -o outThis trains an order-independent error model — first-order per-base transition counts accumulated as integers, merged across threads by integer addition (so the trained model is independent of thread count), then normalized once and used to score every placement. It stays fully deterministic, but costs a second BAM pass (train, then score), roughly doubling the alignment-mode runtime.
When is it worth it? On well-aligned short-read data with informative AS tags
(e.g. bowtie2, STAR) it changes results but does not measurably improve
accuracy against truth, so the default AS scoring is preferred. Reach for
--errorModel when you specifically want parity with salmon’s traditional
error-modeled quant, or when the aligner does not emit a usable AS (without
which AS scoring falls back to uniform weighting).
Compressing RAD output (--radCompress)
Section titled “Compressing RAD output (--radCompress)”RAD chunks can be compressed, on by default for both --writeRad and the
--deterministic intermediate:
| Flag | Effect |
|---|---|
--radCompress=lz4 | Default. Fast LZ4; ≈ 1.25× smaller on a typical human RAD, neutral-to-faster to write. |
--radCompress=zstd | Better ratio (≈ 1.9× smaller), a little slower. |
--radCompress=none | Uncompressed. |
--noCompressRad | Force uncompressed (overrides --radCompress). |
Compression is transparent and lossless: chunks are decompressed by the
reader, so every consumer (salmon, alevin-fry, piscem-infer) is unchanged, and a
RAD with no codec tag — every file produced before this feature, and every piscem
RAD — reads as uncompressed automatically. The quantified quant.sf is identical
regardless of codec.