RAD I/O & deterministic quantification

salmon can separate the two halves of a run — mapping reads to the transcriptome and quantifying abundances from those mappings — by writing the mappings to a RAD file (the Reduced Alignment Data format shared across the COMBINE-lab tools) and quantifying from it later. The same machinery powers a fully deterministic quantification mode.

Why decouple map and quant

Map once, quantify many times — re-run quantification with different options (bias correction, --numBootstraps, gene aggregation) without re-mapping.
Quantify a shared RAD — quantify a RAD produced by piscem map-bulk with salmon’s EM.
Split the phases across machines — map on one host, quantify on another.

Writing a RAD (`--writeRad`)

Add --writeRad <PATH> to any reads-mode run to also emit per-fragment mappings. The sketch or selective-alignment profile is chosen automatically from the mapping mode. Quantification still runs; add --skipQuant to map only:

# map only, write a RAD
salmon quant -i salmon_index -l A -1 r1.fq.gz -2 r2.fq.gz -p 16 \
  --writeRad mappings.rad --skipQuant -o out_map

The file is piscem map-bulk-compatible and can be re-quantified with --rad. salmon additionally bakes an order-independent fragment-length distribution, initial abundances, and the resolved library format into the RAD header, so re-quantifying it is a single pass (see Determinism below).

Quantifying a RAD (`--rad`)

# quantify a salmon- or piscem-produced RAD (no -i needed)
salmon quant --rad mappings.rad -l A -p 16 -o out_quant

--rad reads the RAD in parallel and runs the full EM. Reference names travel in the RAD header, so no index is required. salmon auto-detects whether the RAD came from salmon (header values are present and consumed — one pass) or from piscem (nothing baked — a first pass derives a unique-fragment fragment-length distribution, then quantifies). Bias correction (--seqBias, --gcBias, --posBias), --numBootstraps, --numGibbsSamples, and -g/--geneMap all work in RAD mode.

Fragment-length distribution (`--fldPolicy`)

salmon’s RAD writer always bakes its fragment-length distribution into the header — including under --skipQuant, which suppresses only the baked abundances. On read-back that distribution takes precedence, which is what makes a requant reproduce the writing run exactly.

The consequence is that --fldMean/--fldSD/--fldMax have no effect on a salmon-produced RAD by default. They are priors, and a baked distribution replaces the prior outright rather than being blended with it. salmon warns when you supply one of these flags and a baked distribution supersedes it; a run that inherits their defaults stays quiet.

--fldPolicy chooses where the distribution comes from:

Policy	Behavior
`baked` (default)	Use the RAD’s baked distribution when present. Exact parity with the run that wrote the file.
`derive`	Ignore the baked distribution; rebuild it from this RAD’s own uniquely-mapped proper pairs, seeded by `--fldMean`/`--fldSD`. This is what a piscem RAD does automatically.
`prior`	Ignore both the baked distribution and the RAD’s fragment lengths; `--fldMean`/`--fldSD` alone determine it.

Use prior for a fragment-length sensitivity analysis — it is the only setting under which varying --fldMean changes the result:

for m in 150 250 350; do
  salmon quant --rad mappings.rad -t transcripts.fa -l A \
    --fldPolicy prior --fldMean $m --fldSD 25 -o out_fld_$m
done

aux_info/meta_info.json records which path a run took in frag_length_source (rad_baked, rad_derived, or prior), so the provenance stays auditable after the fact.

Single-end RADs

A single-end run has no fragment lengths to measure, so the distribution it bakes is simply its own --fldMean/--fldSD prior. Reading such a RAD back therefore inherits the writing run’s prior, and salmon reports frag_length_source as rad_baked_prior to distinguish this from an observed distribution. If you want your own values to apply, pass --fldPolicy prior (derive has nothing to derive from here).

piscem interoperability

A RAD produced by piscem map-bulk quantifies directly:

piscem map-bulk -i piscem_index -1 r1.fq.gz -2 r2.fq.gz -t 16 -o pi
salmon quant --rad pi.rad -l A -p 16 -o out_quant

Deterministic quantification (`--deterministic`)

RAD-mode quantification is byte-identical across thread counts and runs. --deterministic brings that guarantee to FASTQ input directly:

salmon quant -i salmon_index -l A -1 r1.fq.gz -2 r2.fq.gz -p 16 \
  --deterministic -o out

It maps the reads once to an intermediate RAD, then quantifies from it with a fixed fragment-length distribution — no second mapping pass. The intermediate RAD is written under the output directory and removed on success unless you pass --keepRad (or --writeRad <PATH> to choose its location and keep it).

The determinism comes from making the computation itself order-independent rather than sorting records:

equivalence-class weights accumulate in fixed-point integers (integer addition is associative, hence independent of thread count and arrival order);
the fragment-length distribution is built from integer count histograms in a fixed order, then frozen;
the bias observed models accumulate in fixed-point integers too, so the trained models — and the bias-corrected effective lengths — are byte-identical across thread counts.

--deterministic works with bias correction and does not require -t: the reference sequences for the second pass are taken from the index.

Deterministic alignment mode (`-a --deterministic`)

--deterministic also applies to alignment mode — quantifying a name-grouped transcriptome BAM:

salmon quant -a aln.bam -l A -p 16 --deterministic -o out

salmon writes the BAM’s placements to an intermediate RAD (baking the fixed fragment-length distribution and, with bias correction, the seed abundances) and quantifies from it in a single pass — byte-identical across thread counts.

By default each placement is scored by its BAM alignment score (AS), which is what --noErrorModel does in ordinary alignment mode. This is deliberate: across every benchmark with ground truth (uniform and realistic position-dependent Illumina errors, at 50 bp and 76 bp), AS scoring is at least as accurate as an error model, and it needs only a single BAM pass.

`--errorModel` (opt-in)

To reproduce salmon’s classic error-modeled weighting deterministically, pass --errorModel (which requires -t, the transcriptome the reads were aligned to):

salmon quant -a aln.bam -t txome.fa -l A -p 16 --deterministic --errorModel -o out

This trains an order-independent error model — first-order per-base transition counts accumulated as integers, merged across threads by integer addition (so the trained model is independent of thread count), then normalized once and used to score every placement. It stays fully deterministic, but costs a second BAM pass (train, then score), roughly doubling the alignment-mode runtime.

When is it worth it? On well-aligned short-read data with informative AS tags (e.g. bowtie2, STAR) it changes results but does not measurably improve accuracy against truth, so the default AS scoring is preferred. Reach for --errorModel when you specifically want parity with salmon’s traditional error-modeled quant, or when the aligner does not emit a usable AS (without which AS scoring falls back to uniform weighting).

Compressing RAD output (`--radCompress`)

RAD chunks can be compressed, on by default for both --writeRad and the --deterministic intermediate:

Flag	Effect
`--radCompress=lz4`	Default. Fast LZ4; ≈ 1.25× smaller on a typical human RAD, neutral-to-faster to write.
`--radCompress=zstd`	Better ratio (≈ 1.9× smaller), a little slower.
`--radCompress=none`	Uncompressed.
`--noCompressRad`	Force uncompressed (overrides `--radCompress`).

Compression is transparent and lossless: chunks are decompressed by the reader, so every consumer (salmon, alevin-fry, piscem-infer) is unchanged, and a RAD with no codec tag — every file produced before this feature, and every piscem RAD — reads as uncompressed automatically. The quantified quant.sf is identical regardless of codec.

RAD I/O & deterministic quantification

Why decouple map and quant

Writing a RAD (--writeRad)

Quantifying a RAD (--rad)

Fragment-length distribution (--fldPolicy)

Single-end RADs

piscem interoperability

Deterministic quantification (--deterministic)

Deterministic alignment mode (-a --deterministic)

--errorModel (opt-in)

Compressing RAD output (--radCompress)