Output format specification
This page specifies every file salmon 2.0 writes to a quant output
directory, including exact byte layouts for the binary files, so that downstream
tooling (tximport, fishpond/swish, custom scripts) can rely on a stable contract.
Files fall into two compatibility classes:
- C++-compatible — byte/loadable-compatible with C++ salmon (≤ 1.12.0) so existing downstream tools keep working unchanged. These formats are stable and changes to them are breaking.
- Documented Rust format — diagnostic dumps not consumed by standard downstream tools. The layout is documented here and is stable within the 2.x series, but is not promised to match C++ salmon byte-for-byte.
All multi-byte binary values are little-endian. All *.gz files are standard
gzip streams; “uncompressed payload” describes the bytes after gunzip.
<output_dir>/├── quant.sf (C++-compatible) TSV├── cmd_info.json (C++-compatible) JSON├── lib_format_counts.json (C++-compatible) JSON├── logs/│ └── salmon_quant.log (Rust format) text├── libParams/│ └── flenDist.txt (C++-compatible) TSV (one line)└── aux_info/ ├── meta_info.json (C++-compatible) JSON ├── ambig_info.tsv (C++-compatible) TSV ├── fld.gz (Rust format) gzip → i32[] ├── eq_classes.txt.gz (C++-compatible) gzip → text [--dumpEq] ├── observed_bias.gz (Rust format) gzip → i32[] [legacy stub] ├── observed_bias_3p.gz (Rust format) gzip → i32[] [legacy stub] ├── expected_bias.gz (Rust format) gzip → f64[] [legacy stub] ├── obs5_seq.gz obs3_seq.gz (Rust format) gzip → f64[] [--seqBias] ├── exp5_seq.gz exp3_seq.gz (Rust format) gzip → f64[] [--seqBias] ├── obs_gc.gz exp_gc.gz (Rust format) gzip → f64[] [--gcBias] ├── obs5_pos.gz obs3_pos.gz (Rust format) gzip → header+f64[] [--posBias] ├── exp5_pos.gz exp3_pos.gz (Rust format) gzip → header+f64[] [--posBias] └── bootstrap/ [--numBootstraps / --numGibbsSamples] ├── names.tsv.gz (C++-compatible) gzip → text └── bootstraps.gz (C++-compatible) gzip → f64[]C++-compatible files
Section titled “C++-compatible files”quant.sf — primary abundance table (TSV)
Section titled “quant.sf — primary abundance table (TSV)”A header line followed by one row per non-decoy transcript, in index order (decoy references, index ≥ first-decoy, are excluded — matching C++ salmon).
Name<TAB>Length<TAB>EffectiveLength<TAB>TPM<TAB>NumReads| Column | Type | Notes |
|---|---|---|
Name | string | transcript id, as in the index |
Length | integer | transcript length in nucleotides |
EffectiveLength | float | --sigDigits decimals (default 3) |
TPM | float | fixed 6 decimals |
NumReads | float | --sigDigits decimals (default 3) |
This is the file tximport reads by default; it is a drop-in replacement for the C++ output.
aux_info/meta_info.json — run metadata (JSON)
Section titled “aux_info/meta_info.json — run metadata (JSON)”Pretty-printed JSON. tximport keys off several fields (num_bootstraps,
num_valid_targets, eq_class_properties, index_seq_hash, …). Fields:
| Field | Type | Meaning |
|---|---|---|
salmon_version | string | e.g. "2.0.0" |
samp_type | string | "bootstrap", "gibbs", or "none" |
opt_type | string | "em" or "vb" |
num_libraries | int | currently 1 |
library_types | string[] | detected/declared library type(s) |
frag_dist_length | int | number of FLD length bins |
frag_length_mean / frag_length_sd | float | observed fragment length stats |
seq_bias_correct / gc_bias_correct | bool | bias correction enabled |
mapping_type | string | "mapping" (SA) or "pseudo" (sketch) |
keep_duplicates | bool | index built with --keepDuplicates |
index_seq_hash / index_name_hash | string | SHA-256 (hex) of reference seqs / names |
index_seq_hash512 / index_name_hash512 | string | SHA-512 variants |
index_decoy_seq_hash / index_decoy_name_hash | string | decoy hashes |
num_valid_targets | int | non-decoy transcript count |
num_decoy_targets | int | decoy count |
num_eq_classes | int | equivalence-class count |
eq_class_properties | string[] | e.g. ["gzipped"] or ["range_factorized","gzipped"] |
length_classes | int[] | length-class boundaries (u32) |
num_processed / num_mapped | int | fragments observed / mapped |
percent_mapped | float | |
num_decoy_fragments | int | |
num_bootstraps | int | inferential-replicate count (0 if none) |
start_time / end_time | string | asctime |
(Plus quant_errors, num_bias_bins, serialized_eq_classes,
num_dovetail_fragments, num_fragments_filtered_vm,
num_alignments_below_threshold_for_mapped_fragments_vm, and call.)
aux_info/ambig_info.tsv — per-transcript read partition (TSV)
Section titled “aux_info/ambig_info.tsv — per-transcript read partition (TSV)”Header UniqueCount<TAB>AmbigCount, then one row per quantified transcript in
index order (the same num_valid_targets set as quant.sf). UniqueCount =
fragments mapping uniquely to that transcript; AmbigCount = fragments mapping
ambiguously.
libParams/flenDist.txt — fragment-length distribution (TSV, one line)
Section titled “libParams/flenDist.txt — fragment-length distribution (TSV, one line)”A single line: the normalized fragment-length PMF as tab-separated values in scientific notation, one value per length bin from 0 to the max fragment length.
aux_info/eq_classes.txt.gz — equivalence classes (gzip text) [—dumpEq]
Section titled “aux_info/eq_classes.txt.gz — equivalence classes (gzip text) [—dumpEq]”Only written with --dumpEq or --dumpEqWeights. Uncompressed payload is
salmon’s text format:
num_transcriptsnum_eq_classes<transcript name 0><transcript name 1>... (num_transcripts names, index order)<class line> (num_eq_classes lines)...--dumpEq(collapsed by transcript set): each class line isgroupSizeTABtid_0TAB … TABtid_{g-1}TABcount.--dumpEqWeights: interleaves the per-transcript combined weights before the count:groupSizeTABtid_0 … tid_{g-1}TABw_0 … w_{g-1}TABcount.
Transcript ids index into the name list above.
aux_info/bootstrap/ — inferential replicates (gzip) [—numBootstraps / —numGibbsSamples]
Section titled “aux_info/bootstrap/ — inferential replicates (gzip) [—numBootstraps / —numGibbsSamples]”Written when bootstrap or Gibbs sampling is requested; byte-compatible with C++
salmon’s GZipWriter::writeBootstrap<double>.
names.tsv.gz— uncompressed payload is the transcript names, tab-separated on a single line terminated by a newline, in index order.bootstraps.gz— uncompressed payload is rawf64little-endian values with no header:n_replicatessamples written contiguously, each sample beingnum_valid_targetsvalues in index order. Total length =n_replicates × num_valid_targets × 8bytes.samp_typeinmeta_info.jsonrecords whether the replicates arebootstraporgibbs.
cmd_info.json / lib_format_counts.json (JSON)
Section titled “cmd_info.json / lib_format_counts.json (JSON)”cmd_info.json records the invocation: salmon_version, index, libType,
output, mates1, mates2, unmatedReads, threads, sketch.
lib_format_counts.json records library-format detection: read_files,
expected_format, compatible_fragment_ratio, num_compatible_fragments,
num_assigned_fragments, num_frags_with_concordant_consistent_mappings,
num_frags_with_inconsistent_or_orphan_mappings, strand_mapping_bias.
Documented Rust-format files (diagnostic)
Section titled “Documented Rust-format files (diagnostic)”These are bias-model and FLD diagnostic dumps, not read by standard downstream tools. Layouts are documented for completeness.
aux_info/fld.gz
Section titled “aux_info/fld.gz”Uncompressed payload: an array of i32 little-endian counts, one per
fragment-length bin. Where C++ salmon draws 10,000 Monte-Carlo samples from the
log-PMF, the port writes the deterministic expected histogram
round(10000 · pmf[len]) (same type and layout).
Sequence-bias dumps [—seqBias]
Section titled “Sequence-bias dumps [—seqBias]”obs5_seq.gz, obs3_seq.gz, exp5_seq.gz, exp3_seq.gz: uncompressed payload
is an array of f64 little-endian values — the flattened observed/expected
5′ and 3′ sequence-bias context tables.
GC-bias dumps [—gcBias]
Section titled “GC-bias dumps [—gcBias]”obs_gc.gz, exp_gc.gz: uncompressed payload is an array of f64
little-endian values (observed/expected GC mass bins).
Positional-bias dumps [—posBias]
Section titled “Positional-bias dumps [—posBias]”obs5_pos.gz, obs3_pos.gz, exp5_pos.gz, exp3_pos.gz: uncompressed payload
is a header followed by the model bins:
[u32 num_models][u32 bins_per_model] then num_models × bins_per_model f64 LE, row-majorEach “model” is one length-class’s positional distribution.
Legacy seq-bias stubs
Section titled “Legacy seq-bias stubs”observed_bias.gz (i32 [0]), observed_bias_3p.gz (i32 [0]), and
expected_bias.gz (f64 [1.0]) are single-element stubs preserved for the
legacy simple-count seq-bias model the port does not implement (it uses the
SBModel context model instead). They exist so consumers expecting these
filenames do not error.
logs/salmon_quant.log
Section titled “logs/salmon_quant.log”A concise human-readable run summary (version, start/end time, library type, mapping type, observed/mapped fragment counts, mapping rate, equivalence-class count, fragment-length mean/sd). Downstream tools key off the JSON metadata, not this log.
Index directory
Section titled “Index directory”The 2.0 index is the piscem-rs format and is not compatible with C++ salmon (pufferfish) indices — they must be rebuilt. Pointing 2.0 at a C++ index (or C++ salmon at a 2.0 index) produces a clear, actionable error. See what changed in 2.0.