My Profile Photo My Profile Photo

Alevin-Tutorial


A support website for Alevin-tool (part of Salmon). How Tos and FAQs


Selective Alignment

Fast is Good but Fast and accurate is better !

The accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. After investigating the influence of mapping and alignment on the accuracy of transcript quantification in both simulated and experimental data, as well as the effect on subsequent differential expression analysis, we designed selective alignment method which overcomes the shortcomings of lightweight approaches without incurring the computational cost of traditional alignment. Here we give a short tutorial on how to index your genome and transcriptome to get the accurate quantification estimates.

Downloading Reference

We are first going to download the reference transcriptome and genome for salmon index. As an example we are downloading the gencode mouse reference

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/gencode.vM23.transcripts.fa.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/GRCm38.primary_assembly.genome.fa.gz

Installing Salmon

Although there are mutliple ways to download salmon (ex: binary from github, docker image), we are going to install it through conda. Assuming a conda environment is already set up, we can install salmon through following command:

conda install --channel bioconda salmon

Make sure you have the latest version of salmon (v1.0 as on November 1st, 2019) by using salmon --version

Preparing metadata

Salmon indexing requires the names of the genome targets, which is extractable by using the grep command:

grep "^>" <(gunzip -c GRCm38.primary_assembly.genome.fa.gz) | cut -d " " -f 1 > decoys.txt
sed -i.bak -e 's/>//g' decoys.txt

Along with the list of decoys salmon also needs the concatenated transcriptome and genome reference file for index. NOTE: the genome targets (decoys) should come after the transcriptome targets in the reference

cat gencode.vM23.transcripts.fa.gz GRCm38.primary_assembly.genome.fa.gz > gentrome.fa.gz

Salmon Indexing

We have all the ingredients ready for the salmon recipe. We can run salmon indexing step as follows:

salmon index -t gentrome.fa.gz -d decoys.txt -p 12 -i salmon_index --gencode

NOTE: --gencode flag is for removing extra metdata in the target header separated by | from the gencode reference. You can skip it if using other references.

Ipython Notebook

Prefer to read ipython notebook ? Check out the gist here.