Interpreting the RNA-Seq analysis result

The main results of the RNA-Seq are two expression tracks: one summarizing expression at the gene level (called GE) and one summarizing expression at the transcript level (called TE). Note that the latter is only produced if the "Genome annotated with genes and transcripts" option is selected in figure 27.4.

Both tracks can be shown in a Table (Image table) and a Graphical (Image annotation_track) view. By creating a Track list, the graphical view can be shown together with the read mapping track and tracks from other samples:

        File | New | Track List (Image tracklist)

Select the mapping and expression tracks of the samples you wish to visualize together and select the annotation tracks used as reference for the RNA-Seq and click Finish.

Once the track list is shown, double-click the label of the expression track to show it in a table view. Clicking a row in the table makes the track list view jump to that location, allowing for quick inspection of interesting parts of the RNA-Seq read mapping (see an example in figure 27.12).

Image mrna_seq_result_contig-web
Figure 27.12: RNA-Seq results shown in a split view with an expression track at the bottom and a track list with read mappings of two samples at the top.

Reads spanning two exons are shown with a dashed line between each end as shown in figure 27.12, and the thin solid line represents the connection between two reads in a pair.

When doing comparative analysis and opening an experiment (see Experimental design) and a track list, clicking a row in the experiment will cause the track list to jump to the corresponding position, allowing for quick inspection of the reads underlying the counts in the experiment. Please note that at least one of the expression tracks used in the experiment have to be included in the track list in order for the link between the two to work.

Expression tracks can also be used to annotate variants using the Annotate with Overlap Information tool. Select the variant track as input and annotate with the expression track. For variants inside genes or transcripts, information will be added about expression (counts, expression value etc) from the gene or transcript in the expression track. Read more about the annotation tool in Annotate with overlap information.

Gene-level expression
The gene-level expression track holds information about counts and expression values for each gene. It can be opened in a Table view (Image table) allowing sorting and filtering on all the information in the track (see figure 27.13 for an example subset of an expression track).

Image mrna_seq_result
Figure 27.13: A subset of a result of an RNA-Seq analysis on the gene level. Not all columns are shown in this figure

Each row in the table corresponds to a gene (or reference sequence, if the One reference sequence per transcript option was used). The corresponding counts and other information is shown for each gene:

Transcript-level expression
If the "Genome annotated with genes and transcripts" option is selected in figure 27.4, a transcript-level expression track is also generated.

The track can be opened in a Table view (Image table) allowing sorting and filtering on all the information in the track. Each row in the table corresponds to an mRNA annotation in the mRNA track used as reference.

Definition of RPKM
RPKM, Reads Per Kilobase of exon model per Million mapped reads, is defined in this way [Mortazavi et al., 2008]:

$\displaystyle \emph{RPKM} = \frac{\emph{total exon reads}}{\emph{mapped reads(millions)} \times \emph{exon length (KB)}}. $

Total exon reads
This value can be found in the column with header Total exon reads in the expression track. This is the number of reads that have been mapped to exons (either within an exon or at the exon junction). When the reference genome is annotated with gene and transcript annotations, the mRNA track defines the exons, and the total exon reads are the reads mapped to all transcripts for that gene. When only genes are used, each gene in the gene track is considered an exon. When an un-annotated sequence list is used, each sequence is considered an exon.
Exon length
This is the number in the column with the header Exon length in the expression track, divided by 1000. This is calculated as the sum of the lengths of all exons (see definition of exon above). Each exon is included only once in this sum, even if it is present in more annotated transcripts for the gene. Partly overlapping exons will count with their full length, even though they share the same region.
Mapped reads
The sum of all mapped reads as listed in the RNA-Seq analysis report. Please note that the option to Map to gene regions only will affect the number of mapped reads, since all intergenic reads will not be mapped if this option is selected. This means that comparison of RPKM values between samples should only be carried out if this parameter was set in the same way for all samples.