Special notes upgrading to Genomics Workbench 6.0

The variant format changed with the release of CLC Genomics Workbench 6.0 and CLC Genomics Server 5.0. This section is intended for those upgrading from earlier versions and will provide information about how this change affects both existing and new data.

The new format introduces three main differences compared to the existing format:

  1. One variant (i.e. one entry in the table or one annotation) only represents one change. This means that a heterozygous variant is represented by two entries - one for each allele. The main reason for this is that downstream annotation and filtering becomes much easier. A simple example would be filtering the variant table on frequency which was not possible before because the frequency column contained frequency for more than one allele. When one of the alleles is also the reference allele, this is also included. If there is a need to report the non-reference variants only, there is a special tool can be used to remove those from the variant track (see Filter reference variants).
  2. Before version 6.0, adjacent SNVs were merged into MNVs in order to ensure correct calculation of amino acid changes etc. This has now been changed so that each SNV is retained. Instead, relationship between adjacent SNVs is governed by the concept of a linkage group (see Linking adjacent variants in linkage groups). This ensures a simple representation of the individual variants which lends itself to easy downstream filtering and analysis while preserving information about linked variants.
  3. As a consequence of the above, we have redefined the variant types. The InDel variant type has been replaced by insertion and deletion, the MNV type is no longer used, and a new category called replacement. This is described in more detail in Variant types

When existing data created before version 6.0 is used for analysis, it will be converted on the fly. This means that if a variant track is annotated or filtered, the new track will be in the new data format.

The on-the-fly conversion of data means that the analysis takes a little longer than normal. Once the new track has been created, subsequent analyses run at normal speed.

If a track is used in it's original form again and again for various analysis, the slower run time could be a problem. To can convert the track to the new format, you can download a special plug-in that will handle the conversion. Alternatively, the data can be re-downloaded or imported. The plug-in is available in the plug-in manager (see Installing plug-ins).

You can tell the old and new variant tracks apart by the icon in the Navigation Area as shown in figure 26.19.

Image variants_upgrade
Figure 26.19: The icon for the old variant track has grey-scale colors.