2.4 - Postprocessing
Postprocessing is needed to process the mapped reads onto the reference genome (realigning, correcting, calibrating, marking or flagging them) as according to the state-of-the-art practices. The aim of this step is to optimize alignment and minimize errors that could lead to false positives or negatives during variant calling. There are several postprocessing treatments that can be performed with VariantSeq. In this tutorial, we perform three postprocessing jobs: AddReplaceGroups, MarkDuplicates and BQSR.
AddReplaceReadGroups is command function of Picard tools (Wysoker, et al. 2011) that is used to assigns all the reads in a bam or sam file to a single new read-group (RG). This step is needed because many any tools require or assume the presence of at least one RG tag to define a "read-group" onto which each read can be assigned (as specified in the RG tag in the SAM record).
To run AddReplaceGroups, please go to the Step-by-Step menu path, SNP/Indels → Postprocessing → Picard Tools → Picard – AddReplaceReadGroups and proceed as indicated in Video 5.
Video 5. Using AddReplaceGroups of Picard with VariantSeq
Expected results from AddReplaceGroups:
When AddReplaceGroups is complete, you will receive a new bam file with the TAGs or labels added by AddReplaceGroup.
The expected results of this step are available in the following link PicardAddReplaceReadGroups
To learn more about Picard tools and AddReplaceGroup see, https://gatk.broadinstitute.org/hc/en-us/articles/360037226472-AddOrReplaceReadGroups-Picard-, and https://broadinstitute.github.io/picard/
MarkDuplicates is another command of Picard Tools (Wysoker, et al. 2011) to locate, mark and/or eliminate duplicated reads in a BAM or SAM file. It corrects for systematic bias by eliminating duplicated reads that arise for different reasons during a sequencing experiment (sample preparation, duplication artifacts, etc).
To run MarkDuplicates, please go to the Step-by-Step menu path, SNP/Indels → Postprocessing → Picard Tools → Picard – MarkDuplicates, and proceed as indicated in Video 6.
Video 6. Using MarkDuplicates of Picard with VariantSeq.
Expected results from MarkDuplicates:
When the MarkDuplicates command is complete, you will receive a new bam file.
The expected results of this step are available in the following link MarkDuplicates
To learn more about Picard tools and MarkDuplicates see, https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-, and https://broadinstitute.github.io/picard/
BQSR is a data pre-processing step performed by GATK (McKenna, et al. 2010; DePristo, et al. 2011; Cibulskis, et al. 2013) to detect and correct systematic errors that affect the assignment of base quality scores by the sequencer. In this step, we use the training data sets described in section “1.2 - Tutorial material and case study”.
To run BQSR, please go to the Step-by-Step menu path, SNP/Indels → Postprocessing → GATK Tools → BQSR, and proceed as indicated in Video 7.
Video 7. Applying BQSR with the GATK implementation of VariantSeq.
Expected results from BQSR:
When the BQSR command of GATK is complete, you will receive a new bam file with this postprocessing job applied.
The expected results of this step are available in the following link BQSR
To know more about BQSR see, https://gatk.broadinstitute.org/hc/en-us/articles/360035890531-Base-Quality-Score-Recalibration-BQSR-