2.6 - Variant filtering
Variant filtering identifies confident variants called by Mutect2 and removes those that do not pass the filtering threshold to eliminate potential false positives. VariantSeq provides use of several GATK tools for variant filtering (for more details, see the VariantSeq manual at https://gpro.biotechvana.com/tool/variantseq/manual). As we used Mutect2 to call the variants, we use two GATK tools for filtering: “Cross-sample Contamination” to generate contamination tables and “FilterMutectCalls to filter using the contamination tables generated by CalculateContamination.
Cross-SampleContamination calls a small pipeline to generate the contamination tables based on two GATK tools: GetPilupSummaries and CalculateContamination (McKenna, et al. 2010; DePristo, et al. 2011; Cibulskis, et al. 2013)To perform this analysis, you will need the bam files from last analysis in postprocessing step as well as the VCF files used in the variant calling step. As you do not have normal pairs for your tumor samples, you need to select Tumor only mode in the input field to upload only bam from tumor samples. To start, go to the Step-by-Step menu path, SNP/Indels → Variants Filtering → Cross-Sample Contamination and do as suggest in Video 9.
Once you have the contamination tables, you can perform the next analysis where we will use the vcf files and the contamination sample to perform the filtering with FilterMutectCalls. To start, go to the Step-by-Step menu path, SNP/Indels → Variants Filtering → FilterMutectCalls and do as suggest in Video 9.
Video 9. Filtering Variants called with Mutect2 in two steps: one using Cross-SampleContamination that applies a pipeline based on GetPilupSummaries and CalculateContamination to generate contamination tables and the second job based on a interface implementation of FilterMutectCalls that performs the filters
Expected results from variant filtering:
This analysis is performed in two steps:
- When Cross-SampleContamination is done with you will receive a contamination table for each VCF file.
- When FilterMutectCalls is done with you will receive a new VCF file for each sample with the filtered applied.
The expected results of this step are available in the following link Cross-sample contamination/FilterMutectCalls
For more details about GetPilupSummaries see the following URL in the GATK forum https://gatk.broadinstitute.org/hc/en-us/articles/360037593451-GetPileupSummaries
For info about CalculateContamination see https://gatk.broadinstitute.org/hc/en-us/articles/360036888972-CalculateContamination
For more details about FilterMutectCalls see the following URL in the GATK forum https://gatk.broadinstitute.org/hc/en-us/articles/360036856831-FilterMutectCalls