=== Notes on calling variants in RNA-seq data with GATK === * RNA-seq includes reads mapped across splice junctions and is associated with high variability of coverage, so typical variant calling pipelines (for DNA) can lead to lots of false positives and negatives. * GATK is currently the gold standard for calling variants in RNA-seq data. See a detailed description of their workflow here: * [https://gatkforums.broadinstitute.org/gatk/discussion/3892/the-gatk-best-practices-for-variant-calling-on-rnaseq-in-full-detail GATK Best Practices for variant calling on RNAseq] * [https://software.broadinstitute.org/gatk/documentation/article.php?id=3891 Calling variants in RNAseq] (with sample commands) * A main difference between calling variants in RNA vs DNA sequencing reads with GATK, is for RNA-seq data the [https://github.com/alexdobin/STAR STAR] aligner is used to perform a 2-pass read mapping step, which was shown ([https://www.nature.com/nmeth/journal/v10/n12/full/nmeth.2722.html Engström, et al.]) to have superior SNP sensitivity in a comparison of the most common mapping tools. == Using GATK to call variants from RNA-seq reads == This example pipeline starts with a single-end short-read fastq file (Reads_1.fq). \\ Note that GATK (versions 3.7 and 3.8) requires Java 1.8 (so you may need to adjust your path to point to that version, if an older version is the default).\\ For example, this can be added to ~/.bashrc: \\ export PATH=/path/to/java:$PATH \\ '''1 - Run the STAR 2-pass procedure to map reads to the reference genome'''. For the STAR 2-pass mapping procedure, please see our mapping SOP, under [http://barcwiki.wi.mit.edu/wiki/SOPs/mapping STAR]. '''2 - Replace ReadGroups, mark duplicate reads , clip intron overhangs and reassign mapping qualities with[http://broadinstitute.github.io/picard/ Picard Tools] and [https://software.broadinstitute.org/gatk/download/ GATK]''' '''Replace read groups and order, by coordinates, the reads'''. Note, if you are combining multiple experiments in this step the RGSM IDs must be the same while the library IDs must be unique. {{{java -jar /usr/local/share/picard-tools/picard.jar AddOrReplaceReadGroups I=output.sam O=rg_added_sorted.bam SO=coordinate RGID=ID_NAME RGLB=library RGPL=illumina RGPU=identifier RGSM=sample_name}}} '''Mark duplicate reads.''' {{{java -jar /usr/local/share/picard-tools/picard.jar MarkDuplicates I=rg_added_sorted.bam O=dedupped.bam CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT M=output.metrics}}} '''Identify and split Cigar N Reads and reassign quality scores.''' {{{java -jar /usr/local/gatk3/GenomeAnalysisTK.jar -T SplitNCigarReads -R /path/to/genome/fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS -fixMisencodedQuals}}} '''Perform BaseRecalibration.''' \\ Calibration files can be found here for hg38 [https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0 Recalibration Files]\\ NOTE: Calibration Files are only available for a few genomes (Human, Mouse, etc). \\ {{{java -jar /usr/local/gatk3/GenomeAnalysisTK.jar -T BaseRecalibrator -R /path/to/genome/fasta -I dedupped.bam -knownSites /path/to/calibration/files -o recalibration.table}}} {{{java -jar /usr/local/gatk3/GenomeAnalysisTK.jar -T PrintReads -R /path/to/genome/fasta -I dedupped.bam -BQSR recalibration.table -o recalibrated.bam}}} '''3 - Call and Filter Variants.''' {{{java -jar /usr/local/gatk3/GenomeAnalysisTK.jar -T HaplotypeCaller -R /path/to/genome/fasta -I recalibrated.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -o Variants_called.vcf}}} {{{java -jar /usr/local/gatk3/GenomeAnalysisTK.jar -T VariantFiltration -R /path/to/genome/fasta -V Variants_called.vcf -window 35 -cluster 3 -filterName Filter -filter "QD < 2.0" -filterName Filter -filter "FS > 30.0" -o Filtered_variants_called.vcf}}} '''4 - Predict the effects of called variants.''' \\ Several tools are available to analyze variants found in your .vcf result files for potential functional consequence. See details [http://barcwiki.wi.mit.edu/wiki/SOPs/vcf here].