Changes between Version 6 and Version 7 of SOPs/variant_calling_GATK


Ignore:
Timestamp:
01/16/14 16:02:43 (11 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/variant_calling_GATK

    v6 v7  
    1010export PATH=/usr/lib/jvm/java-7-openjdk-amd64/bin:$PATH
    1111\\ \\
    12 '''Index the reference genome.''' [Need to do just once, with [[http://samtools.sourceforge.net/samtools.shtml|samtools]].]
     121 - '''Index the reference genome.''' [Need to do just once, with [[http://samtools.sourceforge.net/samtools.shtml|samtools]].]
    1313  * samtools faidx /path/to/genome/genome.fa
    1414\\
    15 '''Create a genome dictionary.''' [Need to do just once, with Picard's [[http://picard.sourceforge.net/command-line-overview.shtml#CreateSequenceDictionary|CreateSequenceDictionary]].]
     152 - '''Create a genome dictionary.''' [Need to do just once, with Picard's [[http://picard.sourceforge.net/command-line-overview.shtml#CreateSequenceDictionary|CreateSequenceDictionary]].]
    1616  * java -jar /usr/local/share/picard-tools/CreateSequenceDictionary.jar R=/path/to/genome/genome.fa O=/path/to/genome/genome.dict
    1717\\
    18 '''Validate VCF file or known variants''' (with GATK's [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_ValidateVariants.html|ValidateVariants]])
     183 - '''Validate VCF file or known variants''' (with GATK's [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_ValidateVariants.html|ValidateVariants]])
    1919  * java -jar /usr/local/gatk/GenomeAnalysisTK.jar -T ValidateVariants -R /path/to/genome/genome.fa --variant:VCF SNPs_from_NCBI.sorted.vcf \\
    2020Respond to errors (by correcting or removing problematic variants), run command again, etc., until validation is successful. \\
     
    2222
    2323\\
    24 '''Align reads to genome with [[http://bio-bwa.sourceforge.net/bwa.shtml|bwa]]'''
     244 - '''Align reads to genome with [[http://bio-bwa.sourceforge.net/bwa.shtml|bwa]]'''
    2525  * bsub "bwa aln /path/to/genome/bwa/genome Reads_1.fq > Reads_1.sai"
    2626  * bsub "bwa samse /path/to/genome/bwa/genome Reads_1.sai  Reads_1.fq > Reads_1.bwa.sam"
    2727\\
    28 '''Convert SAM to BAM, sort, and index''' with BaRC's streamlined [[http://samtools.sourceforge.net/samtools.shtml|samtools]] commands
     285 - '''Convert SAM to BAM, sort, and index''' with BaRC's streamlined [[http://samtools.sourceforge.net/samtools.shtml|samtools]] commands
    2929  * bsub /nfs/BaRC_Public/BaRC_code/Perl/SAM_to_BAM_sort_index/SAM_to_BAM_sort_index.pl Reads_1.bwa.sam
    3030\\
    31 '''Mark duplicates''' (multiple identical reads mapped to the same location) \\
     316 - '''Mark duplicates''' (multiple identical reads mapped to the same location) \\
    3232Run Picard Tools' [[http://picard.sourceforge.net/command-line-overview.shtml#MarkDuplicates|MarkDuplicates]] on each sample \\
    3333May Need "VALIDATION_STRINGENCY=LENIENT" if you get  \\
     
    3535  * bsub java -jar /usr/local/share/picard-tools/MarkDuplicates.jar I=Reads_1.bwa.sorted.bam O=Reads_1.bwa.dedup.bam M=Reads_1.bwa.dedup.txt VALIDATION_STRINGENCY=LENIENT
    3636\\
    37 '''Add Read Group header information to each BAM file''' (or GATK won't let you continue) \\
     377 - '''Add Read Group header information to each BAM file''' (or GATK won't let you continue) \\
    3838Run Picard Tools' [[http://picard.sourceforge.net/command-line-overview.shtml#AddOrReplaceReadGroups|AddOrReplaceReadGroups]] on each sample. \\
    3939Specify RGSM (Read Group sample), RGLB (Read Group Library), RGPL (Read Group platform), and RGPU (Read Group platform unit [e.g. run barcode])
    4040  * bsub java -jar /usr/local/share/picard-tools/AddOrReplaceReadGroups.jar I=Reads_1.bwa.dedup.bam O=Reads_1.bwa.dedup.good.bam RGSM=My_sample RGLB=My_project RGPL=illumina RGPU=none VALIDATION_STRINGENCY=LENIENT
    4141\\
    42 '''Index BAM file(s)''' with [[http://samtools.sourceforge.net/samtools.shtml|samtools]] (optional; for IGV viewing)
     428 - '''Index BAM file(s)''' with [[http://samtools.sourceforge.net/samtools.shtml|samtools]] (optional; for IGV viewing)
    4343  * bsub samtools index Reads_1.bwa.dedup.good.bam
    4444\\
    45 '''Run Indel Realignment''' (with [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_indels_RealignerTargetCreator.html|RealignerTargetCreator]] and [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_indels_IndelRealigner|IndelRealigner]]) \\
     459 - '''Run Indel Realignment''' (with [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_indels_RealignerTargetCreator.html|RealignerTargetCreator]] and [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_indels_IndelRealigner|IndelRealigner]]) \\
    4646  * Example 1: java -jar /usr/local/gatk/GenomeAnalysisTK.jar -T RealignerTargetCreator -R human.fasta -I original.bam -known indels.vcf -o realigner.intervals \\
    4747  * Example 2: java -jar /usr/local/gatk/GenomeAnalysisTK.jar -T IndelRealigner -R human.fasta -I original.bam -known indels.vcf -targetIntervals realigner.intervals -o realigned.bam \\
     
    4949  * java -jar /usr/local/gatk/GenomeAnalysisTK.jar -T IndelRealigner -R /path/to/genome/genome.fa -I Reads_1.bwa.dedup.good.bam -targetIntervals Reads_1.realigner.intervals -o Reads_1.bwa.dedup.realigned.bam --fix_misencoded_quality_scores
    5050\\
    51 '''Run Base Recalibration''' ([[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_bqsr_BaseRecalibrator.html|BaseRecalibrator]] and [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_PrintReads.html|PrintReads]]) \\
     5110 - '''Run Base Recalibration''' ([[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_bqsr_BaseRecalibrator.html|BaseRecalibrator]] and [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_PrintReads.html|PrintReads]]) \\
    5252  * Example 1: java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R human.fasta -I realigned.bam -knownSites dbsnp137.vcf -knownSites gold.standard.indels.vcf -o recal.table
    5353  * Example 2: java -jar GenomeAnalysisTK.jar -T PrintReads -R human.fasta -I realigned.bam -BQSR recal.table -o recal.bam \\
     
    6363
    6464\\
    65 '''Compress BAM with [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_compression_reducereads_ReduceReads.html|ReduceReads]]''' [Optional] \\
     6511 - '''Compress BAM with [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_compression_reducereads_ReduceReads.html|ReduceReads]]''' [Optional] \\
    6666  * Example 1: java -jar GenomeAnalysisTK.jar -T ReduceReads -R human.fasta -I recal.bam -o reduced.bam
    6767  * java -jar /usr/local/gatk/GenomeAnalysisTK.jar -T ReduceReads -R /path/to/genome/genome.fa -I Reads_1.bwa.dedup.realigned.recal.bam -o Reads_1.bwa.dedup.realigned.recal.reduced.bam
    6868\\
    69 '''Finally -- Call variants''' \\
     6912 - '''Finally -- Call variants''' \\
    7070Run [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_haplotypecaller_HaplotypeCaller.html|HaplotypeCaller]] ("The HaplotypeCaller is a more recent and sophisticated tool than the UnifiedGenotyper.")
    7171  * Example: java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R human.fasta -I input.bam -o output.vcf -stand_call_conf 30 -stand_emit_conf 10 -minPruning 3
     
    7676  * java -jar /usr/local/gatk/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /nfs/genomes/a.thaliana_TAIR_10/fasta_whole_genome/TAIR10.fa -I Reads_1.bwa.dedup.realigned.recal.reduced.bam --dbsnp SNPs_from_NCBI.sorted.vcf -o Reads_1.bwa.raw.snps.indels.UnifiedGenotyper.vcf -stand_call_conf 30 -stand_emit_conf 10
    7777\\
    78 '''Run Variant Quality Score Recalibration''' ("VQSR", with [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantrecalibration_VariantRecalibrator.html|VariantRecalibrator] and [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantrecalibration_ApplyRecalibration.html|ApplyRecalibration]) \\ \\
    79 '''Run Genotype Phasing and Refinement''' \\ \\
    80 '''Run Functional Annotation''' ([[http://snpeff.sourceforge.net/SnpEff.html|snpEff]] and [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_VariantAnnotator.html|VariantAnnotator]] [which "parses output from snpEff into a simpler format that is more useful for analysis"])
     7813 - '''Run Variant Quality Score Recalibration''' ("VQSR", with [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantrecalibration_VariantRecalibrator.html|VariantRecalibrator] and [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantrecalibration_ApplyRecalibration.html|ApplyRecalibration]) \\ \\
     7914 - '''Run Genotype Phasing and Refinement''' \\ \\
     8015 - '''Run Functional Annotation''' ([[http://snpeff.sourceforge.net/SnpEff.html|snpEff]] and [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_VariantAnnotator.html|VariantAnnotator]] [which "parses output from snpEff into a simpler format that is more useful for analysis"])
    8181  * Example 1: java -jar snpEff.jar eff -v -onlyCoding true -i vcf -o gatk GRCh37.64 input.vcf > output.vcf
    8282  * Example 2: java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R human.fasta -A SnpEff --variant original.vcf --snpEffFile snpEff_output.vcf -o annotated.vcf
    8383
    84 '''Analyze variant calls''' (with [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_CombineVariants|CombineVariants]], [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_SelectVariants.html|SelectVariants]], and [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_varianteval_VariantEval.html|VariantEval]]) \\ \\
     8416 - '''Analyze variant calls''' (with [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_CombineVariants|CombineVariants]], [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_SelectVariants.html|SelectVariants]], and [[http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_varianteval_VariantEval.html|VariantEval]]) \\ \\
    8585