Changes between Version 1 and Version 2 of SOPs/RRBS


Ignore:
Timestamp:
05/19/22 15:00:28 (3 years ago)
Author:
twhitfie
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/RRBS

    v1 v2  
    88
    99  * **QC**
    10     * Use [https://www.bioinformatics.babraham.ac.uk/projects/trim_galore Trim Galore] or another read trimmer to apply quality filters and remove adapters.
    11     * See our [http://barcwiki.wi.mit.edu/wiki/SOPs/mapping mapping SOP] to search for details on running Trim Galore.
    12     * When assessing transcription of TEs, it is ''essential'' to include multi-mapping reads.  When using STAR, in particular, the winAnchorMultimapNmax and outFilterMultimapNmax flags are used to control multimapping, with the former setting a lower bound on how many loci must have a matching seed and the latter defining the upper bound on how many loci a read maps to in order to report it.  A command for STAR mapping paired end reads using gzipped fastq input can look like:
     10    * Use [https://www.bioinformatics.babraham.ac.uk/projects/trim_galore Trim Galore] or [http://barcwiki.wi.mit.edu/wiki/SOPs/qc_shortReads another] read trimmer to apply quality filters and remove adapters.
     11    * See our [http://barcwiki.wi.mit.edu/wiki/SOPs/qc_shortReads QC and preprocessing guidelines] for details on running Trim Galore.
     12    * When .  A command for Trim Galore paired end reads using gzipped fastq input can look like:
    1313
    1414{{{
    15 bsub STAR --genomeDir /path/to/STAR/index/for/organism --readFilesIn /path/to/reads_1.fastq.gz /path/to/reads_2.fastq.gz --outFileNamePrefix somePrefix --sjdbScore 2 --runThreadN 8 --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --winAnchorMultimapNmax 100 --outFilterMultimapNmax 100
     15bsub trim_galore --paired --rrbs --fastqc -o trimmedReads /path/to/raw/data/reads_1.fq.gz /path/to/raw/data/reads_2.fq.gz
    1616}}}
    1717
    18   * **Quantification of raw counts**
    19     * TEtranscripts uses the BAM file(s) of aligned reads (from STAR in this example) as input.
    20     * TEtranscripts relies on separate gene annotation files (GTFs) for genes and TEs.  A curated collection of TE GTFs can be found [https://www.dropbox.com/sh/1ppg2e0fbc64bqw/AACUXf-TA1rnBIjvykMH2Lcia?dl=0 here] and in the genome resources on the cluster at the Whitehead Institute (for human, mouse and fly).
    21     * Before assigning reads, it is important to know whether they are stranded (see the '''Quantification of raw counts''' section of our [http://barcwiki.wi.mit.edu/wiki/SOPs/rna-seq-diff-expressions best practices] page for details on how to determine this).
    22     * The best way to use the resources on the cluster to assign reads to genes and TEs is by running TEcount separately on each experiment (reverse stranded reads are shown in the example below, for forward stranded reads use --stranded forward and for unstranded reads use --stranded no (the default)):
     18  * **Quantification of methylation calls**
     19    * Bismark produces BAM file(s) of aligned reads and methylation calls.
     20 
     21{{{
     22bsub /path/to/bismark/bismark --genome /nfs/genomes/mouse_mm10_dec_11_no_random/bowtie/ -1 trimmedReads_1.fq.gz -2 trimmedReads_2.fq.gz
     23}}}
     24
     25    * Comment.
     26    * Comment.
     27
     28* **Extract methylation calls**
     29    * After running bismark...
     30    * For paired ended data...:
    2331
    2432{{{
    25 # Reverse stranded reads
    26 bsub TEcount --sortByPos --format BAM --stranded reverse -b /path/to/alignment.bam --GTF /path/to/gene.gtf --TE /path/to/TE.gtf --mode multi --project projectName -i 100
     33bsub /path/to/bismark/bismark_methylation_extractor -p --gzip --bedGraph trimmedReads_bismark_bt2_pe.bam
    2734}}}
    28 
    29     * The --sortByPos flag is necessary here because this was the sorting used in the STAR mapping, above.
    30     * The -I 100 (default) flag sets the maximum number of expectation maximization steps to take in computing maximum likelihood estimates of counts for repetitive elements.
    31 
    32 * **Assessing differential expression for genes and TEs**
    33     * After running TEcount on each sample in your experiment, the reported counts (i.e. a list of raw counts per gene/TE for each sample) can be combined into a counts matrix and analyzed following the steps outlined in the '''Statistics for differential expression''', '''Identifying differentially expressed genes''' and '''Accounting for a batch effect in a differential expression model''' sections of our [http://barcwiki.wi.mit.edu/wiki/SOPs/rna-seq-diff-expressions best practices] page.
    34     * If the number of samples is not too large, the counting and analysis of differential expression can be carried out using a single execution of TEtranscripts ''instead'' of using TEcount (reverse stranded reads are shown in the example below, for forward stranded reads use --stranded forward and for unstranded reads use --stranded no (the default)):
    35 
    36 {{{
    37 # Reverse stranded reads
    38 bsub TEtranscripts --format BAM --stranded reverse -t /path/to/treat1.bam /path/to/treat2.bam -c /path/to/control1.bam /path/to/control2.bam --GTF /path/to/gene.gtf --TE /path/to/TE.gtf --mode multi --project treat_vs_control --minread 1 -i 100 --padj 0.05 --norm DESeq_default --sortByPos
    39 }}}
    40     * The design for tests of differential expression above is a comparison between two biological contexts (e.g. treatment versus control, samples listed after the -t flag versus samples listed after the -c flag).  If your experimental design is more complex, you should use TEcounts with a subsequent custom analysis of differential expression.
    41     * The output from TEtranscripts includes tests of differential expression carried out using [http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html DESeq2], in addition to a (sample x transcript) matrix of counts.
    42  * **Alternative software**
    43    * [https://github.com/nerettilab/RepEnrich2 RepEnrich2]
    44       * Steven W Criscione, Yue Zhang, William Thompson, John M Sedivy & Nicola Neretti [[https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-583|Transcriptional landscape of repetitive elements in normal and cancer human cells]], ''BMC Genomics'' '''15''', 583 (2014).
    45    * [http://research-pub.gene.com/REdiscoverTEpaper/software/ REdiscoverTE]:
    46       * Yu Kong, Christopher M. Rose, Ashley A. Cass, Alexander G. Williams, Martine Darwish, Steve Lianoglou, Peter M. Haverty, Ann-Jay Tong, Craig Blanchette, Matthew L. Albert, Ira Mellman, Richard Bourgon, John Greally, Suchit Jhunjhunwala & Haiyin Chen-Harris [[https://www.nature.com/articles/s41467-019-13035-2|Transposable element expression in tumors is associated with immune infiltration and increased antigenicity]], ''Nature Communications'' '''10''', 5228 (2019).
     35    * Comment 1
     36    * Comment 2