Changes between Version 65 and Version 66 of SOPs/rna-seq-diff-expressions


Ignore:
Timestamp:
09/02/25 09:37:53 (6 weeks ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/rna-seq-diff-expressions

    v65 v66  
    7575
    7676#For PE reads the bam files needs to be sorted by name (default for htseq-count), eg.
    77 #bsub "samtools sort -n -o accepted_hits.sortedByName.bam -m 5G -O bam -T temp accepted_hits.bam"
     77#sbatch --partition=20 --job-name=sort --mem=32G --wrap "samtools sort -n -o accepted_hits.sortedByName.bam -m 5G -O bam -T temp accepted_hits.bam"
    7878#If the bam file is sorted by coordinate you may try htseq-count -r option, eg. -r pos , however,
    7979#this may not always work (htseq-count throws numerous errors).
    80 
    81 #To request a certain amount of memory and a specific node use bsub -R "rusage[mem=50000]" -m NodeName
    8280 
    8381}}}
     
    9593        * To quantify transcripts and genes in a GTF file, use a command like
    9694{{{
    97 bsub cufflinks -G gene_models.gtf accepted_hits.bam
     95sbatch --partition=20 --job-name=cuff --mem=32G --wrap "cufflinks -G gene_models.gtf accepted_hits.bam"
    9896}}}
    9997        * To quantify transcripts, potentially novel, annotated by cufflinks, use a command like
    10098{{{
    101 bsub cufflinks accepted_hits.bam
     99sbatch --partition=20 --job-name=cuff --mem=32G --wrap "cufflinks accepted_hits.bam"
    102100}}}
    103101        * Gene-level FPKM values are calculated by taking the sum of all transcript FPKMs for a gene.  As a result, no "gene length" needs to be calculated.
    104         * NOTE: Some genes, although present in a GTF annotation file, may not get quantified by cufflinks.  This occurs for genes found in very long regions of overlapping genes (which exceed the default value for --max-bundle-length).  When this occurs, the standard err output of cufflinks (contained in the long LSF email when cufflinks is run via 'bsub') will contain the message "Warning: Skipping large bundle."  To correct this (or prevent it in the first place), add an argument like '--max-bundle-length 10000000' to the cufflinks command. You may want to compare the list of genes in the GTF file to that of the cufflinks output to verify that they match.
     102        * NOTE: Some genes, although present in a GTF annotation file, may not get quantified by cufflinks.  This occurs for genes found in very long regions of overlapping genes (which exceed the default value for --max-bundle-length).  When this occurs, the standard err output of cufflinks (contained in the long slurm output when cufflinks is run via 'sbatch') will contain the message "Warning: Skipping large bundle."  To correct this (or prevent it in the first place), add an argument like '--max-bundle-length 10000000' to the cufflinks command. You may want to compare the list of genes in the GTF file to that of the cufflinks output to verify that they match.
    105103        * If you only want to quantify genes in your GTF file use the -G option (instead of -g which will give also transcripts found by Cufflinks and will take away counts from transcripts in your gtf file).
    106104{{{