Changes between Version 65 and Version 66 of SOPs/rna-seq-diff-expressions
- Timestamp:
- 09/02/25 09:37:53 (6 weeks ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SOPs/rna-seq-diff-expressions
v65 v66 75 75 76 76 #For PE reads the bam files needs to be sorted by name (default for htseq-count), eg. 77 # bsub"samtools sort -n -o accepted_hits.sortedByName.bam -m 5G -O bam -T temp accepted_hits.bam"77 #sbatch --partition=20 --job-name=sort --mem=32G --wrap "samtools sort -n -o accepted_hits.sortedByName.bam -m 5G -O bam -T temp accepted_hits.bam" 78 78 #If the bam file is sorted by coordinate you may try htseq-count -r option, eg. -r pos , however, 79 79 #this may not always work (htseq-count throws numerous errors). 80 81 #To request a certain amount of memory and a specific node use bsub -R "rusage[mem=50000]" -m NodeName82 80 83 81 }}} … … 95 93 * To quantify transcripts and genes in a GTF file, use a command like 96 94 {{{ 97 bsub cufflinks -G gene_models.gtf accepted_hits.bam 95 sbatch --partition=20 --job-name=cuff --mem=32G --wrap "cufflinks -G gene_models.gtf accepted_hits.bam" 98 96 }}} 99 97 * To quantify transcripts, potentially novel, annotated by cufflinks, use a command like 100 98 {{{ 101 bsub cufflinks accepted_hits.bam 99 sbatch --partition=20 --job-name=cuff --mem=32G --wrap "cufflinks accepted_hits.bam" 102 100 }}} 103 101 * Gene-level FPKM values are calculated by taking the sum of all transcript FPKMs for a gene. As a result, no "gene length" needs to be calculated. 104 * NOTE: Some genes, although present in a GTF annotation file, may not get quantified by cufflinks. This occurs for genes found in very long regions of overlapping genes (which exceed the default value for --max-bundle-length). When this occurs, the standard err output of cufflinks (contained in the long LSF email when cufflinks is run via 'bsub') will contain the message "Warning: Skipping large bundle." To correct this (or prevent it in the first place), add an argument like '--max-bundle-length 10000000' to the cufflinks command. You may want to compare the list of genes in the GTF file to that of the cufflinks output to verify that they match.102 * NOTE: Some genes, although present in a GTF annotation file, may not get quantified by cufflinks. This occurs for genes found in very long regions of overlapping genes (which exceed the default value for --max-bundle-length). When this occurs, the standard err output of cufflinks (contained in the long slurm output when cufflinks is run via 'sbatch') will contain the message "Warning: Skipping large bundle." To correct this (or prevent it in the first place), add an argument like '--max-bundle-length 10000000' to the cufflinks command. You may want to compare the list of genes in the GTF file to that of the cufflinks output to verify that they match. 105 103 * If you only want to quantify genes in your GTF file use the -G option (instead of -g which will give also transcripts found by Cufflinks and will take away counts from transcripts in your gtf file). 106 104 {{{