Changes between Initial Version and Version 1 of SOPs/coordinates


Ignore:
Timestamp:
01/23/13 16:49:43 (12 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/coordinates

    v1 v1  
     1  * The detailed format descriptions can be found at http://genome.ucsc.edu/goldenPath/help/customTrack.html
     2  * How to choose data format? Check http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
     3  * samtools reference: http://samtools.sourceforge.net/
     4  * samtools and bedtools have been installed on tak
     5  * Other scripts can be found in the BaRC_Public/BaRC_code folder (/nfs/BaRC_Public/BaRC_code)
     6
     7=== Convert text file to wig ===
     8
     9 {{{
     10   Sample command:
     11    txt2wig.pl foo.txt trackName(one word) > foo.wig
     12  }}} 
     13
     14=== Convert bed to wig ===
     15
     16{{{
     17    Sample command:
     18    bed2wig.pl inputBed sampleName(one word) probeWidth > outputWig
     19    Note: It assumes that the probe width in all records is constant.
     20          If probe width is not constant, you can use bedGraph format.
     21          To convert bed to bedGraph format, just change the track name to bedGraph, and minus chromosome end position in bed format by 1. 
     22}}}
     23
     24=== Convert wig to bed ===
     25
     26 {{{
     27   Sample command with variableStep wig format:
     28    wig2bed.pl inputWig sampleName(one word) > outputBed
     29   
     30    Sample command with fixedStep wig format:
     31    wig2bed_fixedStep.pl inputWig > outputBed
     32}}}
     33
     34=== Convert wig to bigwig ===
     35
     36 {{{
     37   Sample commands:
     38   Get chromosome lengths
     39    fetchChromSizes  hg18 > chrSize.txt
     40   Convert wig to big wig: 
     41    wigToBigWig foo.wig chrSize.txt foo.bw
     42 }}}
     43   
     44=== Convert bed to bigbed ===
     45 {{{
     46   Sample commands:
     47   Get chromosome lengths
     48    fetchChromSizes  hg18 > chrSize.txt
     49   Convert bed to big bed: 
     50    bedToBigBed foo.bed chrSize.txt foo.bb
     51}}}
     52
     53=== Convert BAM to bedGraph for UCSC genome browser ===   
     54 {{{
     55   To view BAM files on UCSC browser, both foo.sorted.bam and foo.sorted.bam.bai have to be on a http or ftp server. One way to get around this is to convert BAM files into bedGraph files, which should be small enough that they can be simply uploaded.
     56    genomeCoverageBed -split -bg -ibam sorted.bam -g hg19.genome   
     57    where hg19.genome file is tab delimited and structured as follows:
     58        <chromName><TAB><chromSize>
     59        chr1    249250621
     60    One can use the UCSC Genome Browser's MySQL database to extract chromosome sizes. For example, H. sapiens:
     61        mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from hg19.chromInfo" > hg19.genome
     62}}}
     63 
     64=== convert bam to bigwig ===
     65{{{
     66  Step1: convert bam to bedGraph format:
     67genomeCoverageBed -split -bg -ibam accepted_hits.bam -g /nfs/genomes/mouse_gp_jul_07/anno/mm9.size > accepted_hits.bedGraph
     68
     69  Step2: convert bedGraph to bigwig format:
     70bedGraphToBigWig  accepted_hits.bedGraph /nfs/genomes/mouse_gp_jul_07/anno/mm9.size accepted_hits.bw
     71    where mm9.size file is tab delimited and structured as follows:
     72        <chromName><TAB><chromSize>
     73}}}
     74   
     75=== Updating/fixing UCSC GTF file ===
     76
     77  * GTF files from UCSC Table Browser use RefSeq (NM* ids) for both gene_id and transcript_id which may not be compatible for some programs (eg. counting by genes using HTSeq)
     78  * Some Refseq gtf files (such as for the hg19, hg18, mm9, and dm3 assemblies) are in /nfs/genomes/, under gtf/ in each species folder. If you would like to create additional files, here are the steps:
     79
     80 {{{
     81   Step 1: Use UCSC Table Browser to download RefSeq id and gene symbol.
     82     Use "Genes and Gene Prediction Tracks" for group, "RefSeq Genes" for track and "refGene" for table.  Choose  "selected fields from primary and related tables" for output format and click "get output".  In the next page select "name" and "name2" for the fields. 
     83     output format should be : NM_017940       NBPF1
     84   Step 2: Download a gtf file from the UCSC Table Browser
     85     This uses refseq ID as gene_id and transcript_id, so we need to replace it with the gene symbol.
     86     sample command: 
     87       /nfs/BaRC_Public/BaRC_code/Perl/fix_gtf_refSeq_ensembl.pl hg19.refgene.gtf refseq2symbol > hg19.refgene.gtf
     88   Step 3: About 50-70 genes in the gtf file from UCSC are incorrect; they include exons with a start coordinate that is larger than the end coordinate. 
     89     Software such as cufflinks fails to deal with this situation and ignores these exons.
     90     Since this only affects the last 1-3 bases of a transcript, a temporary solution is to remove these records.
     91       sample command: awk -F"\t" '{ if($4<=$5) print $0 }' hg19.refgene.gtf > hg19.refgene_new.gtf
     92}}}
     93
     94  * Ensembl gtf files can be downloaded from ftp://ftp.ensembl.org/pub/current_gtf/
     95
     96
     97=== Convert bed to gff ===
     98
     99  * Note that bed and gff use slightly different coordinate conventions
     100  * Use /nfs/BaRC_Public/BaRC_code/Perl/bed2gff/bed2gff.pl
     101
     102{{{
     103    USAGE: bed2gff.pl bedFile > gffFile
     104    Ex: bed2gff.pl foo.bed WIBR exon > foo.gff
     105 }}}
     106   
     107=== Convert gtf to bed ===
     108
     109  1. convert gtf to genePhred
     110{{{
     111   gtfToGenePred my.gtf my.genePhred
     112 }}}
     113  2. convert genePhred to bed:
     114{{{
     115   awk -f genePhredToBed my.genePhred > my.bed
     116 }}}
     117
     118genePhredToBed is a awk script by Katrina Learned, downloaded from UCSC Genome Browser discussion list
     119{{{
     120#!/usr/bin/awk -f
     121
     122#
     123# Convert genePred file to a bed file (on stdout)
     124#
     125BEGIN {
     126     FS="\t";
     127     OFS="\t";
     128}
     129{
     130     name=$1
     131     chrom=$2
     132     strand=$3
     133     start=$4
     134     end=$5
     135     cdsStart=$6
     136     cdsEnd=$7
     137     blkCnt=$8
     138
     139     delete starts
     140     split($9, starts, ",");
     141     delete ends
     142     split($10, ends, ",");
     143     blkStarts=""
     144     blkSizes=""
     145     for (i = 1; i <= blkCnt; i++) {
     146         blkSizes = blkSizes (ends[i]-starts[i]) ",";
     147         blkStarts = blkStarts (starts[i]-start) ",";
     148     }
     149
     150     print chrom, start, end, name, 1000, strand, cdsStart, cdsEnd, 0, blkCnt, blkSizes, blkStarts
     151}
     152 }}}
     153=== Convert blat to gff ===
     154
     155  * Use /nfs/BaRC_Public/BaRC_code/Perl/blat2gff/blat2gff.pl
     156 
     157 {{{
     158   Convert BLAT output file (PSL format) into GFF format (v1.1 14 Dec 2010)
     159    blat2gff.pl blatFile dataSource(ex:WIBR) > gffFile
     160}}