Changes between Version 46 and Version 47 of SOPs/qc_shortReads


Ignore:
Timestamp:
10/29/18 10:25:21 (6 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOPs/qc_shortReads

    v46 v47  
    6464  from [[https://en.wikipedia.org/wiki/FASTQ_format | Wikipedia's FASTQ page]]
    6565
     66== Preprocessing read files from NCBI SRA ==
     67
     68**SRA** (for Sequence Read Archive) is a NCBI binary format for short reads.
     69
     70It's thoroughly described in the [[http://www.ncbi.nlm.nih.gov/books/NBK47528/|SRA Handbook]]
     71
     72Processing SRA files requires the [[https://ncbi.github.io/sra-tools/|NCBI SRA Toolkit]], which is installed on our systems.
     73
     74The main command is **fastq-dump <SRA archive file>**, like
     75
     76''**fastq-dump SRR060751.sra**''
     77
     78If your reads are paired, by default the #1 and #2 reads will end up concatenated together in the same file. 
     79To check if the SRA sample has paired reads or not, go to the [https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=run_browser SRA Run browser], enter the sample ID, and look in the table under "Layout".
     80
     81To get paired reads into separate files, use a command like
     82
     83''**fastq-dump --split-files SRR060751.sra**''
     84
     85You can ask for gzipped output instead of typical fastq:
     86
     87''**fastq-dump --gzip SRR060751.sra**''
     88
     89See [[https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump|Converting SRA format data into FASTQ]] for all program options.
     90
     91Note that a fastq file is about 4-5x larger than its corresponding SRA file.
     92
     93fastq-dump can be used to download/fetch the SRA file, or you can download (eg. using wget) the SRA file directly and then run fastq-dump to get the fastq file.  Downloading SRA file directly will avoid changing home dir path for large file (see below).
     94
     95'''Note:''' As of fastq-dump version 2.8.1, running fastq-dump will require the vdb-config to be set up correctly.  By default, downloaded/cache file is copied to the user's home directory, which is likely to run out of space.  Run,
     96
     97{{{
     98vdb-config --restore-defaults
     99vdb-config -i #use the GUI to enter a different location. 
     100}}}
     101
     102Manually editing the file, $HOME/.ncbi/user-settings.mkfg, doesn't seem to work.  See [[https://ncbi.github.io/sra-tools/install_config.html | NCBI SRA Installation/Config]].  Other alternatives: i) simply symlink the NCBI directory in your home directory to somewhere else with larger storage, or ii) download the SRA file directly (eg. using wget) before using fastq-dump.
     103
     104{{{
     105#download SRR4090409.sra (e.g. use wget) from SRA and convert to fastq
     106fastq-dump SRR4090409.sra
     107
     108#download SRA file via fastq-dump (important: home directory or vdb-config file must be set up correctly), and convert to fastq
     109fastq-dump SRR4090409
     110}}}
     111
     112
    66113== FastQC ==
    67114