1 | | |
2 | | == Preprocessing read files from NCBI SRA == |
3 | | |
4 | | **SRA** (for Sequence Read Archive) is a NCBI binary format for short reads. |
5 | | |
6 | | It's thoroughly described in the [[http://www.ncbi.nlm.nih.gov/books/NBK47528/|SRA Handbook]] |
7 | | |
8 | | Processing SRA files requires the [[https://ncbi.github.io/sra-tools/|NCBI SRA Toolkit]], which is installed on our systems. |
9 | | |
10 | | The main command is **fastq-dump <SRA archive file>**, like |
11 | | |
12 | | ''**fastq-dump SRR060751.sra**'' |
13 | | |
14 | | If your reads are paired, by default the #1 and #2 reads will end up concatenated together in the same file. To get them into separate files, instead use a command like |
15 | | |
16 | | ''**fastq-dump --split-files SRR060751.sra**'' |
17 | | |
18 | | You can ask for gzipped output instead of typical fastq: |
19 | | |
20 | | ''**fastq-dump --gzip SRR060751.sra**'' |
21 | | |
22 | | See [[https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump|Converting SRA format data into FASTQ]] for all program options. |
23 | | |
24 | | Note that a fastq file is about 4-5x larger than its corresponding SRA file. |
25 | | |
26 | | fastq-dump can be used to download/fetch the SRA file, or you can download (eg. using wget) the SRA file directly and then run fastq-dump to get the fastq file. Downloading SRA file directly will avoid changing home dir path for large file (see below). |
27 | | |
28 | | '''Note:''' As of fastq-dump version 2.8.1, running fastq-dump will require the vdb-config to be set up correctly. By default, downloaded/cache file is copied to the user's home directory, which is likely to run out of space. Run, |
29 | | |
30 | | {{{ |
31 | | vdb-config --restore-defaults |
32 | | vdb-config -i #use the GUI to enter a different location. |
33 | | }}} |
34 | | |
35 | | Manually editing the file, $HOME/.ncbi/user-settings.mkfg, doesn't seem to work. See [[https://ncbi.github.io/sra-tools/install_config.html | NCBI SRA Installation/Config]]. Other alternatives: i) simply symlink the ncbi directory in your home directory to somewhere else with larger storage, or ii) download the SRA file directly (eg. using wget) before using fastq-dump. |
36 | | |
37 | | {{{ |
38 | | #download SRR4090409.sra (e.g. use wget) from SRA and convert to fastq |
39 | | fastq-dump SRR4090409.sra |
40 | | |
41 | | #download SRA file via fastq-dump (important: home directory or vdb-config file must be set up correctly), and convert to fastq |
42 | | fastq-dump SRR4090409 |
43 | | }}} |