| | 66 | == Preprocessing read files from NCBI SRA == |
| | 67 | |
| | 68 | **SRA** (for Sequence Read Archive) is a NCBI binary format for short reads. |
| | 69 | |
| | 70 | It's thoroughly described in the [[http://www.ncbi.nlm.nih.gov/books/NBK47528/|SRA Handbook]] |
| | 71 | |
| | 72 | Processing SRA files requires the [[https://ncbi.github.io/sra-tools/|NCBI SRA Toolkit]], which is installed on our systems. |
| | 73 | |
| | 74 | The main command is **fastq-dump <SRA archive file>**, like |
| | 75 | |
| | 76 | ''**fastq-dump SRR060751.sra**'' |
| | 77 | |
| | 78 | If your reads are paired, by default the #1 and #2 reads will end up concatenated together in the same file. |
| | 79 | To check if the SRA sample has paired reads or not, go to the [https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=run_browser SRA Run browser], enter the sample ID, and look in the table under "Layout". |
| | 80 | |
| | 81 | To get paired reads into separate files, use a command like |
| | 82 | |
| | 83 | ''**fastq-dump --split-files SRR060751.sra**'' |
| | 84 | |
| | 85 | You can ask for gzipped output instead of typical fastq: |
| | 86 | |
| | 87 | ''**fastq-dump --gzip SRR060751.sra**'' |
| | 88 | |
| | 89 | See [[https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump|Converting SRA format data into FASTQ]] for all program options. |
| | 90 | |
| | 91 | Note that a fastq file is about 4-5x larger than its corresponding SRA file. |
| | 92 | |
| | 93 | fastq-dump can be used to download/fetch the SRA file, or you can download (eg. using wget) the SRA file directly and then run fastq-dump to get the fastq file. Downloading SRA file directly will avoid changing home dir path for large file (see below). |
| | 94 | |
| | 95 | '''Note:''' As of fastq-dump version 2.8.1, running fastq-dump will require the vdb-config to be set up correctly. By default, downloaded/cache file is copied to the user's home directory, which is likely to run out of space. Run, |
| | 96 | |
| | 97 | {{{ |
| | 98 | vdb-config --restore-defaults |
| | 99 | vdb-config -i #use the GUI to enter a different location. |
| | 100 | }}} |
| | 101 | |
| | 102 | Manually editing the file, $HOME/.ncbi/user-settings.mkfg, doesn't seem to work. See [[https://ncbi.github.io/sra-tools/install_config.html | NCBI SRA Installation/Config]]. Other alternatives: i) simply symlink the NCBI directory in your home directory to somewhere else with larger storage, or ii) download the SRA file directly (eg. using wget) before using fastq-dump. |
| | 103 | |
| | 104 | {{{ |
| | 105 | #download SRR4090409.sra (e.g. use wget) from SRA and convert to fastq |
| | 106 | fastq-dump SRR4090409.sra |
| | 107 | |
| | 108 | #download SRA file via fastq-dump (important: home directory or vdb-config file must be set up correctly), and convert to fastq |
| | 109 | fastq-dump SRR4090409 |
| | 110 | }}} |
| | 111 | |
| | 112 | |