15 | | If your reads are paired, by default the #1 and #2 reads will end up concatenated together in the same file. |
16 | | To check if the SRA sample has paired reads or not, go to the [https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=run_browser SRA Run browser], enter the sample ID, and look in the table under "Layout". |
17 | | |
18 | | To get matched paired reads into separate files, use a command like |
19 | | |
20 | | ''**fastq-dump --split-3 SRR060751.sra**'' |
21 | | |
22 | | This works the same as using the "--split-files", but "--split-3" puts unpaired reads (if any) into a third file. |
23 | | |
24 | | You can ask also for gzipped output instead of typical fastq: |
25 | | |
26 | | ''**fastq-dump --split-3 --gzip SRR060751.sra**'' |
27 | | |
28 | | See [[https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump|Converting SRA format data into FASTQ]] for all program options. |
29 | | |
30 | | Note that a fastq file is about 4-5x larger than its corresponding SRA file. |
31 | | |
32 | | fastq-dump can be used to download/fetch the SRA file, or you can download (eg. using wget) the SRA file directly and then run fastq-dump to get the fastq file. Downloading SRA file directly will avoid changing home dir path for large file (see below). |
33 | | |
34 | | '''Note:''' As of fastq-dump version 2.8.1, running fastq-dump will require the vdb-config to be set up correctly. By default, downloaded/cache file is copied to the user's home directory, which is likely to run out of space. Run, |
| 17 | To download one SRR ID at a time to get fastq.gz format, use the command fastq-dump, like |
41 | | Manually editing the file, $HOME/.ncbi/user-settings.mkfg, doesn't seem to work. See [[https://ncbi.github.io/sra-tools/install_config.html | NCBI SRA Installation/Config]]. Other alternatives: i) simply symlink the NCBI directory in your home directory to somewhere else with larger storage, or ii) download the SRA file directly (eg. using wget) before using fastq-dump. |
| 23 | With the option "--split-3", |
| 24 | * single-end reads will end up in a single file, named SRR123456.fastq.gz |
| 25 | * paired-end reads will produce two files (named SRR123456_1.fastq.gz and SRR123456_2.fastq.g) |
| 26 | * unpaired reads (if any) will be placed into a third file. |
| 27 | |
| 28 | See the [[https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump|fastq-dump documentation]] for all program options. |
| 29 | |
| 30 | We recommend always gzipping fastq files because |
| 31 | * fastq.gz files are much smaller than fastq files |
| 32 | * our typically-used analysis programs all permit fastq.gz input |
| 33 | |
| 34 | |
| 35 | '''Note:''' Running fastq-dump places downloaded or cache files into the user's home directory, which is likely to run out of space. To prevent this, you have at least 3 options: |
| 36 | |
| 37 | Option 1: symlink the NCBI directory in your home directory to somewhere else with larger storage, such as with a command like |
53 | | As mentioned in [[https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/| SRA website ]], you can download list of Run accessions from search results page ([[https://www.ncbi.nlm.nih.gov/sra/?term=cancer |- Example offsite image]]) - select Runs of interest by clicking on the checkboxes, click on "Send To", "file", and select "Accession List" in the drop-down menu. |
| 48 | Option 3: Modify your environment with vdb-config |
| 49 | {{{ |
| 50 | vdb-config --restore-defaults # To restore your settings |
| 51 | vdb-config -i # To use the GUI to enter a different location with "Set Default Import Path". |
| 52 | }}} |
| 53 | |
| 54 | === Downloading and processing multiple NCBI SRA samples === |
| 55 | |
| 56 | To '''download a list of SRR files''' (such as for all of the samples of a data series) from NCBI, use prefetch. |