wiki:SOPs/submittingSequencingToGEO

Submitting a sequencing dataset to GEO

Almost every journal requires an experiment that includes high-throughput sequencing to share the high-throughput sequence data on a repository like NCBI's GEO.

The best source of information for how to submit this data is NCBI's page for Submitting high-throughput sequence data to GEO. BaRC can help prepare the files and perform the GEO submission or we can help you do it yourself.

The best ways to start is to Download the metadata spreadsheet (template and examples) and start to fill out the description of your experiment and the samples that comprise it. Explain everything clearly enough so that people can understand an overview of your experiment (and each sample) from the GEO page, without having to read all of your publication. Spelling out words, avoiding acronyms, and using clear nomenclature will be a big help to others who might want to access your data. If BaRC is helping you with the file submission, they can fill out the bottom part of the spreadsheet.

You or BaRC will need to assemble all of your sequencing (fastq.gz) files and summary files (which will depend on the type of experiment). If some of your samples weren't analyzed for the publication, you'll want to decide whether you want to make them public or not.

If you ask BaRC for help with submitting sequencing or other files to NCBI GEO,

  1. Go to NCBI and log in: https://www.ncbi.nlm.nih.gov/account/ NCBI allows you to log in with a variety of types of accounts.

  2. Go to the GEO submission page: https://www.ncbi.nlm.nih.gov/geo/submitter/

  3. Fill in information about you on the left. On the right side of the page, uncheck "Same as Investigator" and fill in the name and email of the BaRC person who is helping you.

  4. Click the Save button.

  5. Click the "New submission" button.

  6. On the next page, click the "Submit high-throughput sequencing" link.

  7. Click the "Transfer files" button.

  8. Click the "Create personalized upload space" button.

  9. Wait until you see the "Your personalized upload space is: " message.

  10. Send this information to the BaRC person who is helping you.

After the BaRC person has finished uploading your data to GEO, they will let you know.

  1. Go back to GEO, log in, and go to https://submit.ncbi.nlm.nih.gov/geo/submission/

  2. Let NCBI know that your FTP file transfer to GEO is complete.

  3. NCBI will let you and BaRC know when your private-for-now GEO web pages are available.

  4. Check that the sample descriptions are clear and complete.

  5. Right before or after your publication is made public, change the GEO access from private to public.

To upload your files to GEO, you can use a file transfer program with a graphical interface (as recommended by GEO), or you can use a command-line method on a Whitehead server. Two options are ftp and ncftp. In the simplest case, all of your files for submission will be in the same directory. Start by going to that directory on the WI server and then confirm that the files are actually there.

cd /path/to/my/files
ls

Make a connection to the NCBI server using the credentials that you got from NCBI and upload your files.

# Using FTP
ftp ftp-private.ncbi.nlm.nih.gov
# You'll be prompted for your username and password
name: geoftp
password: MYPASSWD  # each submission is different
# Now you should be connected to the GEO server.  Change to the directory you were given
cd uploads/jsmith_e123ABC
# Turn off iterative mode (so you don't get asked to confirm every file)
prompt
# Upload all of the files in the directory from where you connected to GEO 
mput *
# Wait what can be a long time ...
# After everything is uploaded
quit
# Using NCFTP
ncftp -u geoftp -p MYPASSWD ftp-private.ncbi.nlm.nih.gov
# Now you should be connected to the GEO server.  Change to the directory you were given
cd uploads/jsmith_e123ABC
# Upload all of the directories/files in the directory from where you connected to GEO
# Note that this (a recursive upload) is something that 'ftp' can't do
mput -r *
# Wait what can be a long time ...
# After everything is uploaded
quit

While you command-line uploading is in progress, you can always open a FTP GUI (like FileZilla) to monitor and/or confirm progress. You can also (on the command-line) send your ftp/ncftp command "to the background" (by typing control-z and then 'bg'). The upload should continue, even if you turn off your laptop.

When the file upload is complete, follow through by submitting your metadata file to GEO (according to the instructions on their website).

Note: See TracWiki for help on using the wiki.