Changes between Initial Version and Version 1 of FAQ


Ignore:
Timestamp:
03/11/20 12:47:37 (5 years ago)
Author:
dionisio
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • FAQ

    v1 v1  
     1Browse the page, use the Find command in your browser or use the Search box at the top right of the page to search the questions and answers.
     2
     3Frequently Asked Questions
     4
     51. Where can I find sample '''[#blastplus blast+ commands?]'''[[br]] [[br]]
     61. How can I '''[#align2 align two sequences?]'''[[br]] [[br]]
     71. How can I '''[#blat run BLAT locally?]'''[[br]] [[br]]
     81. How can I '''[#promoter get the promoter sequence]''' of a gene?[[br]] [[br]]
     91. How can I '''[#nonred make a list of items non-redundant]'''?[[br]] [[br]]
     101. How can I access the '''[#entrez Whitehead version of the Entrez Gene database]'''?[[br]] [[br]]
     111. How can I find slides and materials from '''[#hottopics past Hot Topics talks]'''?[[br]] [[br]]
     121. How can I '''[#relational create my own relational database]'''?[[br]] [[br]]
     131. How can I '''[#tracks download data/tracks from UCSC]'''?[[br]] [[br]]
     141. How can I '''[#barctools access BaRC Tools]''' or know what tools are available?[[br]] [[br]]
     151. How can I '''[#LSF submit a job to the LSF cluster]'''?[[br]] [[br]]
     161. How can I find out what '''[#perlR Perl modules or R packages]''' are installed? Which version is currently installed in the server?[[br]] [[br]]
     171. How can I '''[#xwindow connect to tak]'''?[[br]] [[br]]
     181. How can I '''[#servers get to my or my lab shared storage]'''?[[br]] [[br]]
     191. Where can I '''[#blast find local BLAST databases]'''?[[br]] [[br]]
     201. Where can I '''[#genomeSeqs find genome sequences]'''?    [[br]] [[br]]
     211. Where can I '''[#btFormats find genomes formatted for bowtie, tophat, or blat]'''?[[br]] [[br]]
     221. Where can I '''[#tfs find known or predicted transcription factors that regulate a gene]'''?[[br]] [[br]]
     231. Where can I '''[#unix find simple (one-liner) Unix/Perl commmands]'''?[[br]] [[br]]
     241. Where can I '''[#Rcode find samples of R code]'''?[[br]] [[br]]
     251. Where can I '''[#UCSCmirror find the local mirror of the UCSC genome browser]'''?[[br]] [[br]]
     261. Where can I '''[#galaxy find the local mirror of Galaxy]'''?[[br]] [[br]]
     271. Where can I '''[#rstudio find R Studio on tak]'''?[[br]] [[br]]
     281. Where can I '''[#IGV find IGV download]'''?[[br]] [[br]]
     291. Which software should I use to '''[#heatmaps cluster, create and display heatmaps]'''?  [[br]] [[br]]
     301. Which software should I use to do '''[#GOtools GO enrichment analysis]'''?[[br]] [[br]]
     311. Which software should I use to '''[#GeneNetwork display a gene network]'''?[[br]] [[br]]
     321. How can I get '''[#software desktop software]''' provided by Whitehead?[[br]] [[br]]
     331. Which software should I use to '''[#stats do statistics]'''?[[br]] [[br]]
     341. How can I '''[#pfam search for Pfam (protein) profiles]''' in my protein set using HMMs?[[br]] [[br]]
     351. How do I '''[#R_pkg_install install an R package locally]'''?[[br]] [[br]]
     361. I need to '''[#transfer send/receive very large data files]''' to/from a colleague outside of Whitehead. What is the best way to do this?[[br]] [[br]]
     371. Why do I get '''[#wi_ncbi_blast different BLAST results]'''  from [[http://tak.wi.mit.edu/blast/ | WI]] and NCBI Blast? [[br]] [[br]]
     381. How do I run '''[#tophat_bowtie tophat/bowtie on the LSF with a gzip'd tar (*.tar.gz)]''' file?  [[br]] [[br]]
     39
     40----
     41Answers to Frequently Asked Questions
     42
     431. [=#blastplus Where can I find sample blast+ commands?] [[br]] [[br]]
     44    * See [http://gir.wi.mit.edu/trac/wiki/barc/blastTips BLAST+ tips][[br]] [[br]]
     451.  [=#align2  How can I '''align two sequences'''?] [[br]] [[br]]
     46  * Use an EMBOSS program ([http://bioinfo.wi.mit.edu/bio/tools/emboss/]) for an optimal alignment
     47       * **water** for a Smith-Waterman optimal local alignment
     48       * **needle** for a Needleman-Wunsch optimal global alignment
     49       * **stretcher** for a Needleman-Wunsch optimal global alignment (optimized for longer sequences)
     50  * Use **blast2seq** [http://tak.wi.mit.edu/blast/wblast2.html] for a quick local alignment  [[br]] [[br]]
     511. [=#blat How can I '''run BLAT locally'''?]  [[br]] [[br]]
     52  * See our [http://bioinfo.wi.mit.edu/bio/bioinfo/docs/blat_tak.html Using BLAT on tak] page.[[br]] [[br]]
     531. [=#promoter How can I '''get the promoter sequence''' of a gene?] [[br]] [[br]]
     54  - Go to the [http://genome.ucsc.edu/cgi-bin/hgGateway UCSC Genome Bioinformatics] genome browser.
     55  - Choose your desired genome and enter your desired gene (in the "position or search term" box).
     56  - If the gene has multiple transcripts, choose the one you want.
     57  - Paying attention to the direction of the gene (indicated by the intron hash marks), not the coordinate of the transcription start site (TSS)
     58  - Enter a range of coordinates before and/or after the TSS and click on "jump".
     59  - When you have the desired range in the browser, click on "DNA" on the top blue bar.
     60  - Check the "Reverse complement" box if your gene is on the negative strand.
     61  - Click on the "get DNA" button.
     62  - If you want to check your sequence relative to the TSS, map it with [http://genome.ucsc.edu/cgi-bin/hgBlat?command=star BLAT].[[br]] [[br]]
     631. [=#nonred How can I '''make a list of items non-redundant'''?] [[br]] [[br]]
     64  * See our [http://barc.wi.mit.edu/tools/redundant/  Redundant List Analysis ] page, which also counts how many times each item appears in your list.[[br]] [[br]]
     651. [=#entrez How can I '''access the Whitehead version of the Entrez Gene database'''?] [[br]] [[br]]
     66  * Whitehead BaRC designed a local copy of the Entrez Gene database using MySQL
     67  * You need a MySQL client to access the database, either a desktop tool like [http://wb.mysql.com/ MySQL Workbench] or a tak account.
     68  * The information you need:
     69       * Hostname = canna.wi.mit.edu
     70       * database = entrez_gene
     71       * username = entrezgene
     72       * password = wibr
     73  * On tak, use the command
     74       * mysql -u entrezgene -h canna.wi.mit.edu -D entrez_gene -p[[br]] [[br]]
     751. [=#hottopics How can I find slides and materials from '''past Hot Topics talks'''?] [[br]] [[br]]
     76  * See our [http://jura.wi.mit.edu/bio/education/hot_topics/ Hot Topics ] page, with links to presentations and other materials.[[br]] [[br]]
     771. [=#relational How can I '''create my own relational database'''?] [[br]] [[br]]
     78  * You have the choice of creating a MySQL on your own computer or using Whitehead's MySQL server (canna, which would generally be more robust, if that's needed).
     79  * If you'd like your own installation, download [http://dev.mysql.com/downloads/mysql/ MySQL] and install it.
     80  * If you'd like to use canna, email unix-help@wi.mit.edu and request a database on canna.  Once the IT group creates the database, you will be free to add tables and data.
     81  * Regardless of the system see the [http://dev.mysql.com/doc/refman/5.5/en/index.html MySQL Reference Manual] and our past BaRC presentations about MySQL
     82      * [http://barc.wi.mit.edu/education/bioinfo2006/db4bio/ Relational Databases For Biologists]
     83      * [http://barc.wi.mit.edu/education/bioinfo2006/db4bio/ Querying Biological Databases with SQL][[br]] [[br]]
     841. [=#tracks How can I '''download data/tracks from UCSC'''?] [[br]] [[br]]
     85  * Go to [http://genome.ucsc.edu/ UCSC Genome Bioinformatics]
     86  * Click on "Downloads" on the bar on the left side
     87  * Choose the desired species and assembly, noting that coordinates only apply to the assembly they were generated with.
     88  * Data from most tracks are available by following the "Annotation database" link.
     89  * Every file is either the actual data in a tab-delimited text file (*.txt.gz) or a small file that provides a name for each column.
     90  * The data files can be open in Excel or processed as text files or used to create a table in your MySQL database.
     91  * Note from the data of the annotation files that some are updated much more often than others.[[br]] [[br]]
     921. [=#barctools How can I '''access BaRC Tools''' or know what tools are available?] [[br]] [[br]]
     93  * BaRC Tools can be found on [http://bioinfo.wi.mit.edu/bio/tools/ BaRC Tools].  Available tools are summarized in [http://bioinfo.wi.mit.edu/bio/education/hot_topics/barc_tools/barcTools-summary.pdf Summary] [[br]] [[br]]
     941. [=#LSF How can I '''submit a job to the LSF cluster'''?] [[br]] [[br]]
     95   * Usually you just need to preface your usual command with 'bsub'
     96   * See [http://bioinfo.wi.mit.edu/bio/bioinfo/docs/LSF_help.php Whitehead Linux cluster - LSF help] for sample commands and links to more documentation
     97   * In addition, see the tutorial created by IT called [http://wi-inside.wi.mit.edu/departments/it/services/scientificcomputing/scitutorials Getting Started with the LSF Cluster][[br]] [[br]]
     981. [=#perlR How can I find out what '''Perl modules or R packages''' are installed?] Which version is currently installed in the server? [[br]] [[br]]
     99  * There are links on [http://tak/trac/wiki the home page] of Trac, our Tak tracking system to [http://tak/trac/wiki/Packages installed packaged software], [http://tak/trac/wiki/Perl installed Perl modules], [http://tak/trac/wiki/Python installed Python modules], and [http://tak/trac/wiki/R installed R modules]. [[br]] [[br]]
     1001.  [=#xwindow  How can I '''connect to tak'''?][[br]] [[br]]
     101  * To connect to tak, you need a [http://bioinfo.wi.mit.edu/bio/software/unix/bioinfoaccount.php tak account] and some kind of secure shell (ssh) with X Windows (to get the graphics):
     102   * On a Macintosh, use x11 or Terminal
     103   * On a Windows computer, we recommend [http://cygwin.com Cygwin/X]. You can also use the Whitehead IT [http://bioinfo.wi.mit.edu/bio/tutorials/takpack/TakPack-Installer.exe TakPack installer].
     104  * With either system, double click on the icon to get the "command prompt", the window in which you can type commands.
     105  * Windows only: After opening Cygwin, start X Windows by typing "startx".  A new terminal window will open, and you should use that one.
     106  * From the command prompt, connect to tak (or another Unix/Linux computer) with a command like   
     107    {{{
     108     ssh username@tak.wi.mit.edu -Y
     109     or
     110     ssh username@tak.wi.mit.edu -X
     111     where username is the name of your tak account. 
     112     You'll be prompted for your password.
     113     }}}
     114  * In addition, see the tutorials created by IT called [http://wi-inside.wi.mit.edu/departments/it/services/scientificcomputing/scitutorials Install Cygwin or TakPack]. IT recommends using TakPack if possible as it provides both x11 and and ssh client and is a "lighter install".[[br]] [[br]]
     1151. [=#servers How can I '''get to my or my lab shared storage'''?] [[br]] [[br]]
     116   * On a Mac or Windows computer, most shared storage areas can be accessed via '''wi-files1''' or '''wi-files2''', although high-throughput sequencing data is accessed via '''wi-htdata'''.
     117   * On a Mac computer, get to a server like wi-files1/BaRC_Public (/nfs/BaRC_Public) by connecting to
     118{{{
     119cifs://wi-files1/BaRC_Public
     120}}}
     121   * On a Windows computer, get to a server like wi-files1/BaRC_Public (/nfs/BaRC_Public) by connecting to
     122{{{
     123\\wi-files1\BaRC_Public
     124}}}
     125   * See [http://wi-inside.wi.mit.edu/departments/it/services/filestorage/labshares lab share paths] to get to your lab storage area. [[br]] [[br]]
     1261. [=#blast Where can I '''find local BLAST databases'''?] [[br]] [[br]]
     127  * BLAST formated databases can be found in /nfs/seq/Data on tak.[[BR]][[br]]
     1281. [=#genomeSeqs Where can I '''find genome sequences'''? ]    [[br]] [[br]]
     129  * Genome sequences can be found in /nfs/genomes on tak.[[BR]][[br]]
     1301. [=#btFormats Where can I '''find genomes formatted for bowtie, tophat, or blat'''?] [[br]] [[br]]
     131  *  Within many directories on /nfs/genomes you can these additional files.[[BR]][[br]]
     1321. [=#tfs Where can I '''find known or predicted transcription factors that regulate a gene'''?] [[br]] [[br]]
     133   * We do not have access to [https://portal.biobase-international.com/cgi-bin/portal/login.cgi BIOBASE Knowledge Library], however, an (older) command-line version is available.  See BaRC SOPs for more info.[[br]]
     134   * [http://portal.genego.com/ GeneGO (Login Required)] can be used as well to find known TFs  [[br]] [[br]]
     1351. [=#unix Where can I '''find simple (one-liner) Unix/Perl commmands'''?] [[br]] [[br]]
     136   * There is a helpful list of Unix and Perl commands at [http://bioinfo.wi.mit.edu/bio/bioinfo/scripts/].[[br]] [[br]]
     1371. [=#Rcode Where can I '''find samples of R code'''?] [[br]] [[br]]
     138   * Sample R code is available /nfs/BaRC_Public/BaRC_code.[[br]][[br]]
     1391. [=#UCSCmirror Where can I '''find the local mirror of the UCSC genome browser'''?][[br]] [[br]]
     140  * [http://ucsc.wi.mit.edu/]
     141  * To add tracks via files, copy the files to /nfs/solexa_ucsc and then access it using the URL http://tak.wi.mit.edu/solexa_ucsc to submit tracks for the browser[[br]] [[br]]
     1421. [=#galaxy Where can I '''find the local mirror of Galaxy'''?][[br]] [[br]]
     143  * [https://galaxy.wi.mit.edu/]  [[br]] [[br]]
     1441. [=#rstudio Where can I '''find R Studio on tak'''?][[br]] [[br]]
     145  * [https://tak.wi.mit.edu/rstudio/]  [[br]] [[br]]
     1461. [=#IGV Where can I '''find IGV download'''?] [[br]] [[br]]
     147  * [http://www.broadinstitute.org/software/igv/log-in][[br]] [[br]]
     1481. [=#heatmaps Which software should I use to '''cluster, create and display heatmaps'''?] [[br]] [[br]]
     149  * Cluster 3.0 http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm
     150  * Java Treeview http://jtreeview.sourceforge.net/ [[br]] [[br]]
     1511. [=#GOtools Which software should I use to do '''GO enrichment analysis'''?] [[br]] [[br]]
     152  * DAVID http://david.abcc.ncifcrf.gov/ - our favorite tool
     153  * GSEA http://www.broadinstitute.org/gsea/index.jsp - a Java application that can take a ranked list of all your genes as input
     154  * BIOBASE https://portal.biobase-international.com/cgi-bin/portal/login.cgi
     155  * BiNGO (within Cytoscape http://www.cytoscape.org/)
     156  * GoMiner http://discover.nci.nih.gov/gominer
     157  * GOstat http://gostat.wehi.edu.au
     158  * [http://www.geneontology.org/GO.tools.shtml#term_enrichment GeneOntology.org] has a more complete list.
     159  * Also see the [http://barc.wi.mit.edu/education/hot_topics/ Hot Topics talk] "Gene list enrichment analysis" for more information. [[br]] [[br]]
     1601. [=#GeneNetwork Which software should I use to '''display a gene network'''?] [[br]] [[br]]
     161  * Cytoscape http://www.cytoscape.org/ is our favorite tool for this. [[br]] [[br]]
     1621. [=#software How can I get '''desktop software''' provided by Whitehead?][[br]] [[br]]
     163   * Software is available through the [http://icarus.wi.mit.edu/wibrsw/index.jsp  Whitehead Software database].  To see a list of desktop software, see [http://it.wi.mit.edu/software/get-software].[[br]][[br]]
     1641. [=#stats Which software should I use to '''do statistics'''?] [[br]] [[br]]
     165  * GraphPad Prism (in the [http://icarus.wi.mit.edu/wibrsw/index.jsp Whitehead software database]) -- has an easy-to-use GUI and excellent practical documentation
     166  * MatLab (in the [http://icarus.wi.mit.edu/wibrsw/index.jsp Whitehead software database])
     167  * R (free download from http://www.r-project.org/)
     168  * Also see the BaRC Hot Topics on [http://barc.wi.mit.edu/education/hot_topics/prism/Prism.pdf  An Introduction to GraphPad Prism - statistics and graphing software]  [[br]][[br]]
     1691. [=#pfam How can I '''search for Pfam (protein) profiles'''] in my protein set using HMMs?
     170  * A local copy of all Pfam HMMs can be found at /nfs/seq/pfam_db/Pfam-A.hmm
     171  * Pfam profiles can be easily searched with the HMMER suite of tools
     172    * If you want to annotate a set of proteins with only specific profiles, you can extract one profile at a time with hmmfetch
     173      * ex: hmmfetch /nfs/seq/pfam_db/Pfam-A.hmm PF01731.15 > PF01731.15.hmm
     174    * Use hmmsearch to search your protein set (as a multiple-sequence fasta file) with one to all profiles
     175      * ex1 (all profiles): hmmsearch /nfs/seq/pfam_db/Pfam-A.hmm My_proteins.fa > My_proteins.Pfam_search_out.txt
     176      * ex2 (selected profile): hmmsearch PF01731.15.hmm My_proteins.fa > My_proteins.PF01731.15_search_out.txt
     177    * For more details about HMMER, consult the HMMER User's Guide ([ftp://selab.janelia.org/pub/software/hmmer3/3.0/Userguide.pdf]).[[br]] [[br]]
     1781. [=#R_pkg_install How do I '''install an R package''' locally?]
     179   {{{
     180   #Method 1:
     181   #download package you are interested installing, *tar.gz
     182   #In R command-line
     183   #Location of source AND where to install package
     184   R_libraries_path = "/home/userName/R_libs"
     185   # Go to where the .tar.gz library source is and install
     186   setwd(R_libraries_path)
     187   install.packages("hthgu133ahsentrezgcdf_12.0.0.tar.gz", lib=R_libraries_path, repos=NULL)
     188
     189   #Method 2:
     190   #In bash shell, use directory called R, or whatever you'd like
     191   export R_LIBS="$HOME/R"
     192   #In R command line
     193   source("http://www.bioconductor.org/biocLite.R")
     194   biocLite("pd.mogene.1.0.st.v1")
     195   }}}
     1961. [=#transfer  I need to '''send/receive very large data files''' to/from a colleague outside of Whitehead. What is the best way to do this?]
     197  * Our IT department has built two tools to help you share large files with your colleagues. [Sendit for files up to 2GB; Vort for files over 2GB  http://wi-inside.wi.mit.edu/departments/it/services/filetransfer] [[br]] [[br]]
     1981. [=#wi_ncbi_blast Why do I get '''different BLAST results'''  from http://tak.wi.mit.edu/blast/ and NCBI Blast?]
     199   * WI and NCBI pages have very different defaults. As a result, a hit at WI with an e-value of 1e-12 but at NCBI the alignment is completely different and leads to an e-value of 2e-98.
     200   * Word sizes are different, as are match/mismatch scores (1/-3 at WI, 2/-3 at NCBI).
     201   * Blast at WI filters (hard masks) for low complexity by default, whereas at NCBI it doesn't, instead using the "L;m;" filter string, which filters for low-complexity for the lookup table but not for extension.  This   
     202   can have a huge effect.
     203   * Even at NCBI choosing (a) Human RefSeq sequences or (b) all RefSeq sequences and then filtering for human only also produce somewhat different results.  Also, the size of the database has a big effect on the e-values.  For one example query that produces the same alignment in three different databases, the e-values are very different:
     204     * human RefSeq sequences     => 0.008
     205     * all RefSeq sequences       => 0.47
     206     * nt                         => 2.9
     2071. [=#tophat_bowtie How do I run '''tophat/bowtie on the LSF with a gzip'd tar (*.tar.gz)''' file?]
     208  * bsub bash -c "tophat ... <(tar xvzfO ...) <(tar xvzfO ...)", this is using process substitution
     209    * eg. bsub bash -c "tophat -p 10 -g 1 -o mapped_data_SRR905147_unique -N 2 -I 10000 --segment-length 25 --segment-mismatches 2 hg19 <(tar xvzfO s_2_1_sequence.txt.tar.gz ACTTGA-s_2_1_sequence.txt) <(tar xvzfO CAGATC-s_2_1_sequence.txt.tar.gz CAGATC-s_2_1_sequence.txt)"