Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of FAQ

Timestamp:: 03/11/20 12:47:37 (6 years ago)
Author:: dionisio
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

FAQ

               v1
+Browse the page, use the Find command in your browser or use the Search box at the top right of the page to search the questions and answers.
+Frequently Asked Questions
+. Where can I find sample '''[#blastplus blast+ commands?]'''[[br]] [[br]]
+. How can I '''[#align2 align two sequences?]'''[[br]] [[br]]
+. How can I '''[#blat run BLAT locally?]'''[[br]] [[br]]
+. How can I '''[#promoter get the promoter sequence]''' of a gene?[[br]] [[br]]
+. How can I '''[#nonred make a list of items non-redundant]'''?[[br]] [[br]]
+. How can I access the '''[#entrez Whitehead version of the Entrez Gene database]'''?[[br]] [[br]]
+. How can I find slides and materials from '''[#hottopics past Hot Topics talks]'''?[[br]] [[br]]
+. How can I '''[#relational create my own relational database]'''?[[br]] [[br]]
+. How can I '''[#tracks download data/tracks from UCSC]'''?[[br]] [[br]]
+. How can I '''[#barctools access BaRC Tools]''' or know what tools are available?[[br]] [[br]]
+. How can I '''[#LSF submit a job to the LSF cluster]'''?[[br]] [[br]]
+. How can I find out what '''[#perlR Perl modules or R packages]''' are installed? Which version is currently installed in the server?[[br]] [[br]]
+. How can I '''[#xwindow connect to tak]'''?[[br]] [[br]]
+. How can I '''[#servers get to my or my lab shared storage]'''?[[br]] [[br]]
+. Where can I '''[#blast find local BLAST databases]'''?[[br]] [[br]]
+. Where can I '''[#genomeSeqs find genome sequences]'''?    [[br]] [[br]]
+. Where can I '''[#btFormats find genomes formatted for bowtie, tophat, or blat]'''?[[br]] [[br]]
+. Where can I '''[#tfs find known or predicted transcription factors that regulate a gene]'''?[[br]] [[br]]
+. Where can I '''[#unix find simple (one-liner) Unix/Perl commmands]'''?[[br]] [[br]]
+. Where can I '''[#Rcode find samples of R code]'''?[[br]] [[br]]
+. Where can I '''[#UCSCmirror find the local mirror of the UCSC genome browser]'''?[[br]] [[br]]
+. Where can I '''[#galaxy find the local mirror of Galaxy]'''?[[br]] [[br]]
+. Where can I '''[#rstudio find R Studio on tak]'''?[[br]] [[br]]
+. Where can I '''[#IGV find IGV download]'''?[[br]] [[br]]
+. Which software should I use to '''[#heatmaps cluster, create and display heatmaps]'''?  [[br]] [[br]]
+. Which software should I use to do '''[#GOtools GO enrichment analysis]'''?[[br]] [[br]]
+. Which software should I use to '''[#GeneNetwork display a gene network]'''?[[br]] [[br]]
+. How can I get '''[#software desktop software]''' provided by Whitehead?[[br]] [[br]]
+. Which software should I use to '''[#stats do statistics]'''?[[br]] [[br]]
+. How can I '''[#pfam search for Pfam (protein) profiles]''' in my protein set using HMMs?[[br]] [[br]]
+. How do I '''[#R_pkg_install install an R package locally]'''?[[br]] [[br]]
+. I need to '''[#transfer send/receive very large data files]''' to/from a colleague outside of Whitehead. What is the best way to do this?[[br]] [[br]]
+. Why do I get '''[#wi_ncbi_blast different BLAST results]'''  from [[http://tak.wi.mit.edu/blast/ | WI]] and NCBI Blast? [[br]] [[br]]
+. How do I run '''[#tophat_bowtie tophat/bowtie on the LSF with a gzip'd tar (*.tar.gz)]''' file?  [[br]] [[br]]
+----
+Answers to Frequently Asked Questions
+. [=#blastplus Where can I find sample blast+ commands?] [[br]] [[br]]
+    * See [http://gir.wi.mit.edu/trac/wiki/barc/blastTips BLAST+ tips][[br]] [[br]]
+.  [=#align2  How can I '''align two sequences'''?] [[br]] [[br]]
+  * Use an EMBOSS program ([http://bioinfo.wi.mit.edu/bio/tools/emboss/]) for an optimal alignment
+       * **water** for a Smith-Waterman optimal local alignment
+       * **needle** for a Needleman-Wunsch optimal global alignment
+       * **stretcher** for a Needleman-Wunsch optimal global alignment (optimized for longer sequences)
+  * Use **blast2seq** [http://tak.wi.mit.edu/blast/wblast2.html] for a quick local alignment  [[br]] [[br]]
+. [=#blat How can I '''run BLAT locally'''?]  [[br]] [[br]]
+  * See our [http://bioinfo.wi.mit.edu/bio/bioinfo/docs/blat_tak.html Using BLAT on tak] page.[[br]] [[br]]
+. [=#promoter How can I '''get the promoter sequence''' of a gene?] [[br]] [[br]]
+  - Go to the [http://genome.ucsc.edu/cgi-bin/hgGateway UCSC Genome Bioinformatics] genome browser.
+  - Choose your desired genome and enter your desired gene (in the "position or search term" box).
+  - If the gene has multiple transcripts, choose the one you want.
+  - Paying attention to the direction of the gene (indicated by the intron hash marks), not the coordinate of the transcription start site (TSS)
+  - Enter a range of coordinates before and/or after the TSS and click on "jump".
+  - When you have the desired range in the browser, click on "DNA" on the top blue bar.
+  - Check the "Reverse complement" box if your gene is on the negative strand.
+  - Click on the "get DNA" button.
+  - If you want to check your sequence relative to the TSS, map it with [http://genome.ucsc.edu/cgi-bin/hgBlat?command=star BLAT].[[br]] [[br]]
+. [=#nonred How can I '''make a list of items non-redundant'''?] [[br]] [[br]]
+  * See our [http://barc.wi.mit.edu/tools/redundant/  Redundant List Analysis ] page, which also counts how many times each item appears in your list.[[br]] [[br]]
+. [=#entrez How can I '''access the Whitehead version of the Entrez Gene database'''?] [[br]] [[br]]
+  * Whitehead BaRC designed a local copy of the Entrez Gene database using MySQL
+  * You need a MySQL client to access the database, either a desktop tool like [http://wb.mysql.com/ MySQL Workbench] or a tak account.
+  * The information you need:
+       * Hostname = canna.wi.mit.edu
+       * database = entrez_gene
+       * username = entrezgene
+       * password = wibr
+  * On tak, use the command
+       * mysql -u entrezgene -h canna.wi.mit.edu -D entrez_gene -p[[br]] [[br]]
+. [=#hottopics How can I find slides and materials from '''past Hot Topics talks'''?] [[br]] [[br]]
+  * See our [http://jura.wi.mit.edu/bio/education/hot_topics/ Hot Topics ] page, with links to presentations and other materials.[[br]] [[br]]
+. [=#relational How can I '''create my own relational database'''?] [[br]] [[br]]
+  * You have the choice of creating a MySQL on your own computer or using Whitehead's MySQL server (canna, which would generally be more robust, if that's needed).
+  * If you'd like your own installation, download [http://dev.mysql.com/downloads/mysql/ MySQL] and install it.
+  * If you'd like to use canna, email unix-help@wi.mit.edu and request a database on canna.  Once the IT group creates the database, you will be free to add tables and data.
+  * Regardless of the system see the [http://dev.mysql.com/doc/refman/5.5/en/index.html MySQL Reference Manual] and our past BaRC presentations about MySQL
+      * [http://barc.wi.mit.edu/education/bioinfo2006/db4bio/ Relational Databases For Biologists]
+      * [http://barc.wi.mit.edu/education/bioinfo2006/db4bio/ Querying Biological Databases with SQL][[br]] [[br]]
+. [=#tracks How can I '''download data/tracks from UCSC'''?] [[br]] [[br]]
+  * Go to [http://genome.ucsc.edu/ UCSC Genome Bioinformatics]
+  * Click on "Downloads" on the bar on the left side
+  * Choose the desired species and assembly, noting that coordinates only apply to the assembly they were generated with.
+  * Data from most tracks are available by following the "Annotation database" link.
+  * Every file is either the actual data in a tab-delimited text file (*.txt.gz) or a small file that provides a name for each column.
+  * The data files can be open in Excel or processed as text files or used to create a table in your MySQL database.
+  * Note from the data of the annotation files that some are updated much more often than others.[[br]] [[br]]
+. [=#barctools How can I '''access BaRC Tools''' or know what tools are available?] [[br]] [[br]]
+  * BaRC Tools can be found on [http://bioinfo.wi.mit.edu/bio/tools/ BaRC Tools].  Available tools are summarized in [http://bioinfo.wi.mit.edu/bio/education/hot_topics/barc_tools/barcTools-summary.pdf Summary] [[br]] [[br]]
+. [=#LSF How can I '''submit a job to the LSF cluster'''?] [[br]] [[br]]
+   * Usually you just need to preface your usual command with 'bsub'
+   * See [http://bioinfo.wi.mit.edu/bio/bioinfo/docs/LSF_help.php Whitehead Linux cluster - LSF help] for sample commands and links to more documentation
+   * In addition, see the tutorial created by IT called [http://wi-inside.wi.mit.edu/departments/it/services/scientificcomputing/scitutorials Getting Started with the LSF Cluster][[br]] [[br]]
+. [=#perlR How can I find out what '''Perl modules or R packages''' are installed?] Which version is currently installed in the server? [[br]] [[br]]
+  * There are links on [http://tak/trac/wiki the home page] of Trac, our Tak tracking system to [http://tak/trac/wiki/Packages installed packaged software], [http://tak/trac/wiki/Perl installed Perl modules], [http://tak/trac/wiki/Python installed Python modules], and [http://tak/trac/wiki/R installed R modules]. [[br]] [[br]]
+.  [=#xwindow  How can I '''connect to tak'''?][[br]] [[br]]
+  * To connect to tak, you need a [http://bioinfo.wi.mit.edu/bio/software/unix/bioinfoaccount.php tak account] and some kind of secure shell (ssh) with X Windows (to get the graphics):
+   * On a Macintosh, use x11 or Terminal
+   * On a Windows computer, we recommend [http://cygwin.com Cygwin/X]. You can also use the Whitehead IT [http://bioinfo.wi.mit.edu/bio/tutorials/takpack/TakPack-Installer.exe TakPack installer].
+  * With either system, double click on the icon to get the "command prompt", the window in which you can type commands.
+  * Windows only: After opening Cygwin, start X Windows by typing "startx".  A new terminal window will open, and you should use that one.
+  * From the command prompt, connect to tak (or another Unix/Linux computer) with a command like
+    {{{
+     ssh username@tak.wi.mit.edu -Y
+     or
+     ssh username@tak.wi.mit.edu -X
+     where username is the name of your tak account.
+     You'll be prompted for your password.
+     }}}
+  * In addition, see the tutorials created by IT called [http://wi-inside.wi.mit.edu/departments/it/services/scientificcomputing/scitutorials Install Cygwin or TakPack]. IT recommends using TakPack if possible as it provides both x11 and and ssh client and is a "lighter install".[[br]] [[br]]
+. [=#servers How can I '''get to my or my lab shared storage'''?] [[br]] [[br]]
+   * On a Mac or Windows computer, most shared storage areas can be accessed via '''wi-files1''' or '''wi-files2''', although high-throughput sequencing data is accessed via '''wi-htdata'''.
+   * On a Mac computer, get to a server like wi-files1/BaRC_Public (/nfs/BaRC_Public) by connecting to
+{{{
+cifs://wi-files1/BaRC_Public
+}}}
+   * On a Windows computer, get to a server like wi-files1/BaRC_Public (/nfs/BaRC_Public) by connecting to
+{{{
+\\wi-files1\BaRC_Public
+}}}
+   * See [http://wi-inside.wi.mit.edu/departments/it/services/filestorage/labshares lab share paths] to get to your lab storage area. [[br]] [[br]]
+. [=#blast Where can I '''find local BLAST databases'''?] [[br]] [[br]]
+  * BLAST formated databases can be found in /nfs/seq/Data on tak.[[BR]][[br]]
+. [=#genomeSeqs Where can I '''find genome sequences'''? ]    [[br]] [[br]]
+  * Genome sequences can be found in /nfs/genomes on tak.[[BR]][[br]]
+. [=#btFormats Where can I '''find genomes formatted for bowtie, tophat, or blat'''?] [[br]] [[br]]
+  *  Within many directories on /nfs/genomes you can these additional files.[[BR]][[br]]
+. [=#tfs Where can I '''find known or predicted transcription factors that regulate a gene'''?] [[br]] [[br]]
+   * We do not have access to [https://portal.biobase-international.com/cgi-bin/portal/login.cgi BIOBASE Knowledge Library], however, an (older) command-line version is available.  See BaRC SOPs for more info.[[br]]
+   * [http://portal.genego.com/ GeneGO (Login Required)] can be used as well to find known TFs  [[br]] [[br]]
+. [=#unix Where can I '''find simple (one-liner) Unix/Perl commmands'''?] [[br]] [[br]]
+   * There is a helpful list of Unix and Perl commands at [http://bioinfo.wi.mit.edu/bio/bioinfo/scripts/].[[br]] [[br]]
+. [=#Rcode Where can I '''find samples of R code'''?] [[br]] [[br]]
+   * Sample R code is available /nfs/BaRC_Public/BaRC_code.[[br]][[br]]
+. [=#UCSCmirror Where can I '''find the local mirror of the UCSC genome browser'''?][[br]] [[br]]
+  * [http://ucsc.wi.mit.edu/]
+  * To add tracks via files, copy the files to /nfs/solexa_ucsc and then access it using the URL http://tak.wi.mit.edu/solexa_ucsc to submit tracks for the browser[[br]] [[br]]
+. [=#galaxy Where can I '''find the local mirror of Galaxy'''?][[br]] [[br]]
+  * [https://galaxy.wi.mit.edu/]  [[br]] [[br]]
+. [=#rstudio Where can I '''find R Studio on tak'''?][[br]] [[br]]
+  * [https://tak.wi.mit.edu/rstudio/]  [[br]] [[br]]
+. [=#IGV Where can I '''find IGV download'''?] [[br]] [[br]]
+  * [http://www.broadinstitute.org/software/igv/log-in][[br]] [[br]]
+. [=#heatmaps Which software should I use to '''cluster, create and display heatmaps'''?] [[br]] [[br]]
+  * Cluster 3.0 http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm
+  * Java Treeview http://jtreeview.sourceforge.net/ [[br]] [[br]]
+. [=#GOtools Which software should I use to do '''GO enrichment analysis'''?] [[br]] [[br]]
+  * DAVID http://david.abcc.ncifcrf.gov/ - our favorite tool
+  * GSEA http://www.broadinstitute.org/gsea/index.jsp - a Java application that can take a ranked list of all your genes as input
+  * BIOBASE https://portal.biobase-international.com/cgi-bin/portal/login.cgi
+  * BiNGO (within Cytoscape http://www.cytoscape.org/)
+  * GoMiner http://discover.nci.nih.gov/gominer
+  * GOstat http://gostat.wehi.edu.au
+  * [http://www.geneontology.org/GO.tools.shtml#term_enrichment GeneOntology.org] has a more complete list.
+  * Also see the [http://barc.wi.mit.edu/education/hot_topics/ Hot Topics talk] "Gene list enrichment analysis" for more information. [[br]] [[br]]
+. [=#GeneNetwork Which software should I use to '''display a gene network'''?] [[br]] [[br]]
+  * Cytoscape http://www.cytoscape.org/ is our favorite tool for this. [[br]] [[br]]
+. [=#software How can I get '''desktop software''' provided by Whitehead?][[br]] [[br]]
+   * Software is available through the [http://icarus.wi.mit.edu/wibrsw/index.jsp  Whitehead Software database].  To see a list of desktop software, see [http://it.wi.mit.edu/software/get-software].[[br]][[br]]
+. [=#stats Which software should I use to '''do statistics'''?] [[br]] [[br]]
+  * GraphPad Prism (in the [http://icarus.wi.mit.edu/wibrsw/index.jsp Whitehead software database]) -- has an easy-to-use GUI and excellent practical documentation
+  * MatLab (in the [http://icarus.wi.mit.edu/wibrsw/index.jsp Whitehead software database])
+  * R (free download from http://www.r-project.org/)
+  * Also see the BaRC Hot Topics on [http://barc.wi.mit.edu/education/hot_topics/prism/Prism.pdf  An Introduction to GraphPad Prism - statistics and graphing software]  [[br]][[br]]
+. [=#pfam How can I '''search for Pfam (protein) profiles'''] in my protein set using HMMs?
+  * A local copy of all Pfam HMMs can be found at /nfs/seq/pfam_db/Pfam-A.hmm
+  * Pfam profiles can be easily searched with the HMMER suite of tools
+    * If you want to annotate a set of proteins with only specific profiles, you can extract one profile at a time with hmmfetch
+      * ex: hmmfetch /nfs/seq/pfam_db/Pfam-A.hmm PF01731.15 > PF01731.15.hmm
+    * Use hmmsearch to search your protein set (as a multiple-sequence fasta file) with one to all profiles
+      * ex1 (all profiles): hmmsearch /nfs/seq/pfam_db/Pfam-A.hmm My_proteins.fa > My_proteins.Pfam_search_out.txt
+      * ex2 (selected profile): hmmsearch PF01731.15.hmm My_proteins.fa > My_proteins.PF01731.15_search_out.txt
+    * For more details about HMMER, consult the HMMER User's Guide ([ftp://selab.janelia.org/pub/software/hmmer3/3.0/Userguide.pdf]).[[br]] [[br]]
+. [=#R_pkg_install How do I '''install an R package''' locally?]
+   {{{
+   #Method 1:
+   #download package you are interested installing, *tar.gz
+   #In R command-line
+   #Location of source AND where to install package
+   R_libraries_path = "/home/userName/R_libs"
+   # Go to where the .tar.gz library source is and install
+   setwd(R_libraries_path)
+   install.packages("hthgu133ahsentrezgcdf_12.0.0.tar.gz", lib=R_libraries_path, repos=NULL)
+   #Method 2:
+   #In bash shell, use directory called R, or whatever you'd like
+   export R_LIBS="$HOME/R"
+   #In R command line
+   source("http://www.bioconductor.org/biocLite.R")
+   biocLite("pd.mogene.1.0.st.v1")
+   }}}
+. [=#transfer  I need to '''send/receive very large data files''' to/from a colleague outside of Whitehead. What is the best way to do this?]
+  * Our IT department has built two tools to help you share large files with your colleagues. [Sendit for files up to 2GB; Vort for files over 2GB  http://wi-inside.wi.mit.edu/departments/it/services/filetransfer] [[br]] [[br]]
+. [=#wi_ncbi_blast Why do I get '''different BLAST results'''  from http://tak.wi.mit.edu/blast/ and NCBI Blast?]
+   * WI and NCBI pages have very different defaults. As a result, a hit at WI with an e-value of 1e-12 but at NCBI the alignment is completely different and leads to an e-value of 2e-98.
+   * Word sizes are different, as are match/mismatch scores (1/-3 at WI, 2/-3 at NCBI).
+   * Blast at WI filters (hard masks) for low complexity by default, whereas at NCBI it doesn't, instead using the "L;m;" filter string, which filters for low-complexity for the lookup table but not for extension.  This
+   can have a huge effect.
+   * Even at NCBI choosing (a) Human RefSeq sequences or (b) all RefSeq sequences and then filtering for human only also produce somewhat different results.  Also, the size of the database has a big effect on the e-values.  For one example query that produces the same alignment in three different databases, the e-values are very different:
+     * human RefSeq sequences     => 0.008
+     * all RefSeq sequences       => 0.47
+     * nt                         => 2.9
+. [=#tophat_bowtie How do I run '''tophat/bowtie on the LSF with a gzip'd tar (*.tar.gz)''' file?]
+  * bsub bash -c "tophat ... <(tar xvzfO ...) <(tar xvzfO ...)", this is using process substitution
+    * eg. bsub bash -c "tophat -p 10 -g 1 -o mapped_data_SRR905147_unique -N 2 -I 10000 --segment-length 25 --segment-mismatches 2 hg19 <(tar xvzfO s_2_1_sequence.txt.tar.gz ACTTGA-s_2_1_sequence.txt) <(tar xvzfO CAGATC-s_2_1_sequence.txt.tar.gz CAGATC-s_2_1_sequence.txt)"