Context Navigation

Changes between Version 23 and Version 24 of SOP/PooledCRISPR

Timestamp:: 01/16/23 17:24:33 (3 years ago)
Author:: gbell
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOP/PooledCRISPR

-              v23
+              v24
 = Pooled CRISPR screen analysis =
+== Method 1 (based on Wang et al., 2015) ==
+== Method 1: MAGeCK ==
+Analyze CRISPR genome-wide, or targeted, screen.  As input, MAGeCK requires (raw) counts.  If there are no replicates, MAGeCK will estimate the mean/variance from all the samples, i.e. both conditions.
+  * [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4 | Publication]]
+  * [[https://sourceforge.net/projects/mageck/ | MAGeCK Home/Download Page]]
+  * [[https://sourceforge.net/p/mageck/wiki/demo/ | Tutorial]]
+Note that MAGeCK requires Python 2, so on Whitehead systems, it is only accessible on Ubuntu 18 computers (such as tak and the LSF cluster).
+==== Test: compare two conditions ====
+  * Common usage to test, or compare, two conditions
+{{{
+# the options -t and -c specificity the treatment and control samples, respectively.
+mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt
+# For paired samples:
+mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt --paired
+# note: --paired option is available in version 0.5.9+
+# use the option --normcounts-to-file to write normalized counts (by guide) to a file
+# If control guides are used, these can be specified using the options (below).
+# MAGeCK will use the control guides for normalization and generating the null distribution in running RRA, instead of all the guides.
+--control-sgrna controlGuides.txt --norm-method control
+#One option is to run with and without control guides and compare the results, e.g. volcano plot of -log(RRA) vs logFC to see how the control guides compare.
+# A summary of the output (in pdf) is available from running the R script (see below).
+# In some versions of 0.5.9+ an error may occur for the 'paired' option,
+# Error in axis(1, at = 1:length(vali), labels = (collabel), las = 2) :
+#'at' and 'labels' lengths differ, 2 != 4
+# a workaround this is to edit the R script is to set the collabel to only two conditions,
+# e.g.  collabel=c("low","high")
+# in the R script for all occurrences
+}}}
+The input file, count_matrix.txt, column names must match arguments to -c and -t but can also include other samples, which will be ignored.  The tab-delimited format should look like
+||sgRNA||gene||bot1||bot2||top1||top2||
+||sgACTL7A_2||ACTL7A||32||14||10||26||
+||sgACTL7A_3||ACTL7A||44||40||82||118||
+||sgACTL7A_4||ACTL7A||64||61||418||313||
+||sgACTL7A_5||ACTL7A||9||0||17||74||
+||sgACTL7A_6||ACTL7A||42||5||47||166||
+||sgACTL7A_7||ACTL7A||14||32||23||60||
+The output files include
+  - .gene_summary.txt results summarized by gene (for all genes)
+  - .sgrna_summary.txt resuls by guide (for all guides); this file can be made into a matrix using a few UNIX commands, e.g.
+Run .Rmd to create a file (.html) which summarizes (only) the top hits, and also includes a waterfall plot. To create this summary report file, you can either open the .Rmd in Rstudio and click on "knit", or run the command below:
+{{{
+Rscript -e "rmarkdown::render('foo.report.Rmd')"
+}}}
+{{{
+# Get only the columns of interest: Gene, sgrna, control_mean, treat_mean
+cut -f 1,2,5,6 mageck.sgrna_summary.txt | awk '{print $2"\t"$1"\t"$3"\t"$4}' > CRISPR_score_sgRNA.txt
+# Convert the single column into a (wide) matrix, each column is a guide and each row is a gene
+grep -v crispr_sgRNA.txt | sed 's/_/\t/' | sort -k 1,1 -k 2,2 -k 3,3n | awk -F '\t' '{print $1"\t"$2"_"$3"\t"$4"\t"$5}' | grep  -v INTERGENIC | grep -v CTRL0 | cut -f 1,4 | groupBy -g 1 -c 2 -o collapse |sed 's/,/\t/g' > CRISPR_score_sgRNA.txt
+}}}
+== Method 2 (based on Wang et al., 2015) ==
 Recommendations for this method come from Whitehead Functional Genomics platform.
 …
 }}}
-== Method 2: MAGeCK ==
-Analyze CRISPR genome-wide, or targeted, screen.  As input, MAGeCK requires (raw) counts.  If there are no replicates, MAGeCK will estimate the mean/variance from all the samples, i.e. both conditions.
-  * [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4 | Publication]]
-  * [[https://sourceforge.net/projects/mageck/ | MAGeCK Home/Download Page]]
-  * [[https://sourceforge.net/p/mageck/wiki/demo/ | Tutorial]]
-==== Test: compare two conditions ====
-  * Common usage to test, or compare, two conditions
-{{{
-# the options -t and -c specificity the treatment and control samples, respectively.
-mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt
-# For paired samples:
-mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt --paired
-# note: --paired option is available in version 0.5.9+
-# use the option --normcounts-to-file to write normalized counts (by guide) to a file
-# If control guides are used, these can be specified using the options (below).
-# MAGeCK will use the control guides for normalization and generating the null distribution in running RRA, instead of all the guides.
---control-sgrna controlGuides.txt --norm-method control
-#One option is to run with and without control guides and compare the results, e.g. volcano plot of -log(RRA) vs logFC to see how the control guides compare.
-# A summary of the output (in pdf) is available from running the R script (see below).
-# In some versions of 0.5.9+ an error may occur for the 'paired' option,
-# Error in axis(1, at = 1:length(vali), labels = (collabel), las = 2) :
-#'at' and 'labels' lengths differ, 2 != 4
-# a workaround this is to edit the R script is to set the collabel to only two conditions,
-# e.g.  collabel=c("low","high")
-# in the R script for all occurrences
-}}}
-The input file, count_matrix.txt, column names must match arguments to -c and -t, e.g.
-||sgRNA||gene||bot1||bot2||top1||top2||
-||sgACTL7A_2||ACTL7A||32||14||10||26||
-||sgACTL7A_3||ACTL7A||44||40||82||118||
-||sgACTL7A_4||ACTL7A||64||61||418||313||
-||sgACTL7A_5||ACTL7A||9||0||17||74||
-||sgACTL7A_6||ACTL7A||42||5||47||166||
-||sgACTL7A_7||ACTL7A||14||32||23||60||
-The output files include,
-  - .gene_summary.txt results summarized by gene (for all genes)
-  - .sgrna_summary.txt resuls by guide (for all guides); this file can be made into a matrix using a few UNIX commands, e.g.
-Run .Rmd to create a file (.html) which summarizes (only) the top hits, and also includes a waterfall plot. To create this summary report file, you can either open the .Rmd in Rstudio and click on "knit", or run the command below:
-{{{
-Rscript -e "rmarkdown::render('foo.report.Rmd')"
-}}}
-{{{
-#get only the columns of interest: Gene, sgrna, control_mean, treat_mean
-cut -f 1,2,5,6 mageck.sgrna_summary.txt | awk '{print $2"\t"$1"\t"$3"\t"$4}' > CRISPR_score_sgRNA.txt
-#convert the single column into a (wide) matrix, each column is a guide and each row is a gene
-grep -v crispr_sgRNA.txt | sed 's/_/\t/' | sort -k 1,1 -k 2,2 -k 3,3n | awk -F '\t' '{print $1"\t"$2"_"$3"\t"$4"\t"$5}' | grep  -v INTERGENIC | grep -v CTRL0 | cut -f 1,4 | groupBy -g 1 -c 2 -o collapse |sed 's/,/\t/g' > CRISPR_score_sgRNA.txt
-}}}