Changes between Version 23 and Version 24 of SOP/PooledCRISPR


Ignore:
Timestamp:
01/16/23 17:24:33 (2 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOP/PooledCRISPR

    v23 v24  
    22= Pooled CRISPR screen analysis =
    33
    4 == Method 1 (based on Wang et al., 2015) ==
     4== Method 1: MAGeCK ==
     5
     6Analyze CRISPR genome-wide, or targeted, screen.  As input, MAGeCK requires (raw) counts.  If there are no replicates, MAGeCK will estimate the mean/variance from all the samples, i.e. both conditions.
     7
     8  * [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4 | Publication]]
     9  * [[https://sourceforge.net/projects/mageck/ | MAGeCK Home/Download Page]]
     10  * [[https://sourceforge.net/p/mageck/wiki/demo/ | Tutorial]]
     11
     12Note that MAGeCK requires Python 2, so on Whitehead systems, it is only accessible on Ubuntu 18 computers (such as tak and the LSF cluster).
     13
     14
     15==== Test: compare two conditions ====
     16
     17  * Common usage to test, or compare, two conditions
     18
     19
     20{{{
     21
     22# the options -t and -c specificity the treatment and control samples, respectively.
     23mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt
     24
     25# For paired samples:
     26mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt --paired
     27# note: --paired option is available in version 0.5.9+
     28# use the option --normcounts-to-file to write normalized counts (by guide) to a file
     29
     30# If control guides are used, these can be specified using the options (below).
     31# MAGeCK will use the control guides for normalization and generating the null distribution in running RRA, instead of all the guides.
     32--control-sgrna controlGuides.txt --norm-method control
     33
     34#One option is to run with and without control guides and compare the results, e.g. volcano plot of -log(RRA) vs logFC to see how the control guides compare.
     35
     36# A summary of the output (in pdf) is available from running the R script (see below).
     37# In some versions of 0.5.9+ an error may occur for the 'paired' option,
     38# Error in axis(1, at = 1:length(vali), labels = (collabel), las = 2) :
     39#'at' and 'labels' lengths differ, 2 != 4
     40# a workaround this is to edit the R script is to set the collabel to only two conditions,
     41# e.g.  collabel=c("low","high")
     42# in the R script for all occurrences
     43
     44
     45}}}
     46
     47
     48The input file, count_matrix.txt, column names must match arguments to -c and -t but can also include other samples, which will be ignored.  The tab-delimited format should look like
     49
     50
     51||sgRNA||gene||bot1||bot2||top1||top2||
     52||sgACTL7A_2||ACTL7A||32||14||10||26||
     53||sgACTL7A_3||ACTL7A||44||40||82||118||
     54||sgACTL7A_4||ACTL7A||64||61||418||313||
     55||sgACTL7A_5||ACTL7A||9||0||17||74||
     56||sgACTL7A_6||ACTL7A||42||5||47||166||
     57||sgACTL7A_7||ACTL7A||14||32||23||60||
     58
     59
     60The output files include
     61
     62  - .gene_summary.txt results summarized by gene (for all genes)
     63  - .sgrna_summary.txt resuls by guide (for all guides); this file can be made into a matrix using a few UNIX commands, e.g.
     64
     65Run .Rmd to create a file (.html) which summarizes (only) the top hits, and also includes a waterfall plot. To create this summary report file, you can either open the .Rmd in Rstudio and click on "knit", or run the command below:
     66{{{
     67Rscript -e "rmarkdown::render('foo.report.Rmd')"
     68}}}
     69
     70
     71{{{
     72# Get only the columns of interest: Gene, sgrna, control_mean, treat_mean
     73cut -f 1,2,5,6 mageck.sgrna_summary.txt | awk '{print $2"\t"$1"\t"$3"\t"$4}' > CRISPR_score_sgRNA.txt
     74
     75# Convert the single column into a (wide) matrix, each column is a guide and each row is a gene
     76grep -v crispr_sgRNA.txt | sed 's/_/\t/' | sort -k 1,1 -k 2,2 -k 3,3n | awk -F '\t' '{print $1"\t"$2"_"$3"\t"$4"\t"$5}' | grep  -v INTERGENIC | grep -v CTRL0 | cut -f 1,4 | groupBy -g 1 -c 2 -o collapse |sed 's/,/\t/g' > CRISPR_score_sgRNA.txt
     77}}}
     78
     79
     80== Method 2 (based on Wang et al., 2015) ==
    581
    682Recommendations for this method come from Whitehead Functional Genomics platform.
     
    71147}}}
    72148 
    73 
    74 == Method 2: MAGeCK ==
    75 
    76 Analyze CRISPR genome-wide, or targeted, screen.  As input, MAGeCK requires (raw) counts.  If there are no replicates, MAGeCK will estimate the mean/variance from all the samples, i.e. both conditions.
    77 
    78   * [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4 | Publication]]
    79   * [[https://sourceforge.net/projects/mageck/ | MAGeCK Home/Download Page]]
    80   * [[https://sourceforge.net/p/mageck/wiki/demo/ | Tutorial]]
    81 
    82 
    83 ==== Test: compare two conditions ====
    84 
    85   * Common usage to test, or compare, two conditions
    86 
    87 
    88 {{{
    89 
    90 # the options -t and -c specificity the treatment and control samples, respectively.
    91 mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt
    92 
    93 # For paired samples:
    94 mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt --paired
    95 # note: --paired option is available in version 0.5.9+
    96 # use the option --normcounts-to-file to write normalized counts (by guide) to a file
    97 
    98 # If control guides are used, these can be specified using the options (below).
    99 # MAGeCK will use the control guides for normalization and generating the null distribution in running RRA, instead of all the guides.
    100 --control-sgrna controlGuides.txt --norm-method control
    101 
    102 #One option is to run with and without control guides and compare the results, e.g. volcano plot of -log(RRA) vs logFC to see how the control guides compare.
    103 
    104 # A summary of the output (in pdf) is available from running the R script (see below).
    105 # In some versions of 0.5.9+ an error may occur for the 'paired' option,
    106 # Error in axis(1, at = 1:length(vali), labels = (collabel), las = 2) :
    107 #'at' and 'labels' lengths differ, 2 != 4
    108 # a workaround this is to edit the R script is to set the collabel to only two conditions,
    109 # e.g.  collabel=c("low","high")
    110 # in the R script for all occurrences
    111 
    112 
    113 }}}
    114 
    115 
    116 The input file, count_matrix.txt, column names must match arguments to -c and -t, e.g.
    117 
    118 
    119 ||sgRNA||gene||bot1||bot2||top1||top2||
    120 ||sgACTL7A_2||ACTL7A||32||14||10||26||
    121 ||sgACTL7A_3||ACTL7A||44||40||82||118||
    122 ||sgACTL7A_4||ACTL7A||64||61||418||313||
    123 ||sgACTL7A_5||ACTL7A||9||0||17||74||
    124 ||sgACTL7A_6||ACTL7A||42||5||47||166||
    125 ||sgACTL7A_7||ACTL7A||14||32||23||60||
    126 
    127 
    128 The output files include,
    129 
    130   - .gene_summary.txt results summarized by gene (for all genes)
    131   - .sgrna_summary.txt resuls by guide (for all guides); this file can be made into a matrix using a few UNIX commands, e.g.
    132 
    133 Run .Rmd to create a file (.html) which summarizes (only) the top hits, and also includes a waterfall plot. To create this summary report file, you can either open the .Rmd in Rstudio and click on "knit", or run the command below:
    134 {{{
    135 Rscript -e "rmarkdown::render('foo.report.Rmd')"
    136 }}}
    137 
    138 
    139 {{{
    140 #get only the columns of interest: Gene, sgrna, control_mean, treat_mean
    141 cut -f 1,2,5,6 mageck.sgrna_summary.txt | awk '{print $2"\t"$1"\t"$3"\t"$4}' > CRISPR_score_sgRNA.txt
    142 
    143 #convert the single column into a (wide) matrix, each column is a guide and each row is a gene
    144 grep -v crispr_sgRNA.txt | sed 's/_/\t/' | sort -k 1,1 -k 2,2 -k 3,3n | awk -F '\t' '{print $1"\t"$2"_"$3"\t"$4"\t"$5}' | grep  -v INTERGENIC | grep -v CTRL0 | cut -f 1,4 | groupBy -g 1 -c 2 -o collapse |sed 's/,/\t/g' > CRISPR_score_sgRNA.txt
    145 }}}
    146 
    147