| 4 |  | == Method 1 (based on Wang et al., 2015) == | 
          
            |  | 4 | == Method 1: MAGeCK == | 
          
            |  | 5 |  | 
          
            |  | 6 | Analyze CRISPR genome-wide, or targeted, screen.  As input, MAGeCK requires (raw) counts.  If there are no replicates, MAGeCK will estimate the mean/variance from all the samples, i.e. both conditions. | 
          
            |  | 7 |  | 
          
            |  | 8 | * [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4 | Publication]] | 
          
            |  | 9 | * [[https://sourceforge.net/projects/mageck/ | MAGeCK Home/Download Page]] | 
          
            |  | 10 | * [[https://sourceforge.net/p/mageck/wiki/demo/ | Tutorial]] | 
          
            |  | 11 |  | 
          
            |  | 12 | Note that MAGeCK requires Python 2, so on Whitehead systems, it is only accessible on Ubuntu 18 computers (such as tak and the LSF cluster). | 
          
            |  | 13 |  | 
          
            |  | 14 |  | 
          
            |  | 15 | ==== Test: compare two conditions ==== | 
          
            |  | 16 |  | 
          
            |  | 17 | * Common usage to test, or compare, two conditions | 
          
            |  | 18 |  | 
          
            |  | 19 |  | 
          
            |  | 20 | {{{ | 
          
            |  | 21 |  | 
          
            |  | 22 | # the options -t and -c specificity the treatment and control samples, respectively. | 
          
            |  | 23 | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt | 
          
            |  | 24 |  | 
          
            |  | 25 | # For paired samples: | 
          
            |  | 26 | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt --paired | 
          
            |  | 27 | # note: --paired option is available in version 0.5.9+ | 
          
            |  | 28 | # use the option --normcounts-to-file to write normalized counts (by guide) to a file | 
          
            |  | 29 |  | 
          
            |  | 30 | # If control guides are used, these can be specified using the options (below). | 
          
            |  | 31 | # MAGeCK will use the control guides for normalization and generating the null distribution in running RRA, instead of all the guides. | 
          
            |  | 32 | --control-sgrna controlGuides.txt --norm-method control | 
          
            |  | 33 |  | 
          
            |  | 34 | #One option is to run with and without control guides and compare the results, e.g. volcano plot of -log(RRA) vs logFC to see how the control guides compare. | 
          
            |  | 35 |  | 
          
            |  | 36 | # A summary of the output (in pdf) is available from running the R script (see below). | 
          
            |  | 37 | # In some versions of 0.5.9+ an error may occur for the 'paired' option, | 
          
            |  | 38 | # Error in axis(1, at = 1:length(vali), labels = (collabel), las = 2) : | 
          
            |  | 39 | #'at' and 'labels' lengths differ, 2 != 4 | 
          
            |  | 40 | # a workaround this is to edit the R script is to set the collabel to only two conditions, | 
          
            |  | 41 | # e.g.  collabel=c("low","high") | 
          
            |  | 42 | # in the R script for all occurrences | 
          
            |  | 43 |  | 
          
            |  | 44 |  | 
          
            |  | 45 | }}} | 
          
            |  | 46 |  | 
          
            |  | 47 |  | 
          
            |  | 48 | The input file, count_matrix.txt, column names must match arguments to -c and -t but can also include other samples, which will be ignored.  The tab-delimited format should look like | 
          
            |  | 49 |  | 
          
            |  | 50 |  | 
          
            |  | 51 | ||sgRNA||gene||bot1||bot2||top1||top2|| | 
          
            |  | 52 | ||sgACTL7A_2||ACTL7A||32||14||10||26|| | 
          
            |  | 53 | ||sgACTL7A_3||ACTL7A||44||40||82||118|| | 
          
            |  | 54 | ||sgACTL7A_4||ACTL7A||64||61||418||313|| | 
          
            |  | 55 | ||sgACTL7A_5||ACTL7A||9||0||17||74|| | 
          
            |  | 56 | ||sgACTL7A_6||ACTL7A||42||5||47||166|| | 
          
            |  | 57 | ||sgACTL7A_7||ACTL7A||14||32||23||60|| | 
          
            |  | 58 |  | 
          
            |  | 59 |  | 
          
            |  | 60 | The output files include | 
          
            |  | 61 |  | 
          
            |  | 62 | - .gene_summary.txt results summarized by gene (for all genes) | 
          
            |  | 63 | - .sgrna_summary.txt resuls by guide (for all guides); this file can be made into a matrix using a few UNIX commands, e.g. | 
          
            |  | 64 |  | 
          
            |  | 65 | Run .Rmd to create a file (.html) which summarizes (only) the top hits, and also includes a waterfall plot. To create this summary report file, you can either open the .Rmd in Rstudio and click on "knit", or run the command below: | 
          
            |  | 66 | {{{ | 
          
            |  | 67 | Rscript -e "rmarkdown::render('foo.report.Rmd')" | 
          
            |  | 68 | }}} | 
          
            |  | 69 |  | 
          
            |  | 70 |  | 
          
            |  | 71 | {{{ | 
          
            |  | 72 | # Get only the columns of interest: Gene, sgrna, control_mean, treat_mean | 
          
            |  | 73 | cut -f 1,2,5,6 mageck.sgrna_summary.txt | awk '{print $2"\t"$1"\t"$3"\t"$4}' > CRISPR_score_sgRNA.txt | 
          
            |  | 74 |  | 
          
            |  | 75 | # Convert the single column into a (wide) matrix, each column is a guide and each row is a gene | 
          
            |  | 76 | grep -v crispr_sgRNA.txt | sed 's/_/\t/' | sort -k 1,1 -k 2,2 -k 3,3n | awk -F '\t' '{print $1"\t"$2"_"$3"\t"$4"\t"$5}' | grep  -v INTERGENIC | grep -v CTRL0 | cut -f 1,4 | groupBy -g 1 -c 2 -o collapse |sed 's/,/\t/g' > CRISPR_score_sgRNA.txt | 
          
            |  | 77 | }}} | 
          
            |  | 78 |  | 
          
            |  | 79 |  | 
          
            |  | 80 | == Method 2 (based on Wang et al., 2015) == | 
        
        
          
            | 73 |  |  | 
          
            | 74 |  | == Method 2: MAGeCK == | 
          
            | 75 |  |  | 
          
            | 76 |  | Analyze CRISPR genome-wide, or targeted, screen.  As input, MAGeCK requires (raw) counts.  If there are no replicates, MAGeCK will estimate the mean/variance from all the samples, i.e. both conditions. | 
          
            | 77 |  |  | 
          
            | 78 |  |   * [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4 | Publication]] | 
          
            | 79 |  |   * [[https://sourceforge.net/projects/mageck/ | MAGeCK Home/Download Page]] | 
          
            | 80 |  |   * [[https://sourceforge.net/p/mageck/wiki/demo/ | Tutorial]] | 
          
            | 81 |  |  | 
          
            | 82 |  |  | 
          
            | 83 |  | ==== Test: compare two conditions ==== | 
          
            | 84 |  |  | 
          
            | 85 |  |   * Common usage to test, or compare, two conditions | 
          
            | 86 |  |  | 
          
            | 87 |  |  | 
          
            | 88 |  | {{{ | 
          
            | 89 |  |  | 
          
            | 90 |  | # the options -t and -c specificity the treatment and control samples, respectively. | 
          
            | 91 |  | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt | 
          
            | 92 |  |  | 
          
            | 93 |  | # For paired samples: | 
          
            | 94 |  | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt --paired | 
          
            | 95 |  | # note: --paired option is available in version 0.5.9+ | 
          
            | 96 |  | # use the option --normcounts-to-file to write normalized counts (by guide) to a file | 
          
            | 97 |  |  | 
          
            | 98 |  | # If control guides are used, these can be specified using the options (below). | 
          
            | 99 |  | # MAGeCK will use the control guides for normalization and generating the null distribution in running RRA, instead of all the guides. | 
          
            | 100 |  | --control-sgrna controlGuides.txt --norm-method control | 
          
            | 101 |  |  | 
          
            | 102 |  | #One option is to run with and without control guides and compare the results, e.g. volcano plot of -log(RRA) vs logFC to see how the control guides compare. | 
          
            | 103 |  |  | 
          
            | 104 |  | # A summary of the output (in pdf) is available from running the R script (see below). | 
          
            | 105 |  | # In some versions of 0.5.9+ an error may occur for the 'paired' option, | 
          
            | 106 |  | # Error in axis(1, at = 1:length(vali), labels = (collabel), las = 2) : | 
          
            | 107 |  | #'at' and 'labels' lengths differ, 2 != 4 | 
          
            | 108 |  | # a workaround this is to edit the R script is to set the collabel to only two conditions, | 
          
            | 109 |  | # e.g.  collabel=c("low","high") | 
          
            | 110 |  | # in the R script for all occurrences | 
          
            | 111 |  |  | 
          
            | 112 |  |  | 
          
            | 113 |  | }}} | 
          
            | 114 |  |  | 
          
            | 115 |  |  | 
          
            | 116 |  | The input file, count_matrix.txt, column names must match arguments to -c and -t, e.g. | 
          
            | 117 |  |  | 
          
            | 118 |  |  | 
          
            | 119 |  | ||sgRNA||gene||bot1||bot2||top1||top2|| | 
          
            | 120 |  | ||sgACTL7A_2||ACTL7A||32||14||10||26|| | 
          
            | 121 |  | ||sgACTL7A_3||ACTL7A||44||40||82||118|| | 
          
            | 122 |  | ||sgACTL7A_4||ACTL7A||64||61||418||313|| | 
          
            | 123 |  | ||sgACTL7A_5||ACTL7A||9||0||17||74|| | 
          
            | 124 |  | ||sgACTL7A_6||ACTL7A||42||5||47||166|| | 
          
            | 125 |  | ||sgACTL7A_7||ACTL7A||14||32||23||60|| | 
          
            | 126 |  |  | 
          
            | 127 |  |  | 
          
            | 128 |  | The output files include, | 
          
            | 129 |  |  | 
          
            | 130 |  |   - .gene_summary.txt results summarized by gene (for all genes) | 
          
            | 131 |  |   - .sgrna_summary.txt resuls by guide (for all guides); this file can be made into a matrix using a few UNIX commands, e.g.  | 
          
            | 132 |  |  | 
          
            | 133 |  | Run .Rmd to create a file (.html) which summarizes (only) the top hits, and also includes a waterfall plot. To create this summary report file, you can either open the .Rmd in Rstudio and click on "knit", or run the command below: | 
          
            | 134 |  | {{{ | 
          
            | 135 |  | Rscript -e "rmarkdown::render('foo.report.Rmd')" | 
          
            | 136 |  | }}} | 
          
            | 137 |  |  | 
          
            | 138 |  |  | 
          
            | 139 |  | {{{ | 
          
            | 140 |  | #get only the columns of interest: Gene, sgrna, control_mean, treat_mean | 
          
            | 141 |  | cut -f 1,2,5,6 mageck.sgrna_summary.txt | awk '{print $2"\t"$1"\t"$3"\t"$4}' > CRISPR_score_sgRNA.txt | 
          
            | 142 |  |  | 
          
            | 143 |  | #convert the single column into a (wide) matrix, each column is a guide and each row is a gene | 
          
            | 144 |  | grep -v crispr_sgRNA.txt | sed 's/_/\t/' | sort -k 1,1 -k 2,2 -k 3,3n | awk -F '\t' '{print $1"\t"$2"_"$3"\t"$4"\t"$5}' | grep  -v INTERGENIC | grep -v CTRL0 | cut -f 1,4 | groupBy -g 1 -c 2 -o collapse |sed 's/,/\t/g' > CRISPR_score_sgRNA.txt | 
          
            | 145 |  | }}} | 
          
            | 146 |  |  | 
          
            | 147 |  |  |