| 4 | | == Method 1 (based on Wang et al., 2015) == |
| | 4 | == Method 1: MAGeCK == |
| | 5 | |
| | 6 | Analyze CRISPR genome-wide, or targeted, screen. As input, MAGeCK requires (raw) counts. If there are no replicates, MAGeCK will estimate the mean/variance from all the samples, i.e. both conditions. |
| | 7 | |
| | 8 | * [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4 | Publication]] |
| | 9 | * [[https://sourceforge.net/projects/mageck/ | MAGeCK Home/Download Page]] |
| | 10 | * [[https://sourceforge.net/p/mageck/wiki/demo/ | Tutorial]] |
| | 11 | |
| | 12 | Note that MAGeCK requires Python 2, so on Whitehead systems, it is only accessible on Ubuntu 18 computers (such as tak and the LSF cluster). |
| | 13 | |
| | 14 | |
| | 15 | ==== Test: compare two conditions ==== |
| | 16 | |
| | 17 | * Common usage to test, or compare, two conditions |
| | 18 | |
| | 19 | |
| | 20 | {{{ |
| | 21 | |
| | 22 | # the options -t and -c specificity the treatment and control samples, respectively. |
| | 23 | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt |
| | 24 | |
| | 25 | # For paired samples: |
| | 26 | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt --paired |
| | 27 | # note: --paired option is available in version 0.5.9+ |
| | 28 | # use the option --normcounts-to-file to write normalized counts (by guide) to a file |
| | 29 | |
| | 30 | # If control guides are used, these can be specified using the options (below). |
| | 31 | # MAGeCK will use the control guides for normalization and generating the null distribution in running RRA, instead of all the guides. |
| | 32 | --control-sgrna controlGuides.txt --norm-method control |
| | 33 | |
| | 34 | #One option is to run with and without control guides and compare the results, e.g. volcano plot of -log(RRA) vs logFC to see how the control guides compare. |
| | 35 | |
| | 36 | # A summary of the output (in pdf) is available from running the R script (see below). |
| | 37 | # In some versions of 0.5.9+ an error may occur for the 'paired' option, |
| | 38 | # Error in axis(1, at = 1:length(vali), labels = (collabel), las = 2) : |
| | 39 | #'at' and 'labels' lengths differ, 2 != 4 |
| | 40 | # a workaround this is to edit the R script is to set the collabel to only two conditions, |
| | 41 | # e.g. collabel=c("low","high") |
| | 42 | # in the R script for all occurrences |
| | 43 | |
| | 44 | |
| | 45 | }}} |
| | 46 | |
| | 47 | |
| | 48 | The input file, count_matrix.txt, column names must match arguments to -c and -t but can also include other samples, which will be ignored. The tab-delimited format should look like |
| | 49 | |
| | 50 | |
| | 51 | ||sgRNA||gene||bot1||bot2||top1||top2|| |
| | 52 | ||sgACTL7A_2||ACTL7A||32||14||10||26|| |
| | 53 | ||sgACTL7A_3||ACTL7A||44||40||82||118|| |
| | 54 | ||sgACTL7A_4||ACTL7A||64||61||418||313|| |
| | 55 | ||sgACTL7A_5||ACTL7A||9||0||17||74|| |
| | 56 | ||sgACTL7A_6||ACTL7A||42||5||47||166|| |
| | 57 | ||sgACTL7A_7||ACTL7A||14||32||23||60|| |
| | 58 | |
| | 59 | |
| | 60 | The output files include |
| | 61 | |
| | 62 | - .gene_summary.txt results summarized by gene (for all genes) |
| | 63 | - .sgrna_summary.txt resuls by guide (for all guides); this file can be made into a matrix using a few UNIX commands, e.g. |
| | 64 | |
| | 65 | Run .Rmd to create a file (.html) which summarizes (only) the top hits, and also includes a waterfall plot. To create this summary report file, you can either open the .Rmd in Rstudio and click on "knit", or run the command below: |
| | 66 | {{{ |
| | 67 | Rscript -e "rmarkdown::render('foo.report.Rmd')" |
| | 68 | }}} |
| | 69 | |
| | 70 | |
| | 71 | {{{ |
| | 72 | # Get only the columns of interest: Gene, sgrna, control_mean, treat_mean |
| | 73 | cut -f 1,2,5,6 mageck.sgrna_summary.txt | awk '{print $2"\t"$1"\t"$3"\t"$4}' > CRISPR_score_sgRNA.txt |
| | 74 | |
| | 75 | # Convert the single column into a (wide) matrix, each column is a guide and each row is a gene |
| | 76 | grep -v crispr_sgRNA.txt | sed 's/_/\t/' | sort -k 1,1 -k 2,2 -k 3,3n | awk -F '\t' '{print $1"\t"$2"_"$3"\t"$4"\t"$5}' | grep -v INTERGENIC | grep -v CTRL0 | cut -f 1,4 | groupBy -g 1 -c 2 -o collapse |sed 's/,/\t/g' > CRISPR_score_sgRNA.txt |
| | 77 | }}} |
| | 78 | |
| | 79 | |
| | 80 | == Method 2 (based on Wang et al., 2015) == |
| 73 | | |
| 74 | | == Method 2: MAGeCK == |
| 75 | | |
| 76 | | Analyze CRISPR genome-wide, or targeted, screen. As input, MAGeCK requires (raw) counts. If there are no replicates, MAGeCK will estimate the mean/variance from all the samples, i.e. both conditions. |
| 77 | | |
| 78 | | * [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4 | Publication]] |
| 79 | | * [[https://sourceforge.net/projects/mageck/ | MAGeCK Home/Download Page]] |
| 80 | | * [[https://sourceforge.net/p/mageck/wiki/demo/ | Tutorial]] |
| 81 | | |
| 82 | | |
| 83 | | ==== Test: compare two conditions ==== |
| 84 | | |
| 85 | | * Common usage to test, or compare, two conditions |
| 86 | | |
| 87 | | |
| 88 | | {{{ |
| 89 | | |
| 90 | | # the options -t and -c specificity the treatment and control samples, respectively. |
| 91 | | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt |
| 92 | | |
| 93 | | # For paired samples: |
| 94 | | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt --paired |
| 95 | | # note: --paired option is available in version 0.5.9+ |
| 96 | | # use the option --normcounts-to-file to write normalized counts (by guide) to a file |
| 97 | | |
| 98 | | # If control guides are used, these can be specified using the options (below). |
| 99 | | # MAGeCK will use the control guides for normalization and generating the null distribution in running RRA, instead of all the guides. |
| 100 | | --control-sgrna controlGuides.txt --norm-method control |
| 101 | | |
| 102 | | #One option is to run with and without control guides and compare the results, e.g. volcano plot of -log(RRA) vs logFC to see how the control guides compare. |
| 103 | | |
| 104 | | # A summary of the output (in pdf) is available from running the R script (see below). |
| 105 | | # In some versions of 0.5.9+ an error may occur for the 'paired' option, |
| 106 | | # Error in axis(1, at = 1:length(vali), labels = (collabel), las = 2) : |
| 107 | | #'at' and 'labels' lengths differ, 2 != 4 |
| 108 | | # a workaround this is to edit the R script is to set the collabel to only two conditions, |
| 109 | | # e.g. collabel=c("low","high") |
| 110 | | # in the R script for all occurrences |
| 111 | | |
| 112 | | |
| 113 | | }}} |
| 114 | | |
| 115 | | |
| 116 | | The input file, count_matrix.txt, column names must match arguments to -c and -t, e.g. |
| 117 | | |
| 118 | | |
| 119 | | ||sgRNA||gene||bot1||bot2||top1||top2|| |
| 120 | | ||sgACTL7A_2||ACTL7A||32||14||10||26|| |
| 121 | | ||sgACTL7A_3||ACTL7A||44||40||82||118|| |
| 122 | | ||sgACTL7A_4||ACTL7A||64||61||418||313|| |
| 123 | | ||sgACTL7A_5||ACTL7A||9||0||17||74|| |
| 124 | | ||sgACTL7A_6||ACTL7A||42||5||47||166|| |
| 125 | | ||sgACTL7A_7||ACTL7A||14||32||23||60|| |
| 126 | | |
| 127 | | |
| 128 | | The output files include, |
| 129 | | |
| 130 | | - .gene_summary.txt results summarized by gene (for all genes) |
| 131 | | - .sgrna_summary.txt resuls by guide (for all guides); this file can be made into a matrix using a few UNIX commands, e.g. |
| 132 | | |
| 133 | | Run .Rmd to create a file (.html) which summarizes (only) the top hits, and also includes a waterfall plot. To create this summary report file, you can either open the .Rmd in Rstudio and click on "knit", or run the command below: |
| 134 | | {{{ |
| 135 | | Rscript -e "rmarkdown::render('foo.report.Rmd')" |
| 136 | | }}} |
| 137 | | |
| 138 | | |
| 139 | | {{{ |
| 140 | | #get only the columns of interest: Gene, sgrna, control_mean, treat_mean |
| 141 | | cut -f 1,2,5,6 mageck.sgrna_summary.txt | awk '{print $2"\t"$1"\t"$3"\t"$4}' > CRISPR_score_sgRNA.txt |
| 142 | | |
| 143 | | #convert the single column into a (wide) matrix, each column is a guide and each row is a gene |
| 144 | | grep -v crispr_sgRNA.txt | sed 's/_/\t/' | sort -k 1,1 -k 2,2 -k 3,3n | awk -F '\t' '{print $1"\t"$2"_"$3"\t"$4"\t"$5}' | grep -v INTERGENIC | grep -v CTRL0 | cut -f 1,4 | groupBy -g 1 -c 2 -o collapse |sed 's/,/\t/g' > CRISPR_score_sgRNA.txt |
| 145 | | }}} |
| 146 | | |
| 147 | | |