4 | | == Method 1 (based on Wang et al., 2015) == |
| 4 | == Method 1: MAGeCK == |
| 5 | |
| 6 | Analyze CRISPR genome-wide, or targeted, screen. As input, MAGeCK requires (raw) counts. If there are no replicates, MAGeCK will estimate the mean/variance from all the samples, i.e. both conditions. |
| 7 | |
| 8 | * [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4 | Publication]] |
| 9 | * [[https://sourceforge.net/projects/mageck/ | MAGeCK Home/Download Page]] |
| 10 | * [[https://sourceforge.net/p/mageck/wiki/demo/ | Tutorial]] |
| 11 | |
| 12 | Note that MAGeCK requires Python 2, so on Whitehead systems, it is only accessible on Ubuntu 18 computers (such as tak and the LSF cluster). |
| 13 | |
| 14 | |
| 15 | ==== Test: compare two conditions ==== |
| 16 | |
| 17 | * Common usage to test, or compare, two conditions |
| 18 | |
| 19 | |
| 20 | {{{ |
| 21 | |
| 22 | # the options -t and -c specificity the treatment and control samples, respectively. |
| 23 | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt |
| 24 | |
| 25 | # For paired samples: |
| 26 | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt --paired |
| 27 | # note: --paired option is available in version 0.5.9+ |
| 28 | # use the option --normcounts-to-file to write normalized counts (by guide) to a file |
| 29 | |
| 30 | # If control guides are used, these can be specified using the options (below). |
| 31 | # MAGeCK will use the control guides for normalization and generating the null distribution in running RRA, instead of all the guides. |
| 32 | --control-sgrna controlGuides.txt --norm-method control |
| 33 | |
| 34 | #One option is to run with and without control guides and compare the results, e.g. volcano plot of -log(RRA) vs logFC to see how the control guides compare. |
| 35 | |
| 36 | # A summary of the output (in pdf) is available from running the R script (see below). |
| 37 | # In some versions of 0.5.9+ an error may occur for the 'paired' option, |
| 38 | # Error in axis(1, at = 1:length(vali), labels = (collabel), las = 2) : |
| 39 | #'at' and 'labels' lengths differ, 2 != 4 |
| 40 | # a workaround this is to edit the R script is to set the collabel to only two conditions, |
| 41 | # e.g. collabel=c("low","high") |
| 42 | # in the R script for all occurrences |
| 43 | |
| 44 | |
| 45 | }}} |
| 46 | |
| 47 | |
| 48 | The input file, count_matrix.txt, column names must match arguments to -c and -t but can also include other samples, which will be ignored. The tab-delimited format should look like |
| 49 | |
| 50 | |
| 51 | ||sgRNA||gene||bot1||bot2||top1||top2|| |
| 52 | ||sgACTL7A_2||ACTL7A||32||14||10||26|| |
| 53 | ||sgACTL7A_3||ACTL7A||44||40||82||118|| |
| 54 | ||sgACTL7A_4||ACTL7A||64||61||418||313|| |
| 55 | ||sgACTL7A_5||ACTL7A||9||0||17||74|| |
| 56 | ||sgACTL7A_6||ACTL7A||42||5||47||166|| |
| 57 | ||sgACTL7A_7||ACTL7A||14||32||23||60|| |
| 58 | |
| 59 | |
| 60 | The output files include |
| 61 | |
| 62 | - .gene_summary.txt results summarized by gene (for all genes) |
| 63 | - .sgrna_summary.txt resuls by guide (for all guides); this file can be made into a matrix using a few UNIX commands, e.g. |
| 64 | |
| 65 | Run .Rmd to create a file (.html) which summarizes (only) the top hits, and also includes a waterfall plot. To create this summary report file, you can either open the .Rmd in Rstudio and click on "knit", or run the command below: |
| 66 | {{{ |
| 67 | Rscript -e "rmarkdown::render('foo.report.Rmd')" |
| 68 | }}} |
| 69 | |
| 70 | |
| 71 | {{{ |
| 72 | # Get only the columns of interest: Gene, sgrna, control_mean, treat_mean |
| 73 | cut -f 1,2,5,6 mageck.sgrna_summary.txt | awk '{print $2"\t"$1"\t"$3"\t"$4}' > CRISPR_score_sgRNA.txt |
| 74 | |
| 75 | # Convert the single column into a (wide) matrix, each column is a guide and each row is a gene |
| 76 | grep -v crispr_sgRNA.txt | sed 's/_/\t/' | sort -k 1,1 -k 2,2 -k 3,3n | awk -F '\t' '{print $1"\t"$2"_"$3"\t"$4"\t"$5}' | grep -v INTERGENIC | grep -v CTRL0 | cut -f 1,4 | groupBy -g 1 -c 2 -o collapse |sed 's/,/\t/g' > CRISPR_score_sgRNA.txt |
| 77 | }}} |
| 78 | |
| 79 | |
| 80 | == Method 2 (based on Wang et al., 2015) == |
73 | | |
74 | | == Method 2: MAGeCK == |
75 | | |
76 | | Analyze CRISPR genome-wide, or targeted, screen. As input, MAGeCK requires (raw) counts. If there are no replicates, MAGeCK will estimate the mean/variance from all the samples, i.e. both conditions. |
77 | | |
78 | | * [[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4 | Publication]] |
79 | | * [[https://sourceforge.net/projects/mageck/ | MAGeCK Home/Download Page]] |
80 | | * [[https://sourceforge.net/p/mageck/wiki/demo/ | Tutorial]] |
81 | | |
82 | | |
83 | | ==== Test: compare two conditions ==== |
84 | | |
85 | | * Common usage to test, or compare, two conditions |
86 | | |
87 | | |
88 | | {{{ |
89 | | |
90 | | # the options -t and -c specificity the treatment and control samples, respectively. |
91 | | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt |
92 | | |
93 | | # For paired samples: |
94 | | mageck test -k count_matrix.txt -t top1,top2 -c bot1,bot2 -n mageck_out.txt --paired |
95 | | # note: --paired option is available in version 0.5.9+ |
96 | | # use the option --normcounts-to-file to write normalized counts (by guide) to a file |
97 | | |
98 | | # If control guides are used, these can be specified using the options (below). |
99 | | # MAGeCK will use the control guides for normalization and generating the null distribution in running RRA, instead of all the guides. |
100 | | --control-sgrna controlGuides.txt --norm-method control |
101 | | |
102 | | #One option is to run with and without control guides and compare the results, e.g. volcano plot of -log(RRA) vs logFC to see how the control guides compare. |
103 | | |
104 | | # A summary of the output (in pdf) is available from running the R script (see below). |
105 | | # In some versions of 0.5.9+ an error may occur for the 'paired' option, |
106 | | # Error in axis(1, at = 1:length(vali), labels = (collabel), las = 2) : |
107 | | #'at' and 'labels' lengths differ, 2 != 4 |
108 | | # a workaround this is to edit the R script is to set the collabel to only two conditions, |
109 | | # e.g. collabel=c("low","high") |
110 | | # in the R script for all occurrences |
111 | | |
112 | | |
113 | | }}} |
114 | | |
115 | | |
116 | | The input file, count_matrix.txt, column names must match arguments to -c and -t, e.g. |
117 | | |
118 | | |
119 | | ||sgRNA||gene||bot1||bot2||top1||top2|| |
120 | | ||sgACTL7A_2||ACTL7A||32||14||10||26|| |
121 | | ||sgACTL7A_3||ACTL7A||44||40||82||118|| |
122 | | ||sgACTL7A_4||ACTL7A||64||61||418||313|| |
123 | | ||sgACTL7A_5||ACTL7A||9||0||17||74|| |
124 | | ||sgACTL7A_6||ACTL7A||42||5||47||166|| |
125 | | ||sgACTL7A_7||ACTL7A||14||32||23||60|| |
126 | | |
127 | | |
128 | | The output files include, |
129 | | |
130 | | - .gene_summary.txt results summarized by gene (for all genes) |
131 | | - .sgrna_summary.txt resuls by guide (for all guides); this file can be made into a matrix using a few UNIX commands, e.g. |
132 | | |
133 | | Run .Rmd to create a file (.html) which summarizes (only) the top hits, and also includes a waterfall plot. To create this summary report file, you can either open the .Rmd in Rstudio and click on "knit", or run the command below: |
134 | | {{{ |
135 | | Rscript -e "rmarkdown::render('foo.report.Rmd')" |
136 | | }}} |
137 | | |
138 | | |
139 | | {{{ |
140 | | #get only the columns of interest: Gene, sgrna, control_mean, treat_mean |
141 | | cut -f 1,2,5,6 mageck.sgrna_summary.txt | awk '{print $2"\t"$1"\t"$3"\t"$4}' > CRISPR_score_sgRNA.txt |
142 | | |
143 | | #convert the single column into a (wide) matrix, each column is a guide and each row is a gene |
144 | | grep -v crispr_sgRNA.txt | sed 's/_/\t/' | sort -k 1,1 -k 2,2 -k 3,3n | awk -F '\t' '{print $1"\t"$2"_"$3"\t"$4"\t"$5}' | grep -v INTERGENIC | grep -v CTRL0 | cut -f 1,4 | groupBy -g 1 -c 2 -o collapse |sed 's/,/\t/g' > CRISPR_score_sgRNA.txt |
145 | | }}} |
146 | | |
147 | | |