Context Navigation

Changes between Version 1 and Version 2 of SOP/MassSpec

Timestamp:: 05/09/19 10:47:50 (6 years ago)
Author:: gbell
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SOP/MassSpec

-              v1
+              v2
+Label-free mass spectrometry data analysis with data from Scaffold.
+. Quantitative Methods.
+. Normalization.
+. Differentially expressed proteins.
+==== Differential protein expression (with mass spec) ====
+This method is for label-free samples from our Proteomics Core Facility, which has some [[http://massspec.wi.mit.edu/documents/Scaffoldhowto060513.pdf|Scaffold quick instructions]].
+  * Using the (free) Scaffold Viewer (available from [[http://www.proteomesoftware.com/products/free-viewer|Proteome Software]], with a large [[http://www.proteomesoftware.com/pdf/scaffold_users_guide.pdf|User's Manual]]), open the sf3 file.
+  * By default, three filters prevent all mapped proteins from being displayed:
+    * Protein Threshold (default = 99%)
+    * Min # Peptides (default = 2)
+    * Peptide Threshold (default = 95%)
+  * These filters are typical good with the default settings.
+  * Scaffold has multiple display and quantification options.
+    * From the Scaffold User's Manual:
+      * Spectrum Counting methods are the most reliable in answering the question, "Is anything changing between experimental conditions?".
+      * Precursor Ion Intensity quantification methods are very reliable in answering the question, "How much is the amount of change I am dealing with?"
+      * The Total Ion Count (TIC) methods can answer both questions but not very well.
+    * For a quick QC check:
+      * Look at the first few most highly expressed proteins.
+      * Are they within 2-fold or so of each other?
+      * If some samples have much lower or higher counts, their sensitivity may differ so much that samples may not be comparable.
+    * Under Display Options, select "Total Spectrum Count"
+    * For Spectrum Counting, click on the "Quantitative Analysis" icon (showing a bar graph).
+       * Keep "Use Normalization" checked.
+       * For Quantitative Method, select "Total Spectra".
+       * Click the Apply button.
+       * Next to Display Option: select Quantitative Value (Normalized Total Spectra)
+       * Export (top menu) => Current View.
+       * Use this file to identify differentially expressed proteins (to be explained below).
+    * For Precursor Ion Intensity quantification, click on the "Quantitative Analysis" icon (showing a bar graph).
+       * Keep "Use Normalization" checked.
+       * For Quantitative Method, select "Top 3 Precursor Intensity".
+       * Click the Apply button.
+       * Next to Display Option: select Quantitative Value (Normalized Top 3 Precursor Intensity)
+       * Export (top menu) => Current View.
+       * Use this file for visualization (to be explained below).
+    * To identify differentially expressed proteins, use the normalized Total Spectrum Count
+      * Recommended statistic: t-test on log2 transformed values
+      * Correct p-values with FDR (or an alternate method)
+    * For pathway analysis: [[https://david.ncifcrf.gov/|DAVID]] usually works fine.
+    * For visualization:
+      * Draw a heatmap (Cluster3.0 -> Java TreeView) using the normalized Top 3 Precursor Intensities.
+      * Draw scatterplot using the normalized Top 3 Precursor Intensities, highlighting the differentially expressed proteins (from the Total Spectrum).
+==== Recommendations from Northeastern (May Institute, Vitek Lab)  ====
+  * Best input is peptide-level "peak intensities", which are any continuous metric, such as Scaffold's
+    * Average Precursor Intensity
+    * Total Precursor Intensity
+    * Top Three Precursor Intensities
+  * Ideal analysis pipeline is to input these values into MSstats for pre-processing, statistics, and data visualization
+  * Preprocessing steps recommended by (performed by) MSstats:
+    * Log2 transform
+    * Median-normalize across samples and runs (ignoring any 0s)
+    * Convert all 0s to NA
+    * Censor low measurements
+      * Get median
+      * Get 99.9th (or other percentile) to identify right tail of distribution ("r")
+      * Get threshold of left side ("l") of the distribution (2*median - r)
+      * Censor all values less than "l"
+    * Impute all missing values using a MNAR method, such as the accelerated failure model
+    * Summarize all features of a protein using Tukey's median polish (TMP), but ignore proteins with only 1 peptide (or risk increased false positive rate)
+  * Model each protein with a linear mixed-effects model
+     * Limma does a good job too, but it doesn't handle all the experimental designed handled by MSstats
+   * Use model to calculate fold changes and raw p-values
+   * Correct all p-values with FDR (BH method)
+   * Draw summary plots (volcano plots, MA plots)
+==== Preparing and processing an experiment with MSstats  ====
+  * Export peptide-level intensity values from your favorite MS quantification software.
+  * Create a peptide intensity file
+      * Organize the dataset so the first three columns are Gene.symbol, Protein.Accession, and Peptide.sequence
+        * If needed, convert each protein accession to a gene symbol
+        * Replace any intensities shown as "-" with 0
+        * If Excel is used, check that gene symbols aren't being converted to dates
+      * After the first 3 columns, the remaining columns hold intensities, one column per sample
+  * To avoid losing information about peptides that could have originated from multiple proteins and/or genes, merge peptide rows representing more than 1 protein and/or gene
+     * Sample command: sort -k3,3 Peptide_intensities.matrix.txt | groupBy -g 3 -c 1,2,3,4,5,6,7,8,9,10 -o distinct >| Peptide_intensities.matrix.mergedByPeptide.txt
+     * Make sure that all rows of the output file are unique.
+  * Create a sample description file
+     * Columns are  Run, Condition, BioReplicate
+     * See MSstats documentation on how to use these fields to represent technical replicates, biological replicates, and paired designs.
+     * Replication is not required for subsequent protein quantification but is required for statistical analysis.
+  * Run MSstats using peptide intensities and sample description as input files.
+    * For sample code, see **/nfs/BaRC_code/R/analyze_MS_with_MSstats/analyze_MS_with_MSstats.R**
+==== References  ====
+  * Choi et al., 2017 [[https://pubs.acs.org/doi/abs/10.1021/acs.jproteome.6b00881|ABRF Proteome Informatics Research Group (iPRG) 2015 Study: Detection of Differentially Abundant Proteins in Label-Free Quantitative LC−MS/MS Experiments]]
+  * MSstats at [[https://bioconductor.org/packages/release/bioc/html/MSstats.html|Bioconductor]] and [[https://github.com/MeenaChoi/MSstats|GitHub]]
+  * Other references on these topics
+    * Lazar et. al., 2016 [[https://pubs.acs.org/doi/full/10.1021/acs.jproteome.5b00981|Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies]]
+    * Wei et al., 2018  [[https://www.nature.com/articles/s41598-017-19120-0|Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data]]