Changes between Version 1 and Version 2 of SOP/MassSpec


Ignore:
Timestamp:
05/09/19 10:47:50 (5 years ago)
Author:
gbell
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SOP/MassSpec

    v1 v2  
    1 Label-free mass spectrometry data analysis with data from Scaffold.
    2 1. Quantitative Methods.
    3 2. Normalization.
    4 3. Differentially expressed proteins.
     1==== Differential protein expression (with mass spec) ====
     2
     3This method is for label-free samples from our Proteomics Core Facility, which has some [[http://massspec.wi.mit.edu/documents/Scaffoldhowto060513.pdf|Scaffold quick instructions]].
     4
     5  * Using the (free) Scaffold Viewer (available from [[http://www.proteomesoftware.com/products/free-viewer|Proteome Software]], with a large [[http://www.proteomesoftware.com/pdf/scaffold_users_guide.pdf|User's Manual]]), open the sf3 file.
     6
     7  * By default, three filters prevent all mapped proteins from being displayed:
     8    * Protein Threshold (default = 99%)
     9    * Min # Peptides (default = 2)
     10    * Peptide Threshold (default = 95%)
     11  * These filters are typical good with the default settings.
     12
     13  * Scaffold has multiple display and quantification options.
     14    * From the Scaffold User's Manual:
     15      * Spectrum Counting methods are the most reliable in answering the question, "Is anything changing between experimental conditions?".
     16      * Precursor Ion Intensity quantification methods are very reliable in answering the question, "How much is the amount of change I am dealing with?"
     17      * The Total Ion Count (TIC) methods can answer both questions but not very well.
     18    * For a quick QC check:
     19      * Look at the first few most highly expressed proteins. 
     20      * Are they within 2-fold or so of each other? 
     21      * If some samples have much lower or higher counts, their sensitivity may differ so much that samples may not be comparable.
     22
     23    * Under Display Options, select "Total Spectrum Count"
     24
     25    * For Spectrum Counting, click on the "Quantitative Analysis" icon (showing a bar graph).
     26       * Keep "Use Normalization" checked.
     27       * For Quantitative Method, select "Total Spectra".
     28       * Click the Apply button.
     29       * Next to Display Option: select Quantitative Value (Normalized Total Spectra)
     30       * Export (top menu) => Current View.
     31       * Use this file to identify differentially expressed proteins (to be explained below).
     32
     33    * For Precursor Ion Intensity quantification, click on the "Quantitative Analysis" icon (showing a bar graph).
     34       * Keep "Use Normalization" checked.
     35       * For Quantitative Method, select "Top 3 Precursor Intensity".
     36       * Click the Apply button.
     37       * Next to Display Option: select Quantitative Value (Normalized Top 3 Precursor Intensity)
     38       * Export (top menu) => Current View.
     39       * Use this file for visualization (to be explained below).
     40 
     41    * To identify differentially expressed proteins, use the normalized Total Spectrum Count
     42      * Recommended statistic: t-test on log2 transformed values
     43      * Correct p-values with FDR (or an alternate method)
     44
     45    * For pathway analysis: [[https://david.ncifcrf.gov/|DAVID]] usually works fine.
     46
     47    * For visualization:
     48      * Draw a heatmap (Cluster3.0 -> Java TreeView) using the normalized Top 3 Precursor Intensities.
     49      * Draw scatterplot using the normalized Top 3 Precursor Intensities, highlighting the differentially expressed proteins (from the Total Spectrum). 
     50
     51
     52==== Recommendations from Northeastern (May Institute, Vitek Lab)  ====
     53
     54  * Best input is peptide-level "peak intensities", which are any continuous metric, such as Scaffold's
     55    * Average Precursor Intensity
     56    * Total Precursor Intensity
     57    * Top Three Precursor Intensities
     58  * Ideal analysis pipeline is to input these values into MSstats for pre-processing, statistics, and data visualization
     59  * Preprocessing steps recommended by (performed by) MSstats:
     60    * Log2 transform
     61    * Median-normalize across samples and runs (ignoring any 0s)
     62    * Convert all 0s to NA
     63    * Censor low measurements
     64      * Get median
     65      * Get 99.9th (or other percentile) to identify right tail of distribution ("r")
     66      * Get threshold of left side ("l") of the distribution (2*median - r)
     67      * Censor all values less than "l"
     68    * Impute all missing values using a MNAR method, such as the accelerated failure model
     69    * Summarize all features of a protein using Tukey's median polish (TMP), but ignore proteins with only 1 peptide (or risk increased false positive rate)
     70  * Model each protein with a linear mixed-effects model
     71     * Limma does a good job too, but it doesn't handle all the experimental designed handled by MSstats
     72   * Use model to calculate fold changes and raw p-values
     73   * Correct all p-values with FDR (BH method)
     74   * Draw summary plots (volcano plots, MA plots)
     75
     76==== Preparing and processing an experiment with MSstats  ====
     77
     78  * Export peptide-level intensity values from your favorite MS quantification software.
     79
     80  * Create a peptide intensity file
     81      * Organize the dataset so the first three columns are Gene.symbol, Protein.Accession, and Peptide.sequence
     82        * If needed, convert each protein accession to a gene symbol
     83        * Replace any intensities shown as "-" with 0
     84        * If Excel is used, check that gene symbols aren't being converted to dates
     85      * After the first 3 columns, the remaining columns hold intensities, one column per sample
     86
     87  * To avoid losing information about peptides that could have originated from multiple proteins and/or genes, merge peptide rows representing more than 1 protein and/or gene
     88     * Sample command: sort -k3,3 Peptide_intensities.matrix.txt | groupBy -g 3 -c 1,2,3,4,5,6,7,8,9,10 -o distinct >| Peptide_intensities.matrix.mergedByPeptide.txt
     89     * Make sure that all rows of the output file are unique.
     90
     91  * Create a sample description file
     92     * Columns are  Run, Condition, BioReplicate
     93     * See MSstats documentation on how to use these fields to represent technical replicates, biological replicates, and paired designs.
     94     * Replication is not required for subsequent protein quantification but is required for statistical analysis.
     95 
     96  * Run MSstats using peptide intensities and sample description as input files.
     97    * For sample code, see **/nfs/BaRC_code/R/analyze_MS_with_MSstats/analyze_MS_with_MSstats.R**
     98
     99==== References  ====
     100
     101  * Choi et al., 2017 [[https://pubs.acs.org/doi/abs/10.1021/acs.jproteome.6b00881|ABRF Proteome Informatics Research Group (iPRG) 2015 Study: Detection of Differentially Abundant Proteins in Label-Free Quantitative LC−MS/MS Experiments]]
     102  * MSstats at [[https://bioconductor.org/packages/release/bioc/html/MSstats.html|Bioconductor]] and [[https://github.com/MeenaChoi/MSstats|GitHub]]
     103  * Other references on these topics
     104    * Lazar et. al., 2016 [[https://pubs.acs.org/doi/full/10.1021/acs.jproteome.5b00981|Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies]]
     105    * Wei et al., 2018  [[https://www.nature.com/articles/s41598-017-19120-0|Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data]]