|
Mass Spectrometry for Protein Identification
Tandem mass spectrometry has emerged as a key technique for
identifying proteins in complex biological samples.
In the "shotgun proteomics" technique, proteins are digested into peptides –
which are identified from fragmentation spectra
– and then peptide identifications
are integrated back into protein identifications. However, since high-throughput proteomics laboratories can produce millions of mass spectra a week, automatic data analysis is critical.
Working with top proteomics laboratories, PARC researchers have developed
new algorithms and software for efficiently identifying peptides and proteins – with greater sensitivity and accuracy than standard tools such as SEQUEST, Mascot, and ProteinProphet. Our collaborators are using PARC's software to address difficult proteomics problems
such as biomarker discovery and oxidative footprinting.
PARC’s peptide identification program – ByOnic – takes a hybrid approach that uses both de novo sequencing and database search
techniques.
- ByOnic first employs de novo sequencing to identify a small number of "lookup peaks", likely b- and y-ion masses. Then the database is searched for peptides that match a given number
of lookup peaks, for example, 1 match for fully tryptic peptides, 2 for semi-tryptic peptides, and 3 for non-tryptic peptides. Qualifying peptides are then scored in great detail, taking into account predicted and observed peak intensities and mass measurement errors.Lookup peaks function similarly to 3-letter "sequence tags", but are more efficient because 2 lookup peaks filter
the database about 5 times more effectively than a 3-letter tag.
PARC’s protein identification program – ComByne – integrates ByOnic's peptide identifications into a list of protein identifications,
ranked by confidence.
- To compile its list, ComByne uses the number of peptide identifications, along with their lengths and scores, and then corrects for the lengths and redundancies of proteins. On a complex, high-dynamic-range sample like blood plasma, ByOnic typically makes 50% to 100% more spectrum identifications than Mascot or SEQUEST at the same false discovery rate (empirically measured using reversed protein sequences). This improvement at the spectrum level typically translates into 30% to 70% more identifications at the protein level.
A comparison of ByOnic/ComByne v. Mascot/ProteinProphet v. X!Tandem (using the product of E-values for protein ranking) on a sample of mouse blood plasma. All three tools were run on the same 50,000-protein database, which included reversed proteins – deliberate decoys – for an empirical estimate of the false discovery rate.
- All three tools agreed on the first 69 proteins in the mouse plasma sample.
- For Mascot, reversed proteins started showing up at rank 70, and reached about
50% of all identifications by rank 90.
- For X!Tandem, reversed proteins started showing up at rank 105, and reached
about 50% of all indentifications by rank 120.
- For ByOnic, reversed proteins started showing up at rank 148, and reached
about 50% of all identifications by rank 160.
- The sample included 13 soluble human proteins spiked into the mouse plasma at a concentration of 10 micrograms per milliliter.
- Mascot found only 2 of the spikes, but ByOnic found 10.
|
 |
|
 |
| BUSINESS
CONTACT |
Richard Bruce
Manager, Biomedical Systems
650-812-4447 |
 |
| KEYWORDS |
| de novo sequencing ∙ peptide identification ∙ proteomics ∙ tandem mass spectrometry |
 |
|