Peptide Prediction with Abundance (PPA)

Artificial neural network model to assess in silico digested peptide's detectability (likelihood of detection by LC-MS/MS) based on physicochemical properties and protein abundance.

Download the executable version written in Perl





[2]: Peptide Mass : From to
[3]: Min Peptide Length :
[4]: Desired PPA score :
[5]: Check if protein sequence used to perform in-silico digest Protein Sequence Peptide Sequence
[6]: Check if protein coverage or amount in fmol is used Protein Sequence Coverage Protein amount in fmol

Download the result in CSV format



Instructions to use the web interface searching:

[1a]: When uploading/pasting protein sequence, make sure a maximum of around 5000 proteins are used. Please download and use the exectuable code for a large number of proteins.
[1b]: You could upload default sequence coverage report from platforms such as Proteome Discovery, Mascot and Protein Pilot. Please convert your xls/xlsx file to csv or txt format.
[1c]: Sequence coverage could be typed in as, for example, 50%, 50, or 0.5.
[1d]: If type in the sequence coverage, each coverage will be assigned to the proteins provided in consecutive order.
[1e]: If sequence coverage is provided, PPA score shows the detectability under current protein abundance. Otherwise, PPA score is calculated based on 15 physicochemical features.
[2] : The recommended peptide molecular weight is between 600 to 6000 Dalton.
[3] : The recommended peptide size is greater or equal to 5 amino acids.
[4] : The target PPA score used to compute fold enrichment under provided protein abundance.
[5] : All sequences will be subjected to in-silico trypsin digest unless the 'peptide sequence' is checked in step 5.
[6] : User can choose protein abundance as [fmol] by checking the second option in step [6].
[7] : If you would like to generate a neuralnet network model based on your own dataset, please download and install the PPA's neural network training model in R package ppa_1.0.zip.
       For your convenience, two sample files are provided. file1 and file2
       Usage:
            neural.network.classification.model(file1, file2)
       Arguments:
            file1      The path to the search results: (1) peptide summary from protein pilot, (2) exported csv from mascot, (3) tab delimited file containing only protein and peptide with two headers 'Accession' and 'Peptides'
            file2      The path to the reference protein database in FASTA format.
       Details:
            The program will generate a test file containing neural network coefficients. These coefficients can be used in conjunction with our peptide prediction program PPA software.steeenlab.org/rc4/PPA.php.
            Specifically use the file generated here as the customized model for the website (step 7 on the website instruction).
            Protein name must be the same for file1 and file2. Use at lease 100 different proteins for the model training.

Instructions to run the excutable script:

       Usage:
            PPA.perl [options] sequence-file
       Arguments
            sequence-file   Path to protein sequence database in FASTA format.
       Optional arguments
            -c   Type of sequence file : 1 for protein sequence and 0 for peptide sequence.
            -s   Path to protein sequence coverage output -- Reports from Proteome Discovery, Mascot and Protein Pilot searches are supported.
            -o   Path to output file.

This project is funded by Boston Children's Hospital. For any questions, please email: Shaojun.Tang@Childrens.Harvard.edu OR Hanno.Steen@childrens.harvard.edu OR Michael_Springer@hms.harvard.edu