How do you solve a problem in palaeoproteomics?

The field of palaeoproteomics holds great potential for archaeologists and evolutionary biologists. In a paper recently published in BMC Evolutionary Biology, Frido Welker of the Max Planck Institute for Evolutionary Anthropology in Leipzig investigated the efficacy of error-tolerant searches for palaeoproteomic studies, highlighting some potential problems and solutions for studying whole proteomes from ancient hominin species.

Palaeoproteomics – Exciting new frontiers

Palaeoproteomics is the new-kid-on-the-block in archaeological science and evolutionary biology. Standards of practice have only recently been set out for this burgeoning field of investigation, used recently to study ancient humans, animals, and even dinosaurs (though this is somewhat disputed).

Ancient protein sequences are incredibly robust compared to ancient DNA, with proteins recently retrieved from samples as old as 3.8 million years.

Bioinformatics techniques

Single amino acid polymorphisms (SAPs), areas in the proteome where one amino acid (AA) difference occurs, are useful to determine evolutionary differences between clades, including late Pleistocene hominins like Neanderthals and Denisovans.

Error-tolerant searches (wherein the algorithm can ‘learn’ when examples received have been corrupted in some way) correctly inferred SAPs in Collagen Type 1 of modern and Pleistocene samples, but as Dr Welker notes, this “needs to be demonstrated [in a whole proteome] before moving on to the analysis of older and possibly more divergent hominin fossils”. This is the basis of the experiments reported in his recently-published paper in BMC Evolutionary Biology.

The cross-species proteomics effect problem

As you go further back in time on the evolutionary tree, larger proteomic sequence differences occur between your target species and their more-distant relatives.

Until now, the cross-species proteomics effect (CSPE) problem hadn’t been demonstrated to be overcome by error-tolerant searches – essential for confidence in palaeoproteomic investigations.

The CSPE problem derives from using a different reference proteome from your target population or species. As you go further back in time on the evolutionary tree, larger proteomic sequence differences occur between your target species and their more-distant relatives.

The larger the number of sequence differences, the more likely that peptides and proteins will not be identified in standard (non-error-tolerant) searches. The small proteome databases available for hominin species compound the effect.

Error-tolerant vs Standard search experiment

Error-tolerant searches should allow for identification of SAPs, and overcome the CSPE issue. Dr Welker tested this using standard and error-tolerant searches of the proteomes of 7 modern human samples against UniProt-derived human (Homo sapiens; Homo s.’), chimpanzee (Pan troglodytes; Pan’), and Sumatran orang-utan (Pongo abelii; Pongo’) reference databases using PEAKS (v 7.5).

For a protein to be confirmed, at least two unique peptide spectrum matches (PSMs) with a FDR (false discovery rate) of 1% had to be identified when searching the modern samples against the Homo s. database in the general search. In error-tolerant searches, PSMs were accepted only when a peptide of 10 amino acids (AAs) or longer was matched in the Homo s. database.

Four outcomes were established for PSMs individually searched against orthologous databases in error-tolerant searches, and two outcomes in standard searches, viewable in Figure 1 in Dr Welker’s paper.

Evolutionary distance, and long peptides

No PSMs with three or more SAPs were identified, and mutable PSMs with AA lengths over 25 were almost never identified (Figure 4 in Dr Welker’s article). Mutable PSMs with 10-15 AAs were identified up to 75% of the time, regardless of evolutionary distance.

This could be problematic when comparing human proteomes with much older hominin ancestors – larger evolutionary distances may make it difficult to reliably reconstruct phylogenetic trees.

Welker recommends creating shorter peptides in palaeoproteomic studies, and searching against databases of species with less evolutionary distance, though the latter may more difficult with more ancient hominins.

Error-tolerant searches minimised the CSPE problem between moderately divergent proteomes (e.g. the human-chimpanzee split), and this should be kept in mind for future phylogenetic analysis of ancient hominin proteomes.

Filtering criteria to mitigate falsely-suggested AA substitutions

Fast-evolving proteins provide more phylogenetic information compared to slow-evolving proteins, but their high substitution rates mean they may go unidentified in error-tolerant searches.

Error-tolerant searches can miss-match PSMs, incorrectly suggesting PSMs that should not be mutated, confounding phylogenetic analysis. Welker determined filtering criteria to mitigate these off-target effects in error-tolerant searches:

  1. Two or more PSMs covering the mutated AA position must be confirmed.
  2. These PSMs must comprise the majority of the total number of PSMs matching that AA position.

These filtering criteria remove falsely-suggested AA substitutions, but can filter out correctly-substituted PSMs matching a single PSM, which could affect phylogeneticically-informative PSMs. Welker suggests running the same protein extract several times on a tandem mass spectrometer.

Informative proteins a potential target for phylogenetic studies

Fast-evolving proteins provide more phylogenetic information compared to slow-evolving proteins, but their high substitution rates mean they may go unidentified in error-tolerant searches. Welker revealed two fast-evolving proteins, with phylogenetically-informative positions sufficiently spaced to allow error-tolerant search identification, even at large evolutionary distances.

Alpha-2-HS-glycoprotein (AHSG) and fibrinogen alpha chain (FGA), frequently observed in ancient bone proteome datasets, present a viable target for proteomic phylogenetic studies.

Filling in the map

While there are limitations to error-tolerant searches in palaeoproteomic studies, Frido Welker has established filtering criteria and recommended experimental techniques to help maximise the potential for whole proteome investigations. Moreover, he has identified two potentially invaluable proteins for evolutionary studies.

Palaeoproteomics is a new field of study, with much of its potential and limitations still to be explored. Dr Welker’s paper has filled in a portion of the map of this largely-uncharted territory.

View the latest posts on the BMC Series blog homepage

Comments