AAI-profiler [1] is a fast homology search tool that takes a query proteome (protein sequences in FASTA format) as input and plots the AAI (Average Amino-acid Identity) values of species in the Uniprot database. AAI-profiler is powered by SANS [2] and the processing time for a bacterial proteome is a few minutes.
Whole-genome sequencing has propelled the re-evaluation of the taxonomic classification of many bacteria. AAI-profiler can be used to select relevant genera for phylogenomic (AAI, ANI) or phylogenetic (16S rRNA, MLSA) trees. Figure 1 shows examples of congruent (A) and incongruent (B) genus assignments in the sequence databases.
A. Dickeya dieffenbachia. Live example | B. Proteus sp. HMSC10D02. Live example |
Figure 1. Each data point represents a species in the Uniprot database. AAI is plotted on the horizontal axis. The vertical axis is the fraction of query proteins that have a match in the database. The database contains both completely and partially sequenced species. (A) AAI values above 95-97 % correspond to the same species, and other species from the same (monophyletic) genus as the query should have higher AAI than other genera. (B) The sequence databases have tentatively assigned strain HMSC10D02 to Proteus, though its closest sequence neighbors belong to genus Klebsiella.
References
- Medlar AJ, Toronen P, Holm L (2018) AAI-profiler: fast proteome-wide exploratory analysis reveals taxonomic identity, misclassification and contamination. Nucleic Acids Research, 46, W479-W485
- Somervuo P, Holm L (2015) SANSparallel: interactive homology search against Uniprot. Nucl. Acids Res. 43, W24-W29