AAI-profiler: fast proteome-wide search reveals taxonomic outliers

AAI-profiler [1] is a fast homology search tool that takes a query proteome (protein sequences in FASTA format) as input and plots the AAI (Average Amino-acid Identity) values of species in the Uniprot database. AAI-profiler is powered by SANS [2] and the processing time for a bacterial proteome is a few minutes.

Whole-genome sequencing has propelled the re-evaluation of the taxonomic classification of many bacteria. AAI-profiler can be used to select relevant genera for phylogenomic (AAI, ANI) or phylogenetic (16S rRNA, MLSA) trees. Figure 1 shows examples of congruent (A) and incongruent (B) genus assignments in the sequence databases.

A. Dickeya dieffenbachia. Live example B. Proteus sp. HMSC10D02. Live example

Figure 1. Each data point represents a species in the Uniprot database. AAI is plotted on the horizontal axis. The vertical axis is the fraction of query proteins that have a match in the database. The database contains both completely and partially sequenced species. (A) AAI values above 95-97 % correspond to the same species, and other species from the same (monophyletic) genus as the query should have higher AAI than other genera. (B) The sequence databases have tentatively assigned strain HMSC10D02 to Proteus, though its closest sequence neighbors belong to genus Klebsiella.

References

  1. Medlar AJ, Toronen P, Holm L (2018) AAI-profiler: fast proteome-wide exploratory analysis reveals taxonomic identity, misclassification and contamination. Nucleic Acids Research, 46, W479-W485
  2. Somervuo P, Holm L (2015) SANSparallel: interactive homology search against Uniprot. Nucl. Acids Res. 43, W24-W29

Example input

Example output

STEP 1 - Enter your query proteome:

Paste proteome in FASTA format (example, text area limited to 10M characters):


or upload a proteome FASTA file:

or load proteome FASTA file from URL: (restricted to NCBI genome or EBI database ftp servers)

or paste the checksum of a recent job:

STEP 2 - Optional inputs:

Project title:

E-mail address for notification:

STEP 3 - Submit your job:

The results will appear in a new window.

Tutorial as PDF.

Youtube video (by Jana Roels)

Genome sequencing projects that used AAI-profiler:

Archaea

-Halobellus captivus sp. nov. (publication)
-Haloferax litoreum sp. nov., Haloferax marinisediminis sp. nov., and Haloferax marinum sp. nov. (publication)
-Halomicroarcula amylolytica sp.nov. (publication)
-Halorhabdus amylolytica sp.nov., Halorhabdus salina sp.nov. (publication)
-Halorubrum amylolyticum sp. nov. (publication)
-Haloterrigena salifodinae sp. nov. (publication)

Bacteria

-Acidiferrimicrobium australe gen. nov., sp. nov. (publication)
-Actobacterium tashihtau gen. nov., sp. nov. (preprint)
-Agrolactibacillus fermenti sp. nov. (publication)
-Aestuariispira ectoiniformans sp. nov. (publication)
-Alcaligenaceae sp. Strain 429 (publication)
-Aliikangiella coralliicola sp.nov. (publication)
-Aoguangibacterium sediminis gen.nov., sp.nov. (preprint)
-Candidatus Izimaplasma strain zrk1 (publication)
-Caproicibacter fermentans gen. nov., sp. nov. (publication)
-Carbonactinosporaceae fam. nov. (publication)
-Croceivirga litoralis sp. nov. (publication)
-Deferribacter autotrophicus (publication)
-Fructobacillus tropaeoli CRL 2034 (publication)
-Litoribacterium kuwaitense gen.nov., sp.nov. (publication)
-Mangrovivirga cuniculi gen. nov., sp. nov. (publication)
-Marasmitruncus massiliensis gen.nov., sp.nov. (publication)
-Maribellus comscasis sp. nov. (preprint)
-Massilia horti sp.nov. (publication)
-Metabacillus elymi sp. nov. (publication)
-Microlunatus elymi sp. nov. (publication)
-Myceligenerans indicum sp. nov. (publication)
-Mycoplasmas (publication)
-Nocardia cyriacigeorgica soil strains (publication)
-Novaherbaspirillum arenae sp. nov. (publication)
-Oceanivirga miroungae sp. nov. (publication)
-Pantoea beijingensis LMG27579T (reclassified as Erwinia) (publication)
-Pseudoalteromonas distincta (publication)
-Pseudogemmobacter faecipullorum sp.nov. (publication)
-Pusillimonas faecipullorum sp.nov. (publication)
-Rhodospirellula aestuarii sp.nov. (publication)
-Roseobacter ponti DSM 106830 (publication)
-Rubinisphaera margarita sp.nov. (publication)
-Salifodinibacter halophilus gen. nov., sp. nov. (publication)
-Schlegelella brevitalea sp. nov. (publication)
-Shewanella glacialimarina TZS-4T nov. (publication)
-Sinomicrobium weinanense sp.nov. (publication)
-Streptomyces brasiliscabiei (publication)
-Synechococcus moorigangaii CMS01 (publication)
-Urmitella timonensis gen.nov., sp.nov. (publication)
-Verrucosispora sp. Strain CWR15 (publication)
-Verrucocispora domesticated strains (publication)
-Vibrio chemaguriensis sp. nov. (publication)

Eukaryota

-Balamuthia mandrillaris transcriptome (manuscript)
-Picrorhiza kurrooa (publication)
-Pseudocercospora macadamiae (publication)

Metagenome assembled genomes

-Chlorobium species in Antarctica (publication
-Dinosaur bone (publication)
-Woeseiales in marine sediment (publication)