This page contains supplementary material related to the article
Optimizing InterProScan feature processing generates a surprisingly good Protein Function Prediction method
Automated protein Function Prediction (AFP) is an intensively studied topic. Most of this research focuses on methods that combine multiple data sources, while less articles look for the most efficient ways to use a single data source. Therefore, we wanted to test how different prepro cessing methods and classifiers would perform in the AFP task when we process the output from the InterProscan (IPS). Especially, we represent novel preprocessing methods, less used classifiers and inclusion of species taxonomy. We also test classifier stacking for combining tested classifier results. Methods are tested with in-house data and CAFA3 competition evaluation data. We show that including IPS localisation and taxonomy to the data improves results. Also the stacking improves the performance. Sur prisingly, our best performing methods outperformed all international CAFA3 competition participants in most tests. Results show how pre- processing and classifier combination are beneficial in the AFP task
Available also at the end of the manuscript
Our function prediction www-server, PANNZER
Comparing and improving stratified cross validation for multi-class datasets
Visualization of sequence-level taxonomic similarities at genome level with AAI server