This package contains data and scripts used for evaluation in Holm L (2019) Benchmarking fold detection by DaliLite v.5 The benchmark is based on PDB frozen in May 2018 and SCOPe 2.07 (stable version). Downloads are in three parts. The evaluation script and evaluation data are in the main part. # main part (778 Mb) wget http://ekhidna2.biocenter.helsinki.fi/dali/benchmark.tar.gz tar -zxvf benchmark.tar.gz # original outputs from compared programs (1 Gb) cd benchmark wget http://ekhidna2.biocenter.helsinki.fi/dali/raw_results.tar.gz tar -zxvf raw_results.tar.gz # PDB coordinates of database (10 Gb) wget http://ekhidna2.biocenter.helsinki.fi/dali/pdb_and_scope.tar tar -zxvf pdb_and_scope.tar The test set consists of 140 query domains from SCOPe 2.07, listed in scope_140_targets.list. The query domains are compared against PDB structures, and performance is evaluated against the SCOPe 2.07 classification using Fmax. There are two target sets. pdb70_and_scope.list has 15211 chains and pdb_and_scope.list has 176022 chains. The ground truth for evaluation is defined in combinetable.pdb70 and combinetable.pdb. Truth tables: combinetable.pdb and combinetable.pdb70 contain one row per chain in the target sets pdb_and_scope.list and pdb70_and_scope.list, respectively. The first column is the structure identifier. The second column is a string which has one symbol for each of the 140 query structures. The match status of the i'th query is given by the i'th symbol with the following encoding: symbol meaning - incorrect . ignore 1 fold level match 2 superfamily level match 3 family level match The truth table is an argument to the evaluation script bin/evaluate_ordered_lists.pl Subfolders: bin/ evaluation script (evaluate_ordered_lists.pl) and utility scripts (Perl) structure_data/ PDB format coordinates of query domains, downloaded from http://scop.berkeley.edu/downloads/pdbstyle/pdbstyle-2.07/ PDB format coordinates of chains in pdb_and_scope.list (one chain per file) query domains in Dali's internal data format (./DAT) pdb_subsets/ Representative subsets of PDB used in DaliLite v.5 runs raw_results/ outputs of the compared programs ordered_querywise/ raw outputs by compared programs converted to ordered tuples for the evaluation script ordered_pooled/ raw outputs by compared programs converted to ordered tuples for the evaluation script, all queries pooled into one list evaluation_results/ querywise evaluation results in _ pooled evaluation in pooled_ querywise_summary has averages of Fmax,T,TP,precision,recall,score (at Fmax point) for each of fold, superfamily and family level Structure comparison methods included in evaluation: DaliLite v.5 (http://ekhidna2.biocenter.helsinki.fi/dali/README.v5.html) was used with default parameter settings Dali control = pairwise alignment of true pairs only: query-specific target lists, e.g. --cd1 q001A --db a.2.list Dali systematic = pairwise alignment of query domains to PDB: --query scope_140_targets.list --db pdb.list Dali hierarchical = pairwise alignment of query domains to PDB25 and Blast expansion to PDB --hierarchical --query scope_140_targets.list --db pdb.list --repset pdb25.list --oneway --HMAX 200 --KMAX 2000 Dali knowledge-based (alias walk): --walk --query scope_140_targets.list --db pdb.list --repset pdb25.list --oneway --HMAX 200 --KMAX 2000 --H 100 --MAX_HITS 10000 --MAX_DALICON 10000 mTMalign: query domains were submitted to the web server http://yanglab.nankai.edu.cn/mTM-align/ and results were downloaded from the server. mTMalign performs a knowledge-based search. DeepAlign software was downloaded from https://github.com/realbigws/DeepAlign/ and the DeepAlign_Search.sh was run using the -k 1 option. DeepAlign also outputs the TMscore which was used to order DeepAlign_TMscore targets. TMalign was downloaded from http://zhanglab.ccmb.med.umich.edu/TM-align/TMtools20170708.tar.gz and systematic pairwise comparisons were done against PDB coordinate files containing one chain. Evaluation results were generated using the commands: for p in {pdb70,pdb} ; do ( for m in {DeepAlign,DeepAlign_TMscore,hierarchical,mTMalign,systematic,TMalign,walk,control} ; do ( bin/evaluate_ordered_lists.pl ./ordered_querywise/$m combinetable.$p scope_140_targets.list querywise > evaluation_results/$m\_$p; echo "*** $m $p ***" ; for i in {4,7,8,9,11,14,15,16,18,21,22,23} ; do (cut -f $i evaluation_results/$m\_$p | grep [0-9] | bin/histcol.pl 0 | grep Aver ) ; done ) ; done ) ; done > evaluation_results/querywise_summary bin/evaluate_ordered_lists.pl ordered_pooled/ combinetable.pdb scope_140_targets.list pooled > evaluation_results/pooled_pdb bin/evaluate_ordered_lists.pl ordered_pooled/ combinetable.pdb70 scope_140_targets.list pooled > evaluation_results/pooled_pdb70