This package contains data and scripts used for evaluation in

        Holm L (2019) Benchmarking fold detection by DaliLite v.5

The benchmark is based on PDB frozen in May 2018 and SCOPe 2.07 (stable version).

Downloads are in three parts. The evaluation script and evaluation data are in the main part.

	# main part (778 Mb)
	wget http://ekhidna2.biocenter.helsinki.fi/dali/benchmark.tar.gz
	tar -zxvf benchmark.tar.gz
	# original outputs from compared programs (1 Gb)
        cd benchmark
	wget http://ekhidna2.biocenter.helsinki.fi/dali/raw_results.tar.gz
	tar -zxvf raw_results.tar.gz
	# PDB coordinates of database (10 Gb)
	wget http://ekhidna2.biocenter.helsinki.fi/dali/pdb_and_scope.tar
	tar -zxvf pdb_and_scope.tar

The test set consists of 140 query domains from SCOPe 2.07, listed in scope_140_targets.list.
The query domains are compared against PDB structures, and performance is evaluated
against the SCOPe 2.07 classification using Fmax. There are two target sets. 
pdb70_and_scope.list has 15211 chains and pdb_and_scope.list has 176022 chains.
The ground truth for evaluation is defined in combinetable.pdb70 and combinetable.pdb.

Truth tables:
combinetable.pdb and combinetable.pdb70 contain one row per chain in the target sets 
pdb_and_scope.list and pdb70_and_scope.list, respectively. The first column is the 
structure identifier. The second column is a string which has one symbol for each of 
the 140 query structures. The match status of the i'th query is given by the i'th symbol 
with the following encoding:

	symbol	meaning
	-	incorrect
	.	ignore
	1	fold level match
	2 	superfamily level match
	3	family level match

The truth table is an argument to the evaluation script bin/evaluate_ordered_lists.pl

Subfolders:
bin/
	evaluation script (evaluate_ordered_lists.pl) and utility scripts (Perl) 
structure_data/
	PDB format coordinates of query domains, downloaded from http://scop.berkeley.edu/downloads/pdbstyle/pdbstyle-2.07/
	PDB format coordinates of chains in pdb_and_scope.list (one chain per file)
	query domains in Dali's internal data format (./DAT)
pdb_subsets/
	Representative subsets of PDB used in DaliLite v.5 runs
raw_results/
	outputs of the compared programs
ordered_querywise/
	raw outputs by compared programs converted to ordered tuples for the evaluation script 
ordered_pooled/
	raw outputs by compared programs converted to ordered tuples for the evaluation script, all queries pooled into one list
evaluation_results/
	querywise evaluation results in <method>_<pdb|pdb70>
	pooled evaluation in pooled_<pdb|pdb70>
	querywise_summary has averages of Fmax,T,TP,precision,recall,score (at Fmax point) for each of fold, superfamily and family level 

Structure comparison methods included in evaluation:
	DaliLite v.5 (http://ekhidna2.biocenter.helsinki.fi/dali/README.v5.html) was used with default parameter settings
		Dali control = pairwise alignment of true pairs only: query-specific target lists, e.g. --cd1 q001A --db a.2.list
		Dali systematic = pairwise alignment of query domains to PDB: --query scope_140_targets.list --db pdb.list 
		Dali hierarchical = pairwise alignment of query domains to PDB25 and Blast expansion to PDB --hierarchical --query scope_140_targets.list --db pdb.list --repset pdb25.list --oneway --HMAX 200 --KMAX 2000
		Dali knowledge-based (alias walk):  --walk --query scope_140_targets.list --db pdb.list --repset pdb25.list --oneway --HMAX 200 --KMAX 2000 --H 100 --MAX_HITS 10000 --MAX_DALICON 10000
	mTMalign: query domains were submitted to the web server http://yanglab.nankai.edu.cn/mTM-align/ and results were downloaded from the server. mTMalign performs a knowledge-based search.
	DeepAlign software was downloaded from https://github.com/realbigws/DeepAlign/ and the DeepAlign_Search.sh was run using the -k 1 option. DeepAlign also outputs the TMscore which was used to order DeepAlign_TMscore targets.
	TMalign was downloaded from http://zhanglab.ccmb.med.umich.edu/TM-align/TMtools20170708.tar.gz and systematic pairwise comparisons were done against PDB coordinate files containing one chain.

Evaluation results were generated using the commands:

	for p in {pdb70,pdb} ; do ( for m in {DeepAlign,DeepAlign_TMscore,hierarchical,mTMalign,systematic,TMalign,walk,control} ; do ( bin/evaluate_ordered_lists.pl ./ordered_querywise/$m combinetable.$p scope_140_targets.list querywise > evaluation_results/$m\_$p; echo "*** $m $p ***" ; for i in {4,7,8,9,11,14,15,16,18,21,22,23} ; do (cut -f $i evaluation_results/$m\_$p  | grep [0-9] | bin/histcol.pl 0 | grep Aver ) ; done ) ; done ) ; done > evaluation_results/querywise_summary
	bin/evaluate_ordered_lists.pl ordered_pooled/ combinetable.pdb scope_140_targets.list pooled > evaluation_results/pooled_pdb
	bin/evaluate_ordered_lists.pl ordered_pooled/ combinetable.pdb70 scope_140_targets.list pooled > evaluation_results/pooled_pdb70