Preprocessed AlphaFold Database for use with DaliLite running locally

The AlphaFold Database is a set of 360K structural models. We have preprocessed the AlphaFold Database so that you can perform structure comparisons against it using a local installation of DaliLite. The Dali web server performs equivalent searches but it has long queueing times. Sequence searches against the AlphaFold Database can be performed, for example, with the SANSparallel server.

Download the data here:

The tar file contains two subdirectories: Digest/AF.list contains all structures of the AlphaFold Database. Subsets at 70% sequence identity (AF_70.list etc.) were generated using CD-HIT. The mapping between DaliLite's internal structure identifiers and original AlphaFold Database file names can be retrieved from the lists in Digest/*.list.

Why you need to use digested data

A few models exceed the dimensions that DaliLite can handle. The critical parameter is the number of secondary structure elements, which must not exceed 200 - the program crashes otherwise. The limit cannot be changed for historical reasons (including the use of Fortran). The models listed in Table 2 were cut to two chains, labelled A and B. Cut points were selected visually in low confidence segments between globular domains.

Installing

Create a directory for installation:
mkdir alphafold
cd alphafold
Download the tarball:
wget http://ekhidna2.biocenter.helsinki.fi/dali/AF-Digest.tar.gz
tar -zxvf AF-Digest.tar.gz 
You should find populated subdirectories DAT/ and Digest/ under your current working directory.

Create a Blast database for hierarchical search:

makeblastdb -in Digest/AF.fasta -dbtype prot 

Searching

We recommend to run DaliLite in hierarchical search mode. This example compares your structure to human proteins in the AlphaFold Database (assuming you have installed DaliLite.v5 in your home directory and you are in the alphafold/ directory where you installed the data as above):

~/DaliLite.v5/bin/dali.pl --hierarchical --oneway --BLAST_DB Digest/AF.fasta \
--pdbfile mystructure.pdb --db Digest/HUMAN.list --repset Digest/HUMAN_70.list \
--dat1 ./ --dat2 ./DAT/ --title "my search" --np 40
Bold parameters refer to the digest of the AlphaFold Database. The hierarchical search is rather slow. The 70% identity subsets are significantly smaller than the full set mainly in plants (Table 1).

DaliLite imports structures giving them a four-letter identifier. Chains shorter than 30 amino acids are excluded. DaliLite results list both the four-letter identifier and the original file name, which is based on the Uniprot accession number. Note that DaliLite detects structural similarities between compact, globular domains. Searches with non-compact and non-globular AlphaFold models yield no hits with significant structural similarity.

Table 1: Subset lists

ShortScientific nameCommon NameFullSetSubset70
AFAlphaFold DatabaseAll models364717241174
ARATHArabidopsis thalianaArabidopsis 27400 22895
CAEELCaenorhabditis elegansNematode worm 19645 18233
CANALCandida albicansC. albicans 5974 5829
DANREDanio rerioZebrafish 24640 20023
DICDIDictyostelium discoideumDictyostelium 12620 11484
DROMEDrosophila melanogasterFruit fly 13432 13074
ECOLIEscherichia coliE. coli 4301 4174
HUMANHomo sapiensHuman 23332 18899
LEIINLeishmania infantumL. infantum 7924 7708
MAIZEZea maysMaize 39220 27990
METJAMethanocaldococcus jannaschiiM. jannaschii 1773 1740
MOUSEMus musculusMouse 21558 18146
MYCTUMycobacterium tuberculosisM. tuberculosis 3979 3896
ORYSJOryza sativaAsian rice 43581 38243
PLAF7Plasmodium falciparumP. falciparum 5186 5016
RATRattus norvegicusRat 21254 18017
SCHPOSchizosaccharomyces pombeFission yeast 5124 4961
SOYBNGlycine maxSoybean 55693 31054
STAA8Staphylococcus aureusS. aureus 2882 2812
TRYCCTrypanosoma cruziT. cruzi 19053 9255
YEASTSaccharomyces cerevisiaeBudding yeast 6019 5615

Table 2: AlphaFold models split into two chains

idshortoriginal filechain B start
cwy4DANREAF-A0A0R4II06-F1-model_v11303
c8mqRATAF-F1M5Q4-F1-model_v1742
e020HUMANAF-P02751-F1-model_v1999
fcn8HUMANAF-O75369-F1-model_v11035
fh10TRYCCAF-Q4CU46-F1-model_v11226
fiazTRYCCAF-Q4CTN6-F1-model_v11195
finbTRYCCAF-Q4DVS3-F1-model_v11200
fjigTRYCCAF-Q4DFV2-F1-model_v11200
fjydTRYCCAF-Q4CRW2-F1-model_v11218
flusTRYCCAF-Q4CTC1-F1-model_v11120
flw0TRYCCAF-Q4DH14-F1-model_v11228
fnedTRYCCAF-Q4CSQ4-F1-model_v11151
fnldTRYCCAF-Q4CY82-F1-model_v11243
fop7TRYCCAF-Q4CST2-F1-model_v11208
fpakTRYCCAF-Q4D802-F1-model_v11240
fpjjTRYCCAF-Q4CZ74-F1-model_v11250
fpndTRYCCAF-Q4CTR3-F1-model_v11209
fqdkTRYCCAF-Q4CX92-F1-model_v11248
frwdTRYCCAF-Q4CUD3-F1-model_v11186
frx2TRYCCAF-Q4CX06-F1-model_v11250
fsm1TRYCCAF-Q4CXH5-F1-model_v11250
ftc2TRYCCAF-Q4CSF7-F1-model_v11132
fuhhTRYCCAF-Q4CSJ3-F1-model_v11231
fuslTRYCCAF-Q4CT27-F1-model_v11236
fuucTRYCCAF-Q4CTS2-F1-model_v11230
fu0uTRYCCAF-Q4CVN2-F1-model_v11210
fvnnTRYCCAF-Q4D1P3-F1-model_v11252
f91qMOUSEAF-Q8BTM8-F1-model_v11064
ggudMOUSEAF-Q8R4Y4-F1-model_v11126