Download the data here:
mkdir alphafold
cd alphafold
Download the tarball:
wget http://ekhidna2.biocenter.helsinki.fi/dali/AF-Digest.tar.gz
tar -zxvf AF-Digest.tar.gz
You should find populated subdirectories DAT/ and Digest/ under your current working directory.
The internal identifiers for AF-DB fill the name space from a000 to xzzz. You can import structures from the Protein Data Bank to your local DaliLite database, because PDB identifiers start with a number and don't clash with the internal AF-DB identifiers. If you have lots of locally generated structures, you can store them in another data directory, like DAT_special_1/, DAT_special_2, etc.
Create a Blast database for hierarchical search:
makeblastdb -in Digest/AFDB1.fasta -dbtype prot
makeblastdb -in Digest/AFDB2.fasta -dbtype prot
~/DaliLite.v5/bin/dali.pl --hierarchical --oneway --BLAST_DB Digest/AFDB1.fasta \
--pdbfile mystructure.pdb --db Digest/HUMAN.list --repset Digest/HUMAN_70.list \
--dat1 ./ --dat2 ./DAT/ --title "my search" --np 40
Bold parameters refer to the digest of the AlphaFold Database. The hierarchical search is rather slow.
The 70% identity subsets are significantly smaller than the full set mainly in plants (Table 1).
DaliLite imports structures giving them a four-letter identifier. Chains shorter than 30 amino acids are excluded. DaliLite results list both the four-letter identifier and the original file name, which is based on the Uniprot accession number. Note that DaliLite detects structural similarities between compact, globular domains. Searches with non-compact and non-globular AlphaFold models yield no hits with significant structural similarity.
AlphaFold Database v.2 contains one million model structures of model species and Swissprot (Table 1). You can map the amino acid sequences of interest to the nearest match in AlphaFold Database v.2 by running BLAST against Digest/AFDB2.fasta. The resulting list of identifiers refers to the ./DAT/ directory which you have already populated with structures from the Digest.
AlphaFold Database v.4 contains 200 million model structures of almost all proteins in the Uniprot database. Here you find instructions for (1) obtaining the Uniprot accession numbers of a given species from Uniprot, (2) downloading the model structures from EBI, (3) importing the model structures to locally installed DaliLite. Create your subsets in a special project directory to avoid clashes with the Digest's name space.
Having created you target database list (mySubset.list), you can run structural commparisons between structures of interest and the target database (take care to point to the correct --dat1 and --dat2 directories):
~/DaliLite.v5/bin/dali.pl --cd1 a000A --db mySubset.list \
--dat1 ./DAT/ --dat2 ./DAT/ --title "a000A against mySubset" --np 40
Short | Scientific name | Common Name | FullSet | Subset70 |
---|---|---|---|---|
AF | AlphaFold Database version 1 | All models | 364717 | 241174 |
ARATH | Arabidopsis thaliana | Arabidopsis | 27400 | 22895 |
CAEEL | Caenorhabditis elegans | Nematode worm | 19645 | 18233 |
CANAL | Candida albicans | C. albicans | 5974 | 5829 |
DANRE | Danio rerio | Zebrafish | 24640 | 20023 |
DICDI | Dictyostelium discoideum | Dictyostelium | 12620 | 11484 |
DROME | Drosophila melanogaster | Fruit fly | 13432 | 13074 |
ECOLI | Escherichia coli | E. coli | 4301 | 4174 |
HUMAN | Homo sapiens | Human | 23332 | 18899 |
LEIIN | Leishmania infantum | L. infantum | 7924 | 7708 |
MAIZE | Zea mays | Maize | 39220 | 27990 |
METJA | Methanocaldococcus jannaschii | M. jannaschii | 1773 | 1740 |
MOUSE | Mus musculus | Mouse | 21558 | 18146 |
MYCTU | Mycobacterium tuberculosis | M. tuberculosis | 3979 | 3896 |
ORYSJ | Oryza sativa | Asian rice | 43581 | 38243 |
PLAF7 | Plasmodium falciparum | P. falciparum | 5186 | 5016 |
RAT | Rattus norvegicus | Rat | 21254 | 18017 |
SCHPO | Schizosaccharomyces pombe | Fission yeast | 5124 | 4961 |
SOYBN | Glycine max | Soybean | 55693 | 31054 |
STAA8 | Staphylococcus aureus | S. aureus | 2882 | 2812 |
TRYCC | Trypanosoma cruzi | T. cruzi | 19053 | 9255 |
YEAST | Saccharomyces cerevisiae | Budding yeast | 6019 | 5615 |
AFDB2 | AlphaFold Database version 2 | 992000 | 701022 | |
swissprot | swissprot | 571708 | 201968 | |
AJECG | Ajellomyces capsulatus | 9172 | 9142 | |
BRUMA | Brugia malayi | 8719 | 7007 | |
CAMJE | Campylobacter jejuni | 1580 | 1572 | |
9EURO1 | Cladophialophora carrionii | 11113 | 11103 | |
DRAME | Dracunculus medinensis | 10895 | 10504 | |
ENTFC | Enterococcus faecium | 2798 | 2697 | |
9EURO2 | Fonsecaea pedrosoi | 12473 | 12401 | |
HAEIN | Haemophilus influenzae | 1665 | 1605 | |
HELPY | Helicobacter pylori | 1569 | 1487 | |
KLEPH | Klebsiella pneumoniae | 5754 | 5573 | |
9PEZI1 | Madurella mycetomatis | 9504 | 9216 | |
MYCLE | Mycobacterium leprae | 1572 | 1556 | |
MYCUL | Mycobacterium ulcerans | 8955 | 7732 | |
NEIG1 | Neisseria gonorrhoeae | 2060 | 1991 | |
9NOCA1 | Nocardia brasiliensis | 8292 | 8189 | |
ONCVO | Onchocerca volvulus | 12015 | 11560 | |
PARBA | Paracoccidioides lutzii | 8767 | 8699 | |
PSEAE | Pseudomonas aeruginosa | 5445 | 5186 | |
SALTY | Salmonella typhimurium | 4698 | 4384 | |
SCHMA | Schistosoma mansoni | 13821 | 9302 | |
SHIDS | Shigella dysenteriae | 3934 | 3445 | |
SPOS1 | Sporothrix schenckii | 8629 | 8606 | |
STRR6 | Streptococcus pneumoniae | 1990 | 1913 | |
STRER | Strongyloides stercoralis | 12781 | 11934 | |
TRITR | Trichuris trichiura | 9677 | 9040 | |
TRYB2 | Trypanosoma brucei | 8464 | 7983 | |
WUCBA | Wuchereria bancrofti | 12694 | 12394 |
id | short | original file | chain B start | |
---|---|---|---|---|
cwy4 | DANRE | AF-A0A0R4II06-F1-model_v1 | 1303 | |
c8mq | RAT | AF-F1M5Q4-F1-model_v1 | 742 | |
e020 | HUMAN | AF-P02751-F1-model_v1 | 999 | |
fcn8 | HUMAN | AF-O75369-F1-model_v1 | 1035 | |
fh10 | TRYCC | AF-Q4CU46-F1-model_v1 | 1226 | |
fiaz | TRYCC | AF-Q4CTN6-F1-model_v1 | 1195 | |
finb | TRYCC | AF-Q4DVS3-F1-model_v1 | 1200 | |
fjig | TRYCC | AF-Q4DFV2-F1-model_v1 | 1200 | |
fjyd | TRYCC | AF-Q4CRW2-F1-model_v1 | 1218 | |
flus | TRYCC | AF-Q4CTC1-F1-model_v1 | 1120 | |
flw0 | TRYCC | AF-Q4DH14-F1-model_v1 | 1228 | |
fned | TRYCC | AF-Q4CSQ4-F1-model_v1 | 1151 | |
fnld | TRYCC | AF-Q4CY82-F1-model_v1 | 1243 | |
fop7 | TRYCC | AF-Q4CST2-F1-model_v1 | 1208 | |
fpak | TRYCC | AF-Q4D802-F1-model_v1 | 1240 | |
fpjj | TRYCC | AF-Q4CZ74-F1-model_v1 | 1250 | |
fpnd | TRYCC | AF-Q4CTR3-F1-model_v1 | 1209 | |
fqdk | TRYCC | AF-Q4CX92-F1-model_v1 | 1248 | |
frwd | TRYCC | AF-Q4CUD3-F1-model_v1 | 1186 | |
frx2 | TRYCC | AF-Q4CX06-F1-model_v1 | 1250 | |
fsm1 | TRYCC | AF-Q4CXH5-F1-model_v1 | 1250 | |
ftc2 | TRYCC | AF-Q4CSF7-F1-model_v1 | 1132 | |
fuhh | TRYCC | AF-Q4CSJ3-F1-model_v1 | 1231 | |
fusl | TRYCC | AF-Q4CT27-F1-model_v1 | 1236 | |
fuuc | TRYCC | AF-Q4CTS2-F1-model_v1 | 1230 | |
fu0u | TRYCC | AF-Q4CVN2-F1-model_v1 | 1210 | |
fvnn | TRYCC | AF-Q4D1P3-F1-model_v1 | 1252 | |
f91q | MOUSE | AF-Q8BTM8-F1-model_v1 | 1064 | |
ggud | MOUSE | AF-Q8R4Y4-F1-model_v1 | 1126 | |
ij5j | PSEAE | AF-Q9I2M3-F1-model_v2 | 1406 | |
iqli | BRUMA | AF-A0A5S6P8V9-F1-model_v2 | 964 | |
is2z | BRUMA | AF-A0A5S6P8X1-F1-model_v2 | 1150 | |
j0n4 | DRAME | AF-A0A158Q5A1-F1-model_v2 | 917 | |
o43k | swissprot | AF-P15921-F1-model_v2 | 1210 | |
ofty | swissprot | AF-Q8X8V7-F1-model_v2 | 1137 | |
v862 | TRITR | AF-A0A077Z2J4-F1-model_v2 | 1008 |