Data for "TopEC: Improved classification of enzyme function by a localized 3D protein descriptor and 3D Graph Neural Networks"

dc.contributor.authorvan der Weg, Karel
dc.contributor.authorMerdivan, Erinc
dc.contributor.authorPiraud, Marie
dc.contributor.authorGohlke, Holger
dc.date.accessioned2024-09-02T18:38:39Z
dc.date.available2024-09-02T18:38:39Z
dc.date.issued2024-08-25
dc.description.abstractAccurately annotating molecular function of enzymes remains challenging. Computational methods can aid in this and allow for high-throughput annotation. Tools available for inferring enzyme function from general sequence, fold, or evolutionary information are generally successful. However, they can lead to misclassification if for certain sequences a deviation in local structural features influences the function. Here, we present TopEC, a 3D graph neural network based on a localized 3D descriptor to learn chemical reactions of enzymes from (predicted) enzyme structures and predict Enzyme Commission (EC) classes. Using the message passing frameworks from SchNet and DimeNet++, we include distance and angle information to improve the predictive performance compared to regular 2D graph neural networks. We obtained significantly improved EC classification prediction (F-score: 0.72) to 2D GNNs, without fold bias at residue and atomic resolutions and trained networks that can classify both experimental and computationally generated enzyme structures for a vast functional space (> 800 ECs). Our model is robust to uncertainties in binding site locations and similar functions in distinct binding sites. By investigating the importance of each graph node to the predictive performance, we see that TopEC networks learn from an interplay between biochemical features and local shape-dependent features. TopEC is available as a repository, including accompanying data, on github: https://github.com/IBG4-CBCLab/TopEC. The data in this repository is available under the CC-BY-NC-SA 4.0 license.
dc.identifier.urihttps://researchdata.hhu.de/handle/entry/176
dc.identifier.urihttps://doi.org/10.25838/d5p-66
dc.language.isoen
dc.publisherN/A
dc.rights.licenseCC BY NC SA 4.0
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectNATURAL SCIENCES::Chemistry::Biochemistry::Structural biology
dc.subject.ddc500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::572 Biochemie
dc.titleData for "TopEC: Improved classification of enzyme function by a localized 3D protein descriptor and 3D Graph Neural Networks"
dc.typeDataset

Files

Original bundle

Now showing 1 - 5 of 6
No Thumbnail Available
Name:
supplemental_data.zip
Size:
1.14 MB
Format:
Archives using the ZIP Format
Description:
Supplemental data. The data is available under the CC-BY-NC-SA 4.0 license.
No Thumbnail Available
Name:
source_data.zip
Size:
63.37 MB
Format:
Archives using the ZIP Format
Description:
Source data. The data is available under the CC-BY-NC-SA 4.0 license.
No Thumbnail Available
Name:
TopEC-main.zip
Size:
432.14 MB
Format:
Archives using the ZIP Format
Description:
TopEC code & respository. The data is available under the CC-BY-NC-SA 4.0 license.
No Thumbnail Available
Name:
trained_networks.zip
Size:
397.32 MB
Format:
Archives using the ZIP Format
Description:
Trained networks. The data is available under the CC-BY-NC-SA 4.0 license.
No Thumbnail Available
Name:
all_structures.tar.gz
Size:
27.75 GB
Format:
Unknown data format
Description:
Raw structures. The data is available under the CC-BY-NC-SA 4.0 license.

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.4 KB
Format:
Item-specific license agreed upon to submission
Description: