Data for "TopEC: Improved classification of enzyme function by a localized 3D protein descriptor and 3D Graph Neural Networks"
No Thumbnail Available
Date
2024-08-25
Journal Title
Journal ISSN
Volume Title
Publisher
N/A
Abstract
Accurately annotating molecular function of enzymes remains challenging. Computational methods can aid in this and allow for high-throughput annotation. Tools available for inferring enzyme function from general sequence, fold, or evolutionary information are generally successful. However, they can lead to misclassification if for certain sequences a deviation in local structural features influences the function. Here, we present TopEC, a 3D graph neural network based on a localized 3D descriptor to learn chemical reactions of enzymes from (predicted) enzyme structures and predict Enzyme Commission (EC) classes. Using the message passing frameworks from SchNet and DimeNet++, we include distance and angle information to improve the predictive performance compared to regular 2D graph neural networks. We obtained significantly improved EC classification prediction (F-score: 0.72) to 2D GNNs, without fold bias at residue and atomic resolutions and trained networks that can classify both experimental and computationally generated enzyme structures for a vast functional space (> 800 ECs). Our model is robust to uncertainties in binding site locations and similar functions in distinct binding sites. By investigating the importance of each graph node to the predictive performance, we see that TopEC networks learn from an interplay between biochemical features and local shape-dependent features. TopEC is available as a repository, including accompanying data, on github: https://github.com/IBG4-CBCLab/TopEC.
The data in this repository is available under the CC-BY-NC-SA 4.0 license.
Description
Keywords
NATURAL SCIENCES::Chemistry::Biochemistry::Structural biology