Data for "TopEC: Improved classification of enzyme function by a localized 3D protein descriptor and 3D Graph Neural Networks"
dc.contributor.author | van der Weg, Karel | |
dc.contributor.author | Merdivan, Erinc | |
dc.contributor.author | Piraud, Marie | |
dc.contributor.author | Gohlke, Holger | |
dc.date.accessioned | 2024-09-02T18:38:39Z | |
dc.date.available | 2024-09-02T18:38:39Z | |
dc.date.issued | 2024-08-25 | |
dc.description.abstract | Accurately annotating molecular function of enzymes remains challenging. Computational methods can aid in this and allow for high-throughput annotation. Tools available for inferring enzyme function from general sequence, fold, or evolutionary information are generally successful. However, they can lead to misclassification if for certain sequences a deviation in local structural features influences the function. Here, we present TopEC, a 3D graph neural network based on a localized 3D descriptor to learn chemical reactions of enzymes from (predicted) enzyme structures and predict Enzyme Commission (EC) classes. Using the message passing frameworks from SchNet and DimeNet++, we include distance and angle information to improve the predictive performance compared to regular 2D graph neural networks. We obtained significantly improved EC classification prediction (F-score: 0.72) to 2D GNNs, without fold bias at residue and atomic resolutions and trained networks that can classify both experimental and computationally generated enzyme structures for a vast functional space (> 800 ECs). Our model is robust to uncertainties in binding site locations and similar functions in distinct binding sites. By investigating the importance of each graph node to the predictive performance, we see that TopEC networks learn from an interplay between biochemical features and local shape-dependent features. TopEC is available as a repository, including accompanying data, on github: https://github.com/IBG4-CBCLab/TopEC. The data in this repository is available under the CC-BY-NC-SA 4.0 license. | |
dc.identifier.uri | https://researchdata.hhu.de/handle/entry/176 | |
dc.identifier.uri | https://doi.org/10.25838/d5p-66 | |
dc.language.iso | en | |
dc.publisher | N/A | |
dc.rights.license | CC BY NC SA 4.0 | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject | NATURAL SCIENCES::Chemistry::Biochemistry::Structural biology | |
dc.subject.ddc | 500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::572 Biochemie | |
dc.title | Data for "TopEC: Improved classification of enzyme function by a localized 3D protein descriptor and 3D Graph Neural Networks" | |
dc.type | Dataset |
Files
Original bundle
1 - 5 of 6
No Thumbnail Available
- Name:
- supplemental_data.zip
- Size:
- 1.14 MB
- Format:
- Archives using the ZIP Format
- Description:
- Supplemental data. The data is available under the CC-BY-NC-SA 4.0 license.
No Thumbnail Available
- Name:
- source_data.zip
- Size:
- 63.37 MB
- Format:
- Archives using the ZIP Format
- Description:
- Source data. The data is available under the CC-BY-NC-SA 4.0 license.
No Thumbnail Available
- Name:
- TopEC-main.zip
- Size:
- 432.14 MB
- Format:
- Archives using the ZIP Format
- Description:
- TopEC code & respository. The data is available under the CC-BY-NC-SA 4.0 license.
No Thumbnail Available
- Name:
- trained_networks.zip
- Size:
- 397.32 MB
- Format:
- Archives using the ZIP Format
- Description:
- Trained networks. The data is available under the CC-BY-NC-SA 4.0 license.
No Thumbnail Available
- Name:
- all_structures.tar.gz
- Size:
- 27.75 GB
- Format:
- Unknown data format
- Description:
- Raw structures. The data is available under the CC-BY-NC-SA 4.0 license.
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 3.4 KB
- Format:
- Item-specific license agreed upon to submission
- Description: