TopDomain dataset v2.0
dc.contributor.author | Mulnaes, Daniel | |
dc.contributor.author | Golchin, Pegah | |
dc.contributor.author | Koenig, Filip | |
dc.contributor.author | Gohlke, Holger | |
dc.date.accessioned | 2021-05-01T09:18:27Z | |
dc.date.available | 2021-05-01T09:18:27Z | |
dc.date.issued | 2021-05-01 | |
dc.description.abstract | This is the TopDomain dataset v2.0 as described in: "TopDomain: Exhaustive Protein Domain Boundary Meta-Prediction Combining Multi-Source Information and Deep Learning" by Daniel Mulnaes, Pegah Golchin, Filip Koenig, and Holger Gohlke. This dataset contains three folder: dataset : Contains the full dataset and the TopDomain and TopDomainSeq predictions for the dataset training_set : Contains the fasta files of the TopDomain training set test_set : Contains the fasta files of the TopDomain test set Each fasta file has a header with three fields, in the following format: >system_name|domain_type|boundary_list Where: system_name contains the PDB ID and chain ID of the target protein domain_type contains target type, either single-domain or multi-domain boundary_list contains a list of residues annotated as domain boundaries separated by spaces, this field is empty for single-domain proteins as they have no domain boundaries The sequence is the fasta-sequence of the protein each line contains at most 100 residues of the protein sequence No protein in the test set shares more than 20% sequence identity to any protein in the training set. | en |
dc.identifier.uri | https://researchdata.hhu.de/handle/entry/88 | |
dc.identifier.uri | http://dx.doi.org/10.25838/d5p-19 | |
dc.language.iso | en | en |
dc.publisher | N/A | en |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
dc.subject | Protein structure prediction | en |
dc.subject | Boundary prediction | en |
dc.title | TopDomain dataset v2.0 | en |
dc.type | Dataset | en |