TopDomain dataset

Mulnaes, Daniel; Golchin, Pegah; Koenig, Filip; Gohlke, Holger

TopDomain dataset

dc.contributor.author	Mulnaes, Daniel
dc.contributor.author	Golchin, Pegah
dc.contributor.author	Koenig, Filip
dc.contributor.author	Gohlke, Holger
dc.date.accessioned	2021-02-06T13:01:38Z
dc.date.available	2021-02-06T13:01:38Z
dc.date.issued	2021
dc.description.abstract	This is the TopDomain dataset as described in: "TopDomain: Exhaustive Protein Domain Boundary Meta-Prediction Combining Multi-Source Information and Deep Learning" by Daniel Mulnaes, Pegah Golchin, Filip Koenig, and Holger Gohlke. This dataset contains two folder: training_set : Contains the fasta files of the TopDomain training set; test_set: Contains the fasta files of the TopDomain test set. Each fasta file has a header with three fields, in the following format: ">system_name\|domain_type\|boundary_list". Where: system_name contains the PDB ID and chain ID of the target protein; domain_type contains target type, either single-domain or multi-domain; boundary_list contains a list of residues annotated as domain boundaries separated by spaces, this field is empty for single-domain proteins as they have no domain boundaries. The sequence is the fasta-sequence of the protein, each line contains at most 100 residues of the protein sequence. No protein in the test set shares more than 20% sequence identity to any protein in the training set.	en
dc.identifier.uri	https://researchdata.hhu.de/handle/entry/85
dc.identifier.uri	http://dx.doi.org/10.25838/d5p-16
dc.language.iso	en	en
dc.publisher	N/A	en
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Protein structure prediction	en
dc.subject	Boundary prediction	en
dc.title	TopDomain dataset	en
dc.type	Dataset	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: topdomain_dataset_1.0.tar.gz
Size:: 1.1 MB
Format:: Unknown data format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 3.32 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Computational Pharmaceutical Chemistry and Molecular Informatics Group