Computational Pharmaceutical Chemistry and Molecular Informatics Group
Permanent URI for this collection
Our research focusses on understanding, predicting, and modulating biomolecular interactions from an atomistic perspective.
Browse
Browsing Computational Pharmaceutical Chemistry and Molecular Informatics Group by Subject "Boundary prediction"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item TopDomain dataset(N/A, 2021) Mulnaes, Daniel; Golchin, Pegah; Koenig, Filip; Gohlke, HolgerThis is the TopDomain dataset as described in: "TopDomain: Exhaustive Protein Domain Boundary Meta-Prediction Combining Multi-Source Information and Deep Learning" by Daniel Mulnaes, Pegah Golchin, Filip Koenig, and Holger Gohlke. This dataset contains two folder: training_set : Contains the fasta files of the TopDomain training set; test_set: Contains the fasta files of the TopDomain test set. Each fasta file has a header with three fields, in the following format: ">system_name|domain_type|boundary_list". Where: system_name contains the PDB ID and chain ID of the target protein; domain_type contains target type, either single-domain or multi-domain; boundary_list contains a list of residues annotated as domain boundaries separated by spaces, this field is empty for single-domain proteins as they have no domain boundaries. The sequence is the fasta-sequence of the protein, each line contains at most 100 residues of the protein sequence. No protein in the test set shares more than 20% sequence identity to any protein in the training set.Item TopDomain dataset v2.0(N/A, 2021-05-01) Mulnaes, Daniel; Golchin, Pegah; Koenig, Filip; Gohlke, HolgerThis is the TopDomain dataset v2.0 as described in: "TopDomain: Exhaustive Protein Domain Boundary Meta-Prediction Combining Multi-Source Information and Deep Learning" by Daniel Mulnaes, Pegah Golchin, Filip Koenig, and Holger Gohlke. This dataset contains three folder: dataset : Contains the full dataset and the TopDomain and TopDomainSeq predictions for the dataset training_set : Contains the fasta files of the TopDomain training set test_set : Contains the fasta files of the TopDomain test set Each fasta file has a header with three fields, in the following format: >system_name|domain_type|boundary_list Where: system_name contains the PDB ID and chain ID of the target protein domain_type contains target type, either single-domain or multi-domain boundary_list contains a list of residues annotated as domain boundaries separated by spaces, this field is empty for single-domain proteins as they have no domain boundaries The sequence is the fasta-sequence of the protein each line contains at most 100 residues of the protein sequence No protein in the test set shares more than 20% sequence identity to any protein in the training set.