• de / en
  • Heinrich-Heine-Universität
  • Research data management (RDM)
Logo Heinrich Heine Universität DüsseldorfLogo Heinrich Heine Universität Düsseldorf
  •  Search
    • Browse DSpace
  •  Log In
    Register Forgotten Password
  1. Home
  2. Browse by Author

Browsing by Author "Golchin, Pegah"

Filter results by typing the first few letters
Now showing 1 - 2 of 2
  • Results Per Page
  • Sort Options
  • No Thumbnail Available
    Item
    TopDomain dataset
    (N/A, 2021) Mulnaes, Daniel; Golchin, Pegah; Koenig, Filip; Gohlke, Holger
    This is the TopDomain dataset as described in: "TopDomain: Exhaustive Protein Domain Boundary Meta-Prediction Combining Multi-Source Information and Deep Learning" by Daniel Mulnaes, Pegah Golchin, Filip Koenig, and Holger Gohlke. This dataset contains two folder: training_set : Contains the fasta files of the TopDomain training set; test_set: Contains the fasta files of the TopDomain test set. Each fasta file has a header with three fields, in the following format: ">system_name|domain_type|boundary_list". Where: system_name contains the PDB ID and chain ID of the target protein; domain_type contains target type, either single-domain or multi-domain; boundary_list contains a list of residues annotated as domain boundaries separated by spaces, this field is empty for single-domain proteins as they have no domain boundaries. The sequence is the fasta-sequence of the protein, each line contains at most 100 residues of the protein sequence. No protein in the test set shares more than 20% sequence identity to any protein in the training set.
  • No Thumbnail Available
    Item
    TopDomain dataset v2.0
    (N/A, 2021-05-01) Mulnaes, Daniel; Golchin, Pegah; Koenig, Filip; Gohlke, Holger
    This is the TopDomain dataset v2.0 as described in: "TopDomain: Exhaustive Protein Domain Boundary Meta-Prediction Combining Multi-Source Information and Deep Learning" by Daniel Mulnaes, Pegah Golchin, Filip Koenig, and Holger Gohlke. This dataset contains three folder: dataset : Contains the full dataset and the TopDomain and TopDomainSeq predictions for the dataset training_set : Contains the fasta files of the TopDomain training set test_set : Contains the fasta files of the TopDomain test set Each fasta file has a header with three fields, in the following format: >system_name|domain_type|boundary_list Where: system_name contains the PDB ID and chain ID of the target protein domain_type contains target type, either single-domain or multi-domain boundary_list contains a list of residues annotated as domain boundaries separated by spaces, this field is empty for single-domain proteins as they have no domain boundaries The sequence is the fasta-sequence of the protein each line contains at most 100 residues of the protein sequence No protein in the test set shares more than 20% sequence identity to any protein in the training set.
  • Contact
  • Imprint
  • Privacy statement
© 2025    Heinrich-Heine-Universität Düsseldorf