S1: DNA-binding protein predictions
Integrating sequence, gene expression and multi-‐scope annotations improves DNA-‐binding protein predictions
JawaharlalNehru University, New Delhi
DNA-‐binding proteins form an important class of proteins of which many members remain unknown. Amino acid sequence contains useful information about identifying novel DBPs but remains inadequate due to many false positives and also because the size of the binding sites varies widely across DBPs. On the other hand the current annotations of DBPs are available with various scope and confidence levels, posing challenges for suitable machine learning models. We have shown that integrating multi-‐level annotations, global gene expression profiles and sequence information into predictive models can account for many DBPs, which could not be predicted by individual approaches.