Table S5 from Hidden in plain sight: what remains to be discovered in the eukaryotic proteome?
datasetposted on 07.02.2019 by Valerie Wood, Antonia Lock, Midori A. Harris, Kim Rutherford, Jürg Bähler, Stephen G. Oliver
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The first decade of genome sequencing stimulated an explosion in the characterization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model organisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes (BP). To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences. We use a simple yet powerful metric based on gene ontology BP terms to define characterized and uncharacterized proteins for human, budding yeast and fission yeast. We then identify a set of conserved but unstudied proteins in S. pombe, and classify them based on a combination of orthogonal attributes determined by large-scale experimental and comparative methods. Finally, we explore possible reasons why these proteins remain neglected, and propose courses of action to raise their profile and thereby reap the benefits of completing the catalogue of proteins’ biological roles.