Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - IdrSeq (Discovery and characterization of functional disordered regions and the genes involved in their regulation through next generation sequencing)

Teaser

If DNA is the blueprint of life, proteins are the building blocks. Thus understanding the molecular basis of life requires a deep knowledge of proteins that make up all organisms. A theory of proteins allows one to interpret a genome, and thus holds great promise for human...

Summary

If DNA is the blueprint of life, proteins are the building blocks. Thus understanding the molecular basis of life requires a deep knowledge of proteins that make up all organisms. A theory of proteins allows one to interpret a genome, and thus holds great promise for human health. The biochemical studies pioneered by Anfinsen in the 1960s and the development of powerful methods (e.g. crystallography) established the sequence-structure-function paradigm. With the availability of completely sequenced genomes, it has become clear that a large fraction of any eukaryotic genome (>40%) encodes protein segments that do not autonomously fold into a defined tertiary structure (although they may contain secondary structural elements) and thus do not directly follow Anfinsen’s postulate. These regions are commonly referred to as intrinsically disordered regions (IDRs). IDRs are enriched in critical functions such as transcription and signaling, and have been linked with numerous diseases including neurodegeneration and cancer. Despite their importance and In contrast to structured regions, the molecular principles behind the sequence-function relationship of IDRs remain poorly understood. Therefore it is critical to understand what makes certain disordered regions functional and why mutations in certain IDRs lead to disease.

Intrinsically disordered regions are now to researchers as to what the first few protein structures were to biologists half a century ago. We have witnessed the knowledge and impact on science and human health of the structure-function paradigm in the last 50 years. If this is only half the story, it brings to question the enormous possibilities and the potential of understanding the disorder-function paradigm that remains to be tapped for bettering human health and revolutionizing medicine. For these reasons, we believe that targeted research on intrinsically disordered proteins will have significant, sustained and long-term impact in science and human health.

The overall objectives of this proposal is to identify and characterize functional IDRs in cells, and to discover genes involved in their regulation using yeast as a cellular model. We proposed to develop and apply a targeted, high-throughput, multiplexed approach that we call IdrSeq (for Intrinsically disordered region Sequencing). This has been achieved now and published in an open access journal (Ravarani et al, MSB 2018). Specifically, using IdrSeq, we aim to discover and characterize IDRs that can
(Aim 1) function in transcriptional activation, and discover genes that modulate transcriptional activity
(Aim 2) influence protein stability, and discover genes involved in regulating half-life and
(Aim 3) form higher-order assemblies and discover genes that regulate assembly formation

The unique feature of this proposal is its integrative vision of synthetic & systems biology, (un)structural biology, cell biology, genetics, experiments and computation to establish a discovery platform to study IDRs in a cellular context. Since IdrSeq is modular and scalable, it can be readily extended to investigate a broad range of IDR functions, and adapted to other organisms. Elucidating the principles of sequence-function-gene relationship of IDRs holds enormous potential for synthetic biology. The discovery of genes that regulate IDR function has direct implications for human health by revealing novel therapeutic targets.

Work performed

Aim 1: Identify and characterize IDRs that function as transcriptional activators, and discover genes that modulate IDR mediated transcriptional activity.

Problem: Mutations and altered regulation of specific transcription factors (TFs) lie at the heart of many diseases. TFs have a modular architecture: they have a region to (a) recognize DNA sequences, (b) oligomerise and (c) interact and recruit components of the transcriptional machinery, called trans-activation domain (TADs). While we understand and can engineer DNA binding domains to very high precision, how TADs modulate activation remains poorly understood. Pioneering studies in the 1980s and more recent studies have identified short unstructured, low-complexity proteins segments (~20 amino acids) and peptide motifs within some TFs as being required for transcription initiation. However, no study has systematically assayed sequences at such a fine resolution, and characterized them in vivo.

Since the beginning of the project, we have developed and presented IDR-Screen, a framework to discover functional IDRs in a high-throughput manner by simultaneously assaying large numbers of DNA sequences that code for short disordered sequences. Functionality-conferring patterns in their protein sequence have been inferred through statistical learning. Using yeast HSF1 transcription factor-based assay, we have discovered IDRs that function as transactivation domains (TADs) by screening a random sequence library and a designed library consisting of variants of 13 diverse TADs. Using machine learning, we have discovered that segments devoid of positively charged residues but with redundant short sequence patterns of negatively charged and aromatic residues are a generic feature for TAD functionality. We could use this rule to design new sequences with increased strength of transactivation. We also used this approach to discover the impact of polymorphisms seen in the natural population as well as cancer genomes of the human transcription activation domains. We anticipate that investigating defined sequence libraries using IDR-Screen for specific functions (specifically for Aim 2 and 3) can facilitate discovering novel and functional regions of the disordered proteome as well as understand the impact of natural and disease variants in disordered segments.

The work and the dataset have been published as an Article in the open access journal Molecular Systems Biology (Ravarani et al, MSB 2018). The work was featured as a cover image in the journal and the journal also carried a news and views highlighting the importance of our work. The paper was also identified as Exceptional by F1000.
Please see:
F1000: https://f1000.com/prime/733243496
News and Views: http://msb.embopress.org/content/14/5/e8377
Cover: http://msb.embopress.org/content/14/5.cover-expansion

Aim 2: Identify and characterize IDRs that influence protein stability, and discover genes that regulate IDR mediated protein stability.

Problem: Protein degradation is the end point of gene expression. Correct protein turnover is necessary for cellular function and altered half-life results in a number of human diseases. We and others have shown recently that the length and composition of certain IDRs influence protein half-life either by making them better substrates for the proteasome, or by containing ubiquitination sites or docking motifs that are recognized by the ubiquitin ligases. However, no study has comprehensively assayed IDRs (and the associated genes) at such a fine resolution to identify those that can regulate half-life in vivo.

In the last couple of years, our team has assayed a viral proteome using the IDR-Screen approach and have discovered regions that act as strong degrons. More importantly, we are now performing followup screens to discover the ubiquitin ligases that regulate the degron activity. The work describing the project will be written up for publication next year.

Aim 3: Identify IDRs that can

Final results

While we have a deep understanding of how structured domains carry out their function, the sequence-function relationship of IDRs remains poorly understood. IDRs are emerging to be important for diverse cellular functions and are involved in a number of human diseases. Yet, we did not have a reliable high-throughput approach that allows the investigation of IDRs in a cellular context.

Thanks to the generous funding through this ERC consolidator grant, we have now developed a targeted, high-throughput, multiplexed technology called Idr-Screen. The unique aspect of this approach is its integration of synthetic & systems biology, (un)structural biology, cell biology, genetics, and experiments and computation to establish a discovery platform to identify and characterise functional disordered regions directly in a cellular context. Given the emerging importance of IDRs and a newfound understanding of their biomedical relevance, the approach we have developed in this project can be and is being readily extended to investigate a broad range of functions of IDRs in a cellular context. For the first reporting period, we have already generated high-resolution data on sequence-function relationship of IDRs that can function as transactivation domains, which is fuelling development of methods for investigation of protein function, and interpretation of genome sequence of disordered regions.

For the next reporting period, we hope to be able to discover the interactions mediated by the discovered degrons in a viral proteome (Aim 2) as well as proteins that modulate higher order assembly formation (Aim 3).

Website & more info

More info: https://www.mrc-lmb.cam.ac.uk/genomes/madanm/IDR-Screen/.