Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 3 - lincSAFARI (Sequence and Function Relationships in Long Intervening Noncoding RNAs)

Teaser

The central dogma in molecular biology is that information stored in DNA is transcribed into RNA mainly to serve as a template for production of proteins, that carry out most cellular functions. It is now clear that many regions in the human genome also give rise to a range of...

Summary

The central dogma in molecular biology is that information stored in DNA is transcribed into RNA mainly to serve as a template for production of proteins, that carry out most cellular functions. It is now clear that many regions in the human genome also give rise to a range of processed and regulated transcripts that do not appear to code for functional proteins. A subset of these are long (>200 nucleotides), processed RNAs transcribed collectively called long noncoding RNAs or lncRNAs. The recent estimates are that the human genome encodes >10,000 such distinct lncRNAs, many of which show tissue-specific activity and are frequently dysregulated in human disease, including neurodegeneration.

Given the growing number of lncRNAs implicated in human disease or required for proper development, fundamental questions that need to be addressed are: Which lincRNAs are functional? How is functional information encoded in the lncRNA sequence? Is this information interpreted in the context of the mature or the nascent RNA? What are the identities and functional roles of specific sequence domains within lncRNA genes?

Our main hypothesis is that many lncRNA loci play key roles in gene regulation during cell differentiation, both via functionally important transcription events and post-transcriptionally, through the combined action of multiple short sequence domains. We will test this hypothesis using three complementary approaches – comparative genomics, detailed perturbations in mammalian cells followed by quantitative molecular phenotyping, and high-throughput screens for sequences able to carry out specific functions.

We use an interdisciplinary approach combining computational, molecular and stem cell biology. Our methodology is scalable, allowing us to tackle completely uncharacterized long RNAs and eventually zoom in and study their individual bases. The understanding of which functions are carried out by lncRNAs in key processes, and even more importantly, how those functions are carried out is crucial for the eventual use of these molecules as potential therapeutic targets, or as drugs.

Work performed

We have studied the functions and modes of action of long noncoding RNAs (lncRNAs) using both computational and experimental approaches and have achieved substantial progress on all three aims specified in our proposal. Specifically:

Aim 1: Compare lincRNAs across vertebrates and predict lincRNA families and their functional domains.
Early in the beginning project we have completed a thorough comparison of lncRNAs expressed in 17 species, including 16 verbrates and the sea urchin, and developed computational methodology for comparison of lncRNAs accross species (Hezroni et al., Cell Reports 2015). We have since been improving and expanding different . The improvements of the pipeline for lncRNA identification using RNA-seq data (PLAR) were used in several collaborations with other labs seeking lncRNAs in their systems, including the inner ear (Ushakov et al., Scientific Reports 2017). Other components of the computational infrastructure we developed were used for more broad applications, such as analysis of promoter usage (Tamarkin et al., eLife 2017).
In the next phase of studies of lncRNA evolution, we focused on the evolutionary origins of lncRNAs, and found that ~5% of the lncRNAs shared between mammals can be traced back to protein-coding genes that lost their coding potential before the rise of mammals. These lncRNAs have important functional aspects, such as broader and higher expression levels, that set them apart from other lncRNAs. This study was published last year (Hezroni et al, Genome Biology 2017). I also published a review on lncRNA evolution (Ulitsky, Nature Reviews Genetics 2016).
We also studied, as proposed, the search of repeated domains as another strategy for identification of functional domains within lncRNA. With this approach we identified and dissected the NORAD lncRNA, which contains 12 repeated domains ~300 nt each, which we studied extensively both computationally (comparing domains within and between species) and experimentally. One manuscript on this topic was published (Tichon et al., Nature Communications 2016) and another is accepted for publication (Tichon, Perry et al., Genes & Development, In press).

Aim 2: Characterize lincRNAs acting during human neurogenesis.
As proposed in the grant, we established a system of differentiation of mouse embryonic stem cells (mESCs) towards neuronal progenitor cells (NPCs) and mature neurons. We tested a large number of different alternatives for perturbation methods in this system, and eventually settled on the use of shRNAs introduced through lentiviral infection at the mESC stage, prior to differentiation, and the use of CRISPR/Cas9 for genome editing in the cels. With these approaches, we attempted to knockdown eight different lncRNAs and tested the efficiency of the knockdown at different stages of differentiation and the phenotype of the cells. For four lncRNAs we observed effects on differentiation. For four of these lncRNAs we get robust and different changes in differentiation, which we now profiled using RNA-seq at different stages. We see that different transcriptional dysregulations underlie these changes. We are now working on characterizing the mechanism of action for two of the lncRNAs, both of which appear to act in cis via either up-regulating or repressing their adjacent protein-coding genes. For these lncRNAs, we also established KO lines, as well as shRNA/KO lines for the adjacent protein-coding genes. We also established lentiviral expression vectors, for the perturb-and-rescue experiments, as proposed, and we are currently testing the effect of expressing these rescue vectors during differentiation.

Aim 3: Identify lincRNA sequences capable of specific activities and determine their sequence-function landscape at single-base resolution.
We established a high-throughput system for testing which fragments of lncRNA sequence are capable of carrying out specific functions. For the first interrogated function, we focused on the ab

Final results

We plan to continue studying the functions and modes of action of lncRNAs using our three-pronged approach:

Aim 1: We are developing methods for detailed comparison of sequences of orthologous lncRNAs accross species. Give the sequence of a lncRNA from multiple species, these methods will home in on specific elements that are conserved. We are also going to explore whether secondary structures appear to be preserved during the rapid evolution of lncRNA loci, and continue our efforts to characterize repeated structures in lncRNAs, with a particular focus on the NORAD lncRNA.

Aim 2: We will continue to characterize the ways that lncRNAs act during neurogenesis, with a particular focus on rescue experiments, dissection of the sequence elements that drive lncRNA function, and identification of protein binding partners of lncRNAs using pulldown approaches.

Aim 3: We will extend the use of massively parallel reporter assays for identifying tiles of RNA sequence capable of carrying out specific aspects of lncRNA mechanisms, such stability, and activation/repression of promoters.