Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - Gen-Epix (Genetic Determinants of the Epigenome)

Teaser

The funded project seeks to test the hypothesis that a gross feature of DNA sequence – namely base composition – can be interpreted as a signal by proteins that recognise a simple sequence motif. A screen has revealed numerous candidate proteins, several of which are...

Summary

The funded project seeks to test the hypothesis that a gross feature of DNA sequence – namely base composition – can be interpreted as a signal by proteins that recognise a simple sequence motif. A screen has revealed numerous candidate proteins, several of which are implicated in pluripotency, development and cancer. We chose to follow up one of these proteins, Sall4, which is expressed in stem cells. Postdoctoral researcher, Dr Timo Quante, carried out the screen using mouse stem cells, generated Sall4 mutant cell lines and carried out transcriptomics. His unpublished evidence shows that Sall4 preferentially regulates expression of genes within AT-rich domains of the genome, as predicted by our hypothesis.

SALL4 is of wider interest for several reasons:
• It is mutated in a human skeletal disorder (Okihiro syndrome)
• It is an essential inhibitor of differentiation that safe-guards pluripotency of stem cells
• It is over-expressed in many cancers and is a potential target for anti-cancer therapeutics

Work performed

During the past year we have advanced our understanding of the role of SALL4 as a reader of DNA base composition. We previously identified SALL4 in a screen for proteins in embryonic stem cells (ESCs) that bind to AT-rich DNA. We identified the target sequence as ATATT and showed that it is mediated by a cluster of zinc fingers near the C-terminus of the protein known as ZFC4. Mutations in ZFC4 that abolish A/T motif binding were introduced into ESCs and transcriptome analysis revealed striking up-regulation of genes that are embedded in AT-rich domains. In particular, transcripts associated with the neuronal lineage are increased. While loss of DNA binding by ZFC4 does not affect pluripotency, we observe premature differentiation towards the neuronal lineage, in agreement with the gene expression data. The preliminary conclusion at this stage was that SALL4 restrains expression of differentiation genes, thereby preventing inappropriate differentiation of pluripotent cells.

Final results

Base composition varies across the mammalian genome in a mosaic pattern, whereby long domains of distinct but relatively homogeneous A/T and G/C frequency are interspersed. The mosaic structure of genomes has been appreciated for decades but remains unexplained. Our work has already produced strong evidence that DNA base composition plays an important role in determining cell fate.

We hypothesized that proteins could in theory read base composition by binding to strings of homogeneous A/T sequence, thereby amplifying the relatively subtle differences between domains. We screened for such proteins and identified SALL4, a protein that is expressed in pluripotent stem cells and implicated in a variety of cancers. Loss of SALL4 is long known to cause precocious ESC differentiation, but the underlying mechanism is unknown. We have found that discrete inactivation of the AT-binding zinc finger cluster causes up-regulation of AT-rich genes involved in neuronal differentiation and mimics the severe phenotypes of cells and mice that completely lack SALL4. These results strongly suggest that recognition of AT motifs is at the heart of SALL4 function.

Our studies so far offer a compelling explanation for SALL4 function. They indicate that the base compositional environment of a gene is sampled by SALL4 via the frequency of its A/T recognition motif, resulting in enhanced gene repression where the motif is most abundant. This “blanket” silencing mechanism stabilises the pluripotent state by preventing inappropriate expression of differentiation genes.

More broadly, by showing that the DNA sequence environment of a gene regulates its expression, we provide the first mechanistic evidence that base compositional domains are not merely a biologically irrelevant by-product of genome evolution, but confer a positive selective advantage to the organism.

We suggest that the evolutionary, developmental and disease implications of our future work will uncover parallel mechanisms of wide importance.