Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 2 - ImmRisk (Defining how environmental factors influence downstream effects of immune-mediated disease risk-SNPs)

Teaser

Problem being addressed and the importance for society:Some people are genetically at risk for developing a specific disease. However, not everybody who is genetically at risk, will automatically develop the disease. One of the reasons for this is that in addition to genetic...

Summary

Problem being addressed and the importance for society:

Some people are genetically at risk for developing a specific disease. However, not everybody who is genetically at risk, will automatically develop the disease. One of the reasons for this is that in addition to genetic factors, environmental factors can influence whether a disease will develop. The aim of this project is to determine which genetic and environmental factors are involved in the development of complex diseases, and understand their downstream molecular mechanisms. Understanding how and why these diseases develop, will allow for the identification of high risk individuals in which preventive measurements can be taken to reduce their risk of developing specific diseases (e.g. prevent exposure to the identified environmental factors). Moreover, this knowledge may aid in the identification of new drug targets for these diseases.

The overall objectives of this project are:
1) Identify the downstream consequences of genetic variants that are associated with immune-mediated diseases.
2) Identify which environmental stimuli alter the downstream molecular effects of genetic variants that are associated with immune-mediated disease, through the generation of genotype and single-cell RNA-seq data on blood cells from ~120 individuals that have been stimulated with ~3 different pathogens.
3) Identify other environmental risk factors that influence downstream molecular effects of these genetic variants that are associated with immune-mediated disease, by re-analyzing genotype and RNA-seq data from >20,000 samples, generated in the presence and absence of many different (disease) stimuli, to identify those conditions (e.g. bacterial infections, detectable using the RNA-seq data itself).

Work performed

Objective 1:

We have set up a consortium, eQTLGen Consortium (http://www.ludesign.nl/eqtlgen/), to perform the largest eQTL meta-anlysis to date, encompassing 31,684 whole blood samples from 37 individual RNA expression datasets. This allowed us to identify cis-eQTL effects for 88% and trans-eQTL effects for ~29% of blood-expressed genes. In addition, we calculated polygenic risk scores for 1,267 complex traits and correlated those with gene expression levels (ePRS analysis). We observed a number of significant associations, e.g the polygenic risk score for HDL cholesterol levels was associated with the expression of genes known to play a role in lipid metabolism (e.g. ABCA1, ABCG1) and familial hypercholesterolemia (e.g. LDLR).


Objective 2:

Pilot data has been generated and analysed to determine the feasibility of the approach:
- Genotype and scRNA-seq data on unstimulated blood cells from 45 individuals has been generated. Main results have been recently published in Nature Genetics [van der Wijst et al., 2018], and data is shared with the scientific community (https://molgenis58.target.rug.nl/scrna-seq/).
- Genotype and scRNA-seq data on Candida Albicans-stimulated (fungus) blood cells from 6 individuals has been generated.
- Optimal conditions have been determined to perform the pathogen stimulations.

In progress:
- Invitation/collection of blood cells from ~120 additional individuals.

Objective 3:

We have developed a pipeline to automatically download public RNAseq fastq files, align them to a reference genome, and call genotypes. The pipeline has been tested and validated on 4002 samples from BBMRI - BIOS, a Dutch biobank. Up to now we have performed some processing steps that are required before the actual analysis can be performed: ~10,000 public RNAseq BAM files have been processed up until genotype calling.

Final results

Progress beyond the state of the art:
- ePRS analysis: polygenic risk scores were calculated for 1,267 complex traits and these were correlated with gene expression levels (eQTLGen Consortium).
- Performed one of the first, and largest single cell eQTL analysis to date (Nature Genetics, 2018: doi: 10.1038/s41588-018-0089-9).
- Novel methodology in single cell data to identify co-expression QTLs, i.e., identification of genetic variants that affect the co-expression of two genes (Nature Genetics, 2018: doi: 10.1038/s41588-018-0089-9).

Expected results until the end of the project:
Objective 1:
- Context-dependent eQTL analysis in ~32,000 whole blood samples: identification of eQTLs that are modulated by a specific context, which could be, for example, a specific cell type or the expression of another gene.

Objective 2:
- Identification of cell type-specific and environmental-dependent eQTLs and co-expression QTLs.
- Generation of personalized, context-dependent gene regulatory networks.
- Greater understanding how genetics and environment interact with each other in the context of health and disease.

Objective 3:
- Identification of environmental factors that modulate specific disease risk.

Website & more info

More info: https://molgenis58.target.rug.nl/scrna-seq/.