Original paper licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer
Genetic studies have long identified thousands of small variations in our DNA that are associated with an increased risk of developing diseases. Most of these variations do not sit inside the genes that provide instructions for building proteins. Instead, they sit in the regulatory regions of the genome. These regions act like switches that control when, where, and how much a gene is turned on or off. Because these switches can be specific to certain types of cells, a genetic variation might increase disease risk in a lung cell but have no effect in a skin cell.
The researchers in this paper developed a computational framework called Single-Cell ATAC-seq Disease Score (SCADS) to bridge this gap. The goal is to connect these genetic risk signals to specific cell types by looking at which parts of the DNA are accessible for regulation in individual cells.
To do this, the researchers used a technique called single-cell ATAC-sequencing. This method identifies which parts of the DNA are "open" or accessible in a single cell, indicating that those regions are active regulatory switches. The SCADS framework works in three steps. First, it uses a mathematical model to group these open DNA regions into "topics." These topics represent sets of regulatory elements that work together to control specific biological programs. Second, the framework calculates how much the known genetic risk signals for a specific disease are concentrated within these topics. Third, it combines the activity of these topics within individual cells to produce a disease score for every single cell.
The authors tested SCADS using simulations and found that it identifies disease-relevant cells with higher accuracy and fewer false positives than existing methods. They also demonstrated that the scores are calibrated, meaning they can be compared across different datasets and different diseases.
When the researchers applied SCADS to autoimmune diseases, they found that disease relevance is not uniform across all cells of a single type. For example, in inflammatory bowel disease, the researchers found that different subsets of CD8+ T cells and different types of colon epithelial cells carried different levels of disease risk. By using SCADS, the researchers were able to pinpoint the specific gene programs and genetic variants that drive these differences. The paper describes SCADS as a scalable and modular framework for connecting noncoding genetic variation to the identity and function of individual cells.
Drowning in papers in your field?
Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.