Abstract
AIRE is an unconventional transcription factor that enhances the expression of thousands of genes in medullary thymic epithelial cells and promotes clonal deletion or phenotypic diversion of self-reactive T cells1,2,3,4. The biological logic of AIRE’s target specificity remains largely unclear as, in contrast to many transcription factors, it does not bind to a particular DNA sequence motif. Here we implemented two orthogonal approaches to investigate AIRE’s cis-regulatory mechanisms: construction of a convolutional neural network and leveraging natural genetic variation through analysis of F1 hybrid mice5. Both approaches nominated Z-DNA and NFE2–MAF as putative positive influences on AIRE’s target choices. Genome-wide mapping studies revealed that Z-DNA-forming and NFE2L2-binding motifs were positively associated with the inherent ability of a gene’s promoter to generate DNA double-stranded breaks, and promoters showing strong double-stranded break generation were more likely to enter a poised state with accessible chromatin and already-assembled transcriptional machinery. Consequently, AIRE preferentially targets genes with poised promoters. We propose a model in which Z-DNA anchors the AIRE-mediated transcriptional program by enhancing double-stranded break generation and promoter poising. Beyond resolving a long-standing mechanistic conundrum, these findings suggest routes for manipulating T cell tolerance.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All sequencing data reported in this Article have been deposited as a SuperSeries at the GEO under accession code GSE224557. Specifically, the ATAC–seq, BLISS, ChIP–seq, ChIPmentation, CUT&Tag, bulk RNA-seq and scRNA-seq data are available under accession codes GSE224551, GSE224552, GSE224553, GSE224554, GSE224555, GSE224556 and GSE253215, respectively. Public datasets used in this article are as follows: GSE92594 (ATAC–seq for WT and Aire-KO mTECs), GSE92597 (RNA Pol II, AIRE and IgG ChIP–seq for WT mTECs), GSE180937 (MED1 and IgG ChIP–seq for WT mTECs) and GSE102526 (ATAC–seq for mTECs from Brg1-WT and Brg1-cKO mice). Source data are provided with this paper.
Code availability
Code and scripts used in this study are available at Zenodo (https://doi.org/10.5281/zenodo.10472904).
References
Anderson, M. S. et al. Projection of an immunological self shadow within the thymus by the aire protein. Science 298, 1395–1401 (2002).
Sansom, S. N. et al. Population and single-cell genomics reveal the Aire dependency, relief from Polycomb silencing, and distribution of self-antigen expression in thymic epithelia. Genome Res. 24, 1918–1931 (2014).
Meredith, M., Zemmour, D., Mathis, D. & Benoist, C. Aire controls gene expression in the thymic epithelium with ordered stochasticity. Nat. Immunol. 16, 942–949 (2015).
Brennecke, P. et al. Single-cell transcriptome analysis reveals coordinated ectopic gene-expression patterns in medullary thymic epithelial cells. Nat. Immunol. 16, 933–941 (2015).
van der Veeken, J. et al. Natural genetic variation reveals key features of epigenetic and transcriptional memory in virus-specific CD8 T cells. Immunity. 50, 1202–1217 (2019).
Novakovsky, G. et al. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat. Rev. Genet. 24, 125–137 (2023).
Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).
Link, V. M. et al. Analysis of genetically diverse macrophages reveals local and domain-wide mechanisms that control transcription factor binding and function. Cell 173, 1796–1809 (2018).
Bansal, K., Yoshida, H., Benoist, C. & Mathis, D. The transcriptional regulator Aire binds to and activates super-enhancers. Nat. Immunol. 18, 263–273 (2017).
Rodriguez-Martinez, J. A. et al. Combinatorial bZIP dimers display complex DNA-binding specificity landscapes. eLife. 6, e19272 (2017).
Rich, A., Nordheim, A. & Wang, A. H. The chemistry and biology of left-handed Z-DNA. Annu. Rev. Biochem. 53, 791–846 (1984).
Georgakopoulos-Soares, I. et al. High-throughput characterization of the role of non-B DNA motifs on promoter function. Cell Genom. 2, 100111 (2022).
Umerenkov, D. et al. Z-flipon variants reveal the many roles of Z-DNA and Z-RNA in health and disease. Life Sci. Alliance 6, e202301962 (2023).
Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).
Kouzine, F. et al. Permanganate/S1 nuclease footprinting reveals non-B DNA structures with regulatory potential across a mammalian genome. Cell Syst. 4, 344–356 (2017).
Liu, R. et al. Regulation of CSF1 promoter by the SWI/SNF-like BAF complex. Cell 106, 309–318 (2001).
Zhang, J. et al. BRG1 interacts with Nrf2 to selectively mediate HO-1 induction in response to oxidative stress. Mol. Cell. Biol. 26, 7942–7952 (2006).
Liu, H., Mulholland, N., Fu, H. & Zhao, K. Cooperative activity of BRG1 and Z-DNA formation in chromatin remodeling. Mol. Cell. Biol. 26, 2550–2559 (2006).
Shin, S. I. et al. Z-DNA-forming sites identified by ChIP-seq are associated with actively transcribed regions in the human genome. DNA Res. 23, 477–486 (2016).
Marshall, P. R. et al. Dynamic regulation of Z-DNA in the mouse prefrontal cortex by the RNA-editing enzyme Adar1 is required for fear extinction. Nat. Neurosci. 23, 718–729 (2020).
Fotsing, S. F. et al. The impact of short tandem repeat variation on gene expression. Nat. Genet. 51, 1652–1659 (2019).
Zhang, T. et al. ADAR1 masks the cancer immunotherapeutic promise of ZBP1-driven necroptosis. Nature 606, 594–602 (2022).
Thomas, T. J., Gunnia, U. B. & Thomas, T. Polyamine-induced B-DNA to Z-DNA conformational transition of a plasmid DNA with (dG-dC)n insert. J. Biol. Chem. 266, 6137–6141 (1991).
Brooks, W. H. Increased polyamines alter chromatin and stabilize autoantigens in autoimmune diseases. Front. Immunol. 4, 91 (2013).
Wang, G. & Vasquez, K. M. Z-DNA, an active element in the genome. Front. Biosci. 12, 4424–4438 (2007).
Meng, Y. et al. Z-DNA is remodelled by ZBTB43 in prospermatogonia to safeguard the germline genome and epigenome. Nat. Cell Biol. 24, 1141–1153 (2022).
Pommier, Y., Sun, Y., Huang, S. N. & Nitiss, J. L. Roles of eukaryotic topoisomerases in transcription, replication and genomic stability. Nat. Rev. Mol. Cell Biol. 17, 703–721 (2016).
Puc, J. et al. Ligand-dependent enhancer activation regulated by topoisomerase-I activity. Cell 160, 367–380 (2015).
Madabhushi, R. et al. Activity-induced DNA breaks govern the expression of neuronal early-response genes. Cell 161, 1592–1605 (2015).
Pessina, F. et al. Functional transcription promoters at DNA double-strand breaks mediate RNA-driven phase separation of damage-response factors. Nat. Cell Biol. 21, 1286–1299 (2019).
Sperling, A. S., Jeong, K. S., Kitada, T. & Grunstein, M. Topoisomerase II binds nucleosome-free DNA and acts redundantly with topoisomerase I to enhance recruitment of RNA Pol II in budding yeast. Proc. Natl Acad. Sci. USA 108, 12693–12698 (2011).
Shykind, B. M. et al. Topoisomerase I enhances TFIID-TFIIA complex assembly during activation of transcription. Genes Dev. 11, 397–407 (1997).
Abramson, J., Giraud, M., Benoist, C. & Mathis, D. Aire’s partners in the molecular control of immunological tolerance. Cell 140, 123–135 (2010).
Guha, M. et al. DNA breaks and chromatin structural changes enhance the transcription of autoimmune regulator target genes. J. Biol. Chem. 292, 6542–6554 (2017).
Canela, A. et al. Genome organization drives chromosome fragility. Cell 170, 507–521 (2017).
Giraud, M. et al. Aire unleashes stalled RNA polymerase to induce ectopic gene expression in thymic epithelial cells. Proc. Natl Acad. Sci. USA 109, 535–540 (2012).
Oven, I. et al. AIRE recruits P-TEFb for transcriptional elongation of target genes in medullary thymic epithelial cells. Mol. Cell. Biol. 27, 8815–8823 (2007).
Durand-Dubief, M. et al. Topoisomerase I regulates open chromatin and controls gene expression in vivo. EMBO J. 29, 2126–2134 (2010).
Creemers, G. J., Lund, B. & Verweij, J. Topoisomerase I inhibitors: topotecan and irenotecan. Cancer Treat. Rev. 20, 73–96 (1994).
Maruyama, A., Mimura, J., Harada, N. & Itoh, K. Nrf2 activation is associated with Z-DNA formation in the human HO-1 promoter. Nucleic Acids Res. 41, 5223–5234 (2013).
Koh, A. S. et al. Rapid chromatin repression by Aire provides precise control of immune tolerance. Nat. Immunol. 19, 162–172 (2018).
Michelson, D. A. et al. Thymic epithelial cells co-opt lineage-defining transcription factors to eliminate autoreactive T cells. Cell 185, 2542–2558 (2022).
Michelson, D. A. & Mathis, D. Thymic mimetic cells: tolerogenic masqueraders. Trends Immunol. 43, 782–791 (2022).
Givony, T. et al. Thymic mimetic cells function beyond self-tolerance. Nature 622, 164–172 (2023).
Giraud, M. et al. An RNAi screen for Aire cofactors reveals a role for Hnrnpl in polymerase release and Aire-activated ectopic transcription. Proc. Natl Acad. Sci. USA 111, 1491–1496 (2014).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. NAACL-HLT (eds Burstein, J., Doran, C. & Solorio, T.) 4171–4186 (Association for Computational Linguistics, 2019).
Avsec, Z. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Kelley, D. R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 16, e1008050 (2020).
Moore, J. E. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Forrest, A. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Hendrycks, D. & Gimpel, K. Gaussian error linear units (GELUs). Preprint at arxiv.org/abs/1606.08415 (2020).
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, Las Vegas, 2016).
Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint at arxiv.org/abs/1312.6034 (2014).
Grant, C. E. & Bailey, T. L. XSTREME: comprehensive motif analysis of biological sequence datasets. Preprint at bioRxiv https://doi.org/10.1101/2021.09.02.458722 (2021).
Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME suite. Nucleic Acids Res. 43, W39–W49 (2015).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Cer, R. Z. et al. Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucleic Acids Res. 41, D94–D100 (2013).
Derbinski, J. et al. Promiscuous gene expression patterns in single medullary thymic epithelial cells argue for a stochastic mechanism. Proc. Natl Acad. Sci. USA 105, 657–662 (2008).
Peterson, P., Org, T. & Rebane, A. Transcriptional regulation by AIRE: molecular mechanisms of central tolerance. Nat. Rev. Immunol. 8, 948–957 (2008).
Gardner, J. M. et al. Deletional tolerance mediated by extrathymic Aire-expressing cells. Science 321, 843–847 (2008).
Huang, S. et al. A novel multi-alignment pipeline for high-throughput sequencing data. Database 2014, bau057 (2014).
van der Veeken, J. et al. The transcription factor Foxp3 shapes regulatory T cell identity by tuning the activity of trans-acting intermediaries. Immunity. 53, 971–984 (2020).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 27, 2987–2993 (2011).
de Santiago, I. et al. BaalChIP: Bayesian analysis of allele-specific transcription factor binding in cancer genomes. Genome Biol. 18, 39 (2017).
Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics. 27, 1017–1018 (2011).
Bailey, T. L. STREME: accurate and versatile sequence motif discovery. Bioinformatics. 37, 2834–2840 (2021).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29, 15–21 (2013).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009).
Satija, R. et al. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).
Yoshida, H. et al. The cis-regulatory atlas of the mouse immune system. Cell 176, 897–912 (2019).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. https://doi.org/10.14806/ej.17.1.200 (2015).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Qunhua, L., James, B. B., Haiyan, H. & Peter, J. B. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5, 1752–1779 (2011).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 841–842 (2010).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Ramirez, F. et al. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014).
Shen, L., Shao, N., Liu, X. & Nestler, E. ngs.plot: quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genom. 15, 284 (2014).
Gothe, H. J. et al. Spatial chromosome folding and active transcription drive DNA fragility and formation of oncogenic MLL translocations. Mol. Cell 75, 267–283 (2019).
Yan, W. X. et al. BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks. Nat. Commun. 8, 15058 (2017).
Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Neph, S. et al. BEDOPS: high-performance genomic feature operations. Bioinformatics. 28, 1919–1920 (2012).
Möller, A. et al. Monoclonal antibodies recognize different parts of Z-DNA. J. Biol. Chem. 257, 12081–12085 (1982).
Schmidl, C., Rendeiro, A. F., Sheffield, N. C. & Bock, C. ChIPmentation: fast, robust, low-input ChIP-seq for histones and transcription factors. Nat. Methods 12, 963–965 (2015).
Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).
Kaya-Okur, H. S. et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun. 10, 1930 (2019).
Bansal, K. et al. Aire regulates chromatin looping by evicting CTCF from domain boundaries and favoring accumulation of cohesin on superenhancers. Proc. Natl Acad. Sci. USA 118, e2110991118 (2021).
Acknowledgements
We thank A. Baysov, J. Lee, I. Magill and the members of the Broad Genomics Platform for RNA-seq and scRNA-seq; the staff at the HMS Biopolymers Facility for all other sequencing; the members of the HMS Immunology Flow Core; L. Du and the staff at the HMS Transgenic Mouse Core; K. Hattori and A. Ortiz-Lopez for experimental assistance; L. Yang and B. Vijaykumar for computational help; C. Laplace for graphics; K. Chowdhary and D. Michelson for discussions; M. Anderson for providing the NOD mice with Aire-driven expression of IGRP–GFP; and A. Herbert for drawing our attention to the Z22 monoclonal antibody. This work was supported by NIH grant R01AI088204 (to D.M.). Y.F. is in part supported by the Harvard Molecules, Cells and Organisms Training Program. K.B. is supported by the Department of Biotechnology/Wellcome Trust India Alliance Intermediate Fellowship (IA/I/19/1/504276).
Author information
Authors and Affiliations
Contributions
Y.F. and D.M. conceived the study. Y.F. designed and performed all experiments except for the Pol II ChIP–seq. Y.F. performed all data analysis with supervision from D.M., C.B. and S.M. K.B. performed the Pol II ChIP–seq of mTECs from Aire-KO mice. Y.F. and D.M. wrote the manuscript, which was edited by all of the authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks Alan Herbert and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Performance of the pre-trained CNN model.
a, Schematics of the CNN model for pre-training and fine-tuning. The first section of the main body is comprised of convolutional layers to extract relevant DNA sequence motifs. The following section has repeated blocks containing dilated convolutional layers with residual skip connections, to spread information and model long-range interactions in the input DNA sequences. AIRE-induced and expression-matched AIRE-neutral gene lists have been described3,9. Briefly, AIRE-induced genes were defined as Aire+/+/Aire−/− > 2 and AIRE-neutral genes were Aire+/+/Aire−/− > 0.9 and <1.1, based on bulk RNA-seq data from ref. 3. b, Exemplar true versus predicted profiles over a randomly selected sequence from the test set. Profiles for 15 sequencing datasets are shown. c, Boxplot showing the Pearson correlations between the predicted and true sequencing profiles of test set sequences for the sequencing datasets used for the pre-training, including DNase-seq, ATAC-seq and ChIP-seq. d, Model evaluation on four datasets: a test set and validation set from the B6 genome, and two test sets from the NOD genome: (1) containing SNPs/Indels compared with counterparts in the B6 genome, and (2) derived from NOD-specific genes (in order to prevent data leakage during prediction). e, Bar plot comparing the performances of randomly initialized model and pre-trained model on the test set from the B6 genome. SNPs: single-nucleotide polymorphisms; Indels: insertions and deletions; AIG, AIRE-induced gene; ANG: AIRE-neutral gene.
Extended Data Fig. 2 Additional analysis using the fine-tuned CNN model and Z-DNABERT, related to Fig. 1.
a, Contribution score profiles for AIRE-induced genes whose largest-positive-gradient regions contained (CA)n repeats (left) or NFE2–MAF-binding motifs (right). b, Motifs enriched in the regions (50 bp in length) with the largest ISM scores. c, Motifs that are relatively more enriched in extended-promoter sequences of AIRE-induced genes than AIRE-neutral genes (E-value < 0.05). The MEME suite was used to identify the enriched motifs for panel b and c. d, Example ISM score heatmaps for (CA)n repeats in AIRE-induced gene promoters. Each of the three rows shows results for one possible substitution in the order of A- > C- > G- > T from top to bottom. Red (positive ISM score) indicates that substitution of the original nucleotide leads to a decreased average Z-DNA score across the stretch of the (CA)n repeat; Blue (negative ISM score) indicates the other way. e, Boxplot showing the distribution of ISM scores at various positions near the boundaries of (CA)n repeats in AIRE-induced gene promoters. For example, position 2 indicates the second nucleotides from both ends of a (CA)n repeat. p-values for panel e were calculated using the one-sample Wilcoxon Signed Rank Test (one-tailed). AIG, AIRE-induced gene.
Extended Data Fig. 3 Strain-specific gene expression in mTECs was predominantly driven by cis-regulation.
a,b, Cytofluorometric gating scheme for isolation of mTECs from B6 (a), NOD and F1 (b) mice. c, Rationale of the F1-hybrid analysis. d, Allelically imbalanced gene transcripts and OCRs in B6×NOD F1-hybrid mTECs. Red dots depict significantly imbalanced events with a false discovery rate (FDR) < 0.05. e, Correlation between the fold-changes in accessibility for mTEC OCRs in B6 versus NOD mice (x axis) and fold-changes in accessibility for OCRs (n = 23256) on the B6 versus NOD allele in mTECs of F1 hybrids (y axis). Red dots depict significantly imbalanced OCRs (n = 3750). f, Correlation between the fold-changes in transcript levels for mTECs from B6 versus NOD mTECs (x axis) and transcript fold-changes for the B6 versus NOD allele in mTECs of F1 hybrids (y axis). Only genes significantly differentially expressed in B6 and NOD mTECs are shown (adjusted p-value < 0.05, n = 248). g, Correlation between allelic biases in the expression of the nearest AIRE-induced gene (x axis) and in the accessibility of the OCR (y axis). The imbalanced OCR was assigned to an imbalanced AIRE-induced gene (n = 248) if 1) the OCR was located within 50,000 bp of the gene’s TSS and 2) the AIRE-induced gene was the OCR’s nearest gene. There were 156 imbalanced OCRs assigned to imbalanced AIGs. p-values according to panels e and g were from Spearman’s correlation. FC: fold-change. OCR: open chromatin region; TF: transcription factor.
Extended Data Fig. 4 Exemplar genetic variants associated with allelic imbalances in chromatin accessibility and gene expression.
a,b, Examples of genetic variants of NFE2L2-binding motifs associated with imbalanced expression of AIRE-induced genes. c,d, Examples of genetic variants of Z-DNA motifs associated with allelic imbalances in the expression of an AIRE-induced gene. OCR: open chromatin region; WT: wild-type; Aire-KO: Aire knockout.
Extended Data Fig. 5 Effect of spermidine on Z-DNA formation and thymic cell populations.
a, Density plot showing the effect of spermidine vs control PBS injection on Z-DNA intensity in mTECs measured by flow cytometry using an anti-Z-DNA antibody. b, Representative cytofluorimetric plots and quantitative summary for mTECs of WT and Aire-KO mice treated with spermidine or control PBS. c, Cytofluorometric gating scheme for analyses of thymocyte compartments. d, Analogous plots to panel b for thymocyte compartments. Error bars, mean ± s.e.m. from n = 3 biological replicates. KO: Aire-KO.
Extended Data Fig. 6 Additional analysis of scRNA-seq of PBS-treated versus spermidine-treated mTECs, related to Fig. 3.
a, Log2 ratio (M-values) versus log2 average (A-values) plots showing the effect of spermidine treatment on mTECs from wild-type mice. Red dots depict spermidine-specific AIRE-induced genes (FC > 2, p-value < 0.05). Dark grey dots depict AIRE-induced transcripts shared between mice injected with spermidine and PBS. b, Per-replicate UMAPs of scRNA-seq of mTEChi and post-AIRE mTEClo for PBS-treated and spermidine-treated Aire-WT mice. Each dot on the UMAPs is a single cell (n = 3184). Each number on the UMAPs indicates a cluster identified using Seurat. c, Merged UMAPs of scRNA-seq of mTEChi and post-AIRE mTEClo from PBS-treated (n = 2 biological replicates) and spermidine-treated (n = 2 biological replicates) Aire-WT mice. mTEC subtypes were labeled. d, UMAPs showing the expression of Aire (left) and one MHC Class II gene (right).
Extended Data Fig. 7 Additional analysis of BLISS in mTECs, related to Fig. 4.
a, Boxplot comparing BLISS signals at Z-DNA ChIP-seq peaks that had low, medium or high Z-DNA ChIP-seq signals in mTECs from Aire-WT mice. Low: <25th percentile (n = 1508); Medium: 25th - 75th percentile (n = 3008); High: > 75th percentile (n = 1506). b, Boxplot comparing Aire-KO BLISS signals at promoters of AIRE-inducible (n = 1563) and expression-matched AIRE-neutral genes in Aire-KO mTECs (n = 1907). AIRE-inducible and AIRE-neutral genes were weakly expressed genes (Aire-KO TPM < 0.35) whose promoter DSB generation was detected by BLISS in mTECs from Aire-KO mice. c, Boxplot comparing the enrichment of Z-DNA motifs (left) at DSB hotspots upregulated by spermidine (n = 97) versus those unaffected by spermidine (n = 78). Analogous plot for CTCF-binding motifs is shown on the right. The number of motifs was normalized according to the length of the DSB hotspots. d, Correlation between genetic variation in CTCF-binding motifs and allelic imbalance in DSB generation. Individual lines indicate DSB hotspots with a stronger CTCF-binding motif match on the B6 allele (red, n = 60) or on the CAST allele (blue, n = 59). e, Analogous plot for (CA)n repeats (n = 38 for B6 and n = 51 for CAST). p-values for panels a-c were calculated using the Wilcoxon rank sum test (two-tailed), and for panels d and e using the Kolmogorov-Smirnov (KS) test (two-tailed).
Extended Data Fig. 8 Promoters of AIRE-induced genes were poised for expression prior to the engagement of AIRE.
a, Boxplot comparing the ATAC-seq and Pol II ChIP-seq signals at promoters of AIRE-induced genes (n = 1563) versus those at expression-matched ANGs (n = 1907) in mTECs from Aire-KO mice. b, Exemplar DNA and chromatin profiles of AIRE-induced genes poised for expression in mTECs from Aire-KO mice. In comparison, exemplar profiles for an ANG were shown on the right. c, Same as Fig. 4d except WT ATAC-seq and ChIP-seq signals. p-values in panels a and c were calculated using the Wilcoxon rank sum test (two-tailed). C&T: CUT&Tag; L: Low, n = 747; M: Medium, n = 2130; H: High, n = 322; *: p-value < 1e-10; **: p-value < 1e-20. Data for WT and Aire-KO ATAC-seq, WT Pol II and AIRE ChIP-seq came from ref. 9. Data for WT MED1 ChIP-seq came from ref. 94.
Extended Data Fig. 9 NFE2L2 may cooperate with Z-DNA to poise AIRE-induced genes for expression.
a, Boxplots comparing the Aire-KO BLISS signals at AIRE-induced gene promoter DSB hotspots containing varying numbers of NFE2L2-binding motifs (left) and CTCF-binding motifs (right). For NFE2L2-binding motifs: Low: <=1 (n = 598); Medium: 2–5 (n = 258); High: >5 (n = 87). For CTCF-binding motifs: Low: <=1 (n = 468); Medium: 2–4 (n = 404); High: >4 (n = 71). b, Boxplots comparing the enrichment of NFE2L2-binding motifs at DSB hotspots up-regulated by spermidine (n = 159) versus those unaffected by spermidine (n = 104). The number of motifs was normalized according to the length of the DSB hotspots. c, Density plots and heatmaps showing distributions of Z-DNA motifs (top) and NFE2L2-binding motifs (bottom) at DSB hotspots in promoters (n = 6884). Grey areas depict 95% confidence intervals. d, Boxplot comparing the lengths of Z-DNA motifs at OCRs unchanged (n = 282) or up-regulated (n = 356) by BRG1 (See Methods) in mTECs. e, De novo motif analysis for OCRs unchanged versus up-regulated by BRG1. f, MA plot (log2-scale) showing the expressions of NFE2-related factors in mTECs from Nfe2l2-KO and Ctrl mice. g, Representative cytofluorimetric plots and quantitative summary for mTECs from Nfe2l2-KO and Ctrl mice. Error bars, mean ± s.e.m. from n = 3 biological replicates. h, Volcano plots showing the expression of AIRE-induced genes and ANGs that contain high-confidence NFE2L2-binding motifs at promoters in mTECs from Nfe2l2-KO and Ctrl mice. i, Differentially expressed AIRE-induced genes and ANGs (p-value < 0.05) between mTECs from Nfe2l2-KO and Ctrl mice. p-values for panels a-b and d were calculated using the Wilcoxon rank sum test (two-tailed), and for panel h using the Fisher’s exact test (two-tailed). L: Low; M: Medium; H: High; CI: confidence interval. Nfe2l2-KO: Foxn1Cre-Nfe2l2flox/flox; Ctrl: control, Foxn1Cre-Nfe2l2+/+.
Extended Data Fig. 10 Manipulation of Z-DNA formation, DSB generation or Nfe2l2 expression affected the expression of signature genes of mTEC mimetic cells.
a, KEGG pathway analysis (adjusted p-value < 0.05) for differentially expressed genes (p-value < 0.05, fold-change > 2, n = 745) between mTECs from Nfe2l2-KO and Ctrl mice. b, Network plot showing the significantly enriched downregulated KEGG pathways and the associated genes. c, Expression of lineage-defining TFs42 in WT versus Aire-KO AIRE-stage mTECs (log2 scale). d, MA plots (log2 scale) highlighting expression changes of signatures genes of several mimetic mTEC subtypes in mTECs from Nfe2l2-KO versus Ctrl mice. Red dots depict signature genes of the corresponding subtypes. e-f, Analogous plots showing the impact of spermidine and topotecan, respectively. Signature gene lists used were available at https://github.com/dmichelson/mimetic_cells/tree/main/scrna-seq/adult-neonate/mimetic-cell-signatures. p-values for panels d-f were calculated using the Fisher’s exact test (two-tailed).
Extended Data Fig. 11 A model of Z-DNA’s influence on AIRE target choice.
Independent of AIRE, Z-DNA formation is more likely to occur at the promoters of genes having Z-DNA motifs but not under robust TF-mediated transcriptional control. NFE2L2 and other unknown factors would engage BRG1 or other chromatin remodelers to stabilize the energetically unfavorable Z-DNA formation. Z-DNA would enhance DSB generation at the promoters of genes subject to AIRE induction, which would facilitate their poising, thereby promoting the recruitment of and induction of gene expression by AIRE.
Supplementary information
Supplementary Information
Supplementary Discussion, Supplementary Notes, legends for the Supplementary Tables and Supplementary References.
Supplementary Table 1
Motifs enriched in the regions with the largest positive gradients of AIRE-induced genes. Motifs listed in the table were identified by the XSTREME program (combined results of MEME and STREME) of MEME suite with E < 0.05.
Supplementary Table 2
Z-DNA motifs at promoters of AIRE-induced and AIRE-neutral genes. Z-DNA motifs were identified on the basis of the criteria of the NIH non-B search program.
Supplementary Table 3
Intrareplicate correlations of various sequencing experiments. List of the mouse genotypes, cell types, treatments, replicate numbers, replicate correlation and numbers of uniquely mapped reads for bulk sequencing experiments generated in this study.
Supplementary Table 4
Quality-control data for scRNA-seq. For each sample, the number of cells retained, the median percentage of mitochondrial counts per cell, the median number of unique RNA features per cell and the median number of RNA molecules per cell are shown.
Supplementary Table 5
BLISS adapter sample barcodes. BLISS adapter sample barcode sequences were provided for individual sequencing experiments.
Supplementary Table 6
Z-DNA motifs at BRG1-upregulated OCRs and unchanged OCRs. Z-DNA motifs were identified on the basis of the criteria of the NIH non-B search program.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fang, Y., Bansal, K., Mostafavi, S. et al. AIRE relies on Z-DNA to flag gene targets for thymic T cell tolerization. Nature 628, 400–407 (2024). https://doi.org/10.1038/s41586-024-07169-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-024-07169-7
This article is cited by
-
Z-DNA marks the spot for AIRE
Cell Research (2024)
-
AIRE targets poised promoters enriched for Z-DNA
Nature Reviews Genetics (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.