Introduction

Angiogenesis refers to the physiology process of forming new blood vessels from existing vascular networks, which is essential for vascular morphogenesis in almost all tissues in the body1. Any defects in angiogenesis can lead to vascular abnormality, including arteriovenous malformation2, congenital heart disease3 and infantile hemangioma4. Such kind of disease is common in children and adults, and may cause damage to health to varying degrees. Abnormal angiogenesis is also the hallmark of cancer and inflammatory and ischemic diseases5. It not only contributes to disease progression but serves as a promising target for drug treatment. Therefore, owing to its crucial role in human health, angiogenesis has gained substantial interest among researchers over the past decade.

With continued efforts and research, tremendous advances have been made in inspecting the complex molecular and genetic mechanisms underlying angiogenesis in human6,7. Particularly, endothelial cell proliferation, adhesion, migration and tube formation are thought to be key events for the angiogenic process7. Such endothelial cell behaviors can be regulated by external signaling components (e.g., BMP68 and DLL49), transcription factors (TFs) (e.g., ETS110 and FOXC111) and epigenetic enzymes (e.g., EP30012 and EZH213), thereinto vascular endothelial growth factor A (VEGFA) exemplifies one of the most powerful regulators. As an extracellular signaling factor, VEGFA activates a cascade of endothelial cell gene transcription and controls almost every stage of angiogenesis in human physiology and diseases14.

In recent years, cis-regulation has emerged as an important mechanism of controlling gene expression in embryonic development, which requires the combinatorial interplay of TFs with defined cis-regulatory elements in the genome15,16. There is no unified definition of cis-regulatory elements yet, but in most cases they comprise promoters, enhancers, silencers and insulators17. Notably, comprehensive chromatin and epigenetic landscapes act as efficient tools for genome-wide characterization of cis-regulatory elements18,19. It is extremely noteworthy that cis-regulatory elements vary among different cell-types and tissues20, implying the necessity of compendium analysis of such elements during human embryo morphogenesis. Up till now, cis-regulatory elements have been carefully mapped in several human organs across early developmental time points, including the brain21, heart22 and adipose23. But few studies have concentrated on the field of angiogenesis, so that cis-regulatory elements associated with this process (hereafter referred to as ‘angiogenic cis-regulatory elements’) still remain obscure.

In this study, to profile angiogenic cis-regulatory elements, we conducted an integrated analysis of the transcriptome and chromatin accessibility in a VEGFA-stimulated in vitro angiogenic model, as shown in Fig. 1. We generated a bank of 47,125 angiogenic cis-regulatory elements with promoter and/or enhancer activities. The angiogenic cis-regulatory elements were all located outside ‘gene desert’ regions and enriched for motifs of angiogenesis-relevant TFs. Using this bank, we performed a post exome-wide association study (EWAS) of total anomalous pulmonary venous connection (TAPVC) and found rs199530718 as a novel cis-regulatory single nucleotide polymorphism (SNP). These results provide a general landscape of cis-regulation in angiogenesis, and demonstrate the utility of angiogenic cis-regulatory elements in elucidating the genetics of vascular abnormality.

Fig. 1
figure 1

Overall workflow of this study.

Results

Examining temporal transcriptome changes in VEGFA-stimulated HUVECs

The VEGFA-induced stimulation of human umbilical vein endothelial cells (HUVECs) (hereafter referred to as ‘VEGFA-HUVEC angiogenic model’) is an excellent in vitro system for studying the cis-regulation of angiogenesis10,24,25,26. To profile angiogenic cis-regulatory elements, we retrieved public genetic and epigenetic sequencing datasets of HUVECs before (H-0) and after VEGFA stimulation for 1 (H-1), 4 (H-4) and 12 (H-12) h (Fig. 2a)10,24. As open chromatin regions shared a similar genome-wide distribution with cis-regulatory elements27, we performed correlation analysis of the DNase-seq tag densities among these four stages to determine the relevance of their chromatin accessibility. Intriguingly, the entire stimulation process could be grouped into two time periods, that is, the early (H-0 and H-1) and late (H-4 and H-12) periods (Fig. 2b). This result, which had never been reported in VEGFA-HUVEC angiogenic model, demonstrated that the chromatin accessibility of HUVECs underwent temporal changes when stimulated by VEGFA. Given the landscapes of chromatin accessibility and gene expression were reciprocal causation in in vitro cardiogenesis22, we next examined the temporal transcriptome features of VEGFA-HUVEC angiogenic model. PCA analysis of the retrieved RNA-seq datasets discovered remarkable heterogeneity between the early and late periods of stimulation (Fig. 2c). Collectively, we speculated that in the late stimulation period, VEGFA might reprogrammed, or remodeled, HUVECs into a hitherto undescribed cell type.

Fig. 2
figure 2

Temporal transcriptome dynamics of VEGFA-stimulated HUVECs. (a) Schematic illustration of the overall experimental design. Public genetic and epigenetic sequencing datasets of HUVECs before and after VEGFA stimulation for 1, 4 and 12 h were retrieved for analysis. (b) Correlation heat map of the DNase-seq tag densities in H-0, H-1, H-4 and H-12. Samples with similar chromatin accessibility were highlighted by solid or dashed borders. (c) PCA plot showing the first two components of H-0, H-1, H-4 and H-12. (d) Distribution of DEGs in pairwise comparison. (e) Time series patterns of DEGs during VEGFA stimulation. (f) Dot plots showing enriched GO BP terms of DEGs in Clusters 1 and 2 (left panel) as well as Clusters 3 and 4 (right panel), respectively. (g) Heat map of representative genes of Clusters 1 to 4.

We then screened differentially expressed genes (DEGs) in VEGFA-HUVEC angiogenic model to unveil the nature of H-4 and H-12. There were a total of 839 DEGs identified in the entire stimulation process (Fig. 2d), which was identical with Wang’s report24. These DEGs were further classified into four patterns (215 genes in Cluster 1, 197 in Cluster 2, 262 in Cluster 3, and 175 in Cluster 4) according to time-series clustering analysis (Fig. 2e, Data 1). Genes in Clusters 1 and 2 showed high expression level in the early stimulation period, but were monotonically downregulated in the late period. By contrast, genes in Clusters 3 and 4 were continuously upregulated in the early stimulation period, and showed high expression level in the late period. We thus categorized the genes in Cluster 1 as H-0 specific enriched, Cluster 2 as H-1 specific enriched, Cluster 3 as H-4 specific enriched, and Cluster 4 as H-12 specific enriched, respectively.

Gene Ontology (GO) analysis of Cluster 1 or 2 gene set obtained GO Biological Process (BP) terms closely aligned with endothelial function (e.g., wound healing28, BMP signal pathway29 and inflammatory response30) (Fig. 2f-left panel). Further investigation of these two gene sets identified a DEG subset essential for endothelial identity, such as ZEB231, EDN132 and PROX133 (Fig. 2g, Supplementary Fig. 1a). As for Cluster 3 or 4 gene set, we got GO BP terms related with progenitor cell function (e.g., angiogenesis34, stem cell division and cell differentiation) (Fig. 2f-right panel). Specially, these two gene sets were defined by progenitor cell markers like CD3435, NR5A236 and MEF2C37 (Fig. 2g, Supplementary Fig. 1a). Taken together, our data suggested VEGFA reprogrammed HUVECs into a progenitor-like fate, and H-4 and H-12 exhibited angiogenic transcriptome features.

Temporal transitions in VEGFA-stimulated HUVECs reflected by chromatin accessibility

Considering transcriptome as a readout of the cis-regulatory network, we investigated both the DNase-seq signals and the RNA-seq signals of DEGs at different stages in VEGFA-HUVEC angiogenic model. In Cluster 1 and 2 gene sets, we examined the ZEB2 and SOX7 gene loci due to their crucial roles in maintaining endothelial cell fate31,38. Compared with H-4 and H-12, H-0 and H-1 had higher enrichment of the DNase-seq signals at putative promoters and enhancers at both ZEB2 and SOX7 gene loci, which was consistent with the stages when these two genes were highly expressed (Fig. 3a,b,e,f). The MEF2C and NR5A2 gene loci in Cluster 3 and 4 gene sets were then examined since they participated in pluripotency maintenance36,37. Their putative promoters and enhancers had more enriched DNase-seq signals in H-4 and H-12 than in H-0 or H-1, showing consistency with their respective mRNA expression dynamics (Fig. 3c,d,g,h). From these results, we observed temporal changes in chromatin accessibility related to VEGFA stimulation and correlated with gene transcriptions. The temporal transitions in VEGFA-stimulated HUVECs could be precisely reflected by the epigenetic dynamics. Specifically, the chromatin accessibility landscapes of H-4 and H-12 revealed the cis-regulatory network of angiogenesis.

Fig. 3
figure 3

Temporal chromatin accessibility dynamics of VEGFA-stimulated HUVECs. (ad) Normalized expression levels of ZEB2 (a), SOX7 (b), MEF2C (c) and NR5A2 (d) during VEGFA stimulation, respectively. Data represented means ± SEM (n = 2 per group). e-h Normalized epigenetic and expression profiles at the ZEB2 (e), SOX7 (f), MEF2C (g) and NR5A2 (h) loci during VEGFA stimulation, respectively.

Identifying angiogenic cis-regulatory elements

To identify angiogenic cis-regulatory elements, we searched for open chromatin regions in H-4 and H-12 based on the retrieved DNase-seq datasets. There were 72,113 significant DNase-seq peaks in H-4 as well as 75,280 in H-12, which were recognized as their respective open chromatin regions (Fig. 4a). Then a total of 90,572 angiogenic open chromatin regions was identified by merging the above genomic regions in H-4 and H-12 (Fig. 4b). Of all the angiogenic open chromatin regions, 29,929 (33.1%) were in promoters, 4,952 (5.5%) were in exons, 30,228 (33.4%) were in introns, 4,180 (4.6%) were in UTR5/UTR3, and 21,283 (23.4%) were in intergenic regions (Fig. 4c, Data 2). The genomic distribution of angiogenic open chromatin regions was similar with that of other human tissues39, suggesting our strategy was accurate for profiling such genomic regions.

Fig. 4
figure 4

Epigenetic profiling predicted angiogenic cis-regulatory elements. (a) Read density heat maps showing normalized DNase-seq enrichments in H-4 (left panel) and H-12 (right panel) at their respective peak regions (center ± 2 kb). Peak regions in each heat map were represented as horizontal rows, and ranked by decreasing signal strength. (b) Number of open chromatin regions in H-4, H-12 and the whole late stimulation period. (c) Genomic distribution of angiogenic open chromatin regions. (d,e) Average profiles and read density heat maps showing normalized H3K27ac (d) and H3K4me3 (e) enrichments in H-4 and H-12 at angiogenic open chromatin regions (center ± 2 kb). Black solid and dashed borders were used to highlight genomic regions with and without ChIP-seq signals, respectively. (f) Typical examples of angiogenic open chromatin regions with histone modification of H3K27ac (i), H3K4me3 (ii) and both (iii). (g) Number of angiogenic open chromatin regions with H3K27ac and/or H3K4me3 modifications.

Since most of cis-regulatory elements in vertebrate genomes were enhancers and promoters40, we here confined angiogenic cis-regulatory elements to angiogenic open chromatin regions with enhancer or promoter activity. Public ChIP-seq datasets for two histone modifications, H3K27ac and H3K4me3, in H-4 and H-12 were retrieved in subsequent analysis. These two marks were widely used to label enhancers and promoters, respectively41,42. In H-4, there were 40,186 (44.4%) angiogenic open chromatin regions with H3K27ac enrichment and 22,654 (25.0%) with H3K4me3 enrichment (Fig. 4d,e). In H-12, there were 39,824 (44.0%) angiogenic open chromatin regions with H3K27ac enrichment and 22,010 (24.3%) with H3K4me3 enrichment (Fig. 4d,e). After merging these genomic regions in H-4 and H-12, we found 42,378 angiogenic open chromatin regions with enhancer activity (H3K27ac modification), and 23,745 with promoter activity (H3K4me3 modification) (Fig. 4g).

Noteworthy, most of the above open chromatin regions were monofunctional with either enhancer or promoter activity, as exemplified by the HDAC5 and NOTCH2 gene loci (Fig. 4fi,ii). The rest were bifunctional with both enhancer and promoter activities, as exemplified by the NR5A2 gene locus (Fig. 4fiii). It was in accordance with a previous conclusion that some genomic regions might switch between enhancer and promoter signatures43,44. Thereby, we merged all open chromatin regions with H3K27ac and/or H3K4me3 modifications in H-4 and H-12, and obtained a total of 47,125 angiogenic cis-regulatory elements (Fig. 4g).

Depicting epigenetic signatures of angiogenic cis-regulatory elements

We next compared angiogenic cis-regulatory elements with known features of the human genome. Of all the angiogenic cis-regulatory elements, 20,887 (44.3%) were in promoters, 2,111 (4.5%) were in exons, 14,458 (30.7%) were in introns, 2,361 (5.0%) were in UTR5/UTR3, and 7,308 (15.5%) were in intergenic regions (Fig. 5a, Data 3). Obviously, the vast majority of angiogenic cis-regulatory elements were resided in genome noncoding regions, which conformed to the basic characteristic of regulatory DNA sequences17. On the other hand, we analyzed the genomic locations of angiogenic cis-regulatory elements according to gene annotation. 30,785 (65.3%) angiogenic cis-regulatory elements were found to locate within 5 kb upstream or downstream from their respective nearest TSSs, whereas the rest were distal from their respective neighboring genes (5 kb to 100 kb) (Fig. 5b). None of angiogenic cis-regulatory elements were settled in ‘gene desert’ regions (>500 kb that were devoid of protein coding genes). Collectively, our identified angiogenic cis-regulatory elements were almost noncoding sequences and might regulate gene transcription in angiogenesis via long-range interactions.

Fig. 5
figure 5

General features of angiogenic cis-regulatory elements. (a) Genomic distribution of angiogenic cis-regulatory elements. (b) Distance of each angiogenic cis-regulatory element from its nearest TSS. (c) Significant MSigDB pathways and phenotypes enriched in angiogenic cis-regulatory elements. (d) TF binding motifs enriched in angiogenic cis-regulatory elements.

Further functional annotation of angiogenic cis-regulatory elements was conducted via GREAT. As expected, we found that the enriched MSigDB pathways were tightly related with angiogenesis, including Notch signal pathway45, elongation arrest and recovery46, NFAT signal pathway47 and transcription regulated by SMAD2/3:SMAD4 heterotrimer (Fig. 5c). These angiogenic cis-regulatory elements were also involved in abnormal vascular endothelial cell development (Fig. 5c), suggesting their important role in angiogenesis regulation. We then used HOMER to predict TFs that could potentially bind with angiogenic cis-regulatory elements. The TF with known motifs enriched in angiogenic cis-regulatory elements as well as high expression level in both H-4 and H-12 was recognized as a candidate. Over 15% of angiogenic cis-regulatory elements were enriched for motifs of TFs crucial for angiogenesis, such as FLI148 (17.1%), ETV449 (17.6%), ERG50 (20.6%), ETS110 (14.2%) and ETV151 (18.7%) (Fig. 5d). These five TFs also had persistently detectable mRNA levels in HUVECs during VEGFA stimulation (Supplementary Fig. 1b–f). With these results, we concluded that our identified angiogenic cis-regulatory elements contained comprehensive information on angiogenesis cis-regulation.

Using angiogenic cis-regulatory elements as instrument for identifying cis-regulatory SNPs associated with TAPVC risk

A relevant usage of angiogenic cis-regulatory elements was to guide post-EWAS studies by identifying vascular abnormality-associated cis-regulatory SNPs. We employed this instrument to screen cis-regulatory SNPs associated with TAPVC risk, a congenital heart disease mainly caused by aberrant angiogenesis3. The analysis pipeline had been put forward in our previous study22, and was shown in Supplementary Fig. 2a. Whole-exome sequencing (WES) data of 78 TAPVC cases and 100 controls passed quality control, and a subset of 121,107 common SNPs with high quality was selected for exome-wide association analysis. Of note, there was no population stratification between cases and controls (Supplementary Fig. 2b,c). We thus examined the exome-wide association in an additive logistic regression model without adjustment for any covariates. 25 SNPs showed statistical evidence of exome-wide association with TAPVC and were listed in Data 4 (Fig. 6a, Supplementary Fig. 2d). To avoid any potential impact of linkage disequilibrium (LD) on the findings, we further set the threshold of r2 < 0.6 and obtained 7 independent lead SNPs among the exome-wide associated SNPs (Fig. 6a). LD expansion with a cutoff r2 value of 0.2 revealed another 34 SNPs that were in LD with at least one of the independent lead SNPs. Together, our EWAS discovered a total of 41 SNPs in association with TAPVC risk (Fig. 6a, Data 5). Of all the TAPVC-associated SNPs, 26 (63.4%) were in exons, 9 (22.0%) were in introns, and 6 (14.6%) were in intergenic regions (Fig. 6b).

Fig. 6
figure 6

Using angiogenic cis-regulatory elements to find new cis-regulatory SNPs for TAPVC. (a) Number of exome-wide associated, independent lead, LD and TAPVC-associated SNPs. (b) Genomic distribution of TAPVC-associated SNPs. (c) Schematic illustration of the analysis pipeline. TAPVC-associated SNPs located in angiogenic cis-regulatory elements were recognized as cis-regulatory SNP candidates for TAPVC. (d) Predicted TFs bound to the rs199530718 locus via JASPAR database.

Next, our goal was to find if there were any cis-regulatory SNPs for TAPVC among 41 TAPVC-associated SNPs. Two TAPVC-associated SNPs were located within angiogenic cis-regulatory elements and thereby were recognized as cis-regulatory SNP candidates, namely rs199530718 and rs201538928 (Fig. 6c). rs199530718 was predicted to interact with PRDM1 to form the ‘SNP-TF’ circuit (Fig. 6d), whereas rs201538928 was not located at any TF motifs (data not shown). PRDM1 was a well-studied regulator of embryonic stem cell pluripotency and could affect the process of endothelial cell differentiation52,53. Once its DNA-binding motif was disrupted by rs199530718, PRDM1 would fail to bind to the angiogenic cis-regulatory element containing the rs199530718-A allele. It could disable the PRDM1-mediated pluripotent transcription network in endothelial cells, making them hard to adopt the progenitor-like fate in angiogenesis. In summary, our analyses indicated that rs199530718 was a cis-regulatory SNP linked with TAPVC, validating the angiogenic cis-regulatory elements as an important tool to investigate vascular abnormality genetics.

Discussion

This study carried out a comprehensive assessment of the transcriptome and chromatin signatures in VEGFA-HUVEC angiogenic model, and generated a bank of 47,125 angiogenic cis-regulatory elements. We used this bank to analyze the TAPVC-associated SNPs and discovered a novel cis-regulatory SNP for TAPVC, namely rs199530718. The risk allele rs199530718-A was predicted to disrupt the PRDM1-binding site on an angiogenic cis-regulatory element, thereby causing aberrant angiogenesis. Generally, this study provided a valuable tool for epigenetic dissection of angiogenesis and genetics of vascular abnormality.

Although the VEGFA-HUVEC angiogenic model is not new, the novelty of this study lies in its pure computational approach, integrating DNase-seq, RNA-seq and ChIP-seq datasets deposited on the Gene Expression Omnibus (GEO). Previous analysis of these datasets, combined with cellular assay, has revealed VEGFA-induced transcriptional responses and VEGFA-responsive enhancers in endothelial cells. These results not only elucidated the basic features of VEGFA-HUVEC angiogenic model but also suggested that it was reliable for mimicking angiogenesis. Here, we focused on the angiogenic process rather than the endothelial cell itself. Specifically, based on VEGFA-HUVEC angiogenic model, our computational approach re-exploited the above GEO datasets to study the cis-regulation of angiogenesis.

In vitro differentiation models recapitulating organ and tissue development have been widely applied to explore the cis-regulation of embryonic morphogenesis. Typical examples include in vitro differentiation of cardiomyocytes22,54, retina55 and hypothalamic neurons23. In this study, VEGFA-HUVEC angiogenic model was used to investigate the cis-regulation of angiogenesis. Although this model had been established over a decade, we reached a previously unreported conclusion that HUVECs would adopt a progenitor-like fate in the late stimulation period. Concretely, H-1 showed the endothelial transcriptome characteristics, whereas H-4 and H-12 had enriched BP terms related to angiogenesis and multi-lineage differentiation potential. Under this condition, VEGFA might act as a chemical molecule to reprogram HUVECs into progenitor-like cells to trigger angiogenesis. It was a reasonable speculation as the reprogramming effect of VEGFA had already been observed on endothelial cells in hepatocellular carcinoma56. Besides, our conclusion conformed to the classic sprouting angiogenesis theory, which highlighted the indispensable role of endothelial progenitor cells during the angiogenic process34. Therefore in subsequent analysis, diverse epigenetic datasets of H-4 and H-12 were integratedly examined to profile angiogenic cis-regulatory elements.

It is worth mentioning that none of the techniques available today can provide direct information on genomic cis-regulatory elements. Thus in the study, we adopted an indirect profiling method by firstly mapping angiogenic open chromatin regions with enhancer and/or promoter activities. Briefly, the angiogenic open chromatin regions were detected via DNase-seq datasets, and the enhancer or promoter activity was detected via histone ChIP-seq datasets. This method is mature for profiling genomic cis-regulatory elements, of note, as evidenced by recent achievements in the field of somite and heart development22,57. On the other hand, our identified angiogenic cis-regulatory elements were enriched for DNA-binding motifs of ETS family TFs (e.g., FLI1, ETV4, ERG and ETS1). TFs belonging to the ETS family are master regulators of endothelial cell gene transcription and participate actively in angiogenic signal transduction46. Depleting these TFs can impair angiogenesis and lead to vascular abnormality during embryogenesis58. Moreover, Zhang et al.10 have demonstrated that almost all of the angiogenic enhancers contain ETS TF motif sequences. All these evidences prove that the angiogenic cis-regulatory element bank we generated is comprehensive and reflects the nature of angiogenesis.

The angiogenic cis-regulatory element bank serves as a valuable resource for investigating the angiogenesis and genetics of vascular abnormality. In this study, the bank was used in a post-EWAS analysis to annotate potential cis-regulatory functions of the TAPVC-associated SNPs. Typical examples of such application can be seen in the etiological study of common diseases, such as ventricular septal defect22, acute lymphoblastic leukemia59 and Parkinson’s disease60. Such kind of cis-regulatory element banks also contributes to researches including gene transcription control61, targeted gene finding62, multigenome DNA sequence conservation63 and gene therapy64. For instance, Lee et al. screened proximal cis-regulatory elements in the IL-10 gene loci of Th1 and Th2 cells, and reported a new enhancer that can regulate IL-10 expression in distinct T helpers. Here, while we advocate the use of the angiogenic cis-regulatory element bank to recognize cis-regulatory SNPs for vascular abnormality, we also emphasize that, the ‘SNP-TF’ circuit is a vital clue to prioritize that TF for future follow-up studies.

To sum up, our integrated genetic and epigenetic analysis has generated a genome-wide bank of angiogenic cis-regulatory elements. Browsing the bank enables recognition and understanding of novel cis-regulatory SNPs linked with TAPVC. This study is limited by the lack of evidence from molecular and cellular experiments, which hinders our efforts to further explore angiogenic cis-regulatory elements. Nevertheless, the angiogenic cis-regulatory element bank and the study itself have provided a tool for investigating the cis-regulation of angiogenesis, and contribute to understand genetics of vascular abnormality.

Methods

High-throughput datasets

For this study, high-throughput data from HUVECs before and/or after VEGFA stimulation were reanalyzed. Raw FASTQ files for RNA-seq and DNase-seq were downloaded from GEO series GSE4116610,65, and for H3K27ac and H3K4me3 ChIP-seq were downloaded from GEO series GSE10962624,66. Before alignment, raw sequencing reads were trimmed to generate clean reads via Trim Galore (version 0.6.7) with parameters ‘-q 20 --length 20 --stringency 4 --e 0.1’.

Reanalysis of RNA-seq datasets

Clean reads were aligned to the hg19 reference genome via Hisat2 (version 2.2.1) with default parameters67, and then SAMtools (version 1.9) was used to remove duplicate reads68. Total reads that overlapped the exons of the genes were counted via HTSeq (version 0.13.5) with parameters ‘-s n -t exon’69. Raw gene expression values were computed as counts per million mapped reads (CPM).

For visualization, post-filtered BAM files were normalized and converted to BIGWIG format via deepTools2 bamCoverage with parameters ‘--normalizeUsing RPGC --effectiveGenomeSize 2864785220 --binSize 10’70. Gene tracks were visualized via Integrative Genomics Viewers (IGV).

For principal component analysis (PCA), principal components of gene expression data from all samples were calculated via R function ‘prcomp’. The first two components were then visualized via R package ‘ggplot2’.

For differential analysis, differential expression was assessed by performing all pairwise comparisons among samples. R package ‘DESeq 2’ was used to identify DEGs following the criteria of |log2 (fold change)| ≥ 0.58 and adjust p ≤ 0.01. Time-series clustering of DEGs was analyzed via R package ‘Mfuzz’ with parameter ‘c = 4’.

For functional annotation, GO enrichment analysis for each time-series cluster of DEGs was carried out via DAVID database (https://david.ncifcrf.gov/)71. The GO terms with p < 0.05 were considered as significant and visualized via R package ‘ggplot2’.

Reanalysis of DNase-seq datasets

Clean reads were aligned to hg19 genome via Bowtie2 (version 2.4.4) with default parameters72. Aligned BAM files were then processed to remove low quality mapped and duplicate reads. Peak calling was performed via MACS2 (version 2.1.1.20160309) with parameters ‘--nomodel --shift 100 --extsize 200 -q 0.05’.

For visualization, the pipelines of generating BIGWIG files and visualizing gene tracks were the same as those for RNA-seq datasets. Particularly, significant DNase-seq peaks were visualized via deepTools2 plotHeatmap.

For correlation analysis, genome-wide correlation matrix was calculated via deepTools2 multiBamSummary and plotCorrelation with parameters ‘--corMethod pearson --binSize 10000’. Post-filtered BAM files of all samples were imported as inputs. The correlation heat map was generated via R package ‘pheatmap’.

Reanalysis of ChIP-seq datasets

The analysis pipeline for ChIP-seq reads was the same as that for DNase-seq datasets. Particularly, broad peaks were called via MACS2 with parameters ‘—broad --broad-cutoff 0.1’ and then visualized via IGV.

For read density analysis, the read density matrix was counted via deepTools2 computeMatrix with parameters ‘--referencePoint center -a 2000 -b 2000’, and then was visualized via deepTools2 plotHeatmap.

Identification and annotation of open chromatin regions

The DNase-seq peaks in each sample constituted the initial set of its respective open chromatin regions. Genomic location annotation of open chromatin regions was performed via R package ‘ChIPseeker’.

Identification and annotation of cis-regulatory elements

The DNase-seq peaks that had an overlap with H3K27ac and/or H3K4me3 peaks in each sample constituted the initial set of its respective cis-regulatory elements. Genomic location annotation of cis-regulatory elements was performed via R package ‘ChIPseeker’. Pathway and other enriched functions were predicted via GREAT (version 3.0.0; http://great.stanford.edu/public/html/)73. The enriched terms with p < 0.05 were considered as significant. TF motif enrichment analysis was performed via HOMER with the algorithm ‘findMotifsGenome.pl’74. The enriched motifs with p < 1 × 10−20 were considered as significant.

Exome-wide association analysis

WES data of 78 TAPVC cases and 100 healthy controls were derived from our previous study3. All of the study population was unrelated and recruited from Xinhua Hospital affiliated to Shanghai Jiao Tong University. Before enrollment, written informed consents were signed by participants or their guardians.

For individual quality control, no individuals were filtered out owing to sex discrepancies or low genotyping rate (<95%). For SNP quality control, SNPs were excluded if they were located on sex chromosomes, if their call rate was <95%, if the minor allele frequency (MAF) was <0.05 among controls, or if the p value in Hardy-Weinberg equilibrium test was < 1 × 10−5 among controls. A total of 121,107 high-quality SNPs passed quality control testing and was included for exome-wide association analysis.

For population stratification analysis, PCA of 78 TAPVC cases and 100 controls was performed via PLINK (version 1.90) using all high-quality SNPs75. The first two eigenvectors were visualized via R package ‘ggplot2’.

For association analysis, exome-wide associations were assessed in an additive logistic regression model via PLINK without adjustment for any covariates. SNPs with p < 1 × 10−5 were considered as exome-wide associated. The quantile-quantile (Q-Q) plot and the Manhattan plot were both generated via R package ‘qqman’.

For LD analysis, independent lead SNPs were extracted from exome-wide associated SNPs which were independent from each other at r2 < 0.6. LD SNPs were extracted from high-quality SNPs which were in LD (r2 > 0.2) with at least one independent lead SNP. TAPVC-associated SNPs were the union of independent lead SNPs and LD SNPs. Particularly, functional annotation of the TAPVC-associated SNPs was carried out via ANNOVAR76.