Spatiotemporal gene expression patterns are governed to a large extent by enhancer elements, typically located distally from their target genes. Identification of enhancer-promoter (EP) links that are active in specific cell types is a key challenge in understanding gene regulation that underlies cell fate determination. We developed CT-FOCS, a new statistical inference method that utilizes multiple replicates per cell type to infer cell type-specific EP links (ct-links).
Computationally predicted EP links are usually benchmarked against experimentally determined chromatin interaction measured by ChIA-PET. We expanded this validation scheme by introducing the concept of a connected loop set (CLS), which combines ChIA-PET loops that overlap in their anchor sites. On GM12878 cell line, 72% of the predicted GM12878-specific EP links were supported by CLSs compared to 30% supported by single loops.
Analyzing 1,366 samples covering 651 cell types from ENCODE, Roadmap Epigenomics and FANTOM5, CT-FOCS inferred highly cell type-specific EP links more accurately than a state-of-the-art method. Motif finding analysis applied to cell type-specific enhancers and promoters from the inferred links detected significant overrepresentation of transcription factor (TF) binding signatures, collectively demonstrating highly cell type-specific gene regulation programs. For example, in GM12878 cell line (immortal B-lymphocyte cells), 13 TF motifs were highly overrepresented compared to other cells, including EBF1, PAX5, IRF8, and IRF4, which are known to participate in chromatin remodeling and are essential to B-cell commitment. The complete set of predicted ct-links for each cell type is available at http://acgt.cs.tau.ac.il/ct-focs.