Most Transcription Factors Recognize DNA Shape Independent of Sequence
We hypothesized that transcription factors (TFs) can recognize DNA shape without nucleotide sequence recognition. Motivating an independent role for shape, many TF binding sites lack a sequence-motif, DNA shape adds specificity to sequence-motifs, and different sequences can encode similar shapes. We therefore asked if binding sites of any TFs are enriched for a specific pattern of DNA shape features, such as minor groove width, propeller twist, helical twist, or roll. To discover these shape-motifs de novo, we developed ShapeMF, which performs Gibbs sampling directly on shape features rather than nucleotide sequences. Using ChIP-Seq data for 110 human ENCODE TFs, we find that most TFs have shape-motifs and strongly bind regulatory regions with shape-motifs in the absence of sequence-motifs. When shape- and sequence-recognition co-occur in a region, the two types of motifs can be overlapping, flanking, or separated by consistent spacing, with shape-motifs explaining low information content positions in and nearby sequence-motifs. Shape-motifs are prevalent in regions co-bound by multiple TFs. They also explain binding of the co-factors MYC-MAX and TBX5-NKX2-5, which cannot be accounted for with sequence-motifs. Finally, shape-motifs are strikingly different across TFs with nearly identical sequence motifs, providing an explanation for their distinct binding locations. These results establish shape-motifs as drivers of TF-DNA recognition complementary to sequence-motifs.