Utilizing de Bruijn Graphs in Universal Sequence Design for Discovery of Regulatory Elements
Calvin Lab
Recent technological advancements allow the measurement of protein binding to thousands of DNA or RNA probes on a single microarray. Since the space on the array is limited, the challenge is how to efficiently generate a minimum-size set of sequences that together cover all k-mers. In this talk, we will first introduce de Bruijn graphs and their applications in efficient coverage of DNA k-mers. Then, we will describe a generalization of the problem, in which the sequences are required to obtain certain properties (e.g., unstructured RNAs). We will prove that in this formalization the problem is NP-hard and give a (infeasible) approximation algorithm. We will present a heuristic based on random walks in de Bruijn graphs which works well in practice. If time allows, we shall discuss questions arising in analysis of novel high throughput in vitro methods for motif discovery.