Previous studies have suggested that the order of and distance between cis-regulatory elements are important for determining gene expression. We proposed a distance-enhanced data-mining technique to identify significant cis-element combinations which have relatively conserved distances among cis-regulatory elements.
The cis-regulatory element combinations with conserved distances among cis-regulatory elements are described as cis-regulatory element arrangements in our method. To extract all the frequently occurring cis-element arrangements we use the closed frequent itemset mining technique and clustering algorithms. Then we find significant arrangements based on the expression similarity of genes.
When we applied our method to the Saccharomyces cerevisiae, about 3890 arrangements were extracted. Among the 3890 arrangements, 50 arrangements turned out statistically significant. And among 50 significant arrangements, 29 were newly identified significant cis-regulatory element combination.