Encoder-decoder networks using convolutional neural network (CNN) architecture have been extensively used in deep learning literatures thanks to its excellent performance for various inverse problems. However, it is still difficult to obtain coherent geometric view why such an architecture gives the desired performance. Inspired by recent theoretical understanding on generalizability, expressivity and optimization landscape of neural networks, as well as the theory of deep convolutional framelets, here we provide a unified theoretical framework that leads to a better understanding of geometry of encoder-decoder CNNs. Our unified framework shows that encoder-decoder CNN architecture is closely related to nonlinear frame representation using combinatorial convolution frames, whose expressivity increases exponentially with the depth. We also demonstrate the importance of skipped connection in terms of expressivity, and optimization landscape.