Knowing accurate prognosis of cancer is one of the most important problems in therapeutic studies and applications. Well proven prognostication is essentially needed to achieve higher degrees of patients’ survival and welfare with providing appropriate options for therapies and medication. As genome-wide data sets are being sufficiently generated in various omics fields, it is possible to analyze conventional clinical features such as pathological stages (TNM stage or stage grouping), pathological grade and histological subtypes with related molecular level data. In this thesis, we propose computational methods to correlate gene expression profiles with clinical features and to provide improved approaches for cancer prognosis.
First, we proposed a new method for extracting cancer metastasis related genes with gene expression data and pathological information (pathological M stage and histological subtypes). We analyzed differently expressed genes in primary colon tumors and their metastases in liver. In this process, we tried to reduce metastasis independent noise features which might come from the difference of organs and differently activated organ specific viability. Using appropriately defined set operations to a large scale data set, we could show that our result is biologically related to the metastasis processes and free from noise effects especially from tissue specificity.
Second, we proposed a monotonically expressed gene analysis (MEGA) for extracting breast cancer lymph node invasion and tumor size related gene sets by utilizing expression patterns over a two dimensional $N\timesT$ space with providing appropriate meta-analysis test results of various cancer analyses. The test has been conducted on completely independent data sets. We showed that gene sets selected from the suggested functions were strongly correlated with cancer prognoses including metastasis, relapse and survival, and showed significantly better results than conventional appro...