Human languages and derive algorithms with which the language understanding may be automated through machines. The task of searching for such computational methods is a process of identifying constructive accounts of linguistic phenomena. Languages are more than what qualitative linguistic descriptions can provide.
This thesis investigates a quantitative methodology to compute natural language phenomena. The particular line that this thesis represents is commonly known as probabilistic or empirical approaches The basic tenet is that natural language expressions are ambiguous due to the lack of information. Language processing is roughly equivalent to discovering the missing information that is usually presumed by the speaker and listener of natural language expressions. The presumed information may soon deplete making an exact interpretation of the received expression by a listener impossible. When this happens, the expression becomes ambiguous. For the most language understanding systems the problem of information deficiency is even worse because the representation and construction of knowledge are not provided with realistic algorithms. Probabilistic approaches have been motivated to overcome the uncertainty caused by missing information through weighted decisions.
In particular, this thesis addresses the two types of ambiguities in natural language sentences. One is syntactic ambiguity in which syntactic relations are not certain. The other is lexical ambiguity of polysemous words in which the meaning of words is not certain. For the syntactic ambiguity problem, probabilistic Recursive Transition Network (PRTN) as a probabilistic grammar representation is developed by supplying an algorithm to estimate parameters and an algorithm to identify best scored syntactic results. The establishment of probabilistic Recursive Transition Network is the second effort following Kupiec``s work (1992) to extend the probabilistic CFG that used to work only on Chomsky Norm...