本文介紹自然語言的數學規範理論,並依據此理論初步探討漢語之數學規範性質。利用規範語言之封閉性(closure),本文先證明漢語包孕句的現象超出尋常語言(regular language)。接著本文更利用漢語反覆是非問句的現變,證明漢語的規範性質超過免用語境(context free)語言。此一結果對自然語言之規範理論相當重要。其意義不僅在於再次提供自然語言為超免用語境語言之實證,而且是第一個不藉同構轉換,直接以複寫形式(copying)證明的證明。
The study of mathematical properties of natural languages has both theoretical and applicational implications. In theoretical linguistics, a precise characterization of natural languages in terms of formal models, such as the Chomsky hierarchy, helps to capture the definition of possible natural languages and possible grammars for natural languages. Thus, the advantages and disadvantages of current grammatical theories can be compared and contrasted to motivate meaningful improvements. On the other hand, in natural language processing, knowledge of the formal properties of natural languages means the ability to chose and imple-ment attested efficient parsing algorithms developed for corresponding formal languages.
As a first study of the formal properties of Chinese, this article determines the position of this language in the Chomsky hierarchy. Formal proofs are given to show that Chinese is neither a regular set nor a (type 2) context-free language. Thus it is concluded that Chinese is supra-context-free. These formal proofs base on the closure properties of a type N language under substitution and under intersection with regular sets.
First, with central embedding relative clauses, it is shown that the intersection of Chinese with a well-defined regular set is not a regular set. Thus, according to the finite closure of regular sets under intersection, Chinese cannot be a regular set and it requires a grammar more complex than type 3 grammar.
Second, with three sets of data involving identical copying, it is shown that the intersections of Chinese with well-defined regular sets are not context-free languages. The data discussed are A-not-A questions, interrogative sentential objects of 〔不管〕 'to disregard,' and sentenceinitial NP-not-NP 'regardless of NP.' Since context-free languages are closed under intersection with regular sets, Chinese cannot be a context-free language.
The proof of Chinese's being supra-context-free is significant as the first attested case of copying languages. Previously, the only generally accepted case of supra-context-free natural languages was Swiss. German given is Shieber (1985), and the formal proof relies on homo-morphism. The formal proof using identical copying of constituents of indefinite length in Chinese puts the fact that natural languages are supra-context-free beyond doubt. However, it is also true that the three sets of data requiring mechanisms stronger than context-free grammar all involve identical copying. Thus, Joshi's (1985) idea of 'mildly context-sensitive grammar' with limited supra-context-sensitive mechanisms and Gazdar and Pullum's (1985) suggestion of context-free based parsers for natural languages are still plausible.