The study of mathematical properties of natural languages has both theoretical and applicational implications. In theoretical linguistics, a precise characterization of natural languages in terms of formal models, such as the Chomsky hierarchy, helps to capture the definition of possible natural languages and possible grammars for natural languages. Thus, the advantages and disadvantages of current grammatical theories can be compared and contrasted to motivate meaningful improvements. On the other hand, in natural language processing, knowledge of the formal properties of natural languages means the ability to chose and imple-ment attested efficient parsing algorithms developed for corresponding formal languages.
As a first study of the formal properties of Chinese, this article determines the position of this language in the Chomsky hierarchy. Formal proofs are given to show that Chinese is neither a regular set nor a (type 2) context-free language. Thus it is concluded that Chinese is supra-context-free. These formal proofs base on the closure properties of a type N language under substitution and under intersection with regular sets.
First, with central embedding relative clauses, it is shown that the intersection of Chinese with a well-defined regular set is not a regular set. Thus, according to the finite closure of regular sets under intersection, Chinese cannot be a regular set and it requires a grammar more complex than type 3 grammar.
Second, with three sets of data involving identical copying, it is shown that the intersections of Chinese with well-defined regular sets are not context-free languages. The data discussed are A-not-A questions, interrogative sentential objects of 〔不管〕 'to disregard,' and sentenceinitial NP-not-NP 'regardless of NP.' Since context-free languages are closed under intersection with regular sets, Chinese cannot be a context-free language.
The proof of Chinese's being supra-context-free is significant as the first attested case of copying languages. Previously, the only generally accepted case of supra-context-free natural languages was Swiss. German given is Shieber (1985), and the formal proof relies on homo-morphism. The formal proof using identical copying of constituents of indefinite length in Chinese puts the fact that natural languages are supra-context-free beyond doubt. However, it is also true that the three sets of data requiring mechanisms stronger than context-free grammar all involve identical copying. Thus, Joshi's (1985) idea of 'mildly context-sensitive grammar' with limited supra-context-sensitive mechanisms and Gazdar and Pullum's (1985) suggestion of context-free based parsers for natural languages are still plausible.
Citations are generated automatically from bibliographic data as a convenience, and may not be complete or accurate.