The Chomsky Hierarchy is a classification of formal languages and formal grammars , according to which they are divided into 4 types according to their conditional complexity. Proposed by the professor at the Massachusetts Institute of Technology , the linguist Noam Chomsky .
Content
- 1 Grammar Classification
- 1.1 Type 0 - Unlimited
- 1.2 Type 1 - context sensitive
- 1.3 Type 2 - context-free
- 1.4 Type 3 - Regular
- 2 Classification of languages
- 3 notes
- 4 Literature
Grammar Classification
According to Chomsky, formal grammars can be divided into four types. To assign a grammar to one or another type, it is necessary that all its rules (productions) correspond to certain schemes.
Type 0 - Unlimited
A grammar with a phrase structure G is an algebraic structure , an ordered four (V T , V N , P, S), where [1] :
- - alphabet (set) of terminal symbols - terminals ,
- - the alphabet (set) of non-terminal characters - non - terminals ,
- - dictionary , and
- - a finite set of productions (rules) of grammar,
- - initial character ( source ).
Here - the set of all lines above the alphabet , but - many non-empty lines above the alphabet .
According to Chomsky’s classification, type 0 includes unlimited grammars — phrases with phrasal structure , that is, all formal grammars without exception. Rules can be written as:
- ,
Where - any nonempty chain containing at least one nonterminal [2] symbol, and - any string of characters from the alphabet.
Due to their complexity, such grammars do not have practical application.
Type 1 - Context-sensitive
This type includes context-sensitive (KZ) grammars and non-shortening grammars. For grammar all rules have the form [3] :
- where . Such grammars are referred to as context-sensitive.
- where . Such grammars are classified as non-shortening.
These grammar classes are equivalent. They can be used in the analysis of texts in natural languages, however, when building compilers, they are practically not used because of their complexity. For context-sensitive grammars, the statement is proved: according to a certain algorithm, in a finite number of steps, you can establish whether a chain of terminal symbols belongs to a given language or not.
Type 2 - context-free
This type includes context-free (KS) grammar . For grammar all rules are of the form:
- where (for non-shortening COP grammars) or (for shortening) . That is, the grammar allows only a nonterminal symbol to appear on the left side of the rule.
KS grammars are widely used to describe the syntax of computer languages (see parsing ).
Type 3 - Regular
The third type includes regular grammars (automatic) - the simplest of formal grammars. They are context-free, but with disabilities.
All regular grammars can be divided into two equivalent classes, which for a grammar of type III will have rules of the following form:
- or where (for left-linear grammars).
- ; or where (for right-line grammars).
Regular grammars are used to describe the simplest constructions: identifiers , strings , constants , as well as assembly languages , command processors , etc.
Language Classification
Formal languages are classified according to the types of grammars with which they are defined. However, the same language can be defined by different grammars belonging to different types. In this case, it is believed that the language belongs to the simplest of them. So, a language described by a grammar with a phrase structure, context-sensitive and context-free grammars will be context-free.
As with grammars, the complexity of a language is determined by its type. The most complex are languages with a phrasal structure (natural languages can be attributed here), hereinafter referred to as KZ languages, CS languages and the simplest are regular languages.
Notes
- ↑ Cook, Base, 1990 , p. 258.264.
- ↑ Serebryakov V. A., Galochkin M. P., Gonchar D. R., Furugyan M. G. Theory and implementation of programming languages. - M.: MZ-Press, 2006. - S. 21. - ISBN 5-94073-094-9 .
- ↑ Cook, Base, 1990 , p. 268.
Literature
- Molchanov A. Yu. System software. St. Petersburg: Peter, 2006.
- John Hopcroft , Rajeev Motwani , Jeffrey Ullman . CHAPTER 5. Context-free grammars and languages // Introduction to the theory of automata, languages and computations = Introduction to Automata Theory, Languages, and Computation. - M .: "Williams" , 2002. - S. 528. - ISBN 0-201-44124-1 .
- Serebryakov V. A. , Galochkin M. P., Gonchar D. R., Furugyan M. G. Theory and implementation of programming languages - M .: MZ-Press, 2006, 2nd ed. - ISBN 5-94073-094-9
- Robin Hunter Basic Compiler Concepts = The Essence of Compilers. - M .: "Williams" , 2002. - S. 256. - ISBN 5-8459-0360-2 .
- Cook D., Baze G. Chapter 8. Languages and Grammars // Computer Mathematics = Computer Mathematics. - M .: Science. Fizmatlit, 1990 .-- 384 p. - ISBN 5-02-014216-6 .