Concatenative programming language

A concatenative programming language is a programming language based on the fact that the concatenation of two code fragments expresses their composition . In such a language, the implicit indication of function arguments is widely used (see codeless programming ), new functions are defined as a composition of functions, and concatenation is used instead of application ^[1] . Applicative programming is opposed to this approach.

Many concatenative languages use postfix notation and a stack to store arguments and return values of operations, so concatenated languages are usually referred to as stack ones. However, concatenative languages can be built on other principles, therefore the terms stack language and concatenative language are not synonymous.

Concatenative languages are simple, efficient, and easy to implement, which is why the most popular languages of this type are used in programmable calculators and for embedding in small microprocessor systems. For example, the RPL concatenation language is used in the Hewlett-Packard HP-28 and HP-48 programmable microcalculators . The Forth programming language was implemented on a variety of processors with very limited computational capabilities ^[2] , for example, it was used on a Jupiter ACE computer with a basic RAM of only 1 KB. However, because of its unusualness and difficulty in reading the source code of programs, concatenative programming languages remained niche.

The most common concatenative language is PostScript Page Description Language, a limited subset of which is used in PDF . Its interpreter is built into many high-performance printers.

Content

Definition

A programming language is called concatenative if it satisfies the following requirements:

An elementary correctly constructed language expression is a unary function with one argument and one return value.
If X and Y are correctly constructed expressions, then the concatenation of X and Y is also correctly constructed.
If Z is the concatenation of X and Y, then the value of Z is the composition of the functions X and Y.

In the concatenative language, each expression is a function. There is no special application operation; to apply a function to the arguments, it is enough to put the name of the function next to the arguments, that is, to produce a text "gluing" (concatenation). New functions are also determined by concatenation, that is, simply by a sequence of names of other functions.

Let the functions foo from two arguments and bar from one argument be given. In order to apply foo to arguments, in a prefix notation it is enough to compose a similar expression:

foo 4 5

Now apply the bar function to the result of the foo function:

bar foo 4 5

Finally, we define the baz function as the concatenation of three functions:

define baz
bar foo 4
end-define

The expression baz 8 equivalent to the expression bar foo 4 8 . That is, the name of any function can be replaced with the text of its definition and get the correct expression. This simple principle determines the specifics of concatenative languages, their advantages and disadvantages.

Features

In order for the concatenation of code fragments to always express their composition, the language should have functions of only one argument. ^[3] In this case, you can refuse to explicitly indicate the argument, therefore, using a uniform prefix or postfix notation, you can create a programming language in which the concatenation of code fragments expresses their composition, that is, a concatenative language.

One of the easiest and most effective ways to implement this approach is to use a stack . Functions take arguments from the stack and push the result onto the stack. Therefore, we can say that in concatenative stack programming languages, functions take one argument — the state of the stack, and return the new state of the stack. ^[4] Postfix notation is usually used in these languages because the stack works according to the LIFO principle.

There are other ways. For example, the function accepts the program text and returns it with some changes that reflect its work. A very simple and flexible homoiconic language can be built on this principle. ^[5] It is possible to create a language around the principle of the UNIX pipeline : each function takes a line and returns a new line after processing. ^[6] Unlike the previous principle, the text passed to the function contains only arguments, and not the entire program. These methods can work with both prefix and postfix notation.

Instead of a stack, you can use other data structures, such as a queue or deck (two-way queue) ^[7] .

The idea of a concatenative language is as follows: all expressions are functions that take some kind of the same data structure and return its new state. This data structure (stack, decks, queue, text string, etc.) plays the role of glue for “gluing” functions into a program; the state of the program is stored in it. This approach identifies the advantages and disadvantages of concatenative languages.

Advantages:

simple and effective implementation
very simple syntax
laconic record of expressions
convenience in creating domain-specific languages
functional programming capabilities: pure functions , conversion of functions from several arguments to a function from one argument, etc.
easy to use bottom-up programming method

Disadvantages:

Unusual due to the widespread (with rare exceptions) use of prefix or postfix notations
Difficulties with composing complex mathematical expressions
Disadvantages when working with functions from many arguments

Implementations

The first high-level concatenative language was Forth , developed by Charles Moore in the late 1960s and early 1970s. He used a typeless stack and was notable for its ease of implementation and high efficiency, which made it possible to implement translators even with extremely limited computing resources. Forth significantly influenced subsequent concatenative languages.

Teacher and programmer Manfred von Thun of La Trobe University , influenced by John Backus ’s famous lecture “Can programming be released from von Neumann's style?” Developed the stack programming language Joy and laid the theoretical foundations of concatenated programming. It is the language Joy was first named concatenative.

Under the influence of Forth and Joy, Glory Pestov created the stack factor programming language Factor in 2003. It is positioned as a “practical stack programming language”. Later Cat and Kitten stack concatenative languages were developed, characterized by static typing . Another modern concatenative language, min , is distinguished by its minimalist syntax and very compact implementation (about 1 megabyte); it is used in the HastySite website generator .

Of the specialized stack languages, PostScript is the most famous, which is used to describe pages and print them, as well as RPL , the programming language calculators HP-28 and HP-48 .

Working with the stack

Most concatenative programming languages use the stack to pass arguments. This is due to the ease of implementation and the properties of the stack, which is convenient to use with postfix notation. Consider working with the stack on the example of the Forth language.

In Forte, the program consists of words separated by spaces. If the word is a number, then it is put on the top of the stack. If the word is the name of the function, then this function is called (in Fort terminology, functions are called words). It takes arguments from the stack and pushes the result onto the stack. Consider the simplest program, which consists of four words:

3 4 + .

The first two words are numbers, so they are pushed onto the stack. Then the function + is called, which takes two numbers from the stack, adds them and puts the result on the stack. Then the function is called . which displays a number from the stack. Thus, arguments precede functions, so this notation is called postfix.

Difficulties and their overcoming

Concatenative general purpose languages are not widely used. This is due to their specific advantages and disadvantages, which are a consequence of the basic principle: all functions take one argument and return one value. When this is exactly what is required, then there are no problems, and concatenative languages allow you to write very simple, concise and clear programs. Suppose in a concatenative language with postfix notation there are the following functions that accept and return text strings:

  input - returns the text entered by the user
 print - displays text
 upcase - changes the lowercase letters to uppercase
 first_word - returns the first word in a line (cuts the line to the first space after the first word)

We use them to create a program that displays the user name in upper case:

input first_word upcase print

Difficulties arise when it is necessary to use functions with different numbers of arguments. In the stack language, you need to place the arguments in a certain order, and you often have to change their places. In addition, if an argument is used in a function several times, it must be duplicated. This leads to difficult to understand expressions. For example, the function

fxyz = y² + x² − |y|

in the stack language is written as follows:

f = drop dup dup × swap abs rot3 dup × swap − +

Using Variables

To overcome such difficulties in modern concatenative languages, such as Kitten and min, variables are clearly used. In Kitten, variables are declared as follows:

  -> x;  // the variable x will get the value from the stack
 5 -> y;  // y = 5
 1 2 3 -> xyz;  // x = 1;  y = 2;  z = 3

Consider the function of squaring a number. Traditionally for Kitten stack languages, it is written as follows: ^[8]

define square (Int32 -> Int32):
dup (*)

And so it can be rewritten using a variable:

define square (Int32 -> Int32):
-> x;
x * x

In this simple example, there is no particular point in this. However, if an argument or arguments are used many times in a function, the use of variables greatly simplifies writing a program and reading the source code. Fragment of the program code for displaying a song of 99 bottles of beer :

  define bottles_of_beer (Int32 -> + IO):
   -> x;
   x verse
   if (x> 1):
     (x - 1) bottles_of_beer

In the min programming language, symbols are similarly used:

x define ; символ x получит значение из стека
:x ; сокращённая запись
8 :x ; x = 8

Consider, for example, a program in the min language, which returns true if the file has a size larger than 1 megabyte and it has recently been changed:

dup dup
"\.zip$" match
swap fsize 1000000 > and
swap mtime now 3600 - >

Using the symbol, you can refuse duplication and permutation of stack elements and significantly improve the readability of the code:

:filepath
filepath "\.zip$" match
filepath fsize 1000000 >
filepath mtime now 3600 - >
and and

Using variables brings concatenative languages closer to applicative, but there are still fundamental differences between them. In concatenative languages, the programmer has a choice: use the stack (or a similar mechanism) or declare variables. In addition, the mechanism of working with variables is quite transparent and controllable. This gives flexibility and the possibility of an effective and relatively simple implementation.

Notes

↑ Jon Purdy. Why Concatenative Programming Matters .
↑ Oleg Paramonov. Aliens: strange architecture of alien computers (Rus.) .
↑ Xah Lee. What's Point-free Programing? (point-free function syntax) (English) .
↑ potomushto. Stack programming languages (rus.) .
↑ Sparist. Om: Main Page .
↑ Xah Lee. Unix Pipe as Functional Language .
↑ Deque .
↑ Jon Purdy. Why Concatenative Programming Matters .

Links

Why Concatenative Programming Matters (English)
Concatenative Languages Wiki (English)
Stack Languages (rus.) On Tru Programmer's Blog
Om Programming Language
Kitten programming language
Min programming language

[1] Jon Purdy. Why Concatenative Programming Matters .

[2] Oleg Paramonov. Aliens: strange architecture of alien computers (Rus.) .

[3] Xah Lee. What's Point-free Programing? (point-free function syntax) (English) .

[4] potomushto. Stack programming languages (rus.) .

[5] Sparist. Om: Main Page .

[6] Xah Lee. Unix Pipe as Functional Language .

[7] Deque .

[8] Jon Purdy. Why Concatenative Programming Matters .