A memory barrier ( English memory barrier, membar, memory fence, fence instruction ) is a type of barrier instruction that instructs the compiler (when generating instructions) and the central processor (when executing instructions) to establish a strict sequence between memory accesses before and after the barrier. This means that all memory accesses before the barrier will be guaranteed to be executed before the first memory access after the barrier.
Memory barriers are needed because most modern processors use performance optimizations that can lead to reordering of instructions . Also, reordering of memory accesses can be caused by the compiler in the process of optimizing the use of the registers of the target processor. Such permutations usually do not affect the correctness of a program with a single thread of execution, but can cause unpredictable behavior in multi-threaded programs. Rules for changing the order of execution of instructions depend on the architecture. Some architectures provide several types of barriers with various guarantees. For example, amd64 provides the following instructions: SFENCE ( eng. Store fence ), LFENCE ( eng. Load fence ), MFENCE ( eng. Memory fence ) [1] . Intel Itanium provides separate “memory” ( English acquire ) and “release” ( English release ) memory barriers that take into account the visibility of read operations after writing from the point of view of the reader and writer, respectively.
Memory barriers are typically used when implementing synchronization primitives, non-blocking data structures, and drivers that interact with hardware .
Example
The following program runs on two processors.
Initially, memory locations x and f contain the value 0 . The program in processor # 1 is in the loop until f is zero, then it prints the value of x . A program in processor # 2 writes 42 to x , and then saves 1 to f . Pseudocode for two program fragments:
CPU # 1:
while ( f == 0 ) { }
// A barrier is needed here
print x ;
CPU # 2:
x = 42 ;
// A barrier is needed here
f is 1 ;
Although it is expected that print always print “42”, but if processor # 2 changes the order of execution of instructions and first changes the value of f print can print “0”. Similarly, processor # 1 can read x before f , and print will again output an unexpected value. For most programs, none of these situations is acceptable. A memory barrier for processor # 2 can be inserted before changing the value of f . You can also insert a barrier for processor # 1 before reading x [2] .
Optimization of the execution order by the compiler
Memory barriers work only at the hardware level. Compilers can also reorder instructions as part of program optimization. Reordering prevention measures are only needed for data that is not protected by synchronization primitives.
In C and C ++ , the volatile keyword is designed to exclude compiler optimizations. It is used most often for working with memory input-output. However, this keyword (unlike Java) does not provide atomicity and protection from extraordinary execution. [3]
Notes
- ↑ peeterjoot. Intel memory ordering, fence instructions, and atomic operations (unspecified) (4.9.2009).
- ↑ Other examples are in the double-check locking article.
- ↑ Volatile Considered Harmful - Linux Kernel Documentation