Berkson's paradox or Berkson's error - the position of mathematical statistics, formulated by J. Burkson. Formulation: two independent events may become conditionally dependent if a certain event occurs . This conclusion is counter-intuitive for some people, and thus can be described as a paradox . The Berkson paradox is often described in the field of medical statistics or biostatistics . It is a complicating factor appearing in statistical tests of relations.
The same paradox is mentioned in the theory of artificial neural networks as a side-by-side explanation, the effect of justification or reduction of the cause ( English explaining away ) [1] [2] .
Content
Formal definition
- if 0 <P ( A ) <1 and 0 <P ( B ) <1, where A and B are some events,
- and P ( A | B ) = P ( A ) (i.e. events are independent),
- then P ( A | B , C ) <P ( A | C ) where C = A ∪ B (that is, A or B ).
- and P ( A | B ) = P ( A ) (i.e. events are independent),
An illustration using mathematical statistics as an example.
We will investigate the statistics of a random selection of postage stamps from a set, considering two independent brand properties: “rarity” and “beauty”.
Suppose there are 1000 brands, among which 300 are beautiful, 100 are rare, and 30 are both beautiful and rare. Obviously, 10% of the whole set of brands are rare, but of all the beautiful brands, too, 10% are rare, that is, the beauty of the brand says nothing about its rarity.
However, if you choose from the whole set (1000) all beautiful brands and all rare brands (there are 370 such brands), then in this sample of rare brands there will already be 27% (100 out of 370), but from the number of beautiful rare brands there will still be only 10 % (30 out of 300). Then, when analyzing such a sample (and not the entire set), the observer will see a seeming inverse relationship between the beauty and rarity of the brand (if the brand is beautiful, then the probability of its rarity is lower). But in fact there is no such connection.
The described result is mathematically completely correct, its “paradox” is connected with the peculiarities of perception of people who are intuitively inclined to believe that if two parameters are independent, then they remain so in any sample. In reality, in the case of sampling prejudice between independent parameters, conditional dependencies may arise, resulting, in extending them throughout the entire population, to gross errors in the analysis.
Illustration from the example of the theory of neural networks
Let the simplest Bayesian artificial neural network with a sigmoid activation function be given, containing two independent events (reasons) for the third event to occur - the house will shake. The displacement of –10 neuron earthquake events means that in the absence of observations and prior knowledge this event is times more likely not to happen than to happen. If an earthquake event occurred, but a truck event did not occur, then the neuron of the house shake event has a total input of 0, which means the probability of an event (that is, activation of the neuron) 0.5. Thus, if we have the observation of the “house shakes” event, then the best explanation for this fact is the occurrence of one of the event-causes. However, it is illogical to assume that both causes-events occurred at once, in order to explain a house shake event, since the probability of their simultaneous occurrence is equal to . Thus, if we observe a house shake event, and we know what happened, for example, the cause event is an earthquake, then this throws out an explanation ( explaining away , reduces the reason) that the truck was to blame for the house shaking [3] .
Notes
- ↑ Neuroinformatics-2003. Part 1: Lectures on neuroinformatics. - M. .. - 2003. - 187 pp., Ill. - In Russian language .. - ISBN 5-7262-0471-9 GRNTI: 28.23.27 UDC: 004.81.032.26 (063)
- ↑ Lecture 1 “Bayesian and Markov networks” D. P. Vetrov D. A. Kropotov A. A. Osokin. - MSU, VMIK, Kaf. MMP EC of the Russian Academy of Sciences Course "Graphic Models"
- ↑ Hinton, GE ; Osindero, S .; Teh, Y. A fast belief algorithm for the nets (Neopr.) // Neural Computation . - 2006. - Vol. 18 , No. 7 . - p . 1527-1554 . - DOI : 10.1162 / neco.2006.18.7.1527 . - PMID 16764513 .
Links
- Berkson, J. (1946) "Limitations of the table of hospital data". Biometrics Bulletin , 2 (3), 47-53.
See also
- Probability
- Independence
- Bayes theorem
- Conditional probability