Multiple correlation coefficient

Multiple correlation coefficient - Characterizes the tightness of the linear correlation between one random variable and some many random variables. More precisely, if (ξ ₁ , ξ ₂ , ..., ξ _k ) is a random vector from R ^k , then the multiple correlation coefficient $\rho _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}$ ${\ displaystyle \ rho _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}}}$ ${\ displaystyle \ rho _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}}}$ between ξ ₁ and ξ ₂ , ..., ξ _{k is} numerically equal to the coefficient of pairwise linear correlation between the quantity ξ ₁ and its best linear approximation $M(\xi _{1}|\xi _{2},\ldots ,\xi _{k})$ ${\ displaystyle M (\ xi _ {1} | \ xi _ {2}, \ ldots, \ xi _ {k})}$ ${\ displaystyle M (\ xi _ {1} | \ xi _ {2}, \ ldots, \ xi _ {k})}$ in the variables ξ ₂ ..., ξ _k , which is a linear regression of ξ ₁ on ξ ₂ , ..., ξ _k .

Content

Properties

The multiple correlation coefficient has the property that, provided

$M\xi _{1}=M\xi _{2}=\ldots =M\xi _{k}=0$ ${\ displaystyle M \ xi _ {1} = M \ xi _ {2} = \ ldots = M \ xi _ {k} = 0}$ when $\xi _{1}^{*}=\beta _{2}\xi _{2}+\beta _{3}\xi _{3}+\cdots +\beta _{k}\xi _{k}$ ${\ displaystyle \ xi _ {1} ^ {*} = \ beta _ {2} \ xi _ {2} + \ beta _ {3} \ xi _ {3} + \ cdots + \ beta _ {k} \ xi _ {k}}$ is the regression of ξ ₁ on ξ ₂ , ..., ξ _k ,

among all linear combinations of the variables ξ ₂ , ..., ξ _{k, the} variable ξ ₁ will have a maximum correlation coefficient with ξ ₁ ^* , which coincides with $\rho _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}$ ${\ displaystyle \ rho _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}}}$ . In this sense, the multiple correlation coefficient is a special case of the canonical correlation coefficient . For k = 2, the multiple correlation coefficient in absolute value coincides with the pair linear correlation coefficient ρ ₁₂ between ξ ₁ and ξ ₂ .

Calculation

The multiple correlation coefficient is calculated using the correlation matrix $\mathbf {R} =\left\{\rho _{i,j}\right\},i,j=1,\ldots ,k$ ${\ displaystyle \ mathbf {R} = \ left \ {\ rho _ {i, j} \ right \}, i, j = 1, \ ldots, k}$ according to the formula

$\rho _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}^{2}=1-{\frac {\left\vert R\right\vert }{R_{11}}}$ ${\ displaystyle \ rho _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}} ^ {2} = 1 - {\ frac {\ left \ vert R \ right \ vert} {R_ {11}}}}$ ,

Where $\left\vert R\right\vert$ ${\ displaystyle \ left \ vert R \ right \ vert}$ is the determinant of the correlation matrix, and $R_{11}$ ${\ displaystyle R_ {11}}$ is an algebraic complement of the element ρ ₁₁ = 1 ; here $0\leqslant \rho _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}\leqslant 1$ ${\ displaystyle 0 \ leqslant \ rho _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}} \ leqslant 1}$ . If a $\rho _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}=1$ ${\ displaystyle \ rho _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}} = 1}$ , then with probability 1 the values of ξ ₁ coincide with the linear combination ξ ₂ , ..., ξ _k , therefore, the joint distribution ξ ₁ , ξ ₂ , ..., ξ _k lies on a hyperplane in the space R ^k . On the other hand, with $\rho _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}=0$ ${\ displaystyle \ rho _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}} = 0}$ all pair correlation coefficients ρ ₁₂ = ρ ₁₃ = ... = ρ _1k = 0 are equal to zero, therefore, the values of ξ ₁ do not correlate with the quantities ξ ₂ , ..., ξ _k . The converse is also true. The multiple correlation coefficient can also be calculated by the formula

$\rho _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}^{2}=1-{\frac {\sigma _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}^{2}}{\sigma _{1}^{2}}}$ ${\ displaystyle \ rho _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}} ^ {2} = 1 - {\ frac {\ sigma _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}} ^ {2}} {\ sigma _ {1} ^ {2}}}}$ ,

Where $\sigma _{1}^{2}$ ${\ displaystyle \ sigma _ {1} ^ {2}}$ is the variance ξ ₁ , and $\sigma _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}^{2}=M(\xi _{1}-(\beta _{2}\xi _{2}+\beta _{3}\xi _{3}+\cdots +\beta _{k}\xi _{k}))^{2}$ ${\ displaystyle \ sigma _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}} ^ {2} = M (\ xi _ {1} - (\ beta _ {2} \ xi _ {2} + \ beta _ {3} \ xi _ {3} + \ cdots + \ beta _ {k} \ xi _ {k})) ^ {2}}$ - variance ξ ₁ relative to the regression.

Selective Multiple Correlation Coefficient

The selective analogue of the multiple correlation coefficient is the quantity $r_{1\bullet 2,\ldots ,k}={\sqrt {1-{\frac {s_{1\bullet 2,\ldots ,k}^{2}}{s_{1}^{2}}}}}$ ${\ displaystyle r_ {1 \ bullet 2, \ ldots, k} = {\ sqrt {1 - {\ frac {s_ {1 \ bullet 2, \ ldots, k} ^ {2}} {s_ {1} ^ { 2}}}}}}$ where $s_{1\bullet 2,\ldots ,k}^{2}$ ${\ displaystyle s_ {1 \ bullet 2, \ ldots, k} ^ {2}}$ and $s_{1}^{2}$ ${\ displaystyle s_ {1} ^ {2}}$ are grades for $\sigma _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}^{2}$ ${\ displaystyle \ sigma _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}} ^ {2}}$ and $\sigma _{1}^{2}$ ${\ displaystyle \ sigma _ {1} ^ {2}}$ obtained from a sample of volume n . To test the null hypothesis about the absence of correlation, the distribution of statistics is used $r_{1\bullet 2,\ldots ,k}$ ${\ displaystyle r_ {1 \ bullet 2, \ ldots, k}}$ . Provided that the sample is taken from the multidimensional normal distribution, the quantity $r_{1\bullet 2,\ldots ,k}^{2}$ ${\ displaystyle r_ {1 \ bullet 2, \ ldots, k} ^ {2}}$ will have a beta distribution with parameters ${\frac {k-1}{2}},{\frac {n-k}{2}}$ ${\ displaystyle {\ frac {k-1} {2}}, {\ frac {nk} {2}}}$ , if a $\rho _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}=0$ ${\ displaystyle \ rho _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}} = 0}$ . For case $\rho _{\xi _{1}\bullet \xi _{2},\ldots ,\xi _{k}}\neq 0$ ${\ displaystyle \ rho _ {\ xi _ {1} \ bullet \ xi _ {2}, \ ldots, \ xi _ {k}} \ neq 0}$ distribution type $r_{1\bullet 2,\ldots ,k}^{2}$ ${\ displaystyle r_ {1 \ bullet 2, \ ldots, k} ^ {2}}$ known, but almost never used due to its bulkiness.

Literature

Kramer G. Mathematical methods of statistics, trans. from English., 2 ed., M., 1975;
Kendall M., Steward A. , Statistical Findings and Communications, trans. from English., M., 1973.