Search for the best projection

The search for the best projection ( Projection Pursuit ) is the statistical method consisting in finding such a projection of multidimensional data for which a certain projection quality function reaches its maximum.

Scope

Although people have a good visual perception of information, they are able to analyze images of only small dimensions. When multidimensional data is analyzed, figurative perception is inconvenient. This problem is solved by considering projections of data of dimension two or three. For visualization of data projections, usual methods are used: scattering diagrams, histograms, box diagrams and so on.

Method Idea

It is necessary to decide which projection will be the most “interesting”. One approach to automating the selection of the most “interesting” projection is based on the following considerations (for clarity, we will consider projection onto a straight line). The data projection will be considered “uninteresting” if the histogram has a normal distribution density, as in Figure 1.

Projections with a two-vertex distribution, as in Figure 2, will be considered “interesting”.

We consider the two-vertex (bimodal) distribution to be more interesting, since it indicates the possible presence of two clusters in the data.

Projection Index

To automate the search for the most “interesting” projection, they use a specially selected quality function, which is often called an index. The projection for which the quality function is maximum is declared the best. A projection search for multidimensional data based on maximizing some projection quality function is called a Projection Pursuit. Choosing an index determines how useful the result will be. We describe several options for the projection quality function.

Search for one-dimensional projection

We introduce the notation. Let be $X$ ${\ displaystyle X}$ - $p$ ${\ displaystyle p}$ -dimensional random vector, we will further assume that the vector is centered, i.e. $E(X)=0$ ${\ displaystyle E (X) = 0}$ .

We denote $a$ ${\ displaystyle a}$ - $p$ ${\ displaystyle p}$ -dimensional numerical vector, finding this vector is the task of finding the best projection, which will take the form $a^{T}X$ ${\ displaystyle a ^ {T} X}$ .

In this case, the data matrix has a dimension $p\times 1$ ${\ displaystyle p \ times 1}$ , vector $X$ ${\ displaystyle X}$ and there is a data matrix.

Then the index $I(a)$ ${\ displaystyle I (a)}$ defined as the variance of a linear combination $a^{T}X$ ${\ displaystyle a ^ {T} X}$ under the additional condition of normalization $a^{T}a=1$ ${\ displaystyle a ^ {T} a = 1}$ .

Common Projection Indices

Friedman and Tukey Approach

Jerome Friedman and John Tukey (1974) measured how “interesting” the multidimensional distribution is. $X$ ${\ displaystyle X}$ considering index

$I_{FT,\;h}(a)=n^{-1}\sum _{j=1}^{n}{\hat {f}}_{h,\;a}^{2}(a^{T}X_{i})$ ${\ displaystyle I_ {FT, \; h} (a) = n ^ {- 1} \ sum _ {j = 1} ^ {n} {\ hat {f}} _ {h, \; a} ^ { 2} (a ^ {T} X_ {i})}$ ,

Where ${\hat {f}}_{h,\;a}$ ${\ displaystyle {\ hat {f}} _ {h, \; a}}$ denotes a nuclear density estimate obtained from the designed data,

${\hat {f}}_{h,\;a}(z)=n^{-1}\sum _{j=1}^{n}K_{h}(z-a^{T}X_{j})$ ${\ displaystyle {\ hat {f}} _ {h, \; a} (z) = n ^ {- 1} \ sum _ {j = 1} ^ {n} K_ {h} (za ^ {T} X_ {j})}$ .

If a multidimensional random variable $X$ ${\ displaystyle X}$ has a normal distribution, then each projection $z=a^{T}X$ ${\ displaystyle z = a ^ {T} X}$ has a standard normal distribution, while $\|a\|=1$ ${\ displaystyle \ | a \ | = 1}$ and $X$ ${\ displaystyle X}$ centered. Changes in $I_{FT,\;h}(a)$ ${\ displaystyle I_ {FT, \; h} (a)}$ regarding $a$ ${\ displaystyle a}$ indicate deviations from normality.

The Hodges and Lehman Approach

Projection Index is defined as $\int (f')^{2}$ ${\ displaystyle \ int (f ') ^ {2}}$ where $f$ ${\ displaystyle f}$ - distribution density of a multidimensional random variable $X$ ${\ displaystyle X}$ which is a data matrix. Very often, density cannot be calculated explicitly or much more conveniently, instead of density, using its estimate.

Hodges and Lehman (1956) showed that if $E(X)=0$ ${\ displaystyle E (X) = 0}$ and $D(X)=1$ ${\ displaystyle D (X) = 1}$ then minimum $\int (f')^{2}$ ${\ displaystyle \ int (f ') ^ {2}}$ achieved at the density of Yepanechnikov, which has the form $f(z)=\max {\{0,\;c(b^{2}-z^{2})\}}$ ${\ displaystyle f (z) = \ max {\ {0, \; c (b ^ {2} -z ^ {2}) \}}}$ where $c={\frac {3}{20{\sqrt {5}}}}$ ${\ displaystyle c = {\ frac {3} {20 {\ sqrt {5}}}}}$ and $b={\sqrt {5}}$ ${\ displaystyle b = {\ sqrt {5}}}$ . This is a parabolic density function that is equal to zero outside the interval $(-{\sqrt {5}},\;{\sqrt {5}})$ ${\ displaystyle (- {\ sqrt {5}}, \; {\ sqrt {5}})}$ . Thus, when using such an index, the density of Yepanechnikov will be the least interesting. A large index value indicates a large deviation from the parabolic form.

The Hodges-Lehman alternative index is based on maximizing entropy , i.e. $\int (-f\log f)$ ${\ displaystyle \ int (-f \ log f)}$ .

If a $E(X)=0$ ${\ displaystyle E (X) = 0}$ and $D(X)=1$ ${\ displaystyle D (X) = 1}$ , then the minimum index $\int (f\log f)$ ${\ displaystyle \ int (f \ log f)}$ achieved at standard normal density. This property is an advantage of the index compared to the previous version.

Indeed, it seems intuitively that the normal distribution is “less interesting” than the Yepanechnikov distribution. So using the index $\int (f\log f)$ ${\ displaystyle \ int (f \ log f)}$ , we measure the deviation of the distribution from normal.

Fisher's Approach

As another index, we can consider Fisher’s information , $\int (f')^{2}/f$ ${\ displaystyle \ int (f ') ^ {2} / f}$ .

When calculating the entropy index, we encounter great computational difficulties, which require a lot of time, which, of course, is not very convenient.

Jones and Sibson's approach

Jones and Sibson (1987) suggested considering deviations from normal density as $f(x)=\varphi (x)\{1+\varepsilon (x)\}$ ${\ displaystyle f (x) = \ varphi (x) \ {1+ \ varepsilon (x) \}}$ where function $\varepsilon$ ${\ displaystyle \ varepsilon}$ satisfies the conditions

$\int \varphi (u)\varepsilon (u)u^{-r}du=0$ ${\ displaystyle \ int \ varphi (u) \ varepsilon (u) u ^ {- r} du = 0}$ at $r=0,\;1,\;2.$ ${\ displaystyle r = 0, \; 1, \; 2.}$

To simplify the calculation of the Jones - Sibson index, it is convenient to go to the cumulants $\kappa _{3}=\mu _{3}=E(X^{3})$ ${\ displaystyle \ kappa _ {3} = \ mu _ {3} = E (X ^ {3})}$ , $\kappa _{4}=\mu _{4}=E(X^{4})-3$ ${\ displaystyle \ kappa _ {4} = \ mu _ {4} = E (X ^ {4}) - 3}$ .

Since the standard normal density satisfies the condition $\kappa _{3}=\kappa _{4}=0$ ${\ displaystyle \ kappa _ {3} = \ kappa _ {4} = 0}$ , the index should at least include information up to the level of symmetrical deviations ( $\kappa _{3}$ ${\ displaystyle \ kappa _ {3}}$ or $\kappa _{4}$ ${\ displaystyle \ kappa _ {4}}$ - not zero) from normality. The simplest of these indices is a positive definite quadratic form from $\kappa _{3}$ ${\ displaystyle \ kappa _ {3}}$ and $\kappa _{4}$ ${\ displaystyle \ kappa _ {4}}$ . In this case, there must be invariance when replacing the data sign, starting with $a^{T}X$ ${\ displaystyle a ^ {T} X}$ and $-a^{T}X$ ${\ displaystyle -a ^ {T} X}$ , we must get the same kind of deviation from normality. notice, that $\kappa _{3}$ ${\ displaystyle \ kappa _ {3}}$ - odd, i.e. $\kappa _{3}(a^{T}X)=-\kappa _{3}(-a^{T}X)$ ${\ displaystyle \ kappa _ {3} (a ^ {T} X) = - \ kappa _ {3} (- a ^ {T} X)}$ . BUT $\kappa _{4}$ ${\ displaystyle \ kappa _ {4}}$ - even, that is $\kappa _{4}(a^{T}X)=\kappa _{4}(-a^{T}X)$ ${\ displaystyle \ kappa _ {4} (a ^ {T} X) = \ kappa _ {4} (- a ^ {T} X)}$ . Quadratic form from $\kappa _{3}$ ${\ displaystyle \ kappa _ {3}}$ and $\kappa _{4}$ ${\ displaystyle \ kappa _ {4}}$ measuring deviation from normality does not include a mixed coefficient $\kappa _{3}\kappa _{4}$ ${\ displaystyle \ kappa _ {3} \ kappa _ {4}}$ .

Therefore, the index proposed by Jones and Sibson is

$I_{JS}(a)=\{\kappa _{3}^{2}(a^{T}X)+\kappa _{4}^{2}(a^{T}X)/4\}/12$ ${\ displaystyle I_ {JS} (a) = \ {\ kappa _ {3} ^ {2} (a ^ {T} X) + \ kappa _ {4} ^ {2} (a ^ {T} X) / 4 \} / 12}$ .

This index actually measures the difference. $\int f\log f-\int \varphi \log \varphi$ ${\ displaystyle \ int f \ log f- \ int \ varphi \ log \ varphi}$ .

Implementation Issues

The method of finding the best projection can give interesting results, but there are many drawbacks to its implementation. Firstly, it is difficult to come up with the correct interpretation of the results. Secondly, the implementation of the method can take a long time and require a sufficiently large amount of computer RAM. In addition, there are still differences between the human visual representation of the best projection and the solution obtained when searching for the best projection. These problems have not yet been resolved; there is no “canonical” version of the method; active research is underway.

Literature

Peter J. Huber , Projection Pursuit (Invited paper), Harvard University, The Annalas of Statistics, 13, No. 2 (1985), 435-475.
Jerome H. Friedman , Exploratory projection pursuit. J. Amer. Statist Assoc. 82 (1987) 249-266. Software online .