Clever Geek Handbook
📜 ⬆️ ⬇️

Bayesian linear regression

Bayesian linear regression is an approach in linear regression in which statistical analysis is carried out in the context of Bayesian inference . When the regression model has with a normal distribution , and if a certain form of a priori distribution is adopted, explicit results are available for posterior probability distributions of the model parameters.

Content

  • 1 Model Configuration
  • 2 Regression with adjoint distributions
    • 2.1 Conjugate a priori distribution
    • 2.2 Posterion distribution
    • 2.3 Validity of the model
  • 3 Other cases
  • 4 See also
  • 5 notes
  • 6 Literature
  • 7 Software

Model Configuration

Consider the standard linear regression problem, in which fori=one,...,n {\ displaystyle i = 1, ..., n} {\displaystyle i=1,...,n} we indicate the mean of the conditional distribution of the magnitudeyi {\ displaystyle y_ {i}} y_{i} for a given vectork×one {\ displaystyle k \ times 1} {\displaystyle k\times 1} predictionsxi {\ displaystyle \ mathbf {x} _ {i}} {\displaystyle \mathbf {x} _{i}} :

yi=xiTβ+ϵi,{\ displaystyle y_ {i} = \ mathbf {x} _ {i} ^ {\ rm {T}} {\ boldsymbol {\ beta}} + \ epsilon _ {i},} {\displaystyle y_{i}=\mathbf {x} _{i}^{\rm {T}}{\boldsymbol {\beta }}+\epsilon _{i},}

Whereβ {\ displaystyle {\ boldsymbol {\ beta}}} {\displaystyle {\boldsymbol {\beta }}} is ank×one {\ displaystyle k \ times 1} {\displaystyle k\times 1} vector, andϵi {\ displaystyle \ epsilon _ {i}} \epsilon _{i} are independent and equally distributed normally random variables:

ϵi∼N(0,σ2).{\ displaystyle \ epsilon _ {i} \ sim N (0, \ sigma ^ {2}).} {\displaystyle \epsilon _{i}\sim N(0,\sigma ^{2}).}

This corresponds to the following likelihood function :

ρ(y|X,β,σ2)∝(σ2)-n/2e-one2σ2(y-Xβ)T(y-Xβ).{\ displaystyle \ rho (\ mathbf {y} | \ mathbf {X}, {\ boldsymbol {\ beta}}, \ sigma ^ {2}) \ propto (\ sigma ^ {2}) ^ {- n / 2 } e ^ {- {\ frac {1} {2 {\ sigma} ^ {2}}} (\ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta}}) ^ {\ rm {T }} (\ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta}})}.} {\displaystyle \rho (\mathbf {y} |\mathbf {X} ,{\boldsymbol {\beta }},\sigma ^{2})\propto (\sigma ^{2})^{-n/2}e^{-{\frac {1}{2{\sigma }^{2}}}(\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }})^{\rm {T}}(\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }})}.}

The solution to the is to estimate the coefficient vector using :

β^=(XTX)-oneXTy{\ displaystyle {\ hat {\ boldsymbol {\ beta}}} = (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ rm {T}} \ mathbf {y}} {\displaystyle {\hat {\boldsymbol {\beta }}}=(\mathbf {X} ^{\rm {T}}\mathbf {X} )^{-1}\mathbf {X} ^{\rm {T}}\mathbf {y} }

WhereX {\ displaystyle \ mathbf {X}} \mathbf {X} is ann×k {\ displaystyle n \ times k} n \times k , each row of which is a prediction vectorxiT {\ displaystyle \ mathbf {x} _ {i} ^ {\ rm {T}}} {\displaystyle \mathbf {x} _{i}^{\rm {T}}} , buty {\ displaystyle \ mathbf {y}} {\displaystyle \mathbf {y} } is a column vector r[yone⋯yn]T {\ displaystyle [y_ {1} \; \ cdots \; y_ {n}] ^ {\ rm {T}}} {\displaystyle [y_{1}\;\cdots \;y_{n}]^{\rm {T}}} .

This is a approach, and it is assumed that there are enough measurements to say something meaningful aboutβ {\ displaystyle {\ boldsymbol {\ beta}}} {\displaystyle {\boldsymbol {\beta }}} . In the Bayesian approach, the data is accompanied by additional information in the form of an a priori probability distribution . A priori beliefs about the parameters are combined with the likelihood function of the data according to the Bayes theorem to obtain a posteriori confidence about the parametersβ {\ displaystyle {\ boldsymbol {\ beta}}} {\displaystyle {\boldsymbol {\beta }}} andσ {\ displaystyle \ sigma} \sigma . A priori data can take various forms depending on the area of ​​application and the information that is available a priori .

Regression with adjoint distributions

Conjugate a priori distribution

For any a priori distribution, there may not be an analytical solution for the posterior distribution . In this section, we consider the so-called conjugate a priori distribution , for which the posterior distribution can be derived analytically.

Prior distributionρ(β,σ2) {\ displaystyle \ rho ({\ boldsymbol {\ beta}}, \ sigma ^ {2})}   is the conjugate of the likelihood function if it has the same functional form withβ {\ displaystyle {\ boldsymbol {\ beta}}}   andσ {\ displaystyle \ sigma}   . Since the logarithmic likelihood is quadratic withβ {\ displaystyle {\ boldsymbol {\ beta}}}   , we rewrite it so that the credibility becomes normal from(β-β^) {\ displaystyle ({\ boldsymbol {\ beta}} - {\ hat {\ boldsymbol {\ beta}}}}   . We write

(y-Xβ)T(y-Xβ)=(y-Xβ^)T(y-Xβ^)+(β-β^)T(XTX)(β-β^).{\ displaystyle {\ begin {aligned} (\ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta}}) ^ {\ rm {T}} (\ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta}}) & = (\ mathbf {y} - \ mathbf {X} {\ hat {\ boldsymbol {\ beta}}} ^ {\ rm {T}} (\ mathbf {y} - \ mathbf {X} {\ hat {\ boldsymbol {\ beta}}}) \\ & + ({\ boldsymbol {\ beta}} - {\ hat {\ boldsymbol {\ beta}}} ^ {\ rm {T}} (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X}) ({\ boldsymbol {\ beta}} - {\ hat {\ boldsymbol {\ beta}}}). \ End {aligned}}}  

Credibility is now rewritten as

ρ(y|X,β,σ2)∝(σ2)-v/2e-vs22σ2(σ2)-(n-v)/2×e-one2σ2(β-β^)T(XTX)(β-β^),{\ displaystyle {\ begin {aligned} \ rho (\ mathbf {y} | \ mathbf {X}, {\ boldsymbol {\ beta}}, \ sigma ^ {2}) & \ propto (\ sigma ^ {2} ) ^ {- v / 2} e ^ {- {\ frac {vs ^ {2}} {2 {\ sigma} ^ {2}}}} (\ sigma ^ {2}) ^ {- (nv) / 2} \\ & \ times e ^ {- {\ frac {1} {2 {\ sigma} ^ {2}}} ({\ boldsymbol {\ beta}} - {\ hat {\ boldsymbol {\ beta}} }) ^ {\ rm {T}} (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X}) ({\ boldsymbol {\ beta}} - {\ hat {\ boldsymbol {\ beta} }})}, \ end {aligned}}}  

Where

vs2=(y-Xβ^)T(y-Xβ^){\ displaystyle vs ^ {2} = (\ mathbf {y} - \ mathbf {X} {\ hat {\ boldsymbol {\ beta}}} ^ {\ rm {T}} (\ mathbf {y} - \ mathbf {X} {\ hat {\ boldsymbol {\ beta}}}) \ quad}   andv=n-k {\ displaystyle \ quad v = nk}   ,

Wherek {\ displaystyle k}   is the number of regression coefficients.

This indicates the type of a priori distribution:

ρ(β,σ2)=ρ(σ2)ρ(β|σ2),{\ displaystyle \ rho ({\ boldsymbol {\ beta}}, \ sigma ^ {2}) = \ rho (\ sigma ^ {2}) \ rho ({\ boldsymbol {\ beta}} | \ sigma ^ {2 }),}  

Whereρ(σ2) {\ displaystyle \ rho (\ sigma ^ {2})}   is the

ρ(σ2)∝(σ2)-v02-onee-v0s022σ2.{\ displaystyle \ rho (\ sigma ^ {2}) \ propto (\ sigma ^ {2}) ^ {- {\ frac {v_ {0}} {2}} - 1} e ^ {- {\ frac { v_ {0} s_ {0} ^ {2}} {2 {\ sigma} ^ {2}}}}.  

In the notation introduced in the article, this is the distribution densityInv-gamma(a0,b0) {\ displaystyle {\ text {Inv-Gamma}} (a_ {0}, b_ {0})}   froma0=v02 {\ displaystyle a_ {0} = {\ tfrac {v_ {0}} {2}}}   andb0=one2v0s02 {\ displaystyle b_ {0} = {\ tfrac {1} {2}} v_ {0} s_ {0} ^ {2}}   wherev0 {\ displaystyle v_ {0}}   ands02 {\ displaystyle s_ {0} ^ {2}}   are a priori valuesv {\ displaystyle v}   ands2 {\ displaystyle s ^ {2}}   respectively. Equivalently, this density can be described as theScale-inv-χ2(v0,s02). {\ displaystyle {\ mbox {Scale-inv -}} \ chi ^ {2} (v_ {0}, s_ {0} ^ {2}).}  

Further, the conditional a priori densityρ(β|σ2) {\ displaystyle \ rho ({\ boldsymbol {\ beta}} | \ sigma ^ {2})}   is a normal distribution

ρ(β|σ2)∝(σ2)-k2e-one2σ2(β-μ0)TΛ0(β-μ0).{\ displaystyle \ rho ({\ boldsymbol {\ beta}} | \ sigma ^ {2}) \ propto (\ sigma ^ {2}) ^ {- {\ frac {k} {2}}} e ^ {- {\ frac {1} {2 {\ sigma} ^ {2}}} ({\ boldsymbol {\ beta}} - {\ boldsymbol {\ mu}} _ {0}) ^ {\ rm {T}} \ mathbf {\ Lambda} _ {0} ({\ boldsymbol {\ beta}} - {\ boldsymbol {\ mu}} _ {0})}.}  

In the notation of the normal distribution, the conditional a priori distribution isN(μ0,σ2Λ0-one). {\ displaystyle {\ mathcal {N}} \ left ({\ boldsymbol {\ mu}} _ {0}, \ sigma ^ {2} \ mathbf {\ Lambda} _ {0} ^ {- 1} \ right) .}  

Posterion distribution

Given the a priori distribution, the posterior distribution can be expressed as

ρ(β,σ2|y,X)∝ρ(y|X,β,σ2)ρ(β|σ2)ρ(σ2){\ displaystyle \ rho ({\ boldsymbol {\ beta}}, \ sigma ^ {2} | \ mathbf {y}, \ mathbf {X}) \ propto \ rho (\ mathbf {y} | \ mathbf {X} , {\ boldsymbol {\ beta}}, \ sigma ^ {2}) \ rho ({\ boldsymbol {\ beta}} | \ sigma ^ {2}) \ rho (\ sigma ^ {2})}  
∝(σ2)-n/2e-one2σ2(y-Xβ)T(y-Xβ){\ displaystyle \ propto (\ sigma ^ {2}) ^ {- n / 2} e ^ {- {\ frac {1} {2 {\ sigma} ^ {2}}} (\ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta}}) ^ {\ rm {T}} (\ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta}}}}  
×(σ2)-k/2e-one2σ2(β-μ0)TΛ0(β-μ0){\ displaystyle \ times (\ sigma ^ {2}) ^ {- k / 2} e ^ {- {\ frac {1} {2 {\ sigma} ^ {2}}} ({\ boldsymbol {\ beta} } - {\ boldsymbol {\ mu}} _ {0}) ^ {\ rm {T}} {\ boldsymbol {\ Lambda}} _ {0} ({\ boldsymbol {\ beta}} - {\ boldsymbol {\ mu}} _ {0})}}  
×(σ2)-(a0+one)e-b0σ2.{\ displaystyle \ times (\ sigma ^ {2}) ^ {- (a_ {0} +1)} e ^ {- {\ frac {b_ {0}} {{\ sigma} ^ {2}}}} .}  

After some transformations [1], the posterior probability can be rewritten so that the posterior meanμn {\ displaystyle {\ boldsymbol {\ mu}} _ {n}}   parameter vectorsβ {\ displaystyle {\ boldsymbol {\ beta}}}   can be expressed in terms of a least squares estimateβ^ {\ displaystyle {\ hat {\ boldsymbol {\ beta}}}}   and a priori averageμ0 {\ displaystyle {\ boldsymbol {\ mu}} _ {0}}   , where support for a priori probability is expressed by a priori accuracy matrixΛ0 {\ displaystyle {\ boldsymbol {\ Lambda}} _ {0}}  

μn=(XTX+Λ0)-one(XTXβ^+Λ0μ0).{\ displaystyle {\ boldsymbol {\ mu}} _ {n} = (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X} + {\ boldsymbol {\ Lambda}} _ {0}) ^ {-1} (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X} {\ hat {\ boldsymbol {\ beta}}} + {\ boldsymbol {\ Lambda}} _ {0} {\ boldsymbol {\ mu}} _ {0}).}  

To confirm thatμn {\ displaystyle {\ boldsymbol {\ mu}} _ {n}}   is actually an a posteriori mean, the quadratic terms in the exponent can be transformed to the fromβ-μn {\ displaystyle {\ boldsymbol {\ beta}} - {\ boldsymbol {\ mu}} _ {n}}   [2] .

(y-Xβ)T(y-Xβ)+(β-μ0)TΛ0(β-μ0)={\ displaystyle (\ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta}}) ^ {\ rm {T}} (\ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta }}) + ({\ boldsymbol {\ beta}} - {\ boldsymbol {\ mu}} _ {0}) ^ {\ rm {T}} {\ boldsymbol {\ Lambda}} _ {0} ({\ boldsymbol {\ beta}} - {\ boldsymbol {\ mu}} _ {0}) =}  
(β-μn)T(XTX+Λ0)(β-μn)+yTy-μnT(XTX+Λ0)μn+μ0TΛ0μ0.{\ displaystyle ({\ boldsymbol {\ beta}} - {\ boldsymbol {\ mu}} _ {n}) ^ {\ rm {T}} (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X} + {\ boldsymbol {\ Lambda}} _ {0}) ({\ boldsymbol {\ beta}} - {\ boldsymbol {\ mu}} _ {n}) + \ mathbf {y} ^ {\ rm {T}} \ mathbf {y} - {\ boldsymbol {\ mu}} _ {n} ^ {\ rm {T}} (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X} + {\ boldsymbol {\ Lambda}} _ {0}) {\ boldsymbol {\ mu}} _ {n} + {\ boldsymbol {\ mu}} _ {0} ^ {\ rm {T}} {\ boldsymbol { \ Lambda}} _ {0} {\ boldsymbol {\ mu}} _ {0}.}  

Now, the posterior distribution can be expressed as the normal distribution times the :

ρ(β,σ2|y,X)∝(σ2)-k2e-one2σ2(β-μn)T(XTX+Λ0)(β-μn){\ displaystyle \ rho ({\ boldsymbol {\ beta}}, \ sigma ^ {2} | \ mathbf {y}, \ mathbf {X}) \ propto (\ sigma ^ {2}) ^ {- {\ frac {k} {2}}} e ^ {- {\ frac {1} {2 {\ sigma} ^ {2}}} ({\ boldsymbol {\ beta}} - {\ boldsymbol {\ mu}} _ { n}) ^ {\ rm {T}} (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X} + \ mathbf {\ Lambda} _ {0}) ({\ boldsymbol {\ beta} } - {\ boldsymbol {\ mu}} _ {n})}}  
×(σ2)-n+2a02-onee-2b0+yTy-μnT(XTX+Λ0)μn+μ0TΛ0μ02σ2.{\ displaystyle \ times (\ sigma ^ {2}) ^ {- {\ frac {n + 2a_ {0}} {2}} - 1} e ^ {- {\ frac {2b_ {0} + \ mathbf { y} ^ {\ rm {T}} \ mathbf {y} - {\ boldsymbol {\ mu}} _ {n} ^ {\ rm {T}} (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X} + {\ boldsymbol {\ Lambda}} _ {0}) {\ boldsymbol {\ mu}} _ {n} + {\ boldsymbol {\ mu}} _ {0} ^ {\ rm {T }} {\ boldsymbol {\ Lambda}} _ {0} {\ boldsymbol {\ mu}} _ {0}} {2 {\ sigma} ^ {2}}}}.}  

Therefore, the posterior distribution can be parameterized as follows.

ρ(β,σ2|y,X)∝ρ(β|σ2,y,X)ρ(σ2|y,X),{\ displaystyle \ rho ({\ boldsymbol {\ beta}}, \ sigma ^ {2} | \ mathbf {y}, \ mathbf {X}) \ propto \ rho ({\ boldsymbol {\ beta}} | \ sigma ^ {2}, \ mathbf {y}, \ mathbf {X}) \ rho (\ sigma ^ {2} | \ mathbf {y}, \ mathbf {X}),}  

where two factors correspond to distribution densitiesN(μn,σ2Λn-one) {\ displaystyle {\ mathcal {N}} \ left ({\ boldsymbol {\ mu}} _ {n}, \ sigma ^ {2} {\ boldsymbol {\ Lambda}} _ {n} ^ {- 1} \ right) \,}   andInv-gamma(an,bn) {\ displaystyle {\ text {Inv-Gamma}} \ left (a_ {n}, b_ {n} \ right)}   with parameters specified by expressions

Λn=(XTX+Λ0),μn=(Λn)-one(XTXβ^+Λ0μ0),{\ displaystyle {\ boldsymbol {\ Lambda}} _ {n} = (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X} + \ mathbf {\ Lambda} _ {0}), \ quad {\ boldsymbol {\ mu}} _ {n} = ({\ boldsymbol {\ Lambda}} _ {n}) ^ {- 1} (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X } {\ hat {\ boldsymbol {\ beta}}} + {\ boldsymbol {\ Lambda}} _ {0} {\ boldsymbol {\ mu}} _ {0}),}  
an=a0+n2,bn=b0+one2(yTy+μ0TΛ0μ0-μnTΛnμn).{\ displaystyle a_ {n} = a_ {0} + {\ frac {n} {2}}, \ qquad b_ {n} = b_ {0} + {\ frac {1} {2}} (\ mathbf { y} ^ {\ rm {T}} \ mathbf {y} + {\ boldsymbol {\ mu}} _ {0} ^ {\ rm {T}} {\ boldsymbol {\ Lambda}} _ {0} {\ boldsymbol {\ mu}} _ {0} - {\ boldsymbol {\ mu}} _ {n} ^ {\ rm {T}} {\ boldsymbol {\ Lambda}} _ {n} {\ boldsymbol {\ mu} } _ {n}).}  

This can be interpreted as Bayesian training, in which the parameters are updated according to the following equalities.

μn=(XTX+Λ0)-one(Λ0μ0+XTXβ^)=(XTX+Λ0)-one(Λ0μ0+XTy),{\ displaystyle {\ boldsymbol {\ mu}} _ {n} = (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X} + {\ boldsymbol {\ Lambda}} _ {0}) ^ {-1} ({\ boldsymbol {\ Lambda}} _ {0} {\ boldsymbol {\ mu}} _ {0} + \ mathbf {X} ^ {\ rm {T}} \ mathbf {X} {\ hat {\ boldsymbol {\ beta}}}) = (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X} + {\ boldsymbol {\ Lambda}} _ {0}) ^ {- 1} ({\ boldsymbol {\ Lambda}} _ {0} {\ boldsymbol {\ mu}} _ {0} + \ mathbf {X} ^ {\ rm {T}} \ mathbf {y}),}  
Λn=(XTX+Λ0),{\ displaystyle {\ boldsymbol {\ Lambda}} _ {n} = (\ mathbf {X} ^ {\ rm {T}} \ mathbf {X} + {\ boldsymbol {\ Lambda}} _ {0}), }  
an=a0+n2,{\ displaystyle a_ {n} = a_ {0} + {\ frac {n} {2}},}  
bn=b0+one2(yTy+μ0TΛ0μ0-μnTΛnμn).{\ displaystyle b_ {n} = b_ {0} + {\ frac {1} {2}} (\ mathbf {y} ^ {\ rm {T}} \ mathbf {y} + {\ boldsymbol {\ mu} } _ {0} ^ {\ rm {T}} {\ boldsymbol {\ Lambda}} _ {0} {\ boldsymbol {\ mu}} _ {0} - {\ boldsymbol {\ mu}} _ {n} ^ {\ rm {T}} {\ boldsymbol {\ Lambda}} _ {n} {\ boldsymbol {\ mu}} _ {n}).}  

Model validity

Model validityp(y|m) {\ displaystyle p (\ mathbf {y} | m)}   Is the probability of data for this modelm {\ displaystyle m}   . It is also known as marginal likelihood and as a priori predictive density . Here the model is determined by the likelihood functionp(y|X,β,σ) {\ displaystyle p (\ mathbf {y} | \ mathbf {X}, {\ boldsymbol {\ beta}}, \ sigma)}   and a priori distribution of parameters, that is,p(β,σ) {\ displaystyle p ({\ boldsymbol {\ beta}}, \ sigma)}   . The validity of the model is fixed by one number, showing how well this model explains the observations. The validity of the Bayesian linear regression model presented in this section can be used to compare competing linear models by Bayesian model comparison . These models can differ in the number and values ​​of the predictive variables, as well as their a priori values ​​in the model parameters. The complexity of the model is taken into account by the validity of the model, since it eliminates the parameters by integrationp(y,β,σ|X) {\ displaystyle p (\ mathbf {y}, {\ boldsymbol {\ beta}}, \ sigma | \ mathbf {X})}   for all possible valuesβ {\ displaystyle {\ boldsymbol {\ beta}}}   andσ {\ displaystyle \ sigma}   .

p(y|m)=∫p(y|X,β,σ)p(β,σ)dβdσ{\ displaystyle p (\ mathbf {y} | m) = \ int p (\ mathbf {y} | \ mathbf {X}, {\ boldsymbol {\ beta}}, \ sigma) \, p ({\ boldsymbol { \ beta}}, \ sigma) \, d {\ boldsymbol {\ beta}} \, d \ sigma}  

This integral can be calculated analytically and the solution is given by the following equality [3]

p(y|m)=one(2π)n/2det(Λ0)det(Λn)⋅b0a0bnan⋅Γ(an)Γ(a0){\ displaystyle p (\ mathbf {y} | m) = {\ frac {1} {(2 \ pi) ^ {n / 2}}} {\ sqrt {\ frac {\ det ({\ boldsymbol {\ Lambda }} _ {0})} {\ det ({\ boldsymbol {\ Lambda}} _ {n})}}} \ cdot {\ frac {b_ {0} ^ {a_ {0}}} {b_ {n } ^ {a_ {n}}}} \ cdot {\ frac {\ Gamma (a_ {n})} {\ Gamma (a_ {0})}}}  

HereΓ {\ displaystyle \ Gamma}   means gamma function . Since we have chosen the conjugate a priori distribution, the maximum likelihood can be easily calculated by solving the following equality for arbitrary valuesβ {\ displaystyle {\ boldsymbol {\ beta}}}   andσ {\ displaystyle \ sigma}   .

p(y|m)=p(β,σ|m)p(y|X,β,σ,m)p(β,σ|y,X,m){\ displaystyle p (\ mathbf {y} | m) = {\ frac {p ({\ boldsymbol {\ beta}}, \ sigma | m) \, p (\ mathbf {y} | \ mathbf {X}, {\ boldsymbol {\ beta}}, \ sigma, m)} {p ({\ boldsymbol {\ beta}}, \ sigma | \ mathbf {y}, \ mathbf {X}, m)}}}  

Note that this equality is nothing more than a reformulation of Bayes' theorem . Substitution of the formula for a priori probability, likelihood and a posteriori probability and simplification of the resulting expression leads to the analytical expression given above.

Other cases

In the general case, it may be impossible or inappropriate to obtain an a posteriori distribution analytically. However, it is possible to approximate the posterior probability by the method, such as Monte Carlo sampling [4] or .

Special caseμ0=0,Λ0=cE {\ displaystyle {\ boldsymbol {\ mu}} _ {0} = 0, \ mathbf {\ Lambda} _ {0} = c \ mathbf {E}}   called ridge regression .

A similar analysis can be performed for the general case of multiple regression and partially for the Bayesian - see .

See also

  • Tikhonov regularization method

Notes

  1. ↑ Intermediate calculations can be found in O'Hagan (1994) at the beginning of the linear model chapter.
  2. ↑ Interim calculations can be found in the book of Fahrmeir et al. (2009 on p. 188.
  3. ↑ Interim calculations can be found in O'Hagan (1994) on page 257.
  4. ↑ Carlin and Louis (Carlin, Louis, 2008) and Gelman et al. (Gelman, et al., 2003) explained how to use sampling techniques for Bayesian linear regression.

Literature

  • George EP Box , Tiao GC Bayesian Inference in Statistical Analysis. - Wiley, 1973. - ISBN 0-471-57428-7 .
  • Bradley P. Carlin, Thomas A. Louis. Bayesian Methods for Data Analysis, Third Edition. - Boca Raton, FL: Chapman and Hall / CRC, 2008 .-- ISBN 1-58488-697-8 .
  • Fahrmeir L., Kneib T., Lang S. Regression. Modelle, Methoden und Anwendungen. - 2nd. - Heidelberg: Springer, 2009 .-- ISBN 978-3-642-01836-7 . - DOI : 10.1007 / 978-3-642-01837-4 .
  • Fornalski KW, Parzych G., Pylak M., Satuła D., Dobrzyński L. Application of Bayesian reasoning and the Maximum Entropy Method to some reconstruction problems // Acta Physica Polonica A. - 2010 .-- V. 117 , no. 6 . - S. 892-899 . - DOI : 10.12693 / APhysPolA.117.892 .
  • Krzysztof W. Fornalski. Applications of the robust Bayesian regression analysis // International Journal of Society Systems Science. - 2015.- T. 7 , no. 4 . - S. 314–333 . - DOI : 10.1504 / IJSSS.2015.07.07233 .
  • Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin. Bayesian Data Analysis, Second Edition. - Boca Raton, FL: Chapman and Hall / CRC, 2003 .-- ISBN 1-58488-388-X .
  • Michael Goldstein, David Wooff. Bayes Linear Statistics, Theory & Methods. - Wiley, 2007 .-- ISBN 978-0-470-01562-9 .
  • Minka, Thomas P. (2001) Bayesian Linear Regression , Microsoft research web page
  • Peter E. Rossi, Greg M. Allenby, Robert McCulloch. Bayesian Statistics and Marketing. - John Wiley & Sons, 2006 .-- ISBN 0470863676 .
  • Anthony O'Hagan. Bayesian Inference. - First. - Halsted, 1994 .-- T. 2B. - (Kendall's Advanced Theory of Statistics). - ISBN 0-340-52922-9 .
  • Sivia, DS, Skilling, J. Data Analysis - A Bayesian Tutorial. - Second. - Oxford University Press, 2006.
  • Gero Walter, Thomas Augustin. Bayesian Linear Regression — Different Conjugate Models and Their (In) Sensitivity to Prior-Data Conflict // Technical Report Number 069, Department of Statistics, University of Munich. - 2009.

Software

  • Python
    • Bayesian Type-II Linear Regression code , tutorial
    • ARD Linear Regression code
    • ARD Linear Regression with kernelized features code , tutorial
Source - https://ru.wikipedia.org/w/index.php?title=Bayesian_linear_regression&oldid=97433012


More articles:

  • Mind Control
  • Giro d'Italia 1932
  • The Unknown God
  • Grazioli, Giovanni Battista
  • Emilia Antwerpian Oranskaya-Nassau
  • Kelly, Crystal
  • The Beatborgs
  • Leioproctus albipilosus
  • Ferreira Resende Alves, Juan Antoniou
  • Patil Pratibha

All articles

Clever Geek | 2019