ANOVA Calculations in Multiple Linear Regression

From ReliaWiki
Jump to navigation Jump to search

New format available! This reference is now available in a new format that offers faster page load, improved display for calculations and images, more targeted search and the latest content available as a PDF. As of September 2023, this Reliawiki page will not continue to be updated. Please update all links and bookmarks to the latest reference at help.reliasoft.com/reference/experiment_design_and_analysis

Chapter {{{1}}}: ANOVA Calculations in Multiple Linear Regression


DOEbox.png

Chapter {{{1}}}  
ANOVA Calculations in Multiple Linear Regression  

Synthesis-icon.png

Available Software:
Weibull++

Examples icon.png

More Resources:
DOE examples



Reference Appendix A: ANOVA Calculations in Multiple Linear Regression

The sum of squares for the analysis of variance in multiple linear regression is obtained using the same relations as those in simple linear regression, except that the matrix notation is preferred in the case of multiple linear regression. In the case of both the simple and multiple linear regression models, once the observed and fitted values are available, the sum of squares are calculated in an identical manner. The difference between the two models lies in the way the fitted values are obtained. In a simple linear regression model, the fitted values are obtained from a model having only one predictor variable. In multiple linear regression analysis, the model used to obtained the fitted values contains more than one predictor variable.


Total Sum of Squares

Recall from Chapter 4 on simple linear regression that the total sum of squares, [math]\displaystyle{ SS_r }[/math], is obtained using the following equation:

[math]\displaystyle{ SS_r = \sum_{i=1}^n (y_i-\bar{y})^2 }[/math]
[math]\displaystyle{ = \sum_{i=1}^n y_i^2-\frac{\left(\sum_{i=1}^n y_i\right)^2}{n} }[/math]

The first term, [math]\displaystyle{ \sum_{i=1}^n y_i^2 }[/math], can be expressed in matrix notation using the vector of observed values, y, as:

[math]\displaystyle{ \sum_{i=1}^n y_i^2 = y'y }[/math]

If represents an n x n square matrix of ones, then the second term, [math]\displaystyle{ \left(\sum_{i=1}^n y_i\right)^2 ln }[/math], can be expressed in matrix notation as:

[math]\displaystyle{ \frac{\left(\sum_{i=1}^n y_i\right)^2}{n}=(\frac{1}{n} y'Jy }[/math]

Therefore, the total sum of squares in matrix notation is: (31)

[math]\displaystyle{ SS_r = y'y - (\frac{1}{n}) y'Jy }[/math]
[math]\displaystyle{ = y'[I-(\frac{1}{n})J]y }[/math]

where I is the identity matrix of order .

Model Sum of Squares

Similarly, the model sum of squares or the regression sum of squares, [math]\displaystyle{ SS_R }[/math], can be obtained in matrix notation as:

[math]\displaystyle{ SS_R=\sum_{i=1}^n \hat{y}_i^2-\frac{\left(\sum_{i=1}^n y_i\right)^2}{n} }[/math]
[math]\displaystyle{ =\hat{y}'\hat-\frac{1}{n} y'Jy }[/math]
[math]\displaystyle{ y'[H-\frac{1}{n}J]y }[/math]

where [math]\displaystyle{ H }[/math] is the hat matrix and is calculated using [math]\displaystyle{ H=X(X'X)^{-1}X' }[/math].

Error Sum of Squares

The error sum of squares or the residual sum of squares, [math]\displaystyle{ SS_E }[/math], is obtained in the matrix notation from the vector of residuals, [math]\displaystyle{ e }[/math], as:

[math]\displaystyle{ SS_E=e'e }[/math]
[math]\displaystyle{ =(y-\hat{y})'(y-\hat{y}) }[/math]
[math]\displaystyle{ =y'(I-H)y }[/math]

Mean Squares

Mean squares are obtained by dividing the sum of squares with their associated degrees of freedom. The number of degrees of freedom associated with the total sum of squares, [math]\displaystyle{ SS_r }[/math], is ([math]\displaystyle{ n-1 }[/math]) since there are n observations in all, but one degree of freedom is lost in the calculation of the sample mean, [math]\displaystyle{ \bar{y} }[/math]. The total mean square is:

[math]\displaystyle{ MS_r=\frac{SS_r}{n-1} }[/math]

The number of degrees of freedom associated with the regression sum of squares, [math]\displaystyle{ SS_R }[/math], is [math]\displaystyle{ k }[/math]. There are (k+1) degrees of freedom associated with a regression model with (k+1) coefficients, [math]\displaystyle{ \beta_0 }[/math], [math]\displaystyle{ \beta_1 }[/math], [math]\displaystyle{ \beta_2 }[/math].... [math]\displaystyle{ \beta_k }[/math]. However, one degree of freedom is lost because the deviations, ([math]\displaystyle{ \hat{y}_i-\bar{y} }[/math]), are subjected to the constraints that they must sum to zero ([math]\displaystyle{ \sum_{i=1}^n (\hat{y}_i-\bar{y})^2=0 }[/math]). The regression mean square is:

[math]\displaystyle{ MS_R=\frac{SS_R}{k} }[/math]

The number of degrees of freedom associated with the error sum of squares is (), as there are observations in all, but () degrees of freedom are lost in obtaining the estimates of [math]\displaystyle{ \beta_0 }[/math], [math]\displaystyle{ \beta_1 }[/math], [math]\displaystyle{ \beta_2 }[/math]...[math]\displaystyle{ \beta_k }[/math]. to calculate the predicted values, . The error mean square is:


The error mean square, [math]\displaystyle{ MS_E }[/math], is an estimate of the variance, [math]\displaystyle{ \sigma^2 }[/math], of the random error terms, [math]\displaystyle{ \epsilon_i }[/math].

[math]\displaystyle{ \hat{\sigma}^2=MS_E }[/math]

Calculation of the Statistic [math]\displaystyle{ F_0 }[/math]

Once the mean squares [math]\displaystyle{ MS_R }[/math] and [math]\displaystyle{ MS_E }[/math] are known, the statistic to test the significance of regression can be calculated as follows:


[math]\displaystyle{ F_0=\frac{MS_R}{MS_E} }[/math]