One Factor Designs

From ReliaWiki
Jump to navigation Jump to search

New format available! This reference is now available in a new format that offers faster page load, improved display for calculations and images, more targeted search and the latest content available as a PDF. As of September 2023, this Reliawiki page will not continue to be updated. Please update all links and bookmarks to the latest reference at help.reliasoft.com/reference/experiment_design_and_analysis

Chapter 5: One Factor Designs


DOEbox.png

Chapter 5  
One Factor Designs  

Synthesis-icon.png

Available Software:
Weibull++

Examples icon.png

More Resources:
DOE examples


As explained in Simple Linear Regression Analysis and Multiple Linear Regression Analysis, the analysis of observational studies involves the use of regression models. The analysis of experimental studies involves the use of analysis of variance (ANOVA) models. For a comparison of the two models see Fitting ANOVA Models. In single factor experiments, ANOVA models are used to compare the mean response values at different levels of the factor. Each level of the factor is investigated to see if the response is significantly different from the response at other levels of the factor. The analysis of single factor experiments is often referred to as one-way ANOVA.

To illustrate the use of ANOVA models in the analysis of experiments, consider a single factor experiment where the analyst wants to see if the surface finish of certain parts is affected by the speed of a lathe machine. Data is collected for three speeds (or three treatments). Each treatment is replicated four times. Therefore, this experiment design is balanced. Surface finish values recorded using randomization are shown in the following table.


Surface finish values for three speeds of a lathe machine.


The ANOVA model for this experiment can be stated as follows:


Yij=μi+ϵij


The ANOVA model assumes that the response at each factor level, i, is the sum of the mean response at the ith level, μi, and a random error term, ϵij. The subscript i denotes the factor level while the subscript j denotes the replicate. If there are na levels of the factor and m replicates at each level then i=1,2,...,na and j=1,2,...,m. The random error terms, ϵij, are assumed to be normally and independently distributed with a mean of zero and variance of σ2. Therefore, the response at each level can be thought of as a normally distributed population with a mean of μi and constant variance of σ2. The equation given above is referred to as the means model.

The ANOVA model of the means model can also be written using μi=μ+τi, where μ represents the overall mean and τi represents the effect due to the ith treatment.


Yij=μ+τi+ϵij


Such an ANOVA model is called the effects model. In the effects models the treatment effects, τi, represent the deviations from the overall mean, μ. Therefore, the following constraint exists on the τis:


nai=1τi=0


Fitting ANOVA Models

To fit ANOVA models and carry out hypothesis testing in single factor experiments, it is convenient to express the effects model of the effects model in the form y=Xβ+ϵ (that was used for multiple linear regression models in Multiple Linear Regression Analysis). This can be done as shown next. Using the effects model, the ANOVA model for the single factor experiment in the first table can be expressed as:


Yij=μ+τi+ϵij


where μ represents the overall mean and τi represents the ith treatment effect. There are three treatments in the first table (500, 600 and 700). Therefore, there are three treatment effects, τ1, τ2 and τ3. The following constraint exists for these effects:


3i=1τi=0or τ1+τ2+τ3=0


For the first treatment, the ANOVA model for the single factor experiment in the above table can be written as:


Y1j=μ+τ1+0τ2+0τ3+ϵ1j


Using τ3=(τ1+τ2), the model for the first treatment is:


Y1j=μ+τ1+0τ20(τ1+τ2)+ϵ1jor Y1j=μ+τ1+0τ2+ϵ1j


Models for the second and third treatments can be obtained in a similar way. The models for the three treatments are:


First Treatment:Y1j=1μ+1τ1+0τ2+ϵ1jSecond Treatment:Y2j=1μ+0τ1+1τ2+ϵ2jThird Treatment:Y3j=1μ1τ11τ2+ϵ3j


The coefficients of the treatment effects τ1 and τ2 can be expressed using two indicator variables, x1 and x2, as follows:


Treatment Effect τ1:x1=1, x2=0Treatment Effect τ2:x1=0, x2=1 Treatment Effect τ3:x1=1, x2=1 


Using the indicator variables x1 and x2, the ANOVA model for the data in the first table now becomes:

Y=μ+x1τ1+x2τ2+ϵ


The equation can be rewritten by including subscripts i (for the level of the factor) and j (for the replicate number) as:

Yij=μ+xi1τ1+xi2τ2+ϵij


The equation given above represents the "regression version" of the ANOVA model.


Treat Numerical Factors as Qualitative or Quantitative?

It can be seen from the equation given above that in an ANOVA model each factor is treated as a qualitative factor. In the present example the factor, lathe speed, is a quantitative factor with three levels. But the ANOVA model treats this factor as a qualitative factor with three levels. Therefore, two indicator variables, x1 and x2, are required to represent this factor.

Note that in a regression model a variable can either be treated as a quantitative or a qualitative variable. The factor, lathe speed, would be used as a quantitative factor and represented with a single predictor variable in a regression model. For example, if a first order model were to be fitted to the data in the first table, then the regression model would take the form Yij=β0+β1xi1+ϵij. If a second order regression model were to be fitted, the regression model would be Yij=β0+β1xi1+β2xi12+ϵij. Notice that unlike these regression models, the regression version of the ANOVA model does not make any assumption about the nature of relationship between the response and the factor being investigated.

The choice of treating a particular factor as a quantitative or qualitative variable depends on the objective of the experimenter. In the case of the data of the first table, the objective of the experimenter is to compare the levels of the factor to see if change in the levels leads to a significant change in the response. The objective is not to make predictions on the response for a given level of the factor. Therefore, the factor is treated as a qualitative factor in this case. If the objective of the experimenter were prediction or optimization, the experimenter would focus on aspects such as the nature of relationship between the factor, lathe speed, and the response, surface finish, so that the factor should be modeled as a quantitative factor to make accurate predictions.

Expression of the ANOVA Model as Y = + ε

The regression version of the ANOVA model can be expanded for the three treatments and four replicates of the data in the first table as follows:


Y11=6=μ+1τ1+0τ2+ϵ11 Level 1, Replicate 1Y21=13=μ+0τ1+1τ2+ϵ21 Level 2, Replicate 1Y31=23=μ1τ11τ2+ϵ31 Level 3, Replicate 1Y12=13=μ+1τ1+0τ2+ϵ12 Level 1, Replicate 2Y22=16=μ+0τ1+1τ2+ϵ22 Level 2, Replicate 2Y32=20=μ1τ11τ2+ϵ32 Level 3, Replicate 2...Y34=18=μ1τ11τ2+ϵ34 Level 3, Replicate 4


The corresponding matrix notation is:


y=Xβ+ϵ


where


y=[Y11Y21Y31Y12Y22...Y34]=Xβ+ϵ=[110101111110101.........111][μτ1τ2]+[ϵ11ϵ21ϵ31ϵ12ϵ22...ϵ34]


Thus:


y=Xβ+ϵ[613231316...18]=[110101111110101.........111][μτ1τ2]+[ϵ11ϵ21ϵ31ϵ12ϵ22...ϵ34]


The matrices y, X and β are used in the calculation of the sum of squares in the next section. The data in the first table can be entered into the DOE folio as shown in the figure below.


Single factor experiment design for the data in the first table.

Hypothesis Test in Single Factor Experiments

The hypothesis test in single factor experiments examines the ANOVA model to see if the response at any level of the investigated factor is significantly different from that at the other levels. If this is not the case and the response at all levels is not significantly different, then it can be concluded that the investigated factor does not affect the response. The test on the ANOVA model is carried out by checking to see if any of the treatment effects, τi, are non-zero. The test is similar to the test of significance of regression mentioned in Simple Linear Regression Analysis and Multiple Linear Regression Analysis in the context of regression models. The hypotheses statements for this test are:


H0:τ1=τ2=...=τna=0H1:τi0 for at least one i


The test for H0 is carried out using the following statistic:


F0=MSTRMSE


where MSTR represents the mean square for the ANOVA model and MSE is the error mean square. Note that in the case of ANOVA models we use the notation MSTR (treatment mean square) for the model mean square and SSTR (treatment sum of squares) for the model sum of squares (instead of MSR, regression mean square, and SSR, regression sum of squares, used in Simple Linear Regression Analysis and Multiple Linear Regression Analysis). This is done to indicate that the model under consideration is the ANOVA model and not the regression model. The calculations to obtain MSTR and SSTR are identical to the calculations to obtain MSR and SSR explained in Multiple Linear Regression Analysis.


Calculation of the Statistic F0

The sum of squares to obtain the statistic F0 can be calculated as explained in Multiple Linear Regression Analysis. Using the data in the first table, the model sum of squares, SSTR, can be calculated as:


SSTR=y[H(1nam)J]y=[613..18][0.16670.0833..0.08330.08330.1667..0.0833..........0.08330.0833..0.1667][613..18]=232.1667


In the previous equation, na represents the number of levels of the factor, m represents the replicates at each level, y represents the vector of the response values, H represents the hat matrix and J represents the matrix of ones. (For details on each of these terms, refer to Multiple Linear Regression Analysis.) Since two effect terms, τ1 and τ2, are used in the regression version of the ANOVA model, the degrees of freedom associated with the model sum of squares, SSTR, is two.


dof(SSTR)=2


The total sum of squares, SST, can be obtained as follows:


SST=y[I(1nam)J]y=[613..18][0.91670.0833..0.08330.08330.9167..0.0833..........0.08330.0833..0.9167][613..18]=306.6667


In the previous equation, I is the identity matrix. Since there are 12 data points in all, the number of degrees of freedom associated with SST is 11.


dof(SST)=11


Knowing SST and SSTR, the error sum of squares is:


SSE=SSTSSTR=306.6667232.1667=74.5


The number of degrees of freedom associated with SSE is:


dof(SSE)=dof(SST)dof(SSTR)=112=9


The test statistic can now be calculated using the equation given in Hypothesis Test in Single Factor Experiments as:


f0=MSTRMSE=SSTR/dof(SSTR)SSE/dof(SSE)=232.1667/274.5/9=14.0235


The p value for the statistic based on the F distribution with 2 degrees of freedom in the numerator and 9 degrees of freedom in the denominator is:


p value=1P(Ff0)=10.9983=0.0017


Assuming that the desired significance level is 0.1, since p value < 0.1, H0 is rejected and it is concluded that change in the lathe speed has a significant effect on the surface finish. The Weibull++ DOE folio displays these results in the ANOVA table, as shown in the figure below. The values of S and R-sq are the standard error and the coefficient of determination for the model, respectively. These values are explained in Multiple Linear Regression Analysis and indicate how well the model fits the data. The values in the figure below indicate that the fit of the ANOVA model is fair.


ANOVA table for the data in the first table.

Confidence Interval on the ith Treatment Mean

The response at each treatment of a single factor experiment can be assumed to be a normal population with a mean of μi and variance of σ2 provided that the error terms can be assumed to be normally distributed. A point estimator of μi is the average response at each treatment, y¯i. Since this is a sample average, the associated variance is σ2/mi, where mi is the number of replicates at the ith treatment. Therefore, the confidence interval on μi is based on the t distribution. Recall from Statistical Background on DOE (inference on population mean when variance is unknown) that:


T0=y¯iμiσ^2/mi=y¯iμiMSE/mi


has a t distribution with degrees of freedom =dof(SSE). Therefore, a 100 (1α) percent confidence interval on the ith treatment mean, μi, is:


y¯i±tα/2,dof(SSE)MSEmi


For example, for the first treatment of the lathe speed we have:


μ^1=y¯1=6+13+7+84=8.5


In the DOE folio, this value is displayed as the Estimated Mean for the first level, as shown in the Data Summary table in the figure below. The value displayed as the standard deviation for this level is simply the sample standard deviation calculated using the observations corresponding to this level. The 90% confidence interval for this treatment is:


=y¯1±tα/2,dof(SSE)MSEmi=y¯1±t0.05,9MSE4=8.5±1.833(74.5/9)4=8.5±1.833(1.44)=8.5±2.64


The 90% limits on μ1 are 5.9 and 11.1, respectively.


Data Summary table for the single factor experiment in the first table.

Confidence Interval on the Difference in Two Treatment Means

The confidence interval on the difference in two treatment means, μiμj, is used to compare two levels of the factor at a given significance. If the confidence interval does not include the value of zero, it is concluded that the two levels of the factor are significantly different. The point estimator of μiμj is y¯iy¯j. The variance for y¯iy¯j is:


var(y¯iy¯j)=var(y¯i)+var(y¯j)=σ2/mi+σ2/mj


For balanced designs all mi=m. Therefore:


var(y¯iy¯j)=2σ2/m


The standard deviation for y¯iy¯j can be obtained by taking the square root of var(y¯iy¯j) and is referred to as the pooled standard error:


Pooled Std. Error=var(y¯iy¯j)=2σ^2/m


The t statistic for the difference is:


T0=y¯iy¯j(μiμj)2σ^2/m=y¯iy¯j(μiμj)2MSE/m


Then a 100 (1- α) percent confidence interval on the difference in two treatment means, μiμj, is:


y¯iy¯j±tα/2,dof(SSE)2MSEm


For example, an estimate of the difference in the first and second treatment means of the lathe speed, μ1μ2, is:

μ^1μ^2=y¯1y¯2=8.513.25=4.75


The pooled standard error for this difference is:


Pooled Std. Error=var(y¯1y¯2)=2σ^2/m=2MSE/m=2(74.5/9)4=2.0344


To test H0:μ1μ2=0, the t statistic is:


t0=y¯1y¯2(μ1μ2)2MSE/m=4.75(0)2(74.5/9)4=4.752.0344=2.3348


In the DOE folio, the value of the statistic is displayed in the Mean Comparisons table under the column T Value as shown in the figure below. The 90% confidence interval on the difference μ1μ2 is:


=y¯1y¯2±tα/2,dof(SSE)2MSEm=y¯1y¯2±t0.05,92MSEm=4.75±1.8332(74.5/9)4=4.75±1.833(2.0344)=4.75±3.729


Hence the 90% limits on μ1μ2 are 8.479 and 1.021, respectively. These values are displayed under the Low CI and High CI columns in the following figure. Since the confidence interval for this pair of means does not included zero, it can be concluded that these means are significantly different at 90% confidence. This conclusion can also be arrived at using the p value noting that the hypothesis is two-sided. The p value corresponding to the statistic t0=2.3348, based on the t distribution with 9 degrees of freedom is:


p value=2×(1P(T|t0|))=2×(1P(T2.3348))=2×(10.9778)=0.0444


Since p value < 0.1, the means are significantly different at 90% confidence. Bounds on the difference between other treatment pairs can be obtained in a similar manner and it is concluded that all treatments are significantly different.


Mean Comparisons table for the data in the first table.

Residual Analysis

Plots of residuals, eij, similar to the ones discussed in the previous chapters on regression, are used to ensure that the assumptions associated with the ANOVA model are not violated. The ANOVA model assumes that the random error terms, ϵij, are normally and independently distributed with the same variance for each treatment. The normality assumption can be checked by obtaining a normal probability plot of the residuals.


Equality of variance is checked by plotting residuals against the treatments and the treatment averages, y¯i (also referred to as fitted values), and inspecting the spread in the residuals. If a pattern is seen in these plots, then this indicates the need to use a suitable transformation on the response that will ensure variance equality. Box-Cox transformations are discussed in the next section. To check for independence of the random error terms residuals are plotted against time or run-order to ensure that a pattern does not exist in these plots. Residual plots for the given example are shown in the following two figures. The plots show that the assumptions associated with the ANOVA model are not violated.


Normal probability plot of residuals for the single factor experiment in the first table.


Plot of residuals against fitted values for the single factor experiment in the first table.

Box-Cox Method

Transformations on the response may be used when residual plots for an experiment show a pattern. This indicates that the equality of variance does not hold for the residuals of the given model. The Box-Cox method can be used to automatically identify a suitable power transformation for the data based on the relationship:


Y=Yλ


λ is determined using the given data such that SSE is minimized. The values of Yλ are not used as is because of issues related to calculation or comparison of SSE values for different values of λ. For example, for λ=0 all response values will become 1. Therefore, the following relationship is used to obtain Yλ :


Yλ={yλ1λy˙λ1λ0y˙lnyλ=0


where y˙=ln1[(1/n)lny]. Once all Yλ values are obtained for a value of λ, the corresponding SSE for these values is obtained using yλ[IH]yλ. The process is repeated for a number of λ values to obtain a plot of SSE against λ. Then the value of λ corresponding to the minimum SSE is selected as the required transformation for the given data. The DOE folio plots lnSSE values against λ values because the range of SSE values is large and if this is not done, all values cannot be displayed on the same plot. The range of search for the best λ value in the software is from 5 to 5, because larger values of of λ are usually not meaningful. The DOE folio also displays a recommended transformation based on the best λ value obtained as per the second table.


Recommended Box-Cox power transformations.


Confidence intervals on the selected λ values are also available. Let SSE(λ) be the value of SSE corresponding to the selected value of λ. Then, to calculate the 100 (1- α) percent confidence intervals on λ, we need to calculate SS as shown next:


SS=SSE(λ)(1+tα/2,dof(SSE)2dof(SSE))


The required limits for λ are the two values of λ corresponding to the value SS (on the plot of SSE against λ). If the limits for λ do not include the value of one, then the transformation is applicable for the given data. Note that the power transformations are not defined for response values that are negative or zero. The DOE folio deals with negative and zero response values using the following equations (that involve addition of a suitable quantity to all of the response values if a zero or negative response value is encountered).


y(i)=y(i)+|ymin|×1.1Negative Responsey(i)=y(i)+1Zero Response


Here ymin represents the minimum response value and |ymin| represents the absolute value of the minimum response.

Example

To illustrate the Box-Cox method, consider the experiment given in the first table. Transformed response values for various values of λ can be calculated using the equation for Yλ given in Box-Cox Method. Knowing the hat matrix, H, SSE values corresponding to each of these λ values can easily be obtained using yλ[IH]yλ. SSE values calculated for λ values between 5 and 5 for the given data are shown below:


λSSElnSSE55947.88.690841946.47.57373696.56.54612282.25.64251135.84.9114083.94.4299174.54.31082101.04.61543190.45.24914429.56.062751057.66.9638


A plot of lnSSE for various λ values, as obtained from the DOE folio, is shown in the following figure. The value of λ that gives the minimum SSE is identified as 0.7841. The SSE value corresponding to this value of λ is 73.74. A 90% confidence interval on this λ value is calculated as follows. SS can be obtained as shown next:


SS=SSE(λ)(1+tα/2,dof(SSE)2dof(SSE))=73.74(1+t0.05,929)=73.74(1+1.83329)=101.27


Therefore, lnSS=4.6178. The λ values corresponding to this value from the following figure are 0.4689 and 2.0054. Therefore, the 90% confidence limits on are 0.4689 and 2.0054. Since the confidence limits include the value of 1, this indicates that a transformation is not required for the data in the first table.


Box-Cox power transformation plot for the data in the first table.