# Analysis of variance (ANOVA) table

tải về 0.49 Mb.
 trang 7/11 Chuyển đổi dữ liệu 06.06.2018 Kích 0.49 Mb.

## Analysis of variance (ANOVA) table

The main output from an analysis of variance study arranged in a table. Lists the sources of variation, their degrees of freedom, the total sum of squares, and the mean squares. The analysis of variance table also includes the F-statistics and p-values. Use these to determine whether the predictors or factors are significantly related to the response.

ANOVA tables are also used in regression and DOE analyses.

Here are the components of an ANOVA table:

    Source - indicates the source of variation, either from the factor, the interaction, or the error. The total is a sum of all the sources.

    DF - degrees of freedom from each source. If a factor has three levels, the degrees of freedom is 2 (n-1). If you have a total of 30 observations, the degrees of freedom total is 29 (n - 1).

    SS - sum of squares between groups (factor) and the sum of squares within groups (error)

    MS - mean squares are found by dividing the sum of squares by the degrees of freedom.

    F - calculate by dividing the factor MS by the error MS; you can compare this ratio against a critical F found in a table or you can use the p-value to determine whether a factor is significant.

    P - use to determine whether a factor is significant; typically compare against an alpha value of 0.05. If the p-value is lower than 0.05, then the factor is significant.

### One-way ANOVA table

Suppose you run an ANOVA to determine which of three different colored flyers produced the most sales. You set up the ANOVA so that your factor is "flyer color" which has the three levels of "black and white", "red" and "yellow." Your response variable is weekly sales during the test period, 10 weeks. Since you are examining one factor you use a one-way ANOVA.
 Source DF SS MS F P Factor 2 20877338 10438669 136.82 0.000 Error 27 2060002 76296 Total 29 22937340

### The p-value of 0.000 indicates that the factor of color is significant.

For a two-way ANOVA, you will have two factors and an interaction term. For DOE and regression applications you can have several factors, or sources of variation.

## Lack-of-fit tests

Used in regression and DOE, lack-of-fit tests assess the fit of your model. If the p-value is less than your selected -level, evidence exists that your model does not accurately fit the data. You may need to add terms or transform your data to more accurately model the data. Minitab calculates two types of lack-of-fit tests:

Pure error lack of fit test: Use if your data contain replicates (multiple observations with identical x-values) and you are reducing your model.  Replicates represent "pure error" because only random variation can cause differences between the observed response values. If you are reducing your model and the resulting p-value for lack-of-fit is less than your selected -level, then you should retain the term you removed from the model.

Data subsetting lack of fit test: Use if your data do not contain replicates and you want to determine if you are accurately modeling the curvature. This method identifies curvature in the data and interactions among predictors that may affect the model fit. Whenever the Data Subsetting p-value is less than the -level, Minitab displays the message "Possible curvature in variable X (P-Value = 0.006 )." Evidence exists that this curvature is not adequately modeled. After examining the raw data in a scatterplot, you might try including a higher-order term to model the curvature.

## Residual Plot Choices main topic

Minitab generates residual plots that you can use to examine the goodness of model fit. You can choose the following residual plots:

    Histogram of residuals . An exploratory tool to show general characteristics of the data, including:

Typical values, spread or variation, and shape

    Unusual values in the data

Long tails in the plot may indicate skewness in the data. If one or two bars are far from the others, those points may be outliers. Because the appearance of the histogram changes depending on the number of intervals used to group the data, use the normal probability plot and goodness-of-fit tests to assess the normality of the residuals.

    Normal probability plot of residuals. The points in this plot should generally form a straight line if the residuals are normally distributed. If the points on the plot depart from a straight line, the normality assumption may be invalid. If your data have fewer than 50 observations, the plot may display curvature in the tails even if the residuals are normally distributed. As the number of observations decreases, the probability plot may show substantial variation and nonlinearity even if the residuals are normally distributed. Use the probability plot and goodness-of-fit tests, such as the Anderson-Darling statistic , to assess whether the residuals are normally distributed.

You can display the Anderson-Darling statistic (AD) on the plot, which can indicate whether the data are normal. If the p-value is lower than the chosen a-level , the data do not follow a normal distribution. To display the Anderson-Darling statistic, choose Tools > Options >  Individual Graphs > Residual Plots. For additional tests of normality, see Stat > Basic Statistics > Normality Test.

    Residuals versus fits . This plot should show a random pattern of residuals on both sides of 0. If a point lies far from the majority of points, it may be an outlier . Also, there should not be any recognizable patterns in the residual plot. The following may indicate error that is not random:

    a series of increasing or decreasing points

    a predominance of positive residuals, or a predominance of negative residuals

    patterns, such as increasing residuals with increasing fits

    Residuals versus order. This is a plot of all residuals in the order that the data was collected and can be used to find non-random error, especially of time-related effects. A positive correlation is indicated by a clustering of residuals with the same sign. A negative correlation is indicated by rapid changes in the signs of consecutive residuals.

    Four in one. Select this option to produce a normal plot of residuals, a histogram of residuals, a plot of residuals versus fits, and a plot of residuals versus order in one graph window.

    Residuals versus other variables. This is a plot of all residuals versus another variable. Plot the residuals against:

    Each predictor to look for curvature or differences in the magnitude of the residuals

    Important variables left out of the model to see if they have critical additional effects on the response.

If certain residual values are of concern, you can brush your graph to identify them. See graph brushing.

Residual

The difference between an observed value (y) and its corresponding fitted value (ŷ). For example, the scatterplot below plots men's weight against their height; the regression line plots the fitted values of weight for each observed value of height. Suppose a man is 6 feet tall and the fitted value of his weight is 190 lbs. If his actual weight is 200, the residual is 10.  If his actual weight is 175, the residual is -5.

 weight height

Residual values are especially useful in regression and ANOVA procedures because they indicate the extent to which a model accounts for the variation in the observed data.

Cơ sở dữ liệu được bảo vệ bởi bản quyền ©hocday.com 2019
được sử dụng cho việc quản lý

Quê hương