Some time ago a PhD student of mine needed to conduct simple main effects for several different ANOVA designs. As the explanation of how to do this is a bit fiddly (and it is the kind of thing I get asked about quite a lot) I decided to write the explanation down. This later turned into a web page and now I'm updating it for this blog. The main focus is on explaining how to do the calculations based on output from generic ANOVA software. The calculation methods I describe seem obvious to me, but intended to be the best or most efficient method. Rather, their aim is to make clear what the calculation is doing and to be sufficiently robust that one can adapt them to most (if not all) standard ANOVA software. For any given software there will probably be more efficient solutions (e.g., via syntax in SPSS).
This post aims to set out how to calculate simple main effects for common ANOVA designs. Some statistics packages will calculate simple main effects for you, but many will not do so or will only do so for certain designs. There is also some confusion as to how to calculate simple main effects for some designs. For this reason I aim to set out generic methods to calculate simple main effects that should work for any software package that can cope with common ANOVA designs (even if they don't support direct calculation of simple main effects).
What is a simple main effect?
In analysis of variance (ANOVA) we are often interested only in the effects of a single factor. These effects are termed main effects. A one-way (one factor) ANOVA therefore has only a single main effect. The F test for a main effect tests the hypothesis that the means differ. 1 In a design with more than one factor (e.g., a two-way design with two factors) we can also look at interaction effects. An interaction between two factors (a two-way interaction) is present if the effects of the factors are not independent. In general researchers will ignore a non-significant
two-way interaction and interpret the two main effects (which are considered to be independent of each other).2
If a two-way interaction is significant it may not make sense to interpret the main effects on their own (because they represent “average” effects of a factor which are known to vary between levels of the other factor).3 Instead, it sometimes a good idea to look at the effect of one factor separately for each level of the other factor. These separate analyses are called simple main effects.
For example, consider a 2 x 2 ANOVA design with gender and age as the factors and anxiety prior to a maths test as the dependent variable. Imagine the means look like this:
This pattern shows a fairly clear interaction and also suggests there might be main effects of both age and gender (assuming equal cell sizes).
Let us assume that all three effects are significant. In this example it makes little sense to pay much attention to the main effect of age (because while age does increase anxiety on average, it would be slightly misleading to interpret this as a general effect). On the other hand it doesn't seem unreasonable to interpret the main effect of gender as a general effect (as anxiety seems quite a bit higher for females regardless of age). Nevertheless, to be sure, it seems sensible to look at the effects of age separately for males and females (and possibly also the the effects of gender for young and old participants).
The test of the effect of age on anxiety for male participants is sometimes called the simple main effect of age at "male". The test of the effect of age on anxiety for female participants is called the simple main effect of age at "female".4
Note that there are are 4 simple main effects for a 2x2 design, 5 for a 3x2 design and so forth.5
The F ratio
As with any ANOVA test we'll want to come up with an F ratio to test the effect.6 The F ratio is the ratio of the variance estimate for the treatment effect to the variance estimate of the error. In ANOVA these variance estimates are called mean squares and are calculated by dividing the sums of squares for a source of variation by its degrees of freedom.
The mean square error will depend on whether the factor which is being looked at is independent measures (between subjects) or repeated measures (within subjects). For a mixed design the mean square error depends on which factor is being looked at. For an independent measures factor a pooled error term is used, while for a repeated measures design a non-pooled error term is used. It turns out that obtaining a pooled error term from a mixed ANOVA printout is not trivial so I will return to this later on (see the discussion of pooled errors for mixed ANOVA designs below).
Obtaining the treatment mean square:
One easy way to calculate the treatment mean square is to run separate one-way ANOVAs for each level of the second factor. We can simply read off the mean square value from the print out.
For example, for the simple main effect of gender at young we would merely run a one-way ANOVA with gender as the factor (taking care to exclude all the old participants!).
Obtaining the error mean square (independent measures):
I think the easiest way to obtain the pooled error term used for a pure independent measures design is to read it off the output from the original ANOVA (i.e., the ANOVA with all factors in the analysis). In an independent measures design the pooled error term averages over all the participants and therefore minimizes the effect of individual differences and thus produces the most accurate test.
The appropriate F ratio can be calculated by dividing the treatment mean square by the error mean square in the normal way. This can be checked for significance using tables or by exact methods using software such as Excel or SPSS (both of which have look-up functions for the significance of the central F distribution).
Obtaining the error mean square (repeated measures):
The easiest way to obtain the non-pooled error term required for repeated measures factors is to run separate one-way ANOVAs for each level of the second factor. . A big advantage of this method is that that as the mean square treatment is also obtained this way, the output will give you the correct F ratio and observed significance (p value).(In some packages, notably SPSS, the simplest ways to do this is are i) copy and paste these data into new columns, or ii) to save the data file under k different names - where k is the number of levels, and delete unwanted levels before running the one-way repeated measures ANOVA. The latter method avoids you having to re-type variable names and other information).
The choice of the non-pooled error term in within-subjects simple effects mirrors the choice in standard repeated measures designs and is now generally accepted as the appropriate procedure.7
In a mixed ANOVA design there is at least one repeated measures factor and at least one independent measures factor. Simple main effects for the repeated measures (within subjects)factors should use a non-pooled error term (just as with standard repeated measures design outlined above).
For the independent measures factors we want to use the pooled error estimate from our main analysis. This is problematic because there isn't a single pooled error term in the mixed ANOVA output. Instead, we must take the error terms from the mixed ANOVA output and calculate the pooled error term 'by hand'. Howell (2002, pp. 490-493) describes a (relatively) simple procedure for this.
Calculating the pooled (within cells) SS from the mixed ANOVA output is fairly easy. The first step is to identify the two error terms from the original mixed ANOVA output. These will usually be clearly labelled as error terms.8
Designate one as error v and one as error u and combine the sums of squares and mean square as follows:
Pooled SS = SSu + SSv
Pooled MS = Pooled SS / (dfu + dfv)
(Where SS stands for sums of squares, MS for mean square and df for degrees of freedom for u or v as appropriate).
So far so good ...
Pooling the sums of squares in this way is fine except that the pooled Mean Square is derived from two different variance estimates. This makes it tricky to work out the correct F distribution for the pooled mean square error. This in turn means that calculating a significance (p) value is tricky. It turns out that we can use the following formula to derive an error d.f. (called f') that gives us the correct p value:
f' = (u + v)2 / (u2/dfu + v2/dfv)
[Note: u and v are the SS for the two error terms]
Once you have obtained the pooled MS error simply calculate your simple main effect ratio as normal (F = MStreatment/MSerror) and evaluate against the usual treatment d.f. and using f' as the error d.f.
While this isn't too difficult it can be a bit fiddly. My preferred solution is to set up a spreadsheet in Excel or a similar program to calculate all these for you (see Resources below).
Looking up significance of F:
Provided you know the F ratio, the treatment df and error df looking up significance is easy. The traditional option is to use tables of significance. These allow you to look up the critical value of F or Fcrit that is required to reach significance for a given alpha level. If the observed level of F equals or exceeds Fcrit then the test is signifiant at that alpha level. The main drawback of this procedure is that exact significance (p values) can not be obtained (though they can be estimated in some cases).
An alternative strategy is to use a computer program to look up the exact significance (p) value. For example, in Excel one can use the FDIST function.
e.g., FDIST(3.29,1,42) returns the value 0.076851736
Here 3.29 is the observed F ratio, 1 is the treatment d.f. and 42 the error d.f. The p value for this test is therefore marginally non-significant. We could thus report the test as F1,42 = 3.29, p = .0769.
Here is a very basic (and certainly not very pretty) Excel spreadsheet that calculates the pooled MS error used for independent factors in a mixed ANOVA design. The spreadsheet takes as input six numbers (labelled in bold in the spreadsheet). These are SSu, SSv, dfu, dfv (the SS and d.f. for the two error terms in the mixed design), the SStreatment and the dftreatment. The latter two are obtained as normal for simple main effects (e.g., they can be readily taken from the output of separate one-way ANOVAs for each level of a second factor.
For convenience the spreadsheet also uses the FDIST function to calculate the p value for the simple main effect (using f' as the error d.f. as described above). This function doesn't work in Google Docs but just work just fine in Excel (just select Export from the File menu and save it as an .xls file).
The default values in the spreadsheet are taken from Howell (2002, pp. 492-3).
Howell, D. C. (2002). Statistical Methods for Psychology
(5th. ed.). Pacific Grove, CA: Duxberry.
[David Howell has just released a seventh edition which looks excellent. At first glance the main changes are in layout and use of software examples.]
1 Strictly, the usual interpretation is that if the F ratio is significant then there is evidence against the null hypothesis that the means are all equal.
2 I have glossed over the finer points of interaction analysis for these pages. There are occasions when non-significant interactions should probably be looked at more closely and occasions when significant interactions should probably be ignored. This relates to the power of the study and the size of the interaction effect. A powerful study may find detect interactions that are negligible in size and a study that lacks power may fail to show significance for an important or large interaction effect.
3 Interpreting main effects in the light of an interaction effect is more of an art than a science. In a mathematical sense interaction effects are independent of main effects and can be interpreted separately. However, verbal interpretations of analyses aren't always sensitive to the mathematics of the situation. The context of the study is also important in the interpretation of the results. If in doubt, always interpret the interaction effect first. Also, if the simple main effects show a consistent pattern (e.g., significant differences or close to significant at all levels of a factor) then the main effects are probably straight-forward to report. In the example given here it probably makes sense to report and interpret the main effect of gender (females are generally more anxious than males in this context), but not age (as it seems misleading to interpret a significant main effect of gender as indicating that older people are generally more anxious in this context).
This decision might differ in another situation (with identical numbers). For example, if the study looked at reading preferences and the factors were gender and font type (serif or sans serif) we might conclude that the main effect of font type was worth interpreting as, in general, people do prefer the serif font. The mathematics hasn't changed, but the context has – in this case the serif font seems like a sensible compromise that pleases most people most of the time (though even in this case the simple main effects might be useful for deciding what font to use for a motoring magazine aimed at men or a lifestyle magazine aimed at women). If there is a moral to be learned from this digression (other than that the author tends to ramble on a bit) it is that applied statistics is much more than mathematics and needs to take account of the context and the goals of the research.
4 In this case the terminology is a bit awkward and most authors would say "the simple main effect of age for the male participants" or similar. In other cases the terminology is quite handy.
5 This should be fairly obvious to work out if you think about it. With a 3x2 design there are 2 simple main effects for the first factor (one for each level of the other factor) plus 3 for the second factor (one for each level of the first factor).
6 Some readers will note that we could also calculate a t ratio if simple main effect has 1 d.f. for the treatment (as is the case for my example). This this t ratio would merely be the square root of the F ratio. In some cases the software one has available makes t easier or quicker to calculate than F so this relationship will prove useful.
7 Pooling the error term would also increase the potential for problems with the repeated measures sphericity assumption. If the simple main effect has 1 treatment d.f. then using a non-pooled error term will mean any problems with spheicity are avoided. If the simple main effect has more than 1 treatment d.f. and sphericity is considered to be untenable the researcher should probably consider whether the substantive hypothesis being investigated can be tested with a specific contrast of means. See also my guidance on checking and correcting sphericity violations.
8 In any case the terms to look for are the "Subjects within ..." term and the "x [or by] Subjects within ..." interaction term if they are not labelled as error terms. For example, a recent version of SPSS labels these simply "Error" in the "Tests of Between-Subjects Effects" table and "Error(Repeated factor A x Repeated factor B x ...)" in "Tests of Within-Subjects Effects".