*12*

To understand the MANOVA, it first helps to understand the ANOVA.

An ANOVA (analysis of variance) is used to determine whether or not there is a statistically significant difference between the means of three or more independent groups.

For example, suppose we want to know whether or not studying technique has an impact on exam scores for a class of students. We randomly split the class into three groups. Each group uses a different studying technique for one month to prepare for an exam. At the end of the month, all of the students take the same exam.

To find out if studying technique impacts exam scores, we can conduct a one-way ANOVA, which will tell us if if there is a statistically significant difference between the mean scores of the three groups.

In an ANOVA, we have one response variable. However, in a **MANOVA** (multivariate analysis of variance) we have multiple response variables.

For example, suppose we want to know how level of education (i.e. high school, associates degree, bachelors degrees, masters degree, etc.) impacts both annual income and amount of student loan debt. In this case, we have one factor (level of education) and two response variables (annual income and student loan debt), so we could conduct a one-way MANOVA.

**Related:** Understanding the Differences Between ANOVA, ANCOVA, MANOVA, and MANCOVA

**How to Conduct a MANOVA in R**

In the following example, we’ll illustrate how to conduct a one-way MANOVA in R using the built-in dataset **iris**, which contains information about the length and width of different measurements of flowers for three different species (“setosa”, “virginica”, “versicolor”):

#view first six rows ofirisdataset head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #1 5.1 3.5 1.4 0.2 setosa #2 4.9 3.0 1.4 0.2 setosa #3 4.7 3.2 1.3 0.2 setosa #4 4.6 3.1 1.5 0.2 setosa #5 5.0 3.6 1.4 0.2 setosa #6 5.4 3.9 1.7 0.4 setosa

Suppose we want to know if species has any effect on sepal length and sepal width. Using *species* as the independent variable, and *sepal length* and *sepal width* as the response variables, we can conduct a one-way MANOVA using the **manova() **function in R.

The **manova()** function uses the following syntax:

**manova(cbind(rv1, rv2, …) ~ iv, data)**

where:

**rv1, rv2**: response variable 1, response variable 2, etc.**iv**: independent variable**data**: name of the data frame

In our example with the iris dataset, we can fit a MANOVA and view the results using the following syntax:

#fit the MANOVA model model #view the results summary(model) # Df Pillai approx F num Df den Df Pr(>F) #Species 2 0.94531 65.878 4 294

From the output we can see that the F-statistic is 65.878 and the corresponding p-value is extremely small. This indicates that there is a statistically significant difference in sepal measurements based on species.

Technical Note:By default, manova() uses thePillaitest statistic. Since the distribution of this test statistic is complex, an approximate F value is also provided for easier interpretation.

In addition, it’s possible to specify “Roy”, “Hotelling-Lawley”, or “Wilks” as the test statistic to be used by using the following syntax: summary(model, test = ‘Wilks’)

To find out exactly how both *sepal length *and *sepal width *are affected by *species*, we can perform univariate ANOVAs using** summary.aov() **as shown in the following code:

summary.aov(model) # Response Sepal.Length : # Df Sum Sq Mean Sq F value Pr(>F) #Species 2 63.212 31.606 119.26 F) #Species 2 11.345 5.6725 49.16

We can see from the output that the p-values for both univariate ANOVAs are extremely small (species has a statistically significant effect on both *sepal width *and *sepal length*.

**Visualizing Group Means**

It can also be helpful to visualize the group means for each level of our independent variable *species* to gain a better understanding of our results.

For example, we can use the **gplots **library and the **plotmeans() **function to visualize the mean *sepal length* by *species*:

#loadgplotslibrary library(gplots) #visualize meansepal lengthbyspeciesplotmeans(iris$Sepal.Length ~ iris$Species)

From the plot we can see that the mean sepal length varies quite a bit by species. This matches the results from our MANOVA, which told us that there was a statistically significant difference in sepal measurements based on species.

We can also visualize the mean *sepal width *by *species*:

plotmeans(iris$Sepal.Width ~ iris$Species)

View the full RDocumentation for the **manova()** function here.