*241*

Quadratic discriminant analysis is a method you can use when you have a set of predictor variables and youâ€™d like to classify a response variable into two or more classes. It is considered to be the non-linear equivalent to linear discriminant analysis.

This tutorial provides a step-by-step example of how to perform quadratic discriminant analysis in R.

**Step 1: Load Necessary Libraries**

First, weâ€™ll load the necessary libraries for this example:

**library(MASS)
library(ggplot2)**

**Step 2: Load the Data**

For this example, weâ€™ll use the built-inÂ **iris** dataset in R. The following code shows how to load and view this dataset:

#attachirisdataset to make it easy to work with attach(iris) #view structure of dataset str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 ...

We can see that the dataset contains 5 variables and 150 total observations.

For this example weâ€™ll build a quadratic discriminant analysis model to classify which species a given flower belongs to.

Weâ€™ll use the following predictor variables in the model:

- Sepal.length
- Sepal.Width
- Petal.Length
- Petal.Width

And weâ€™ll use them to predict the response variable *Species*, which takes on the following three potential classes:

- setosa
- versicolor
- virginica

**Step 3: Create Training and Test Samples**

Next, weâ€™ll split the dataset into a training set to train the model on and a testing set to test the model on:

#make this example reproducible set.seed(1) #Use 70% of dataset as training set and remaining 30% as testing set sample TRUE, FALSE), nrow(iris), replace=TRUE, prob=c(0.7,0.3)) train

**Step 4: Fit the QDA Model**

Next, weâ€™ll use the qda() function from theÂ **MASS** package to fit the QDA model to our data:

#fit QDA model model #view model output model Call: qda(Species ~ ., data = train) Prior probabilities of groups: setosa versicolor virginica 0.3207547 0.3207547 0.3584906 Group means: Sepal.Length Sepal.Width Petal.Length Petal.Width setosa 4.982353 3.411765 1.482353 0.2411765 versicolor 5.994118 2.794118 4.358824 1.3676471 virginica 6.636842 2.973684 5.592105 2.0552632

Here is how to interpret the output of the model:

**Prior probabilities of group:Â **These represent the proportions of each Species in the training set. For example, 35.8% of all observations in the training set were of speciesÂ *virginica*.

**Group means:** These display the mean values for each predictor variable for each species.

**Step 5: Use the Model to Make Predictions**

Once weâ€™ve fit the model using our training data, we can use it to make predictions on our test data:

#use QDA model to make predictions on test data predicted predict(model, test) names(predicted) [1] "class" "posterior" "x"

This returns a list with two variables:

**class:**The predicted class**posterior:**The posterior probability that an observation belongs to each class

We can quickly view each of these results for the first six observations in our test dataset:

#view predicted class for first six observations in test set head(predicted$class) [1] setosa setosa setosa setosa setosa setosa Levels: setosa versicolor virginica #view posterior probabilities for first six observations in test set head(predicted$posterior) setosa versicolor virginica 4 1 7.224770e-20 1.642236e-29 6 1 6.209196e-26 8.550911e-38 7 1 1.248337e-21 8.132700e-32 15 1 2.319705e-35 5.094803e-50 17 1 1.396840e-29 9.586504e-43 18 1 7.581165e-25 8.611321e-37

**Step 6: Evaluate the Model**

We can use the following code to see what percentage of observations the QDA model correctly predicted the Species for:

#find accuracy of model mean(predicted$class==test$Species) [1] 1

It turns out that the model correctly predicted the Species for **100%** of the observations in our test dataset.

In the real-world an QDA model will rarely predict every class outcome correctly, but this iris dataset is simply built in a way that machine learning algorithms tend to perform very well on it.

You can find the complete R code used in this tutorial here.