*15*

**Dixon’s Q Test**, often referred to simply as the **Q Test**, is a statistical test that is used for detecting outliers in a dataset.

The test statistic for the Q test is as follows:

**Q** = |x_{a} – x_{b}| / R

where **x _{a}** is the suspected outlier,

**x**is the data point closest to x

_{b}_{a}, and

**R**is the range of the dataset. In most cases, x

_{a}is the maximum value in the dataset but it can also be the minimum value.

It’s important to note that the Q test is typically performed on small datasets and the test assumes that the data is normally distributed. It’s also important to note that the Q test should only be conducted one time for a given dataset.

**How to Conduct Dixon’s Q Test By Hand**

Suppose we have the following dataset:

**1, 3, 5, 7, 8, 9, 13, 25**

We can follow the standard five-step procedure for hypothesis testing to conduct Dixon’s Q Test by hand to determine if the maximum value in this dataset is an outlier:

**Step 1. State the hypotheses. **

The null hypothesis (H0): The max is not an outlier.

The alternative hypothesis: (Ha): The max *is *an outlier.

**Step 2. Determine a significance level to use.**

Common choices are 0.1, 0.05, and 0.01. We will use a .05 level of significance for this example.

**Step 3. Find the test statistic.**

**Q** = |x_{a} – x_{b}| / R

In this case, our max value is x_{a }= 25, our next closest value is x_{b }= 13, and our range is R = 25 – 1 = 24.

Thus, **Q ** = |25 – 13| / 24 = **0.5**.

Next, we can compare this test statistic to the Q test critical values, which are shown below for various sample sizes (n) and confidence levels:

**n 90% 95% 99%**

**3 ** 0.941 0.970 0.994

**4** 0.765 0.829 0.926

**5** 0.642 0.710 0.821

**6** 0.560 0.625 0.740

**7** 0.507 0.568 0.680

**8** 0.468 0.526 0.634

**9** 0.437 0.493 0.598

**10** 0.412 0.466 0.568

**11** 0.392 0.444 0.542

**12** 0.376 0.426 0.522

**13** 0.361 0.410 0.503

**14** 0.349 0.396 0.488

**15** 0.338 0.384 0.475

**16** 0.329 0.374 0.463

**17** 0.320 0.365 0.452

**18** 0.313 0.356 0.442

**19** 0.306 0.349 0.433

**20** 0.300 0.342 0.425

**21** 0.295 0.337 0.418

**22** 0.290 0.331 0.411

**23** 0.285 0.326 0.404

**24** 0.281 0.321 0.399

**25** 0.277 0.317 0.393

**26** 0.273 0.312 0.388

**27** 0.269 0.308 0.384

**28** 0.266 0.305 0.380

**29** 0.263 0.301 0.376

**30** 0.260 0.290 0.372

The critical value for a sample size of 8 and a confidence level of 95% is **0.526**.

**Step 4. Reject or fail to reject the null hypothesis.**

Since our test statistic Q (0.5) is less than the critical value (0.526), we fail to reject the null hypothesis.

**Step 5. Interpret the results. **

Since we failed to reject the null hypothesis, we conclude that the max value *25 *is not an outlier in this dataset.

**How to Conduct Dixon’s Q Test in R**

To conduct Dixon’s Q Test on the same dataset in R, we can use the **dixon.test() **function from the **outliers **library, which uses the following syntax:

dixon.test(data, , type = 10, opposite = FALSE)

**data:**a numeric vector of data values**type:**the type of formula to use to conduct the test statistic Q. Set to 10 to use the formula outlined earlier.**opposite:**If FALSE, the test determines if the maximum value is an outlier. If TRUE, the test determines if the minimum value is an outlier. This is FALSE by default.

*Note*: *Find the complete documentation for dixon.test() here.*

The following code illustrates how to conduct Dixon’s Q Test to determine if the maximum value in the dataset is an outlier.

#load theoutlierslibrary library(outliers) #create data data #conduct Dixon's Q Test dixon.test(data, type = 10) # Dixon test for outliers # #data: data #Q = 0.5, p-value = 0.06913 #alternative hypothesis: highest value 25 is an outlier

From the output we can see that the test statistic is Q = **0.5** and the corresponding p-value is **0.06913**. Thus, we fail to reject the null hypothesis at a 0.05 significance level and conclude that *25 *is not an outlier. This matches the result we got by hand.