The Jarque-Bera test is a goodness-of-fit test that determines whether or not sample data have skewness and kurtosis that matches a normal distribution.
The test statistic of the Jarque-Bera test is always a positive number and if it’s far from zero, it indicates that the sample data do not have a normal distribution.
The test statistic JB is defined as:
JBÂ =[(n-k+1) / 6] * [S2 + (0.25*(C-3)2)]
where n is the number of observations in the sample, k is the number of regressors (k=1 if not used in the context of regression), S is the sample skewness, and C is the sample kurtosis.
Under the null hypothesis of normality, JB ~ X2(2)
This tutorial explains how to conduct a Jarque-Bera test in R.
Jarque-Bera test in R
To conduct a Jarque-Bera test for a sample dataset, we can use the tseries package:
#install (if not already installed) and load tseries package if(!require(tseries)){install.packages('tseries')} #generate a list of 100 normally distributed random variables dataset #conduct Jarque-Bera test jarque.bera.test(dataset)
This generates the following output:
This tells us that the test statistic is 0.67446 and the p-value of the test is 0.7137. In this case, we would fail to reject the null hypothesis that the data is normally distributed.
This result shouldn’t be surprising since the dataset we generated is composed of 100 random variables that follow a normal distribution.
Consider instead if we generated a dataset that was comprised of a list of 100 uniformly distributed random variables:
#install (if not already installed) and load tseries package if(!require(tseries)){install.packages('tseries')} #generate a list of 100 uniformly distributed random variables dataset #conduct Jarque-Bera test jarque.bera.test(dataset)
This generates the following output:
This tells us that the test statistic is 8.0807 and the p-value of the test is 0.01759. In this case, we would reject the null hypothesis that the data is normally distributed. We have sufficient evidence to say that the data in this example is not normally distributed.
This result shouldn’t be surprising since the dataset we generated is composed of 100 random variables that follow a uniform distribution. After all, the data is expected to be uniformly distributed, not normally distributed.