Home » How to Calculate Z-Scores in R

How to Calculate Z-Scores in R

by Tutor Aspire

In statistics, a z-score tells us how many standard deviations away a value is from the mean. We use the following formula to calculate a z-score:

z = (X – μ) / σ

where:

  • X is a single raw data value
  • μ is the population mean
  • σ is the population standard deviation

This tutorial explains how to calculate z-scores for raw data values in R.

Example 1: Find Z-Scores for a Single Vector

The following code shows how to find the z-score for every raw data value in a vector:

#create vector of data
data #find z-score for each data value 
z_scores #display z-scores 
z_scores

[1] -1.3228757 -1.1338934 -1.1338934 -0.1889822  0.0000000  0.0000000
[7]  0.3779645  0.5669467  1.1338934  1.7008401

Each z-score tells us how many standard deviations away an individual value is from the mean. For example:

  • The first raw data value of “6” is 1.323 standard deviations below the mean.
  • The fifth raw data value of “13” is standard deviations away from the mean, i.e. it is equal to the mean.
  • The last raw data value of “22” is 1.701 standard deviations above the mean.

Example 2: Find Z-Scores for a Single Column in a DataFrame

The following code shows how to find the z-score for every raw data value in a single column of a dataframe:

#create dataframe
df #find z-score for each data value in the 'points' column
z_scores #display z-scores 
z_scores

[1]  0.6191904  1.4635409 -1.2383807 -0.9006405 -0.2251601  0.2814502

Each z-score tells us how many standard deviations away an individual value is from the mean. For example:

  • The first raw data value of “24” is 0.619 standard deviations above the mean.
  • The second raw data value of “29” is 1.464 standard deviations above the mean.
  • The third raw data value of “13” is 1.238 standard deviations below the mean.

And so on.

Example 3: Find Z-Scores for Every Column in a DataFrame

The following code shows how to find the z-score for every raw data value in every column of a dataframe using the sapply() function.

#create dataframe
df #find z-scores of each column
sapply(df, function(df) (df-mean(df))/sd(df))

         assists     points   rebounds
[1,] -0.92315712  0.6191904 -0.9035079
[2,] -0.92315712  1.4635409 -0.9035079
[3,] -0.34011052 -1.2383807 -0.4517540
[4,] -0.04858722 -0.9006405 -0.2258770
[5,]  0.53445939 -0.2251601  1.1293849
[6,]  1.70055260  0.2814502  1.3552619

The z-scores for each individual value are shown relative to the column they’re in. For example:

  • The first value of “4” in the first column is 0.923 standard deviations below the mean value of its column.
  • The first value of “24” in the second column is .619 standard deviations above the mean value of its column.
  • The first value of “9” in the third column is .904 standard deviations below the mean value of its column.

And so on.

You can find more R tutorials here.

You may also like