How to Calculate Z-Scores in R

by Tutor Aspire January 17, 2023

In statistics, a z-score tells us how many standard deviations away a value is from the mean. We use the following formula to calculate a z-score:

z = (X – μ) / σ

where:

X is a single raw data value
μ is the population mean
σ is the population standard deviation

This tutorial explains how to calculate z-scores for raw data values in R.

Example 1: Find Z-Scores for a Single Vector

The following code shows how to find the z-score for every raw data value in a vector:

#create vector of data
data #find z-score for each data value 
z_scores #display z-scores 
z_scores

[1] -1.3228757 -1.1338934 -1.1338934 -0.1889822  0.0000000  0.0000000
[7]  0.3779645  0.5669467  1.1338934  1.7008401

Each z-score tells us how many standard deviations away an individual value is from the mean. For example:

The first raw data value of “6” is 1.323 standard deviations below the mean.
The fifth raw data value of “13” is 0 standard deviations away from the mean, i.e. it is equal to the mean.
The last raw data value of “22” is 1.701 standard deviations above the mean.

Example 2: Find Z-Scores for a Single Column in a DataFrame

The following code shows how to find the z-score for every raw data value in a single column of a dataframe:

#create dataframe
df #find z-score for each data value in the 'points' column
z_scores #display z-scores 
z_scores

[1]  0.6191904  1.4635409 -1.2383807 -0.9006405 -0.2251601  0.2814502

Each z-score tells us how many standard deviations away an individual value is from the mean. For example:

The first raw data value of “24” is 0.619 standard deviations above the mean.
The second raw data value of “29” is 1.464 standard deviations above the mean.
The third raw data value of “13” is 1.238 standard deviations below the mean.

And so on.

Example 3: Find Z-Scores for Every Column in a DataFrame

The following code shows how to find the z-score for every raw data value in every column of a dataframe using the sapply() function.

#create dataframe
df #find z-scores of each column
sapply(df, function(df) (df-mean(df))/sd(df))

         assists     points   rebounds
[1,] -0.92315712  0.6191904 -0.9035079
[2,] -0.92315712  1.4635409 -0.9035079
[3,] -0.34011052 -1.2383807 -0.4517540
[4,] -0.04858722 -0.9006405 -0.2258770
[5,]  0.53445939 -0.2251601  1.1293849
[6,]  1.70055260  0.2814502  1.3552619

The z-scores for each individual value are shown relative to the column they’re in. For example:

The first value of “4” in the first column is 0.923 standard deviations below the mean value of its column.
The first value of “24” in the second column is .619 standard deviations above the mean value of its column.
The first value of “9” in the third column is .904 standard deviations below the mean value of its column.

And so on.

You can find more R tutorials here.

How to Calculate Z-Scores in R

Example 1: Find Z-Scores for a Single Vector

Example 2: Find Z-Scores for a Single Column in a DataFrame

Example 3: Find Z-Scores for Every Column in a DataFrame

How to Find a P-Value from a Z-Score in Python

How to Create a Scatterplot with a Regression Line in Python

You may also like