*35*

In statistics, a **z-score **tells us how many standard deviations away a value is from the mean. We use the following formula to calculate a z-score:

**z** = (X – μ) / σ

where:

- X is a single raw data value
- μ is the population mean
- σ is the population standard deviation

This tutorial explains how to calculate z-scores for raw data values in R.

**Example 1: Find Z-Scores for a Single Vector**

The following code shows how to find the z-score for every raw data value in a vector:

#create vector of data data #find z-score for each data value z_scores #display z-scores z_scores [1] -1.3228757 -1.1338934 -1.1338934 -0.1889822 0.0000000 0.0000000 [7] 0.3779645 0.5669467 1.1338934 1.7008401

Each z-score tells us how many standard deviations away an individual value is from the mean. For example:

- The first raw data value of “6” is
**1.323**standard deviations*below*the mean. - The fifth raw data value of “13” is
**0**standard deviations away from the mean, i.e. it is equal to the mean. - The last raw data value of “22” is
**1.701**standard deviations*above*the mean.

**Example 2: Find Z-Scores for a Single Column in a DataFrame**

The following code shows how to find the z-score for every raw data value in a single column of a dataframe:

#create dataframe df #find z-score for each data value in the 'points' column z_scores #display z-scores z_scores [1] 0.6191904 1.4635409 -1.2383807 -0.9006405 -0.2251601 0.2814502

Each z-score tells us how many standard deviations away an individual value is from the mean. For example:

- The first raw data value of “24” is
**0.619**standard deviations*above*the mean. - The second raw data value of “29” is
**1.464**standard deviations*above*the mean. - The third raw data value of “13” is
**1.238**standard deviations*below*the mean.

And so on.

**Example 3: Find Z-Scores for Every Column in a DataFrame**

The following code shows how to find the z-score for every raw data value in every column of a dataframe using the sapply() function.

#create dataframe df #find z-scores of each column sapply(df, function(df) (df-mean(df))/sd(df)) assists points rebounds [1,] -0.92315712 0.6191904 -0.9035079 [2,] -0.92315712 1.4635409 -0.9035079 [3,] -0.34011052 -1.2383807 -0.4517540 [4,] -0.04858722 -0.9006405 -0.2258770 [5,] 0.53445939 -0.2251601 1.1293849 [6,] 1.70055260 0.2814502 1.3552619

The z-scores for each individual value are shown relative to the column they’re in. For example:

- The first value of “4” in the first column is
**0.923**standard deviations*below*the mean value of its column. - The first value of “24” in the second column is
**.619**standard deviations*above*the mean value of its column. - The first value of “9” in the third column is
**.904**standard deviations*below*the mean value of its column.

And so on.

*You can find more R tutorials here.*