Home » How to Calculate Standard Deviation by Group in R (With Examples)

How to Calculate Standard Deviation by Group in R (With Examples)

by Tutor Aspire

You can use one of the following methods to calculate the standard deviation by group in R:

Method 1: Use base R

aggregate(df$col_to_aggregate, list(df$col_to_group_by), FUN=sd) 

Method 2: Use dplyr

library(dplyr)

df %>%
  group_by(col_to_group_by) %>%
  summarise_at(vars(col_to_aggregate), list(name=sd))

Method 3: Use data.table

library(data.table)

setDT(df)

dt[ ,list(sd=sd(col_to_aggregate)), by=col_to_group_by]

The following examples show how to use each of these methods in practice with the following data frame in R:

#create data frame
df frame(team=rep(c('A', 'B', 'C'), each=6),
                 points=c(8, 10, 12, 12, 14, 15, 10, 11, 12,
                          18, 22, 24, 3, 5, 5, 6, 7, 9))

#view data frame
df

   team points
1     A      8
2     A     10
3     A     12
4     A     12
5     A     14
6     A     15
7     B     10
8     B     11
9     B     12
10    B     18
11    B     22
12    B     24
13    C      3
14    C      5
15    C      5
16    C      6
17    C      7
18    C      9

Method 1: Calculate Standard Deviation by Group Using Base R

The following code shows how to use the aggregate() function from base R to calculate the standard deviation of points scored by team:

#calculate standard deviation of points by team
aggregate(df$points, list(df$team), FUN=sd)

  Group.1        x
1       A 2.562551
2       B 6.013873
3       C 2.041241

Method 2: Calculate Standard Deviation by Group Using dplyr

The following code shows how to use the group_by() and summarise_at() functions from the dplyr package to calculate the standard deviation of points scored by team:

library(dplyr) 

#calculate standard deviation of points scored by team 
df %>%
  group_by(team) %>%
  summarise_at(vars(points), list(name=sd))

# A tibble: 3 x 2
  team   name
   
1 A      2.56
2 B      6.01
3 C      2.04

Method 3: Calculate Standard Deviation by Group Using data.table

The following code shows how to calculate the standard deviation of points scored by team using functions from the data.table package:

library(data.table) 

#convert data frame to data table 
setDT(df)

#calculate standard deviation of points scored by team 
df[ ,list(sd=sd(points)), by=team]

   team       sd
1:    A 2.562551
2:    B 6.013873
3:    C 2.041241

Notice that all three methods return the same results.

Note: If you’re working with an extremely large data frame, it’s recommended to use the dplyr or data.table approach since these packages perform much faster than base R.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Calculate the Mean by Group in R
How to Calculate the Sum by Group in R
How to Calculate Quantiles by Group in R

You may also like