You can use the following syntax to calculate lagged values by group in R using the dplyr package:
df %>%
group_by(var1) %>%
mutate(lag1_value = lag(var2, n=1, order_by=var1))
Note: The mutate() function adds a new variable to the data frame that contains the lagged values.
The following example shows how to use this syntax in practice.
Example: Calculate Lagged Values by Group Using dplyr
Suppose we have the following data frame in R that shows the sales made by two different stores during various days:
#create data frame df frame(store=c('A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'), sales=c(7, 12, 10, 9, 9, 11, 18, 23)) #view data frame df store sales 1 A 7 2 B 12 3 A 10 4 B 9 5 A 9 6 B 11 7 A 18 8 B 23
We can use the following code to create a new column that shows the lagged values of sales for each store:
library(dplyr) #calculate lagged sales by group df %>% group_by(store) %>% mutate(lag1_sales = lag(sales, n=1, order_by=store)) # A tibble: 8 x 3 # Groups: store [2] store sales lag1_sales 1 A 7 NA 2 B 12 NA 3 A 10 7 4 B 9 12 5 A 9 10 6 B 11 9 7 A 18 9 8 B 23 11
Here’s how to interpret the output:
- The first value of lag1_sales is NA because there is no previous value for sales for store A.
- The second value of lag1_sales is NA because there is no previous value for sales for store B.
- The third value of lag1_sales is 7 because this is the previous value for sales for store A.
- The fourth value of lag1_sales is 12 because this is the previous value for sales for store B.
And so on.
Note that you can also change the number of lags used by modifying the value for n in the lag() function.
Additional Resources
The following tutorials explain how to perform other common calculations in R:
How to Calculate a Cumulative Sum Using dplyr
How to Calculate the Sum by Group in R
How to Calculate the Mean by Group in R