You can use the following syntax to create a categorical variable in R:
#create categorical variable from scratch cat_variable A', 'B', 'C', 'D')) #create categorical variable (with two possible values) from existing variable cat_variable factor(ifelse(existing_variable #create categorical variable (with multiple possible values) from existing variable cat_variable factor(ifelse(existing_variable A', ifelse(existing_variable B', ifelse(existing_variable C', ifelse(existing_variable D',0)))))
The following examples show how to use this syntax in practice.
Example 1: Create a Categorical Variable from Scratch
The following code shows how to create a categorical variable from scratch:
#create data frame df frame(var1=c(1, 3, 3, 4, 5), var2=c(7, 7, 8, 3, 2), var3=c(3, 3, 6, 10, 12), var4=c(14, 16, 22, 19, 18)) #view data frame df var1 var2 var3 var4 1 1 7 3 14 2 3 7 3 16 3 3 8 6 22 4 4 3 10 19 5 5 2 12 18 #add categorical variable named 'type' to data frame df$type A', 'B', 'B', 'C', 'D')) #view updated data frame df var1 var2 var3 var4 type 1 1 7 3 14 A 2 3 7 3 16 B 3 3 8 6 22 B 4 4 3 10 19 C 5 5 2 12 18 D
Example 2: Create a Categorical Variable (with Two Values) from Existing Variable
The following code shows how to create a categorical variable from an existing variable in a data frame:
#create data frame df frame(var1=c(1, 3, 3, 4, 5), var2=c(7, 7, 8, 3, 2), var3=c(3, 3, 6, 10, 12), var4=c(14, 16, 22, 19, 18)) #view data frame df var1 var2 var3 var4 1 1 7 3 14 2 3 7 3 16 3 3 8 6 22 4 4 3 10 19 5 5 2 12 18 #add categorical variable named 'type' using values from 'var4' column df$type factor(ifelse(df$var1 #view updated data frame df var1 var2 var3 var4 type 1 1 7 3 14 1 2 3 7 3 16 1 3 3 8 6 22 1 4 4 3 10 19 0 5 5 2 12 18 0
Using the ifelse() statement, we created a new categorical variable called “type” that takes the following values:
- 1 if the value in the ‘var1’ column is less than 4.
- 0 if the value in the ‘var1’ column is not less than 4.
Example 3: Create a Categorical Variable (with Multiple Values) from Existing Variable
The following code shows how to create a categorical variable (with multiple values) from an existing variable in a data frame:
#create data frame df frame(var1=c(1, 3, 3, 4, 5), var2=c(7, 7, 8, 3, 2), var3=c(3, 3, 6, 10, 12), var4=c(14, 16, 22, 19, 18)) #view data frame df var1 var2 var3 var4 1 1 7 3 14 2 3 7 3 16 3 3 8 6 22 4 4 3 10 19 5 5 2 12 18 #add categorical variable named 'type' using values from 'var4' column df$type factor(ifelse(df$var1 A', ifelse(df$var1 B', ifelse(df$var1 C', ifelse(df$var1 D', 'E'))))) #view updated data frame df var1 var2 var3 var4 type 1 1 7 3 14 A 2 3 7 3 16 B 3 3 8 6 22 B 4 4 3 10 19 C 5 5 2 12 18 D
Using the ifelse() statement, we created a new categorical variable called “type” that takes the following values:
- ‘A‘ if the value in the ‘var1’ column is less than 3.
- Else, ‘B‘ if the value in the ‘var1’ column is less than 4.
- Else, ‘C‘ if the value in the ‘var1’ column is less than 5.
- Else, ‘D‘ if the value in the ‘var1’ column is less than 6.
- Else, ‘E‘.
Additional Resources
How to Create Dummy Variables in R
How to Convert Factor to Character in R
How to Convert Character to Numeric in R