One error you may encounter in R is:
Error in randomForest.default(m, y, ...) : NA/NaN/Inf in foreign function call (arg 1)
There are two reasons for why this error might occur:
- There are NA, NaN, or Inf values in the dataset
- One of the variables in the dataset is a character
The easiest way to fix this error is to remove rows with missing data and convert character variables to factor variables:
#remove rows with missing values df omit(df) #convert all character variables to factor variables library(dplyr) df %>% mutate_if(is.character, as.factor)
This tutorial shares an example of how to fix this error in practice.
Related: How to Build Random Forests in R (Step-by-Step)
How to Reproduce the Error
Suppose we attempt to fit a random forest to the following data frame in R:
library(randomForest)
#create data frame
df frame(y #attempt to fit random forest model
model
We receive an error because x1 is a character variable in the data frame.
We can confirm this by using the str() function to view the structure of the data frame:
str(df)
'data.frame': 10 obs. of 3 variables:
$ y....c.30..29..30..45 : num 30 29 30 45 23 19 9 8 11 14
$ x1....c..A....A....B....B.... : chr "A" "A" "B" "B"
$ x2....c.4..4..5..7.. : num 4 4 5 7 8 7 9 6 13 15
How to Fix the Error
To fix this error, we can use the mutate_if() function from dplyr to convert each character column to a factor column:
library(dplyr)
#convert each character column to factor
df = df %>% mutate_if(is.character, as.factor)
We can then fit the random forest model to the data frame:
#fit random forest model
model #view summary of model
model
Call:
randomForest(formula = y ~ ., data = df)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 1
Mean of squared residuals: 65.0047
% Var explained: 48.64
We don’t receive any error this time because there are no longer any character variables in the data frame.
Additional Resources
The following tutorials explain how to address other common errors in R:
How to Fix: the condition has length > 1 and only the first element will be used
How to Fix in R: dim(X) must have a positive length
How to Fix in R: missing value where true/false needed
How to Fix: NAs Introduced by Coercion