Have you ever encountered this error message while working on a dataset in R?
Error in FUN(X[[i]], …) : only defined on a data frame with all numeric variables
If you have, then you’re not alone. In this post, we’ll try to understand the reason behind this error and how to fix it. We’ll be using a dataset from a Coursera course as an example, so if you’re a student taking the same course or working with a similar dataset, you’ll find this blog post particularly helpful. Let’s dive in!
The Error Message and Traceback
Here’s the error message and traceback we’ll be working with:
Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables 5 stop("only defined on a data frame with all numeric variables") 4 FUN(X[[i]], ...) 3 lapply(args, function(x) { x <- as.matrix(x) if (!is.numeric(x) && !is.complex(x)) stop("only defined on a data frame with all numeric variables") ... 2 Summary.data.frame(structure(list(Date = structure(c(279L, 285L, 291L, 297L, 303L, 315L, 321L, 327L, 333L, 339L, 345L, 357L, 363L, 369L, 375L, 387L, 393L, 399L, 405L, 417L, 423L, 429L, 435L, 441L, 447L, 453L, 477L, 501L, 555L, 561L, 567L, 573L, 579L, 585L, 591L, ... 1 corr("specdata")
From this traceback, we can deduce that the error occurs when running the correlation on the sulfate and nitrate columns. After some research, it appears that the issue is related to non-numeric data in the dataset. Since the dataset is from a Coursera course, it’s reasonable to assume that others might be facing the same issue. However, no such mentions were found in the discussion boards or online. Therefore, it’s crucial to explore other possibilities, such as the function code:
corr <- function(directory, threshold = 0) { vect1 <- numeric() files_list <- list.files(directory, full.names = TRUE) for (i in 1:332) { data <- read.csv(files_list[i]) good <- complete.cases(data) complete_data <- data[good,] sulfate <- complete_data[,2] nitrate <- complete_data[,3] if (sum(complete_data) >= threshold) { b <- cor(sulfate,nitrate) vect1 <- rbind(b) } else vect1 <- (numeric()) } return(vect1) }
Identifying the Problem
One of the possible reasons for the error could be related to the following line of code:
if (sum(complete_data) >= threshold) {
This line attempts to take the sum of the data.frame ‘complete_data’, which might contain non-numeric values. An appropriate alternative would be to use the number of rows in ‘complete_data’ instead. The modified line of code should look like this:
if (nrow(complete_data) >= threshold) {
Another potential issue could be that the ‘sulfate’ or ‘nitrate’ columns are being read as factors. In that case, coercing the dataset into numeric values might be a solution. However, attempting this:
complete_data <- as.numeric(data[good,])
results in a different error:
Error: (list) object cannot be coerced to type ‘double’
The Solution
The actual problem, as it turns out, is that the object ‘complete_data’ was mistakenly used in the sum() function instead of the logical vector ‘good’. To fix the error, the sum() function should be applied to the ‘good’ object:
if (sum(good) >= threshold) {
By making this simple change in the code, the error message is resolved, and the code runs as expected.
Conclusion
Understanding and fixing R error messages can sometimes be a bit tricky, especially when dealing with datasets that contain a mix of numeric and non-numeric variables. In this case, the error was caused by mistakenly using the ‘complete_data’ object in the sum() function instead of the ‘good’ object. By making this simple change, the error was resolved, and the code worked as intended.
Remember to always double-check your code and thoroughly investigate error messages to ensure your analysis runs smoothly. And don’t hesitate to ask for help or consult online resources when you encounter issues, as learning from others’ experiences can be a valuable way to improve your skills and prevent similar mistakes in the future.