Fixing a multiple warning "unknown column"
I have a persistent multiple warning of "unknown column" for all types of commands (e.g., str(x) to installing updates on packages), and not sure how to debug this or fix it.
The warning "unknown column" is clearly related to a variable in a tbl_df that I renamed, but the warning comes up in all kinds of commands seemingly unrelated to the tbl_df (e.g., installing updates on a package, str(x) where x is simply a character vector).
This is an issue with the Diagnostics tool in RStudio (the tool that shows warnings and possible mistakes in your code).
As a workaround you can add at the beginning of the opened file(s):
# !diagnostics off
Then save the files and the warnings should stop appearing.
You can also just disable the diagnostics in Preferences/Code/Diagnostics.
I believe the warnings appear because the diagnostics tool in RStudio parses the source code to detect errors and when it performs the diagnostic checks it accesses columns in your tibble that are not initialized, giving the Warning we see. The warnings do not appear because you run unrelated things, they appear when the RStudio diagnostics are executed (when a file is saved, then modified, when you run something...).
I have been encountering the same problem, and although I don't know why it occurs, I have been able to pin down when it occurs, and thus prevent it from happening.
The issue seems to be with adding in a new column, derived from indexing, in a base R data frame vs. in a tibble data frame. Take this example, where you add a new column (age) to a base R data frame:
base_df <- data.frame(id = c(1:3), name = c("mary", "jill","steve")) base_df$age[base_df$name == "mary"] <- 47
That works without returning a warning. But when the same is done with a tibble, it throws a warning (and consequently, I think causing the weird, seemingly unprovoked, multiple warning issue):
library(tibble) tibble_df <- tibble(id = c(1:3), name = c("mary", "jill","steve")) tibble_df$age[tibble_df$name == "mary"] <- 47 Warning message: Unknown column 'age'
There are surely better ways of avoiding this, but I have found that first creating a vector of NAs does the job:
tibble_df$age <- NA tibble_df$age[tibble_df$name == "mary"] <- 47
I have faced this issue when using the "dplyr" package. For those facing this problem after using the "group_by" function in the "dplyr" library:
I have found that ungrouping the variables solves the unknown column warning problem. Sometimes I have had to iterate through the ungrouping several times until the problem is resolved.
Converting the class into data.frame solved the problem for me:
library(dplyr) df <- data.frame(id = c(1,1:3), name = c("mary", "jo", "jill","steve")) dfTbl <- df %>% group_by(id) %>% summarize (n = n()) class(dfTbl) #  "tbl_df" "tbl" "data.frame" dfTbl = as.data.frame(dfTbl) class(dfTbl) #  "data.frame"
Borrowed the partial script from @adts
I ran into this problem too except through a tibble created using a dyplyr block. Here's slight modification of sabre's code to show how I came to the same error.
library(dplyr) df <- data.frame(id = c(1,1:3), name = c("mary", "jo", "jill","steve")) t <- df %>% group_by(id) %>% summarize (n = n()) t str(t) t$newvar[t$id==1] <- 0
Let's say I wanted to select the following column(s)
best.columns = 'id'
For me the following gave the warning:
While this worked as expected, although, as far as I know dplyr, this should be identical.
df%>% select_(.dots = best.columns)
I had this problem when dealing with tibble and lapply functions together. The tibble seemed to save things as a list inside the dataframe.
I solved it by using unlist before adding the results of an lapply function to the tibble.