In SQL there is a function called coalesce, which accepts any number of columns and returns the first non-null value in each row, reading across the columns in the order they’re passed in.  This is super useful when you’re trying to relationalize a not-so-relational database, where a desired value might be found in one column or in another.

Today I needed to do the same thing in CRAN R, and a few quick Google searches for “R equivalent of SQL coalesce” didn’t turn up anything.  So I wrote this quick function which accepts any number of R vectors of the same length and returns the first non-NA value in each position.

# accepts a list of vectors of identical length and returns one vector with the first non-NA value
coalesce = function(...) {
    # convert input arguments into a list of vectors
    input_list = list(...)
    # check that all input vectors are of same length
    vectorlength = length(input_list[[1]])
    for (j in 1:length(input_list)) {
        if(length(input_list[[j]]) != vectorlength) {
            stop(paste("Not all vectors are of same length. First vector length: ",vectorlength,". Vector #",j,"'s length: ",length(input_list[[j]]),sep=""))
        }
    }
    # create a result vector to fill with first non-NA values
    result = rep(NA,vectorlength)
    # fill with first non-NA value
    for (i in 1:length(result)) {
        for (j in 1:length(input_list)) {
            if(!is.na(input_list[[j]][i])) {
                result[i] = input_list[[j]][i]
                break
            }
        }
    }
    return(result)
}

# examples to show how it works
most_preferred_measurement = seq(1,10,1)
most_preferred_measurement = NA
backup_measurement = seq(11,20,1)
backup_measurement = NA
least_preferred_measurement = seq(21,30,1)
least_preferred_measurement = NA
other_vector_of_different_length = seq(31,35,1)
coalesce(most_preferred_measurement,backup_measurement,least_preferred_measurement) # this works
coalesce(most_preferred_measurement,backup_measurement,least_preferred_measurement,other_vector_of_different_length) # this will throw an error

If you run this code, you’ll see the expected result:

> coalesce(most_preferred_measurement,backup_measurement,least_preferred_measurement) # this works
 [1] 11 2 3 24 NA 16 17 18 9 10
> coalesce(most_preferred_measurement,backup_measurement,least_preferred_measurement,other_vector_of_different_length) # this will throw an error
Error in coalesce(most_preferred_measurement, backup_measurement, least_preferred_measurement, : 
 Not all vectors are of same length. First vector length: 10. Vector #4's length: 5
>

Enjoy!