dplyr with rowwise()

Han Oostdijk

2020/06/17

Date last run: 17Jun2020

Introduction

A question was asked in the RStudio Community:
Finding closest date relative to another date in a column and subtracting.
When I saw that I thought it was a good opportunity to use the rowwise() function of the dplyr package. This is a stylized form of that problem.

Load packages and create example data

options(knitr.table.format='html')
HOQCutil::silent_library(c('dplyr','tidyr'))

df1 <- tibble::tribble(
  ~id,          ~x, ~a,  ~b, ~c ,
  1L, 3.25, 2.1, NA , 3.4 , 
  2L, 3.50, 3.45, 3.7, 3.4  
)
df1: example data
id x a b c
1 3.25 2.10 NA 3.4
2 3.50 3.45 3.7 3.4

Determine differences with column x

For all columns except the id and the x column we want to find the column that has the smallest difference with the x column. So the first thing we do is determine which are these columns (in dif_columns) and calculate the differences. The difference columns get a suffix ‘_d’ :

dif_columns = setdiff(names(df1),c('id','x'))

df2 = df1 %>%
  mutate(
    across(
      .cols = any_of(dif_columns) ,
      .fns = function(x,y) ifelse(is.na(x),99999,abs(y-x)) ,
      .names = "{col}_d" ,
      df1$x)
  )
df2: with differences from column x
id x a b c a_d b_d c_d
1 3.25 2.10 NA 3.4 1.15 99999.0 0.15
2 3.50 3.45 3.7 3.4 0.05 0.2 0.10

Use rowwise() to find the smallest difference for each row

We use rowwise() to calculate for each row the column (with ‘_d’ suffix) with the smallest difference. We do that in four steps:

By debugging the min_* functions it became clear that the arguments of these function are the values of the columns that are passed. Because we don’t know the number of arguments beforehand, we have used the a = unlist(list(...)) construct to get these values in a vector.

min_i = function(...) {
  # gives index of smallest number
  a = unlist(list(...))
  order(a)[1]
}
min_v = function(...) {
  # gives value of smallest number
  a = unlist(list(...))
  a[order(a)[1]]
}
min_o = function(...) {
  # picks value at index (data is index followed by values)
  a = unlist(list(...))
  i = a[1]
  o = a[-1]
  o[i]
}
dif2_columns = paste(dif_columns,"_d",sep="")

df3 = df2 %>%
  rowwise() %>%
  mutate(
    min_ix = min_i (c_across(cols = any_of(dif2_columns))),
    min_var = dif_columns[min_ix],
    min_dif  = min_v (c_across(cols = any_of(dif2_columns))),
    min_org = min_o(min_ix,c_across(cols = any_of(dif_columns))) 
    ) 
df3: with indications for minimum
id x a b c a_d b_d c_d min_ix min_var min_dif min_org
1 3.25 2.10 NA 3.4 1.15 99999.0 0.15 3 c 0.15 3.40
2 3.50 3.45 3.7 3.4 0.05 0.2 0.10 1 a 0.05 3.45

Remove the intermediary columns

df4 = df3 %>%
  select(any_of(c('id','x',dif_columns,'min_var','min_org')))
df4: original data with name and value of minimum difference
id x a b c min_var min_org
1 3.25 2.10 NA 3.4 c 3.40
2 3.50 3.45 3.7 3.4 a 3.45

Session Info

This document was produced on 17Jun2020 with the following R environment:

  #> R version 4.0.0 (2020-04-24)
  #> Platform: x86_64-w64-mingw32/x64 (64-bit)
  #> Running under: Windows 10 x64 (build 18363)
  #> 
  #> Matrix products: default
  #> 
  #> locale:
  #> [1] LC_COLLATE=English_United States.1252 
  #> [2] LC_CTYPE=English_United States.1252   
  #> [3] LC_MONETARY=English_United States.1252
  #> [4] LC_NUMERIC=C                          
  #> [5] LC_TIME=English_United States.1252    
  #> 
  #> attached base packages:
  #> [1] stats     graphics  grDevices utils     datasets  methods   base     
  #> 
  #> other attached packages:
  #> [1] tidyr_1.1.0 dplyr_1.0.0
  #> 
  #> loaded via a namespace (and not attached):
  #>  [1] Rcpp_1.0.4.6     knitr_1.28       magrittr_1.5     tidyselect_1.1.0
  #>  [5] HOQCutil_0.1.22  R6_2.4.1         rlang_0.4.6      stringr_1.4.0   
  #>  [9] highr_0.8        tools_4.0.0      xfun_0.13        htmltools_0.4.0 
  #> [13] ellipsis_0.3.0   digest_0.6.25    tibble_3.0.1     lifecycle_0.2.0 
  #> [17] crayon_1.3.4     purrr_0.3.4      vctrs_0.3.0      glue_1.4.1      
  #> [21] evaluate_0.14    rmarkdown_2.1    stringi_1.4.6    compiler_4.0.0  
  #> [25] pillar_1.4.3     generics_0.0.2   pkgconfig_2.0.3