New beta release for CBS OData4 dataportal

2020/05/14

Introduction

In the LinkedIn group Centraal Bureau voor de Statistiek; Open Data I saw the article New beta release for CBS OData4 dataportal . The article points to page CBS Dataportal on their website for more information and mentions the new root pointer.

In the past I included two functions in package HOQCutil: get_table_cbs_odata4 and get_table_cbs_odata4_GET for version OData4. In this blog entry I check if the two functions in the HOQCutil still work.

OData3

In the past I made the package odataR for OData3. I rebuilt this package for R 4.0.0. and did not find any errors. The remainder of this document only concerns OData4 .

OData4

For the previous beta some of the functionality was already tested. The results of that test can be found in the pdf file opendata_beta_versie4_dec2018_20181225.pdf . In this document I will describe the tests done for the new version.

OData4 CBS documentation

On the CBS Dataportal the following documentation can be found:

• FAQ with among others

• a reference to infoservice for questions about (open) data
• OData4 information under the header Welke OData 4 commando's zijn beschikbaar? and examples of use for the commands that are implemented. SpecRunner is supposed to deliver an overview of all implemented functions but Firefox and Chrome browsers give

Error: error
at Object.<anonymous> (https://beta-odata4.cbs.nl/spec/Validation%20OData4/DateAndTimeFunctions.js:219:22)
at u (https://beta-odata4.cbs.nl/external/jquery-3.3.1.min.js:2:33479)
at Object.fireWith [as rejectWith] (https://beta-odata4.cbs.nl/external/jquery-3.3.1.min.js:2:34432)
at k (https://beta-odata4.cbs.nl/external/jquery-3.3.1.min.js:2:93855)
at XMLHttpRequest.<anonymous> (https://beta-odata4.cbs.nl/external/jquery-3.3.1.min.js:2:96455)

• a pdf handleiding (manual) in Dutch with the differences between OData3 and OData4 and information about how to convert from version 3 to version 4.

• in the tab Informatie voor (Information for) subsection ontwikkelaars (developpers) we find

• Snelstartgids OData v4 (quick guide) gives information about retrieving CBS data for the construction of a map and for creating time series in R or Python. The layout suggests that there is also information about filters and Metadata but this is not visible.
• a reference to the GitHub repository CBS Open Data v4 with the same code examples. This repository is said to contain an R package for OData4. I could not find a package in this repository.
• in the tab Informatie voor (Information for) subsection data analists we find

Changes made in the HOQCutil package

While trying to check if the two package functions get_table_cbs_odata4 and get_table_cbs_odata4_GET were still working, I realized that I should have made unit tests for the various functionality in OData and my functions. So I decided to do this now. The test functions can be found in the package subfolder testthat.
I also took the opportunity to add the possibility for JSON output. I renamed the response parameter (it is now called restype). The three possible values for restype with their meaning:

• '' : the output will be a data.frame wherever possible. This is the default. A call with subtable='Properties' will always return a list .
• 'resp' : the output will be the response object returned by the OData server
• 'json' : the (original) JSON output of the OData server will not be converted to data.frame or list.

Test results

The root for the tables

As announced in the CBS blog the root for the CBS tables has been changed. Therefore the default for parameter odata_root in the function get_table_cbs_odata4 is now changed to https://beta-odata4.cbs.nl .

The list ‘Welke OData 4 commando’s zijn beschikbaar?’

The list in the FAQ is not complete. The list lacks the following functions that worked in the previous beta and still work now:

$count This works, but differently in OData3 than in OData4. In OData3 the result is an integer and in OData4 the result is character and preceded by a unicode character. NB. because of the different buildup of the data it is not surprising that the reported numbers are different. # Odata3 count=odataR::odataR_get_table( table_id='81589NED', query="$count")
str(count)
#>  int 80244

# Odata4
count=HOQCutil::get_table_cbs_odata4(
table_id='81589NED',
subtable='Observations',
query="$count", verbose=T) #> generated url : https://beta-odata4.cbs.nl/CBS/81589NED/Observations/$count
#> unencoded query: $count str(count) #> chr "<U+FEFF>1034682" resp = httr::GET('https://beta-odata4.cbs.nl/CBS/81589NED/Observations/$count')
count=httr::content(resp, as = "text",encoding='UTF-8')
str(count)
#>  chr "<U+FEFF>1034682"


In the last case I also used the ‘raw’ httr function calls (with the same result) to show that the unicode string result is not caused by the get_table_cbs_odata4 function.
I think that the current behaviour of $count is an error. Other functions working but not documented The following OData3 functions are not documented in the list ‘Welke OData 4 commando’s zijn beschikbaar?’ but are working in version 4: •$select
• length
• indexof
• tolower and toupper
• month, day and minute
• mod

Functions working but poorly documented

I think it is advisible to tell the reader that the second parameter of the function substring works with zero origin: the first character of a string is character 0 . The same goes for function indexof but that function is not mentioned at all.

Because it is recognized as a function, I think that the behaviour of $orderby is an error. Conclusion • The functions$count and \$orderby have an error
• The functionality SpecRunner gives an error.
• The documentation of the available functions is not yet complete
• The documentation in tab Informatie voor subsection ontwikkelaars is not fully accurate.

Session Info

This document was produced on 14May2020 with the following R environment:

  #> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base
#>
#> other attached packages:
#> [1] HOQCutil_0.1.22
#>
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.4.6    digest_0.6.25   R6_2.4.1        jsonlite_1.6.1
#>  [5] magrittr_1.5    evaluate_0.14   httr_1.4.1      odataR_0.1.4
#>  [9] rlang_0.4.6     stringi_1.4.6   curl_4.3        rmarkdown_2.1
#> [13] tools_4.0.0     stringr_1.4.0   glue_1.4.0      purrr_0.3.4
#> [17] xfun_0.13       compiler_4.0.0  htmltools_0.4.0 knitr_1.28