Program flow in the rtweet package

Han Oostdijk

2019/09/24

Date last run: 27Sep2019

Introduction

While browsing the internet I found 21 Recipes for Mining Twitter Data with rtweet from Bob Rudis and Paul Campbell. The corresponding github repository points to a blog entry with some background material. While trying to reproduce some of the recipes I was wondering which urls were generated and how the authorisation structure was used to request the data. About the generation of url’s I wrote in url generation in the rtweet package. Also the setup of the authorisation structure was briefly discussed.
In this entry I will describe how the authorisation structure is used in combination with a generated url.

Set up the library and authorisation structure

I described how to setup the authorisation structure in the other document. If I don’t specify a token in a function call, then the information created in the setup will be used.

Program flow in rtweet package

In the second recipe of 21 Recipes for Mining Twitter Data with rtweet the authors mention the rtweet::get_trends function. We studied this function to see how the data is retrieved with API calls.

Following the description in this recipe and looking at the code I see that the subfunction rtweet:::get_trends_ is called that does two requests to the twitter API.

The relevant parts of this function (concentrating on the second API call)

	rtweet:::get_trends_ <-
001 function (woeid = 1, lat = NULL, lng = NULL, exclude = FALSE,
002     token = NULL, parse = TRUE)
003 {
	...
033     query <- "trends/place"
034     token <- check_token(token)
	...
041     params <- list(id = woeid, exclude = exclude)
042     url <- make_url(query = query, param = params)
043     gt <- TWIT(get = TRUE, url, token)
	...
049 }

The rtweet:::check_token retrieves the token created at installation time, because the token argument defaults to NULL.
The rtweet:::make_url function uses the query variable (here “trends/place”) to format the url to be used in the GET function :

	rtweet:::make_url <-
001 function (restapi = TRUE, query, param = NULL)
002 {
003     if (restapi) {
004         hostname <- "api.twitter.com"
005     }
006     else {
007         hostname <- "stream.twitter.com"
008     }
009     structure(list(scheme = "https", hostname = hostname, port = NULL,
010         path = paste0("1.1/", query, ".json"), query = param,
011         params = NULL, fragment = NULL, username = NULL, password = NULL),
012         class = "url")
013 }

The actual API call is done in this case via TWIT :

	rtweet:::TWIT <-
001 function (get = TRUE, url, ...)
002 {
003     if (get) {
004         GET(url, ...)
005     }
006     else {
007         POST(url, ...)
008     }
009 }

With this information we now can create our own functions to obtain the Amsterdam trends

get_Adam_trends <- function(parse=TRUE,resonly=FALSE) {
	query <- "trends/place"
	token <- rtweet:::check_token(NULL)
	param <- list(id = '727232', exclude = NULL)
	url   <- rtweet:::make_url(query = query, param = param)
	trd   <- rtweet:::TWIT(get = TRUE, url, token)
	if (resonly)
		return(trd)
	trd   <- rtweet:::from_js(trd)
	if (parse)
	  trd <- rtweet:::parse_trends(trd)
	trd
}

Adam <- get_Adam_trends()
str(head(Adam,1))
#> Classes 'tbl_df', 'tbl' and 'data.frame':	1 obs. of  9 variables:
#>  $ trend           : chr "#klimaatstaking"
#>  $ url             : chr "http://twitter.com/search?q=%23klimaatstaking"
#>  $ promoted_content: logi NA
#>  $ query           : chr "%23klimaatstaking"
#>  $ tweet_volume    : int NA
#>  $ place           : chr "Amsterdam"
#>  $ woeid           : int 727232
#>  $ as_of           : POSIXct, format: "2019-09-27 12:12:56"
#>  $ created_at      : POSIXct, format: "2019-09-27 12:05:56"

With this function we can also see what url-string is created and used to access the twitter data. We can use this url-string in the httr::GET function but only in combination with the token:


(url <- get_Adam_trends(resonly=TRUE)$url)
#> [1] "https://api.twitter.com/1.1/trends/place.json?id=727232"
res = httr::GET(url)
print(httr::content(res, as = "text"))
#> [1] "{\"errors\":[{\"code\":215,\"message\":\"Bad Authentication data.\"}]}"

token <- rtweet:::check_token(NULL)
(res = httr::GET(url,token))
#> Response [https://api.twitter.com/1.1/trends/place.json?id=727232]
#>   Date: 2019-09-27 12:12
#>   Status: 200
#>   Content-Type: application/json;charset=utf-8
#>   Size: 7.13 kB

The API reference pages give details about the various API endpoints. For this particular endpoints see https://developer.twitter.com/en/docs/trends/locations-with-trending-topics/api-reference/get-trends-available and https://developer.twitter.com/en/docs/trends/trends-for-location/api-reference/get-trends-place .
A list with all entrypoints is here .

SessionInfo

#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.2      knitr_1.25      magrittr_1.5    HOQCutil_0.1.11
#>  [5] R6_2.4.0        rlang_0.4.0     stringr_1.4.0   httr_1.4.1     
#>  [9] tools_3.6.0     rtweet_0.6.9    xfun_0.8        htmltools_0.3.6
#> [13] askpass_1.1     openssl_1.4.1   digest_0.6.20   tibble_2.1.3   
#> [17] crayon_1.3.4    purrr_0.3.2     curl_4.0        glue_1.3.1     
#> [21] evaluate_0.14   rmarkdown_1.15  stringi_1.4.3   compiler_3.6.0 
#> [25] pillar_1.4.2    jsonlite_1.6    pkgconfig_2.0.2