# url generation in the rtweet package

## Introduction

While browsing the internet I found 21 Recipes for Mining Twitter Data with rtweet from Bob Rudis and Paul Campbell. The corresponding github repository points to a blog entry with some background material. While trying to reproduce some of the recipes I was wondering which urls were generated. This document shows how the trace facility can be used to find out.

### Setup

After installing the package rtweet I use the silent_library function to load the package. Following the vignette I created a Twitter application that I called HOQC_31415. On the corresponding Twitter developper page the ‘token’ and ‘secret’ information can be found that are necessary to obtain access to the Twitter API and this application.

silent_library('rtweet')


The rtweet::create_token is needed to use this information in the rtweet package:

token <- rtweet::create_token(
app = "HOQC_31415",
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token_key,
access_secret =access_token_secret
)


This function creates a file .rtweet_token.rds. and also changes the users global .Renviron file to set the environment variable TWITTER_PAT pointing to this file. In this way it is no longer necessary to provide authorisation information in the rtweet function calls. When no value for the argument token is provided, the information of this file will be used for the authorisation.

cat(readLines('~/.Renviron'),sep='\n')
#> BINPREF="D:/tools/Rtools/mingw_\$(WIN)/bin/"


### The custom trace function ‘debug_httr_get’

After looking at several functions in the rtweet package I became aware that the retrieval of twitter data is done (at least in the cases I tried) with the httr::GET function. That is why I created the function debug_httr_get. It uses the trace facility to insert code in httr::GET. This code appends to a global variable the information of the argument url, the internal variable hu and the return value of httr::GET . This information contains the information of the httr request. The main argument of debug_httr_get is fn that holds the rtweet call. The function is not depended on the package rtweet: it can be used in all cases where calls to httr::GET are done. The function is now included in the package HOQCutil.

NB because the call fn is done within the debug_httr_get function the results of fn (e.g. trends in Example recipe 2) are internal to the function and not shown or returned.

### Example recipe 2

In the second recipe of 21 Recipes for Mining Twitter Data with rtweet the authors mention the rtweet::get_trends function. They state that this function calls the API twice. Let us see which url’s are use to retrieve the trends for Amsterdam.

# devtools::install_github("HanOostdijk/HOQCutil")
HOQCutil::debug_httr_get(
get_trends("Amsterdam")
)


So we see that indeed two calls are made: one to find the WOEIDs (Where on Earth identifiers) for the available regions and a second to use the WOEID identifier for Amsterdam (apparently 727232) to retrieve its trends.

### Example recipe 3 and 4

Recipe 3 shows that the rtweet::search_tweets function returns data describing about 100 tweets with hashtag #rstats. One of the information elements (hashtags) contains the hashtags in the tweet. Recipe 4 shows how to filter these. The example given is to restrict the tweets to those that have a github reference and no #datascience hashtag. The argument parse of rtweet::search_tweets specifies if the data returned should be converted to a tibble (parse=T) or to a list of lists (parse=F). For both cases the url is returned:

numtweets = 100

recipe34_T <-HOQCutil::debug_httr_get(
rstats <- rtweet::search_tweets("#rstats url:github -#datascience", n=numtweets,parse=T)
)
recipe34_F <-HOQCutil::debug_httr_get(
rstats <- rtweet::search_tweets("#rstats url:github -#datascience", n=numtweets,parse=F)
)
identical(recipe34_T,recipe34_T)
#> [1] TRUE

print(recipe34_T)

 [1] "https://api.twitter.com/1.1/search/tweets.json?q=%23rstats%20url%3Agithub%20-%2
3datascience&result_type=recent&count=100&tweet_mode=extended"


So the same request is done to the twitter API: it is only in the rtweet handling after receiving the data that the two cases are handled differently.

### Auxiliary functions

#### silent_library

silent_library <- function (package_name, mywarnings = FALSE) {
suppressWarnings({
suppressPackageStartupMessages({
library(
package_name,
character.only = TRUE,
warn.conflicts = mywarnings,
quietly = !mywarnings,
verbose = mywarnings
)
})
})
}


### SessionInfo

#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base
#>
#> other attached packages:
#> [1] rtweet_0.6.9 httr_1.4.1
#>
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.2        knitr_1.25        magrittr_1.5
#>  [4] hms_0.4.2         progress_1.2.0    HOQCutil_0.1.12
#>  [7] R6_2.4.0          rlang_0.4.0       stringr_1.4.0
#> [10] tools_3.6.0       xfun_0.8          htmltools_0.3.6