Querying an API

We use the {httr} library for interacting with HTTP requests from R.

Generate the API endpoint query URL.

library(httr)

# Query API w/out function ------------------------------------------------

query_url <- modify_url(
  url = "https://colorado.rstudio.com/rsc/crop-yield-api/crop-yield",
  # these are the parameters that are sent to the API end point 
  # this will become function arguments 
  query = list(year = 1999, product = "maize", entity = "united states")
)

query_url
## [1] "https://colorado.rstudio.com/rsc/crop-yield-api/crop-yield?year=1999&product=maize&entity=united%20states"

Send the query to the API using the correct method. In this case, httr::POST().

# we want to send the query
res <- POST(query_url)

res
## Response [https://colorado.rstudio.com/rsc/crop-yield-api/crop-yield?year=1999&product=maize&entity=united%20states]
##   Date: 2020-11-04 17:50
##   Status: 200
##   Content-Type: application/json
##   Size: 78 B

We see we’ve got a status code of 200 which is ideal! Now we need to extract the contents of the API query result. Use httr::content() to get the results from the API. We specify the argument as = "text" so we can get back the raw JSON that the API sent us. Otherwise, httr will guess and parse the result for us making a list object. I’d rather we parse it ourselves into a data.frame! We are also specifying the text encoding to prevent httr from making an educated guess. Being more specific is better.

# parse the query response as text so we get the raw json
res_json <- content(res, as = "text", encoding = "UTF-8")

cat(res_json)
## [{"entity":"united states","year":1999,"product":"maize","crop_yield":8.3977}]

Notice that we have JSON now. We can use the fabulous jsonlite package to parse a character string which contains JSON. This will read the results into a dataframe.

# read the json into a data frame 
jsonlite::fromJSON(res_json)
##          entity year product crop_yield
## 1 united states 1999   maize     8.3977

Making a wrapper

The above was good, but it’s not easily repeatable for multiple queries. To make life easier, we will bundle all of the above code up into a single function.

# Create plumber API wrapper  ---------------------------------------------


get_crop_yields <- function(.year, .product, .entity) {
  
  query_url <- modify_url(
    url = "https://colorado.rstudio.com/rsc/crop-yield-api/crop-yield",
  
      # Function arguments are passed into the `query` argument to fill out the parameters
    # key-value pairs. 
    
    query = list(year = .year, product = .product, entity = .entity)
  )
  
  res <- POST(query_url)
  
  res_json <- content(res, as = "text", encoding = "UTF-8")
  
  jsonlite::fromJSON(res_json)
  
}

Use our newly defined function.

get_crop_yields(2012, "beans", "mexico")
##   entity year product crop_yield
## 1 mexico 2012   beans     0.6933

This function is extremely useful because now we can iterate over multiple values!

to_query <- expand.grid(.year = 2010:2012,
            .product = c("beans", "maize"),
            .entity = "united states")

results <- purrr::pmap_dfr(to_query, get_crop_yields)

results
##          entity year product crop_yield
## 1 united states 2010   beans     1.9343
## 2 united states 2011   beans     1.9089
## 3 united states 2012   beans     2.1168
## 4 united states 2010   maize     9.5757
## 5 united states 2011   maize     9.2146
## 6 united states 2012   maize     7.7270
library(ggplot2)

ggplot(results, aes(year, crop_yield)) +
  geom_col(position = "dodge", size = 0.8) +
  facet_wrap("product", scales = "free")