library(httr2)
<- function(address) {
compose_url <- url_parse("https://nominatim.openstreetmap.org/search")
url $query <- list(q = address, format = "jsonv2", limit = 1)
urlurl_build(url)
}
Geocoding Addresses Using Nominatim and httr
I am interested in finding the latitude and longitude of a particular address. There are at least a few APIs that will do this for you, but I decided to use Nominatim as it is the one used on OpenStreetMap servers (https://wiki.openstreetmap.org/wiki/Geocoding), and I like OpenStreetMap. Nominatim has good documentation on the usage of their API. In order to make calls against this API, I will use the httr2 package. We need to compose our request, which in this case means crafting the search URL and specifying the output, and we also need to be able to parse the response object.
Building the Request URL
The Nominatim /search
query takes the form https://nominatim.openstreemap.org/search?<params>
. The documentation tells us that the query can either be free form or structured. For the free form query, you can essentially give it an address in natural language, ideally separated by commas. A structured query would provide the address having been split into its components, such as street, city, state, and postal code. A free form query would just have the entire address as the q
param. We need to specify the format, which could be xml
, json
, jsonv2
, geojson
, or geocodejson
. I am most interested in getting JSON results, so I’ll use their jsonv2
.
I need to properly encode the URL. httr2 provides a url_build()
function that will let me do this easily. I specify the format I want and limit it to only 1 result.
Trying that with Chapel Hill Town Hall:
compose_url("405 Martin Luther King Jr. Blvd, Chapel Hill, NC 27514-5705")
[1] "https://nominatim.openstreetmap.org/search?q=405%20Martin%20Luther%20King%20Jr.%20Blvd%2C%20Chapel%20Hill%2C%20NC%2027514-5705&format=jsonv2&limit=1"
Performing the Request
Next, we use the httr2 package to send the API request. You may have to install.packages("httr2")
before attempting.
<- compose_url("405 Martin Luther King Jr. Blvd, Chapel Hill, NC 27514-5705")
address <- request(address)
req req
<httr2_request>
GET
https://nominatim.openstreetmap.org/search?q=405%20Martin%20Luther%20King%20Jr.%20Blvd%2C%20Chapel%20Hill%2C%20NC%2027514-5705&format=jsonv2&limit=1
Body: empty
We can use req_dry_run()
to see what the request will look like before we actually execute it:
req_dry_run(req)
GET /search?q=405%20Martin%20Luther%20King%20Jr.%20Blvd%2C%20Chapel%20Hill%2C%20NC%2027514-5705&format=jsonv2&limit=1 HTTP/1.1
Host: nominatim.openstreetmap.org
User-Agent: httr2/0.2.3 r-curl/4.3.2 libcurl/7.81.0
Accept: */*
Accept-Encoding: deflate, gzip, br, zstd
That looks fine. We could make it more specific by adding Accept = "application/json"
but if you trust the API and its documentation (which is not always a given), it should be fine.
Let’s give it a try.
<- req_perform(req)
resp resp
<httr2_response>
GET
https://nominatim.openstreetmap.org/search?q=405%20Martin%20Luther%20King%20Jr.%20Blvd%2C%20Chapel%20Hill%2C%20NC%2027514-5705&format=jsonv2&limit=1
Status: 200 OK
Content-Type: application/json
Body: In memory (2 bytes)
Let’s fetch the raw response:
resp_raw(resp)
HTTP/1.1 200 OK
server: nginx
date: Mon, 11 Sep 2023 17:59:05 GMT
content-type: application/json; charset=utf-8
content-length: 2
[]
Well that’s not great. Let me try it with a different address:
compose_url("1600 Pennsylvania Ave, Washington, DC 20500") |>
request() |>
req_perform() |>
resp_raw()
HTTP/1.1 200 OK
server: nginx
date: Mon, 11 Sep 2023 17:59:05 GMT
content-type: application/json; charset=utf-8
content-length: 477
[{"place_id":4252913,"licence":"Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright","osm_type":"way","osm_id":899927559,"lat":"38.8959025","lon":"-77.0309076","category":"highway","type":"path","place_rank":27,"importance":0.07500999999999991,"addresstype":"road","name":"Pennsylvania Avenue","display_name":"Pennsylvania Avenue, Washington, District of Columbia, 20045, United States","boundingbox":["38.8958906","38.8959158","-77.0309560","-77.0308642"]}]
That one worked OK. What about my old dorm, Spencer Hall?
compose_url("100 Raleigh St, Chapel Hill, NC 27514") |>
request() |>
req_perform() |>
resp_raw()
HTTP/1.1 200 OK
server: nginx
date: Mon, 11 Sep 2023 17:59:06 GMT
content-type: application/json; charset=utf-8
content-length: 587
[{"place_id":926549,"licence":"Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright","osm_type":"way","osm_id":44341299,"lat":"35.914930999999996","lon":"-79.0494403470627","category":"building","type":"dormitory","place_rank":30,"importance":9.99999999995449e-06,"addresstype":"building","name":"Spencer Residence Hall","display_name":"Spencer Residence Hall, 100, Raleigh Street, Franklin-Rosemary Historic District, Baby Hollow, Chapel Hill, Orange County, North Carolina, 27514, United States","boundingbox":["35.9146468","35.9152113","-79.0495907","-79.0491365"]}]
That one worked too.
compose_url("405 Martin Luther King Jr Blvd, Chapel Hill, NC 27514") |>
request() |>
req_perform() |>
resp_raw()
HTTP/1.1 200 OK
server: nginx
date: Mon, 11 Sep 2023 17:59:07 GMT
content-type: application/json; charset=utf-8
content-length: 595
[{"place_id":929037,"licence":"Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright","osm_type":"way","osm_id":44340822,"lat":"35.916585999999995","lon":"-79.05738559583756","category":"building","type":"yes","place_rank":30,"importance":9.99999999995449e-06,"addresstype":"building","name":"Chapel Hill Town Hall","display_name":"Chapel Hill Town Hall, 405, Martin Luther King Junior Boulevard, Franklin-Rosemary Historic District, Bolin, Chapel Hill, Orange County, North Carolina, 27516, United States","boundingbox":["35.9163479","35.9168469","-79.0575751","-79.0570030"]}]
I’m a bit puzzled here. It didn’t work when I split out the steps but it seems to work when I connect it all in a pipeline. That gives me some concern over what might happen if I try to implement this on a large number of addresses; I worry that I may have to do a ton of spot-checking. I’m wondering if I can rewrite the compose_url()
function to use the structured querying format, and see if that might help.
<- function(street, city, state, postcode = NA) {
compose_url <- url_parse("https://nominatim.openstreetmap.org/search")
url <- list(
params street = street,
city = city,
state = state,
format = "jsonv2",
limit = 1
)if (!is.na(postcode)) params$postcode <- postcode
$query = params
urlurl_build(url)
}
<- compose_url("405 Martin Luther King Jr Blvd", "Chapel Hill", "NC", 27514)
address <- request(address)
req <- req_perform(req)
resp resp_raw(resp)
HTTP/1.1 200 OK
server: nginx
date: Mon, 11 Sep 2023 17:59:07 GMT
content-type: application/json; charset=utf-8
content-length: 595
[{"place_id":929037,"licence":"Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright","osm_type":"way","osm_id":44340822,"lat":"35.916585999999995","lon":"-79.05738559583756","category":"building","type":"yes","place_rank":30,"importance":9.99999999995449e-06,"addresstype":"building","name":"Chapel Hill Town Hall","display_name":"Chapel Hill Town Hall, 405, Martin Luther King Junior Boulevard, Franklin-Rosemary Historic District, Bolin, Chapel Hill, Orange County, North Carolina, 27516, United States","boundingbox":["35.9163479","35.9168469","-79.0575751","-79.0570030"]}]
That looks like it works, even with using intermediate objects. I really don’t know why that didn’t work before. Still not clear how it will fare with a large number of requests, but it would be easy enough to write code to check for empty result sets.
Examining the Output
Now let’s look at what httr can do to help us parse the data.
resp_body_json(resp)
[[1]]
[[1]]$place_id
[1] 929037
[[1]]$licence
[1] "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright"
[[1]]$osm_type
[1] "way"
[[1]]$osm_id
[1] 44340822
[[1]]$lat
[1] "35.916585999999995"
[[1]]$lon
[1] "-79.05738559583756"
[[1]]$category
[1] "building"
[[1]]$type
[1] "yes"
[[1]]$place_rank
[1] 30
[[1]]$importance
[1] 1e-05
[[1]]$addresstype
[1] "building"
[[1]]$name
[1] "Chapel Hill Town Hall"
[[1]]$display_name
[1] "Chapel Hill Town Hall, 405, Martin Luther King Junior Boulevard, Franklin-Rosemary Historic District, Bolin, Chapel Hill, Orange County, North Carolina, 27516, United States"
[[1]]$boundingbox
[[1]]$boundingbox[[1]]
[1] "35.9163479"
[[1]]$boundingbox[[2]]
[1] "35.9168469"
[[1]]$boundingbox[[3]]
[1] "-79.0575751"
[[1]]$boundingbox[[4]]
[1] "-79.0570030"
That’s pretty verbose output, but the gist is that it creates a named list with the output data. What I’m most interested in is the latitude and longitude.
<- resp_body_json(resp)
town_hall c(town_hall[[1]]$lat, town_hall[[1]]$lon)
[1] "35.916585999999995" "-79.05738559583756"
Note that I have to subset the results with [[1]]
because resp_body_json()
returns a list of lists, one for each item in the json array. I specified limit=1
but it still gave me back an array, just with only one item, so httr2 still reads that as a list of results, even if it’s just one element long.
Let’s pull this up really quickly with leaflet:
library(leaflet)
<- leaflet() |>
map addTiles() |>
addMarkers(
lng = as.numeric(town_hall[[1]]$lon),
lat = as.numeric(town_hall[[1]]$lat)
) map
So there we go, using httr and Nominatim to run some basic geocoding, and just a tiny example of leaflet. I hope to use this in the future to work with open government data, a lot of which is geospatial.