One problem with building applications that talk to external dependencies like APIs, is that the applications are talking to external dependencies. This opens up a whole can of worms when it comes to testing the application.
You may well have heard developers saying: don’t let tests hit external dependencies!
Some folks take that to various different extremes, and don’t even let their tests talk to a database they control. That might make sense in some situation, and not in others, but something pretty much everyone agrees on is that a test suite hitting an actual API over the wire is not ideal.
If the test suite is hitting a production API, you could end up sending "funny" (offensive) test emails to a bunch of customers.
If a special testing API exists, then multiple developers hitting that test server could cause state to bleed from one test to another, causing race conditions, false positives, false negatives, or all sorts of nonsense.
Trying to reset an external API back to a specific state for each test is a fools errand. If you somehow manage it, your test suite now requires the internet, meaning anyone of your team is gonna be screwed next time they try working from a coffee shop, busy conference, plane, etc.
Here are a bunch of solutions that not only help you cut the cord, but help you get the application into specific states, improving the quality of your tests.
Hopefully your application is not littered with HTTP calls to this API or their SDK directly, because that would be some tight coupling and make it reeeeal hard to switch the API for another one if the company yank it for some reason.
You probably have some thin layer wrapping their logic, giving you the chance to swap things out without changing too much of your own code. Maybe it looks a bit like this:
class Geocoder
def address(str)
google_sdk.geocode(str)
end
end
The application code has VenueService
which is talking to Geocoder
and using
the address
method, which pops off to the Google Maps API to do the thing.
To avoid the test suite hitting the external API, the most likely move is to
mock the Geocoder
in the VenueService
tests.
RSpec.describe VenueService do
describe '.update' do
it 'will geocode address to lat lon' do
allow(Geocoder).to receive(:address).with('123 Main Street') do
{
lat: 23.534,
lon: 45.432
}
}
subject.update(address: '123 Main Street')
expect(subject.lat).to eql(23.534)
expect(subject.lon).to eql(45.432)
end
end
end
Basically what we have here is a test (using RSpec but whatever it’s all the
same) which describes how the VenueService
should work. The update
method is
being tested, and the Geocoder
is being set up (monkey patched 🙈) to
respond in a certain way.
For the VenueService
unit tests this is fine, because the intent is to make
sure VenueService
works with what we think Geocoder
is going to return. Unit
tests for VenueService
only focus on that class, so what can we do to make sure
Geocoder is working properly?
Well, unit testing that class is one option, but it’s not really doing much other than talking to the Google Map SDK, and we really dont want to mock that. Why? Because we don’t own it, and mocking things you dont own is making guesses that might not be correct now, and might not be correct later. The Google Maps SDK might change, and if all we have are tests saying that the SDK works one way, but really it works another way, then you are in false positive world: a broken application with a lovely green test suite.
This will often be less of a problem for typed languages like Go, TypeScript,
PHP 7, etc., but changes can happen which those type systems do not notice. For
example, a foo
property can still be a string, but suddenly have different
contents within that string.
Integration tests are very important to make sure things work altogether.
Integration tests will be a bit more realistic as they hit more real code, so the behaviour is closer to what is actually likely to happen in production. This does mean integration tests can be slower than unit tests.
Some developers avoid integration tests for this reason, but that is reckless and daft premature optimization. Would you rather work on speeding up a slow but reliable test suite, or have a broken production with an untrustworthy test suite.
As integration tests hit more code, some folks think hitting the external APIs is just going to happen, but not the case! There is a tactic called "record and replay", and it is available in pretty much every programming language in one form or another.
One approach for more realistic HTTP interactions in integration tests is to use something like WebMock for Ruby, Nock for JavaScript, the baked in httptest in Go.
These tools are another type of mock, unlike the two other types of mocking discussed so far. Instead of mocking a class in your programming language, they mock a HTTP server. They are also very different from API specification based mocking tools like Prism, which will be discussed a bit later.
Web mocking tools and can be configured to respond in certain ways depending on what URL, HTTP method, or body params are sent to it, depending on how complex things want to get. Most of the time this is used for simple stuff.
import axios from 'axios';
import nock from 'nock';
test('Test API request', async () => {
// Set up Nock to mock the API response
nock('https://api.example.com')
.get('/data')
.reply(200, { data: 'Mocked data' });
// Make the API request using Axios or any other HTTP library
const response = await axios.get('https://api.example.com/data');
// Assert the response
expect(response.status).toBe(200);
expect(response.data).toEqual({ data: 'Mocked data' });
});
This test is setting up a server on the arbitrary fake hostname
http://foo.local
, with a GET path /openapi
that returns a YAML file with
some specific content.
Then other tests can confirm what Spectral will do if it tries to load an unsupported file type, the response contains a 404 status code, or any other number of edge cases.
Web mocking is great for when you want to control the response, but once again you should only mock things you own. Using this approach for the Google Maps API example would only be confirming that the Geocoder works with an assumption of what the Google Maps API is going to do. When things change in the API there is no programmatic way to know about it.
Even if the change is noticed, updating these mock setups can be time consuming. What we really want is something like Jest Snapshots, but for HTTP requests…
Record & Replay has been around for years, and I did not discover it until I started using Ruby, using a tool called VCR ("Video Cassette Recorder").
For younger developers a VCR is like Blueray but terrible quality and the data is printed on a chunk of plastic you shove in a box under your TV. It was mostly used for recording telly you weren’t able to watch at the time, which is no longer a thing.
VCR explains the goals nicely, so I will use their words:
Record your test suite’s HTTP interactions and replay them during future test runs for fast, deterministic, accurate tests.
The basic approach is to put your test suite in "record mode", which will actually make real requests to the external services, but then it records the response. All the headers, body content, status code, the whole thing.
Then when the test suite is run not in record mode, it will reuse the recorded responses instead of going over the wire, meaning it is quick, always going to give the same result, and the entire response is being used, so you know it is accurate.
require 'rubygems'
require 'test/unit'
require 'vcr'
VCR.configure do |config|
config.cassette_library_dir = "fixtures/vcr_cassettes"
config.hook_into :webmock
end
class VCRTest < Test::Unit::TestCase
def test_example_dot_com
VCR.use_cassette("synopsis") do
response = Net::HTTP.get_response(URI('http://www.iana.org/domains/reserved'))
assert_match /Example domains/, response.body
end
end
end
This is a rather verbose Ruby example for clarity. It includes the config which would normally be tucked away in a helper, and it is manually using a cassette block, but the idea is this: You can define multiple cassettes, and switch them out to see the code working differently.
How exactly it works under the hood might be a bit too much of how the sausage is made, but it is very clever so I am going to nerd out a little. In Ruby once again there is some monkey patching going on.[1] It knows to look out for common HTTP clients, and actually messes with their definitions a little (only in the test suite). This sounds a bit scary, but it means VCR can hijack the HTTP requests and use the recorded versions instead.
Most of these record & replay tools can be configured to use the more static web mocking tools mentioned previously. Ruby VCR for example can use webmock, just think of VCR as a helper for creating these accurate web mocks.
Another convenient thing about record & replay is the ability to have expiring cassettes. You can configure these recordings to automatically expire (vanish) after a certain amount of time, and then the test suite goes back into record mode. Or you can have them throw warnings, and hope some developers actually pay attention. This can be very annoying, but you would not believe how often I have seen client application developers use year old stubs with fields that did not exist anymore.
When recorded responses expire, clients need to go over the wire and record new responses. This can be tricky if as the API might have different data now. Some amount of effort can go into getting good data on the API for recording, which might be a case of building a sort of seed script. This annoyance is worth it in the long run, but certainly takes some getting used to.
Expiring recordings go hand in hand with Change Management, especially Sunset
and Deprecated
headers. If your applications are using reasonably up-to-date
recordings, then your test suite can start throwing deprecating warnings, and
loudly report about the code hitting is URLs marked for removal with Sunset
.
The Ruby VCR was initially inspired by [Chris Young’s NetRecorder](https://github.com/chrisyoung/netrecorder) are the inspiration for a lot of other record and replay tools, and they maintain an impressive list of ports to other languages:
-
Betamax (Python)
-
VCR.py (Python)
-
Betamax (Go)
-
DVR (Go)
-
Go VCR (Go)
-
Betamax (Clojure)
-
vcr-clj (Clojure)
-
scotch (C#/.NET)
-
Betamax.NET (C#/.NET)
-
ExVCR (Elixir)
-
HAVCR (Haskell)
-
Mimic (PHP/Kohana)
-
PHP-VCR (PHP)
-
Polly.js (JavaScript/Node)
-
Nock-VCR (JavaScript/Node)
-
Sepia (JavaScript/Node)
-
VCR.js (JavaScript)
-
yakbak (JavaScript/Node)
-
NSURLConnectionVCR (Objective-C)
-
VCRURLConnection (Objective-C)
-
DVR (Swift)
-
VHS (Erlang)
-
Betamax (Java)
-
http_replayer (Rust)
-
OkReplay (Java/Android)
-
vcr ®
If you are a JavaScript user then check out [Polly.js](https://netflix.github.io/pollyjs/), comically written by Netflix. It has some great config options.
polly.configure({
recordIfMissing: true,
recordIfExpired: false,
recordFailedRequests: false,
expiresIn: null,
timing: Timing.fixed(0),
matchRequestsBy: {
method: true,
headers: true,
body: true,
order: true,
}
})
The recordIfMissing
is a good option, which means when folks add new tests it will try to record
the request when it is run the first time. This can catch developers out if they
are not expecting it, and can lead to a rubbish response being recorded so they have to delete and
try again, but again it is worth getting used to.
Another one I like is recordFailedRequests: true
. This is yet another reminder
that if the API is ignoring HTTP conventions like status codes, this will not
work. Ask the API developers to stop ignoring conventions and build their APIs
properly. Maybe send them a copy of Build APIs You Won’t Hate. if they need convincing.
Any API client that is talking to another API is just hoping they don’t make breaking changes to parts of the API that they use. API developers should be using a sensible API Versioning strategy which does not allow for breaking changes, or using API Evolution where breaking change is extremely limited and only when its unavoidable do people deprecate entire endpoints with the Sunset header.
If the API providers are adding Sunset headers but the consumers didn’t notice, then applications will break.
If the API providers are not doing their own contract testing and accidentally push out a breaking change, then applications will break.
Either way, consumer contract testing can help keep an eye on if various dependency APIs are doing what the consumer wants to be doing.
Tooling for this is very similar to the sort of tests you see in an API providers acceptance test, with one key difference: the API provider is (hopefully) testing all actions that should be possible, and asserting the responses have the correct contract, but the API consumer test suite is only testing what they need. The provider could have removed some fields and deleted an endpoint, but if the client doesn’t care about that then it’s not going to trigger a failure on the test suite.
Here’s an example of a test using Pact, which works in a bunch of languages but here’s the JavaScript library.
describe('Pact with Order API', () => {
describe('given there are orders', () => {
describe('when a call to the API is made', () => {
before(() => {
return provider.addInteraction({
state: 'there are orders',
uponReceiving: 'a request for orders',
withRequest: {
path: '/orders',
method: 'GET',
},
willRespondWith: {
body: eachLike({
id: 1,
items: eachLike({
name: 'burger',
quantity: 2,
value: 100,
}),
}),
status: 200,
headers: {
'Content-Type': 'application/json; charset=utf-8',
},
},
});
});
it('will receive the list of current orders', () => {
return expect(fetchOrders()).to.eventually.have.deep.members([new Order(orderProperties.id, [itemProperties])]);
});
});
});
});
The test suite here is basically describing requests that will be made, and then outlines the “contract” for what should come back. The eachLike helping define examples of data that should come back, so if the data types mismatch it’ll trigger errors. Then if the contract type is wrong you’ll see more errors, and so on.
Creating a test suite of expectations for your codebase is one way of doing it, but I worry that the tests here and the actual code have subtly different expectations. A developer unfamiliar with Pact could change the request in the code, but not updated the defined interactions in the test suite, meaning the test suite is giving a false sense of security.
If you are very lucky, the provider will provide SDKs, version them with SemVer, and you can enable something like Dependabot to get updates for those SDKs, at which point your test suite will let you know if a used method or property has vanished from the SDK. If this is the case, you might not need consumer-driver contract testing.
If that is not the case, but you’re still lucky enough that the provider has provided OpenAPI descriptions (thanks Stripe 🙌) then you can point Prism at those and use the validation proxy.
prism proxy --errors https://raw.githubusercontent.com/stripe/openapi/master/openapi/spec3.yaml https://api.stripe.com
Running this will create a Prism Validation Proxy which is going to see what HTTP traffic comes through it, validate the request, and if it spots any mismatches it’ll blow up thanks to --errors.
If the request is good it’ll remake that request to https://api.stripe.com, then validate the response. If the response is bad, you’ll see output like this in the logs:
✖ error Request terminated with error: https://stoplight.io/prism/errors#UNPROCESSABLE_ENTITY: Invalid request body payload
This curl command came from their documentation and I removed the currency parameter. I expected that to cause the error, but looking at the JSON that Prism returned, the error is actually that the Stripe OpenAPI is wrong. 🤣
$ curl -i http://localhost:4010/v1/charges \
-u sk_test_f5ssPbJNt4fzBElsVbbR3OLk0024dqCRk1: \
-d amount=2000 \
-d source=tok_visa \
-d description="My First Test Charge (created for API docs)"
HTTP/1.1 422 Unprocessable Entity
content-type: application/problem+json
Content-Length: 647
Date: Wed, 17 Jun 2020 18:02:57 GMT
Connection: keep-alive
{
"type": "https:\/\/stoplight.io\/prism\/errors#UNPROCESSABLE_ENTITY",
"title": "Invalid request body payload",
"status": 422,
"detail": "Your request is not valid and no HTTP validation response was found in the spec, so Prism is generating this error for you.",
"validation": [
{
"location": [
"body",
"shipping",
"address"
],
"severity": "Error",
"code": "required",
"message": "should have required property 'line1'"
},
{
"location": [
"body",
"shipping"
],
"severity": "Error",
"code": "required",
"message": "should have required property 'name'"
},
{
"location": [
"body",
"transfer_data"
],
"severity": "Error",
"code": "required",
"message": "should have required property 'destination'"
}
]
}
Here Prism is blowing up because the shipping property should be entirely optional, but if shipping is passed then the address.line1, name, and destination are all required. There’s a valid way to do that in OpenAPI, but it’s not this, so… success for Prism.
Sniffing for mismatches is a good way to spot problems, whether thats a problem with the API documentation or a problem with what you’re trying to use the API, either way a mismatch of expectations has occurred and can be discussed.