Testing

One problem with building applications that talk to external dependencies like APIs, is that the applications are talking to external dependencies. This opens up a whole can of worms when it comes to testing the application.

You may well have heard developers saying: don’t let tests hit external dependencies!

Some folks take that to various different extremes, and don’t even let their tests talk to a database they control. That might make sense in some situation, and not in others, but something pretty much everyone agrees on is that a test suite hitting an actual API over the wire is not ideal.

If the test suite is hitting a production API, you could end up sending "funny" (offensive) test emails to a bunch of customers.

If a special testing API exists, then multiple developers hitting that test server could cause state to bleed from one test to another, causing race conditions, false positives, false negatives, or all sorts of nonsense.

Trying to reset an external API back to a specific state for each test is a fools errand. If you somehow manage it, your test suite now requires the internet, meaning anyone of your team is gonna be screwed next time they try working from a coffee shop, busy conference, plane, etc.

Here are a bunch of solutions that not only help you cut the cord, but help you get the application into specific states, improving the quality of your tests.

Mocking Code Dependencies with Unit Tests

Hopefully your application is not littered with HTTP calls to this API or their SDK directly, because that would be some tight coupling and make it reeeeal hard to switch the API for another one if the company yank it for some reason.

You probably have some thin layer wrapping their logic, giving you the chance to swap things out without changing too much of your own code. Maybe it looks a bit like this:

class Geocoder
  def address(str)
    google_sdk.geocode(str)
  end
end

The application code has VenueService which is talking to Geocoder and using the address method, which pops off to the Google Maps API to do the thing.

To avoid the test suite hitting the external API, the most likely move is to mock the Geocoder in the VenueService tests.

RSpec.describe VenueService do
  describe '.update' do
    it 'will geocode address to lat lon' do
      allow(Geocoder).to receive(:address).with('123 Main Street') do
        {
          lat: 23.534,
          lon: 45.432
        }
      }
      subject.update(address: '123 Main Street')
      expect(subject.lat).to eql(23.534)
      expect(subject.lon).to eql(45.432)
    end
  end
end

Basically what we have here is a test (using RSpec but whatever it’s all the same) which describes how the VenueService should work. The update method is being tested, and the Geocoder is being set up (monkey patched 🙈) to respond in a certain way.

For the VenueService unit tests this is fine, because the intent is to make sure VenueService works with what we think Geocoder is going to return. Unit tests for VenueService only focus on that class, so what can we do to make sure Geocoder is working properly?

Well, unit testing that class is one option, but it’s not really doing much other than talking to the Google Map SDK, and we really dont want to mock that. Why? Because we don’t own it, and mocking things you dont own is making guesses that might not be correct now, and might not be correct later. The Google Maps SDK might change, and if all we have are tests saying that the SDK works one way, but really it works another way, then you are in false positive world: a broken application with a lovely green test suite.

This will often be less of a problem for typed languages like Go, TypeScript, PHP 7, etc., but changes can happen which those type systems do not notice. For example, a foo property can still be a string, but suddenly have different contents within that string.

Integration tests are very important to make sure things work altogether.

A set of stairs with arrows on directing people to walk

Figure 1. "2 unit tests, 0 integration tests" via Vincent Déniel

Web Mocking in Integration Tests

Integration tests will be a bit more realistic as they hit more real code, so the behaviour is closer to what is actually likely to happen in production. This does mean integration tests can be slower than unit tests.

Some developers avoid integration tests for this reason, but that is reckless and daft premature optimization. Would you rather work on speeding up a slow but reliable test suite, or have a broken production with an untrustworthy test suite.

As integration tests hit more code, some folks think hitting the external APIs is just going to happen, but not the case! There is a tactic called "record and replay", and it is available in pretty much every programming language in one form or another.

One approach for more realistic HTTP interactions in integration tests is to use something like WebMock for Ruby, Nock for JavaScript, the baked in httptest in Go.

These tools are another type of mock, unlike the two other types of mocking discussed so far. Instead of mocking a class in your programming language, they mock a HTTP server. They are also very different from API specification based mocking tools like Prism, which will be discussed a bit later.

Web mocking tools and can be configured to respond in certain ways depending on what URL, HTTP method, or body params are sent to it, depending on how complex things want to get. Most of the time this is used for simple stuff.

import axios from 'axios';
import nock from 'nock';

test('Test API request', async () => {
  // Set up Nock to mock the API response
  nock('https://api.example.com')
    .get('/data')
    .reply(200, { data: 'Mocked data' });

  // Make the API request using Axios or any other HTTP library
  const response = await axios.get('https://api.example.com/data');

  // Assert the response
  expect(response.status).toBe(200);
  expect(response.data).toEqual({ data: 'Mocked data' });
});

This test is setting up a server on the arbitrary fake hostname http://foo.local, with a GET path /openapi that returns a YAML file with some specific content.

Then other tests can confirm what Spectral will do if it tries to load an unsupported file type, the response contains a 404 status code, or any other number of edge cases.

Some other Web Mocking implementations

PHP: Guzzler
Java: Wiremock

Web mocking is great for when you want to control the response, but once again you should only mock things you own. Using this approach for the Google Maps API example would only be confirming that the Geocoder works with an assumption of what the Google Maps API is going to do. When things change in the API there is no programmatic way to know about it.

Even if the change is noticed, updating these mock setups can be time consuming. What we really want is something like Jest Snapshots, but for HTTP requests…

Record & Replay in Integration Tests

Record & Replay has been around for years, and I did not discover it until I started using Ruby, using a tool called VCR ("Video Cassette Recorder").

For younger developers a VCR is like Blueray but terrible quality and the data is printed on a chunk of plastic you shove in a box under your TV. It was mostly used for recording telly you weren’t able to watch at the time, which is no longer a thing.

VCR explains the goals nicely, so I will use their words:

Record your test suite’s HTTP interactions and replay them during future test runs for fast, deterministic, accurate tests.

The basic approach is to put your test suite in "record mode", which will actually make real requests to the external services, but then it records the response. All the headers, body content, status code, the whole thing.

Then when the test suite is run not in record mode, it will reuse the recorded responses instead of going over the wire, meaning it is quick, always going to give the same result, and the entire response is being used, so you know it is accurate.

require 'rubygems'
require 'test/unit'
require 'vcr'

VCR.configure do |config|
  config.cassette_library_dir = "fixtures/vcr_cassettes"
  config.hook_into :webmock
end

class VCRTest < Test::Unit::TestCase
  def test_example_dot_com
    VCR.use_cassette("synopsis") do
      response = Net::HTTP.get_response(URI('http://www.iana.org/domains/reserved'))
      assert_match /Example domains/, response.body
    end
  end
end

This is a rather verbose Ruby example for clarity. It includes the config which would normally be tucked away in a helper, and it is manually using a cassette block, but the idea is this: You can define multiple cassettes, and switch them out to see the code working differently.

How exactly it works under the hood might be a bit too much of how the sausage is made, but it is very clever so I am going to nerd out a little. In Ruby once again there is some monkey patching going on.^[1] It knows to look out for common HTTP clients, and actually messes with their definitions a little (only in the test suite). This sounds a bit scary, but it means VCR can hijack the HTTP requests and use the recorded versions instead.

Most of these record & replay tools can be configured to use the more static web mocking tools mentioned previously. Ruby VCR for example can use webmock, just think of VCR as a helper for creating these accurate web mocks.

Another convenient thing about record & replay is the ability to have expiring cassettes. You can configure these recordings to automatically expire (vanish) after a certain amount of time, and then the test suite goes back into record mode. Or you can have them throw warnings, and hope some developers actually pay attention. This can be very annoying, but you would not believe how often I have seen client application developers use year old stubs with fields that did not exist anymore.

When recorded responses expire, clients need to go over the wire and record new responses. This can be tricky if as the API might have different data now. Some amount of effort can go into getting good data on the API for recording, which might be a case of building a sort of seed script. This annoyance is worth it in the long run, but certainly takes some getting used to.

Expiring recordings go hand in hand with Change Management, especially Sunset and Deprecated headers. If your applications are using reasonably up-to-date recordings, then your test suite can start throwing deprecating warnings, and loudly report about the code hitting is URLs marked for removal with Sunset.

The Ruby VCR was initially inspired by [Chris Young’s NetRecorder](https://github.com/chrisyoung/netrecorder) are the inspiration for a lot of other record and replay tools, and they maintain an impressive list of ports to other languages:

Betamax (Python)
VCR.py (Python)
Betamax (Go)
DVR (Go)
Go VCR (Go)
Betamax (Clojure)
vcr-clj (Clojure)
scotch (C#/.NET)
Betamax.NET (C#/.NET)
ExVCR (Elixir)
HAVCR (Haskell)
Mimic (PHP/Kohana)
PHP-VCR (PHP)
Polly.js (JavaScript/Node)
Nock-VCR (JavaScript/Node)
Sepia (JavaScript/Node)
VCR.js (JavaScript)
yakbak (JavaScript/Node)
NSURLConnectionVCR (Objective-C)
VCRURLConnection (Objective-C)
DVR (Swift)
VHS (Erlang)
Betamax (Java)
http_replayer (Rust)
OkReplay (Java/Android)
vcr ®

If you are a JavaScript user then check out [Polly.js](https://netflix.github.io/pollyjs/), comically written by Netflix. It has some great config options.

polly.configure({
  recordIfMissing: true,
  recordIfExpired: false,
  recordFailedRequests: false,

  expiresIn: null,
  timing: Timing.fixed(0),

  matchRequestsBy: {
    method: true,
    headers: true,
    body: true,
    order: true,
  }
})

The recordIfMissing is a good option, which means when folks add new tests it will try to record the request when it is run the first time. This can catch developers out if they are not expecting it, and can lead to a rubbish response being recorded so they have to delete and try again, but again it is worth getting used to.

Another one I like is recordFailedRequests: true. This is yet another reminder that if the API is ignoring HTTP conventions like status codes, this will not work. Ask the API developers to stop ignoring conventions and build their APIs properly. Maybe send them a copy of Build APIs You Won’t Hate. if they need convincing.

"Consumer" Contract Testing

Any API client that is talking to another API is just hoping they don’t make breaking changes to parts of the API that they use. API developers should be using a sensible API Versioning strategy which does not allow for breaking changes, or using API Evolution where breaking change is extremely limited and only when its unavoidable do people deprecate entire endpoints with the Sunset header.

If the API providers are adding Sunset headers but the consumers didn’t notice, then applications will break.

If the API providers are not doing their own contract testing and accidentally push out a breaking change, then applications will break.

Either way, consumer contract testing can help keep an eye on if various dependency APIs are doing what the consumer wants to be doing.

Tooling for this is very similar to the sort of tests you see in an API providers acceptance test, with one key difference: the API provider is (hopefully) testing all actions that should be possible, and asserting the responses have the correct contract, but the API consumer test suite is only testing what they need. The provider could have removed some fields and deleted an endpoint, but if the client doesn’t care about that then it’s not going to trigger a failure on the test suite.

Here’s an example of a test using Pact, which works in a bunch of languages but here’s the JavaScript library.

describe('Pact with Order API', () => {
  describe('given there are orders', () => {
    describe('when a call to the API is made', () => {
      before(() => {
        return provider.addInteraction({
          state: 'there are orders',
          uponReceiving: 'a request for orders',
          withRequest: {
            path: '/orders',
            method: 'GET',
          },
          willRespondWith: {
            body: eachLike({
              id: 1,
              items: eachLike({
                name: 'burger',
                quantity: 2,
                value: 100,
              }),
            }),
            status: 200,
            headers: {
              'Content-Type': 'application/json; charset=utf-8',
            },
          },
        });
      });

      it('will receive the list of current orders', () => {
        return expect(fetchOrders()).to.eventually.have.deep.members([new Order(orderProperties.id, [itemProperties])]);
      });
    });
  });
});

The test suite here is basically describing requests that will be made, and then outlines the “contract” for what should come back. The eachLike helping define examples of data that should come back, so if the data types mismatch it’ll trigger errors. Then if the contract type is wrong you’ll see more errors, and so on.

Creating a test suite of expectations for your codebase is one way of doing it, but I worry that the tests here and the actual code have subtly different expectations. A developer unfamiliar with Pact could change the request in the code, but not updated the defined interactions in the test suite, meaning the test suite is giving a false sense of security.

If you are very lucky, the provider will provide SDKs, version them with SemVer, and you can enable something like Dependabot to get updates for those SDKs, at which point your test suite will let you know if a used method or property has vanished from the SDK. If this is the case, you might not need consumer-driver contract testing.

If that is not the case, but you’re still lucky enough that the provider has provided OpenAPI descriptions (thanks Stripe 🙌) then you can point Prism at those and use the validation proxy.

prism proxy --errors https://raw.githubusercontent.com/stripe/openapi/master/openapi/spec3.yaml https://api.stripe.com

Running this will create a Prism Validation Proxy which is going to see what HTTP traffic comes through it, validate the request, and if it spots any mismatches it’ll blow up thanks to --errors.

If the request is good it’ll remake that request to https://api.stripe.com, then validate the response. If the response is bad, you’ll see output like this in the logs:

✖ error Request terminated with error: https://stoplight.io/prism/errors#UNPROCESSABLE_ENTITY: Invalid request body payload

This curl command came from their documentation and I removed the currency parameter. I expected that to cause the error, but looking at the JSON that Prism returned, the error is actually that the Stripe OpenAPI is wrong. 🤣

$ curl -i http://localhost:4010/v1/charges \
  -u sk_test_f5ssPbJNt4fzBElsVbbR3OLk0024dqCRk1: \
  -d amount=2000 \
  -d source=tok_visa \
  -d description="My First Test Charge (created for API docs)"

HTTP/1.1 422 Unprocessable Entity
content-type: application/problem+json
Content-Length: 647
Date: Wed, 17 Jun 2020 18:02:57 GMT
Connection: keep-alive

{
    "type": "https:\/\/stoplight.io\/prism\/errors#UNPROCESSABLE_ENTITY",
    "title": "Invalid request body payload",
    "status": 422,
    "detail": "Your request is not valid and no HTTP validation response was found in the spec, so Prism is generating this error for you.",
    "validation": [
        {
            "location": [
                "body",
                "shipping",
                "address"
            ],
            "severity": "Error",
            "code": "required",
            "message": "should have required property 'line1'"
        },
        {
            "location": [
                "body",
                "shipping"
            ],
            "severity": "Error",
            "code": "required",
            "message": "should have required property 'name'"
        },
        {
            "location": [
                "body",
                "transfer_data"
            ],
            "severity": "Error",
            "code": "required",
            "message": "should have required property 'destination'"
        }
    ]
}

Here Prism is blowing up because the shipping property should be entirely optional, but if shipping is passed then the address.line1, name, and destination are all required. There’s a valid way to do that in OpenAPI, but it’s not this, so… success for Prism.

Sniffing for mismatches is a good way to spot problems, whether thats a problem with the API documentation or a problem with what you’re trying to use the API, either way a mismatch of expectations has occurred and can be discussed.

1. Monkey patching is the mysterious art of rewriting code at runtime, making a single instance of the application work differently. https://wikipedia.org/wiki/Monkey_patch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

13-testing.adoc

13-testing.adoc

Testing

Mocking Code Dependencies with Unit Tests

Web Mocking in Integration Tests

Record & Replay in Integration Tests

"Consumer" Contract Testing

Files

13-testing.adoc

Latest commit

History

13-testing.adoc

File metadata and controls

Testing

Mocking Code Dependencies with Unit Tests

Web Mocking in Integration Tests

Record & Replay in Integration Tests

"Consumer" Contract Testing