Testing Generated HTML with goquery
- 12 minutes read - 2352 wordsThis is the twelfth in a series of articles about writing a small reading list app in Go for personal use.
When I first introduced tests for this app, I showed a strategy of checking for “fragments” in the body of the page – these are just strings, including HTML markup, that the test will verify are present in the generated page.
This approach works, but it’s fragile: trivial changes to a generated page like spaces or newlines can trigger test failures that don’t indicate real bugs in the app. The tests will only be failing because they’re too tightly coupled to the output format.
In this post I’ll show a better way to validate the contents of generated pages using the goquery package.
Overview of this Strategy
The goal of our test is to verify that the app generates a list of the books in the database. The initial test achieved that by looking for content like <li>Book1 -- Author1</li>
in the page. But that test broke when we added a class to the element: <li class="book">
.
Goquery lets us inspect the page content using selector queries, which we’ll write in a way that will make the tests less fragile than we have right now. For example, we can convert the verification above to a query for the CSS selector li.book
, and then verify that the text contents contains Book1
. We can also query for li.author
and check that the text contains Author1
.
Implementing Tests Using the Query Strategy
First, we go get
the goquery package:
% go get github.com/PuerkitoBio/goquery
go: added github.com/PuerkitoBio/goquery v1.8.0
go: added github.com/andybalholm/cascadia v1.3.1
In main_test.go, add github.com/PuerkitoBio/goquery
to the imports list, and then rewrite TestBookIndexTable
to use goquery instead of the previous fragment-matching approach. The whole function is shown here first – I’ll break it down in chunks below.
func TestBookIndexTable(t *testing.T) {
t.Parallel()
tcs := []struct {
name string
count int
}{
{"empty", 0},
{"single", 1},
{"multiple", 10},
}
for _, tc := range tcs {
t.Run(tc.name, func(t *testing.T) {
t.Parallel()
db := freshDb(t)
books := createBooks(t, db, tc.count)
w := getHasStatus(t, db, "/books/", http.StatusOK)
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
// Check the page header.
h1 := doc.Find("h1").Text()
if h1 != "My Books" {
t.Errorf("expected h1 'My Books', got '%s'", h1)
}
// 1. Get all of the <span class="title"> elements.
// 2. Verify we get the correct number.
// 3. Iterate over the selections, checking that the content of
// each one matches the corresponding book title.
titleSpans := doc.Find("span.title")
if tc.count != titleSpans.Length() {
t.Fatalf("expected %d span.title elements, got %d",
tc.count, titleSpans.Length())
}
titleSpans.Each(func(i int, s *goquery.Selection) {
title := books[i].Title
if title != s.Text() {
t.Errorf("span.title[%d] expected '%s', got '%s'",
i, title, s.Text())
}
})
// Do the same thing for authors.
authorSpans := doc.Find("span.author")
if tc.count != authorSpans.Length() {
t.Fatalf("expected %d span.author elements, got %d",
tc.count, authorSpans.Length())
}
authorSpans.Each(func(i int, s *goquery.Selection) {
author := books[i].Author
if author != s.Text() {
t.Errorf("span.author[%d] expected '%s', got '%s'",
i, author, s.Text())
}
})
})
}
}
Creating a goquery Document
The setup and test case definitions of this test function stay the same, but after getting the response from getHasStatus
, we create a goquery.Document
:
w := getHasStatus(t, db, "/books/", http.StatusOK)
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
The NewDocumentFromReader
function creates a document when given a Reader
. We pass it the Body
reader from the response.
Checking the Page Header
The old version of this test was looking for <h1>My Books</h1>
in the page. In the new version, we query the document for h1
and inspect the text contents of the element that we find:
// Check the page header.
h1 := doc.Find("h1").Text()
if h1 != "My Books" {
t.Errorf("expected h1 'My Books', got '%s'", h1)
}
We call the Find
method on the document, with h1
as the selector. This returns a goquery.Selection
struct. We then call the Text
method on that struct to get the content, and compare it to our expected title.
Note that if the h1
was missing from the page, we would still get back a Selection
struct, but it would be empty, so calling Text
would give us an empty string, and the test would fail.
Checking the Titles
The old version of this test looped over the books
slice and verified that it contained text like <span class="title">Book1</span>
.
The new version takes a different approach based on querying multiple elements using qoguery:
// 1. Get all of the <span class="title"> elements.
// 2. Verify we get the correct number.
// 3. Iterate over the selections, checking that the content of
// each one matches the corresponding book title.
titleSpans := doc.Find("span.title")
if tc.count != titleSpans.Length() {
t.Fatalf("expected %d span.title elements, got %d",
tc.count, titleSpans.Length())
}
titleSpans.Each(func(i int, s *goquery.Selection) {
title := books[i].Title
if title != s.Text() {
t.Errorf("span.title[%d] expected '%s', got '%s'",
i, title, s.Text())
}
})
First we call Find
using the selector span.title
. This selector will match multiple elements when there are multiple books listed in the page. We use the Selection.Length
method to see how many matches we got, and compare this to the number of books that this test case inserted.
Then, assuming it matched, we use the Selection.Each
method to run an inline function for each of the elements in the selection. This inline function is passed an integer that is the index of the match in the selection, starting with zero, and a *Selection
that contains just the current element.
We use the integer to index into the books
slice. This is safe because the Length
check above guarantees that we have exactly the same number of elements in the selection as there are entries in the slice. Then we compare the title from the current Book
struct to the text of the element.
Checking the Authors
The author check is nearly identical to the title check above:
// Do the same thing for authors.
authorSpans := doc.Find("span.author")
if tc.count != authorSpans.Length() {
t.Fatalf("expected %d span.author elements, got %d",
tc.count, authorSpans.Length())
}
authorSpans.Each(func(i int, s *goquery.Selection) {
author := books[i].Author
if author != s.Text() {
t.Errorf("span.author[%d] expected '%s', got '%s'",
i, author, s.Text())
}
})
Refactoring bodyHasFragments
to the Query Strategy
Our tests have an existing test helper function bodyHasFragments
, which checks that all the strings it is given are present in the response body.
I like the new approach of testing using selectors: it’s more precise and less fragile. Let’s refactor bodyHasFragments
to a new function that will allow us to verify a set of selectors contain some specified contents. We’ll call the new function docHasFragments
. Since it’s a test helper function we’ll want to make sure to pass a *testing.T
as the first argument. We also want it to operate on a *goquery.Document
so we’ll have that as the second argument.
We want it to verify that a given selector contains certain contents, like we did above when we checked that h1
contained My Books
. We could have it take two string arguments: selector
and contents
. However, that wouldn’t let us verify multiple fragments like we can now with bodyHasFragments
.
It would be nice if it could take a slice of selectors and contents, and verify each of them. To make that possible, we can define a struct, Fragment
that has Selector
and Contents
string fields. Then docHasFragments
could take a slice of those and verify the doc has each selector with the given contents.
Here’s what that code looks like:
type Fragment struct {
Selector string
Contents string
}
func docHasFragments(t *testing.T, doc *goquery.Document, fragments []Fragment) {
t.Helper()
for _, fragment := range fragments {
sel := doc.Find(fragment.Selector)
if sel.Length() == 0 {
t.Errorf("fragment '%s' not found", fragment.Selector)
return
}
text := sel.Text()
if !strings.Contains(text, fragment.Contents) {
t.Fatalf("fragment '%s' should contain '%s', got '%s'",
fragment.Selector, fragment.Contents, text)
}
}
}
There are a couple of subtle things about this code and how we’ll use it.
First, it’s important to note that sometimes we want to verify that an element occurs in the body, but the element doesn’t have any contents to match. We can achieve this by passing a Fragment
that has an appropriate selector and an empty string for contents. The empty string will always pass the Contains
check, so we also add a length check to make sure that the selector matched something.
Second, it’s important to be aware that a non-unique selector will match multiple elements, and calling sel.Text()
on that selection will give the text for all of those elements. When we want to be precise about the order in which text shows up in the response, we will either have to write a CSS selector using the order (e.g. :nth-child()
), or we will have to use a different approach like sel.Each()
that we used above to verify titles and authors in the book list.
Updating TestBookNewGet
We change the inner loop of TestBookNewGet
to this:
w := getHasStatus(t, db, "/books/new", http.StatusOK)
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
fragments := []Fragment{
{"h1", "Add a Book"},
{`form[action="/books/new"]`, ""},
{`input[id="title"]`, ""},
{`input[id="author"]`, ""},
{`button[type="submit"]`, "Save"},
}
docHasFragments(t, doc, fragments)
This creates a document using the response body.
It defines a slice of []Fragment
. In that slice, we change the string fragment <h1>Add a Book</h1>
to a Fragment
with selector h1
and contents Add a Book
. The next three Fragment
instances have selectors to match elements in the form; all three of these have empty contents because those elements have no text content. The last Fragment
matches the button and verifies that it contains the text Save
.
Finally we call docHasFragments
with the doc and the slice we created to verify that each Fragment
matches the document.
Updating TestBookNewPost
I’m not going to show the whole rewrite of TestBookNewPost
here because it’s fairly long and the changes are sprinkled throughout. Instead let’s look at the key changes. First, we change the test case struct to have a slice of []Fragment
instead of strings:
tcs := []struct {
name string
data gin.H
setup func(*testing.T, *gorm.DB)
status int
fragments []Fragment
}{
Attempting to compile at this point will yield a bunch of errors because we need to update all of the test case definitions. This is mostly a mechanical exercise. The most interesting change is the "empty"
test case:
{
// This makes the manual field validation fail because both
// title and author are empty.
name: "empty",
data: gin.H{},
status: http.StatusBadRequest,
fragments: []Fragment{
{"div.error-message", "Author is required, but was empty"},
{"div.error-message", "Title is required, but was empty"},
},
},
Note that the selector is the same for each Fragment
. Recall what I mentioned above about non-unique selectors. Each of these selectors is going to query div.error-message
, which should match two elements. So each of these fragments will have text content that matches both of the error messages. In my opinion, it’s ok that we’re matching this way: it makes the test less fragile.
We could change the author fragment to use the selector div.error-message:nth-child(2)
, and this would make it so that only the "Author is required"
error message is in the text. Similarly for the title fragment, using div.error-message:nth-child(1)
. This would be more precise, and if order was a critical aspect of the error messages, it would make sense to test this way. However, in this case, we don’t care about the order in which the messages appear, so we use this less precise matching for these messages. This test behavior matches what we had before the refactoring: we only cared that the error messages showed up somewhere in the text, without regard to the specific ordering of the error messages.
In the body of the loop, we change how we perform the check to:
if tc.fragments != nil {
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
docHasFragments(t, doc, tc.fragments)
}
This is a simple replacement of the old bodyHasFragments
pattern to the new docHasFragments
pattern.
Finally, at the bottom of the loop we change the check for the flash message:
if tc.fragments != nil {
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
docHasFragments(t, doc, tc.fragments)
}
By now the application of the pattern should be familiar. The only interesting part of this change is that we were previously searching for the HTML-escaped pattern 'Book1'
, we’re now searching for 'Book1'
because goquery is transforming the HTML-escaped sequence back to unescaped when we get the text from the Selection
.
One Last Refactor
This little code sequence feels tedious:
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
We replace this with a little helper function that checks for error and always returns just a *goquery.Document
so that all of our test code can skip the extra error checks:
func mustDocumentFromReader(t *testing.T, r io.Reader) *goquery.Document {
t.Helper()
doc, err := goquery.NewDocumentFromReader(r)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
return doc
}
Replacing all the spots where we have this pattern in the code is straightforward; I’m not going to show that here.
Next Week Month
Work on my book has been taking up most of my writing time, so it’s been a few months since the last update here, but I’m caught up enough that I should be able to get back to regular posts here – my target is to publish something new each month.
Starting next month I’ll cover more Gorm usage, including associations as we add a new model for the user to maintain lists like “To Read” and “Read”, and migrations as we add a “rating” field to the Book
model.