Testing Generated HTML with goquery
- 12 minutes read - 2352 wordsThis is the twelfth in a series of articles about writing a small reading list app in Go for personal use.
When I first introduced tests for this app, I showed a strategy of checking for “fragments” in the body of the page – these are just strings, including HTML markup, that the test will verify are present in the generated page.
This approach works, but it’s fragile: trivial changes to a generated page like spaces or newlines can trigger test failures that don’t indicate real bugs in the app. The tests will only be failing because they’re too tightly coupled to the output format.
In this post I’ll show a better way to validate the contents of generated pages using the goquery package.

Time Lapse Waterfall. Photo by Vojta Kovařík
Overview of this Strategy
The goal of our test is to verify that the app generates a list of the
books in the database. The initial test achieved that by looking for
content like <li>Book1 -- Author1</li>
in the page. But that test broke
when we added a class to the element: <li class="book">
.
Goquery lets us inspect the page content using selector queries, which
we’ll write in a way that will make the tests less fragile than we have
right now. For example, we can convert the verification above to a query
for the CSS selector li.book
, and then verify that the text contents
contains Book1
. We can also query for li.author
and check that the text
contains Author1
.
Implementing Tests Using the Query Strategy
First, we go get
the goquery package:
% go get github.com/PuerkitoBio/goquery
go: added github.com/PuerkitoBio/goquery v1.8.0
go: added github.com/andybalholm/cascadia v1.3.1
In main_test.go, add github.com/PuerkitoBio/goquery
to the imports
list, and then rewrite TestBookIndexTable
to use goquery instead of
the previous fragment-matching approach. The whole function is shown here
first – I’ll break it down in chunks below.
func TestBookIndexTable(t *testing.T) {
t.Parallel()
tcs := []struct {
name string
count int
}{
{"empty", 0},
{"single", 1},
{"multiple", 10},
}
for _, tc := range tcs {
t.Run(tc.name, func(t *testing.T) {
t.Parallel()
db := freshDb(t)
books := createBooks(t, db, tc.count)
w := getHasStatus(t, db, "/books/", http.StatusOK)
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
// Check the page header.
h1 := doc.Find("h1").Text()
if h1 != "My Books" {
t.Errorf("expected h1 'My Books', got '%s'", h1)
}
// 1. Get all of the <span class="title"> elements.
// 2. Verify we get the correct number.
// 3. Iterate over the selections, checking that the content of
// each one matches the corresponding book title.
titleSpans := doc.Find("span.title")
if tc.count != titleSpans.Length() {
t.Fatalf("expected %d span.title elements, got %d",
tc.count, titleSpans.Length())
}
titleSpans.Each(func(i int, s *goquery.Selection) {
title := books[i].Title
if title != s.Text() {
t.Errorf("span.title[%d] expected '%s', got '%s'",
i, title, s.Text())
}
})
// Do the same thing for authors.
authorSpans := doc.Find("span.author")
if tc.count != authorSpans.Length() {
t.Fatalf("expected %d span.author elements, got %d",
tc.count, authorSpans.Length())
}
authorSpans.Each(func(i int, s *goquery.Selection) {
author := books[i].Author
if author != s.Text() {
t.Errorf("span.author[%d] expected '%s', got '%s'",
i, author, s.Text())
}
})
})
}
}
Creating a goquery Document
The setup and test case definitions of this test function stay the same,
but after getting the response from getHasStatus
, we create a
goquery.Document
:
w := getHasStatus(t, db, "/books/", http.StatusOK)
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
The NewDocumentFromReader
function creates a document when given a
Reader
. We pass it the Body
reader from the response.
Checking the Page Header
The old version of this test was looking for <h1>My Books</h1>
in the
page. In the new version, we query the document for h1
and inspect the
text contents of the element that we find:
// Check the page header.
h1 := doc.Find("h1").Text()
if h1 != "My Books" {
t.Errorf("expected h1 'My Books', got '%s'", h1)
}
We call the Find
method on the document, with h1
as the selector. This
returns a goquery.Selection
struct. We then call the Text
method on
that struct to get the content, and compare it to our expected title.
Note that if the h1
was missing from the page, we would still get back a
Selection
struct, but it would be empty, so calling Text
would give us
an empty string, and the test would fail.
Checking the Titles
The old version of this test looped over the books
slice and verified
that it contained text like <span class="title">Book1</span>
.
The new version takes a different approach based on querying multiple elements using qoguery:
// 1. Get all of the <span class="title"> elements.
// 2. Verify we get the correct number.
// 3. Iterate over the selections, checking that the content of
// each one matches the corresponding book title.
titleSpans := doc.Find("span.title")
if tc.count != titleSpans.Length() {
t.Fatalf("expected %d span.title elements, got %d",
tc.count, titleSpans.Length())
}
titleSpans.Each(func(i int, s *goquery.Selection) {
title := books[i].Title
if title != s.Text() {
t.Errorf("span.title[%d] expected '%s', got '%s'",
i, title, s.Text())
}
})
First we call Find
using the selector span.title
. This selector will
match multiple elements when there are multiple books listed in the page.
We use the Selection.Length
method to see how many matches we got, and
compare this to the number of books that this test case inserted.
Then, assuming it matched, we use the Selection.Each
method to run an
inline function for each of the elements in the selection. This inline
function is passed an integer that is the index of the match in the
selection, starting with zero, and a *Selection
that contains just the
current element.
We use the integer to index into the books
slice. This is safe because
the Length
check above guarantees that we have exactly the same number of
elements in the selection as there are entries in the slice. Then we
compare the title from the current Book
struct to the text of the
element.
Checking the Authors
The author check is nearly identical to the title check above:
// Do the same thing for authors.
authorSpans := doc.Find("span.author")
if tc.count != authorSpans.Length() {
t.Fatalf("expected %d span.author elements, got %d",
tc.count, authorSpans.Length())
}
authorSpans.Each(func(i int, s *goquery.Selection) {
author := books[i].Author
if author != s.Text() {
t.Errorf("span.author[%d] expected '%s', got '%s'",
i, author, s.Text())
}
})
Refactoring bodyHasFragments
to the Query Strategy
Our tests have an existing test helper function bodyHasFragments
,
which checks that all the strings it is given are present in the response
body.
I like the new approach of testing using selectors: it’s more precise and
less fragile. Let’s refactor bodyHasFragments
to a new function that will
allow us to verify a set of selectors contain some specified contents.
We’ll call the new function docHasFragments
. Since it’s a test helper
function we’ll want to make sure to pass a *testing.T
as the first
argument. We also want it to operate on a *goquery.Document
so we’ll have
that as the second argument.
We want it to verify that a given selector contains certain contents, like
we did above when we checked that h1
contained My Books
. We could have
it take two string arguments: selector
and contents
. However, that
wouldn’t let us verify multiple fragments like we can now with
bodyHasFragments
.
It would be nice if it could take a slice of selectors and contents, and
verify each of them. To make that possible, we can define a struct,
Fragment
that has Selector
and Contents
string fields. Then
docHasFragments
could take a slice of those and verify the doc has each
selector with the given contents.
Here’s what that code looks like:
type Fragment struct {
Selector string
Contents string
}
func docHasFragments(t *testing.T, doc *goquery.Document, fragments []Fragment) {
t.Helper()
for _, fragment := range fragments {
sel := doc.Find(fragment.Selector)
if sel.Length() == 0 {
t.Errorf("fragment '%s' not found", fragment.Selector)
return
}
text := sel.Text()
if !strings.Contains(text, fragment.Contents) {
t.Fatalf("fragment '%s' should contain '%s', got '%s'",
fragment.Selector, fragment.Contents, text)
}
}
}
There are a couple of subtle things about this code and how we’ll use it.
First, it’s important to note that sometimes we want to verify that an
element occurs in the body, but the element doesn’t have any contents to
match. We can achieve this by passing a Fragment
that has an appropriate
selector and an empty string for contents. The empty string will always
pass the Contains
check, so we also add a length check to make sure that
the selector matched something.
Second, it’s important to be aware that a non-unique selector will match
multiple elements, and calling sel.Text()
on that selection will give the
text for all of those elements. When we want to be precise about the
order in which text shows up in the response, we will either have to write
a CSS selector using the order (e.g. :nth-child()
), or we will have to
use a different approach like sel.Each()
that we used above to verify
titles and authors in the book list.
Updating TestBookNewGet
We change the inner loop of TestBookNewGet
to this:
w := getHasStatus(t, db, "/books/new", http.StatusOK)
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
fragments := []Fragment{
{"h1", "Add a Book"},
{`form[action="/books/new"]`, ""},
{`input[id="title"]`, ""},
{`input[id="author"]`, ""},
{`button[type="submit"]`, "Save"},
}
docHasFragments(t, doc, fragments)
This creates a document using the response body.
It defines a slice of []Fragment
. In that slice, we change the string
fragment <h1>Add a Book</h1>
to a Fragment
with selector h1
and
contents Add a Book
. The next three Fragment
instances have selectors
to match elements in the form; all three of these have empty contents
because those elements have no text content. The last Fragment
matches
the button and verifies that it contains the text Save
.
Finally we call docHasFragments
with the doc and the slice we created to
verify that each Fragment
matches the document.
Updating TestBookNewPost
I’m not going to show the whole rewrite of TestBookNewPost
here because
it’s fairly long and the changes are sprinkled throughout. Instead let’s
look at the key changes. First, we change the test case struct to have a
slice of []Fragment
instead of strings:
tcs := []struct {
name string
data gin.H
setup func(*testing.T, *gorm.DB)
status int
fragments []Fragment
}{
Attempting to compile at this point will yield a bunch of errors because we
need to update all of the test case definitions. This is mostly a
mechanical exercise. The most interesting change is the "empty"
test
case:
{
// This makes the manual field validation fail because both
// title and author are empty.
name: "empty",
data: gin.H{},
status: http.StatusBadRequest,
fragments: []Fragment{
{"div.error-message", "Author is required, but was empty"},
{"div.error-message", "Title is required, but was empty"},
},
},
Note that the selector is the same for each Fragment
. Recall what I
mentioned above about non-unique selectors. Each of these selectors is
going to query div.error-message
, which should match two elements. So
each of these fragments will have text content that matches both of the
error messages. In my opinion, it’s ok that we’re matching this way: it
makes the test less fragile.
We could change the author fragment to use the selector
div.error-message:nth-child(2)
, and this would make it so that only the
"Author is required"
error message is in the text. Similarly for the
title fragment, using div.error-message:nth-child(1)
. This would be more
precise, and if order was a critical aspect of the error messages, it would
make sense to test this way. However, in this case, we don’t care about the
order in which the messages appear, so we use this less precise matching
for these messages. This test behavior matches what we had before the
refactoring: we only cared that the error messages showed up somewhere in
the text, without regard to the specific ordering of the error messages.
In the body of the loop, we change how we perform the check to:
if tc.fragments != nil {
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
docHasFragments(t, doc, tc.fragments)
}
This is a simple replacement of the old bodyHasFragments
pattern to the
new docHasFragments
pattern.
Finally, at the bottom of the loop we change the check for the flash message:
if tc.fragments != nil {
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
docHasFragments(t, doc, tc.fragments)
}
By now the application of the pattern should be familiar. The only
interesting part of this change is that we were previously searching for
the HTML-escaped pattern 'Book1'
, we’re now searching for
'Book1'
because goquery is transforming the HTML-escaped sequence back
to unescaped when we get the text from the Selection
.
One Last Refactor
This little code sequence feels tedious:
doc, err := goquery.NewDocumentFromReader(w.Body)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
We replace this with a little helper function that checks for error and
always returns just a *goquery.Document
so that all of our test code can
skip the extra error checks:
func mustDocumentFromReader(t *testing.T, r io.Reader) *goquery.Document {
t.Helper()
doc, err := goquery.NewDocumentFromReader(r)
if err != nil {
t.Fatalf("NewDocumentFromReader error: %s", err)
}
return doc
}
Replacing all the spots where we have this pattern in the code is straightforward; I’m not going to show that here.
Next Week Month
Work on my book has been taking up most of my writing time, so it’s been a few months since the last update here, but I’m caught up enough that I should be able to get back to regular posts here – my target is to publish something new each month.
Starting next month I’ll cover more Gorm usage, including associations as
we add a new model for the user to maintain lists like “To Read” and
“Read”, and migrations as we add a “rating” field to the Book
model.