Testing Generated HTML with goquery

January 30, 2023 - 12 minutes read - 2352 words

This is the twelfth in a series of articles about writing a small reading list app in Go for personal use.

When I first introduced tests for this app, I showed a strategy of checking for “fragments” in the body of the page – these are just strings, including HTML markup, that the test will verify are present in the generated page.

This approach works, but it’s fragile: trivial changes to a generated page like spaces or newlines can trigger test failures that don’t indicate real bugs in the app. The tests will only be failing because they’re too tightly coupled to the output format.

In this post I’ll show a better way to validate the contents of generated pages using the goquery package.

*Time Lapse Waterfall. Photo by Vojta Kovařík*

Overview of this Strategy

The goal of our test is to verify that the app generates a list of the books in the database. The initial test achieved that by looking for content like <li>Book1 -- Author1</li> in the page. But that test broke when we added a class to the element: <li class="book">.

Goquery lets us inspect the page content using selector queries, which we’ll write in a way that will make the tests less fragile than we have right now. For example, we can convert the verification above to a query for the CSS selector li.book, and then verify that the text contents contains Book1. We can also query for li.author and check that the text contains Author1.

Implementing Tests Using the Query Strategy

First, we go get the goquery package:

% go get github.com/PuerkitoBio/goquery
go: added github.com/PuerkitoBio/goquery v1.8.0
go: added github.com/andybalholm/cascadia v1.3.1

In main_test.go, add github.com/PuerkitoBio/goquery to the imports list, and then rewrite TestBookIndexTable to use goquery instead of the previous fragment-matching approach. The whole function is shown here first – I’ll break it down in chunks below.

func TestBookIndexTable(t *testing.T) {
   t.Parallel()
   tcs := []struct {
           name  string
           count int
   }{
           {"empty", 0},
           {"single", 1},
           {"multiple", 10},
   }

   for _, tc := range tcs {
           t.Run(tc.name, func(t *testing.T) {
                   t.Parallel()
                   db := freshDb(t)
                   books := createBooks(t, db, tc.count)

                   w := getHasStatus(t, db, "/books/", http.StatusOK)
                   doc, err := goquery.NewDocumentFromReader(w.Body)
                   if err != nil {
                           t.Fatalf("NewDocumentFromReader error: %s", err)
                   }

                   // Check the page header.
                   h1 := doc.Find("h1").Text()
                   if h1 != "My Books" {
                           t.Errorf("expected h1 'My Books', got '%s'", h1)
                   }

                   // 1. Get all of the <span class="title"> elements.
                   // 2. Verify we get the correct number.
                   // 3. Iterate over the selections, checking that the content of
                   //    each one matches the corresponding book title.
                   titleSpans := doc.Find("span.title")
                   if tc.count != titleSpans.Length() {
                           t.Fatalf("expected %d span.title elements, got %d",
                                   tc.count, titleSpans.Length())
                   }
                   titleSpans.Each(func(i int, s *goquery.Selection) {
                           title := books[i].Title
                           if title != s.Text() {
                                   t.Errorf("span.title[%d] expected '%s', got '%s'",
                                           i, title, s.Text())
                           }
                   })

                   // Do the same thing for authors.
                   authorSpans := doc.Find("span.author")
                   if tc.count != authorSpans.Length() {
                           t.Fatalf("expected %d span.author elements, got %d",
                                   tc.count, authorSpans.Length())
                   }
                   authorSpans.Each(func(i int, s *goquery.Selection) {
                           author := books[i].Author
                           if author != s.Text() {
                                   t.Errorf("span.author[%d] expected '%s', got '%s'",
                                           i, author, s.Text())
                           }
                   })
           })
   }
}

Creating a goquery Document

The setup and test case definitions of this test function stay the same, but after getting the response from getHasStatus, we create a goquery.Document:

                       w := getHasStatus(t, db, "/books/", http.StatusOK)
                   doc, err := goquery.NewDocumentFromReader(w.Body)
                   if err != nil {
                           t.Fatalf("NewDocumentFromReader error: %s", err)
                   }

The NewDocumentFromReader function creates a document when given a Reader. We pass it the Body reader from the response.

The old version of this test was looking for <h1>My Books</h1> in the page. In the new version, we query the document for h1 and inspect the text contents of the element that we find:

                   // Check the page header.
                   h1 := doc.Find("h1").Text()
                   if h1 != "My Books" {
                           t.Errorf("expected h1 'My Books', got '%s'", h1)
                   }

We call the Find method on the document, with h1 as the selector. This returns a goquery.Selection struct. We then call the Text method on that struct to get the content, and compare it to our expected title.

Note that if the h1 was missing from the page, we would still get back a Selection struct, but it would be empty, so calling Text would give us an empty string, and the test would fail.

Checking the Titles

The old version of this test looped over the books slice and verified that it contained text like <span class="title">Book1</span>.

The new version takes a different approach based on querying multiple elements using qoguery:

                      // 1. Get all of the <span class="title"> elements.
                   // 2. Verify we get the correct number.
                   // 3. Iterate over the selections, checking that the content of
                   //    each one matches the corresponding book title.
                   titleSpans := doc.Find("span.title")
                   if tc.count != titleSpans.Length() {
                           t.Fatalf("expected %d span.title elements, got %d",
                                   tc.count, titleSpans.Length())
                   }
                   titleSpans.Each(func(i int, s *goquery.Selection) {
                           title := books[i].Title
                           if title != s.Text() {
                                   t.Errorf("span.title[%d] expected '%s', got '%s'",
                                           i, title, s.Text())
                           }
                   })

First we call Find using the selector span.title. This selector will match multiple elements when there are multiple books listed in the page. We use the Selection.Length method to see how many matches we got, and compare this to the number of books that this test case inserted.

Then, assuming it matched, we use the Selection.Each method to run an inline function for each of the elements in the selection. This inline function is passed an integer that is the index of the match in the selection, starting with zero, and a *Selection that contains just the current element.

We use the integer to index into the books slice. This is safe because the Length check above guarantees that we have exactly the same number of elements in the selection as there are entries in the slice. Then we compare the title from the current Book struct to the text of the element.

Checking the Authors

The author check is nearly identical to the title check above:

                   // Do the same thing for authors.
                   authorSpans := doc.Find("span.author")
                   if tc.count != authorSpans.Length() {
                           t.Fatalf("expected %d span.author elements, got %d",
                                   tc.count, authorSpans.Length())
                   }
                   authorSpans.Each(func(i int, s *goquery.Selection) {
                           author := books[i].Author
                           if author != s.Text() {
                                   t.Errorf("span.author[%d] expected '%s', got '%s'",
                                           i, author, s.Text())
                           }
                   })

Refactoring `bodyHasFragments` to the Query Strategy

Our tests have an existing test helper function bodyHasFragments, which checks that all the strings it is given are present in the response body.

I like the new approach of testing using selectors: it’s more precise and less fragile. Let’s refactor bodyHasFragments to a new function that will allow us to verify a set of selectors contain some specified contents. We’ll call the new function docHasFragments. Since it’s a test helper function we’ll want to make sure to pass a *testing.T as the first argument. We also want it to operate on a *goquery.Document so we’ll have that as the second argument.

We want it to verify that a given selector contains certain contents, like we did above when we checked that h1 contained My Books. We could have it take two string arguments: selector and contents. However, that wouldn’t let us verify multiple fragments like we can now with bodyHasFragments.

It would be nice if it could take a slice of selectors and contents, and verify each of them. To make that possible, we can define a struct, Fragment that has Selector and Contents string fields. Then docHasFragments could take a slice of those and verify the doc has each selector with the given contents.

Here’s what that code looks like:

type Fragment struct {
   Selector string
   Contents string
}

func docHasFragments(t *testing.T, doc *goquery.Document, fragments []Fragment) {
   t.Helper()
   for _, fragment := range fragments {
           sel := doc.Find(fragment.Selector)
           if sel.Length() == 0 {
                   t.Errorf("fragment '%s' not found", fragment.Selector)
                   return
           }
           text := sel.Text()
           if !strings.Contains(text, fragment.Contents) {
                   t.Fatalf("fragment '%s' should contain '%s', got '%s'",
                           fragment.Selector, fragment.Contents, text)
           }
   }
}

There are a couple of subtle things about this code and how we’ll use it.

First, it’s important to note that sometimes we want to verify that an element occurs in the body, but the element doesn’t have any contents to match. We can achieve this by passing a Fragment that has an appropriate selector and an empty string for contents. The empty string will always pass the Contains check, so we also add a length check to make sure that the selector matched something.

Second, it’s important to be aware that a non-unique selector will match multiple elements, and calling sel.Text() on that selection will give the text for all of those elements. When we want to be precise about the order in which text shows up in the response, we will either have to write a CSS selector using the order (e.g. :nth-child()), or we will have to use a different approach like sel.Each() that we used above to verify titles and authors in the book list.

Updating `TestBookNewGet`

We change the inner loop of TestBookNewGet to this:

                  w := getHasStatus(t, db, "/books/new", http.StatusOK)
                   doc, err := goquery.NewDocumentFromReader(w.Body)
                   if err != nil {
                           t.Fatalf("NewDocumentFromReader error: %s", err)
                   }
                   fragments := []Fragment{
                           {"h1", "Add a Book"},
                           {`form[action="/books/new"]`, ""},
                           {`input[id="title"]`, ""},
                           {`input[id="author"]`, ""},
                           {`button[type="submit"]`, "Save"},
                   }
                   docHasFragments(t, doc, fragments)

This creates a document using the response body.

It defines a slice of []Fragment. In that slice, we change the string fragment <h1>Add a Book</h1> to a Fragment with selector h1 and contents Add a Book. The next three Fragment instances have selectors to match elements in the form; all three of these have empty contents because those elements have no text content. The last Fragment matches the button and verifies that it contains the text Save.

Finally we call docHasFragments with the doc and the slice we created to verify that each Fragment matches the document.

Updating `TestBookNewPost`

I’m not going to show the whole rewrite of TestBookNewPost here because it’s fairly long and the changes are sprinkled throughout. Instead let’s look at the key changes. First, we change the test case struct to have a slice of []Fragment instead of strings:

 tcs := []struct {
           name      string
           data      gin.H
           setup     func(*testing.T, *gorm.DB)
           status    int
           fragments []Fragment
   }{

Attempting to compile at this point will yield a bunch of errors because we need to update all of the test case definitions. This is mostly a mechanical exercise. The most interesting change is the "empty" test case:

                {
                   // This makes the manual field validation fail because both
                   // title and author are empty.
                   name:   "empty",
                   data:   gin.H{},
                   status: http.StatusBadRequest,
                   fragments: []Fragment{
                           {"div.error-message", "Author is required, but was empty"},
                           {"div.error-message", "Title is required, but was empty"},
                   },
           },

Note that the selector is the same for each Fragment. Recall what I mentioned above about non-unique selectors. Each of these selectors is going to query div.error-message, which should match two elements. So each of these fragments will have text content that matches both of the error messages. In my opinion, it’s ok that we’re matching this way: it makes the test less fragile.

We could change the author fragment to use the selector div.error-message:nth-child(2), and this would make it so that only the "Author is required" error message is in the text. Similarly for the title fragment, using div.error-message:nth-child(1). This would be more precise, and if order was a critical aspect of the error messages, it would make sense to test this way. However, in this case, we don’t care about the order in which the messages appear, so we use this less precise matching for these messages. This test behavior matches what we had before the refactoring: we only cared that the error messages showed up somewhere in the text, without regard to the specific ordering of the error messages.

In the body of the loop, we change how we perform the check to:

                       if tc.fragments != nil {
                           doc, err := goquery.NewDocumentFromReader(w.Body)
                           if err != nil {
                                   t.Fatalf("NewDocumentFromReader error: %s", err)
                           }
                           docHasFragments(t, doc, tc.fragments)
                   }

This is a simple replacement of the old bodyHasFragments pattern to the new docHasFragments pattern.

Finally, at the bottom of the loop we change the check for the flash message:

                     if tc.fragments != nil {
                           doc, err := goquery.NewDocumentFromReader(w.Body)
                           if err != nil {
                                   t.Fatalf("NewDocumentFromReader error: %s", err)
                           }
                           docHasFragments(t, doc, tc.fragments)
                   }

By now the application of the pattern should be familiar. The only interesting part of this change is that we were previously searching for the HTML-escaped pattern 'Book1', we’re now searching for 'Book1' because goquery is transforming the HTML-escaped sequence back to unescaped when we get the text from the Selection.

One Last Refactor

This little code sequence feels tedious:

      doc, err := goquery.NewDocumentFromReader(w.Body)
        if err != nil {
                t.Fatalf("NewDocumentFromReader error: %s", err)
        }

We replace this with a little helper function that checks for error and always returns just a *goquery.Document so that all of our test code can skip the extra error checks:

func mustDocumentFromReader(t *testing.T, r io.Reader) *goquery.Document {
   t.Helper()
   doc, err := goquery.NewDocumentFromReader(r)
   if err != nil {
           t.Fatalf("NewDocumentFromReader error: %s", err)
   }
   return doc
}

Replacing all the spots where we have this pattern in the code is straightforward; I’m not going to show that here.

Next Week Month

Work on my book has been taking up most of my writing time, so it’s been a few months since the last update here, but I’m caught up enough that I should be able to get back to regular posts here – my target is to publish something new each month.

Starting next month I’ll cover more Gorm usage, including associations as we add a new model for the user to maintain lists like “To Read” and “Read”, and migrations as we add a “rating” field to the Book model.

Overview of this Strategy

Implementing Tests Using the Query Strategy

Creating a goquery Document

Checking the Page Header

Checking the Titles

Checking the Authors

Refactoring bodyHasFragments to the Query Strategy

Updating TestBookNewGet

Updating TestBookNewPost

One Last Refactor

Next Week Month

Subscribe to Universal Glue

Refactoring `bodyHasFragments` to the Query Strategy

Updating `TestBookNewGet`

Updating `TestBookNewPost`