11 Semgrep Rules for Go Web Projects

March 29, 2022 - 16 minutes read - 3305 words

I’ve mentioned semgrep a few times in recent articles, and I thought it would be good to introduce this new(ish) tool and demonstrate a few rules that you can use to find problems in your Go web apps.

At the end of this article you will:

understand what semgrep is and what it can do
have some idea of the limits of semgrep’s power
have some rules that you can immediately apply to your own projects

Fair warning: I am not a semgrep expert by any stretch of the imagination. If you are, and you think these rules can be improved, please drop a note to brian at universalglue.dev.

*Semgrep isn’t named for semaphore flags… but it does offer a pretty good signal.*

What is semgrep anyway?

Semgrep is a static analysis tool that allows users to write custom rules that match against patterns found in source code. The tool comes with rules to detect errors – especially security errors – in multiple languages. It has support for a bunch of programming languages including Go.

There is an interactive tutorial that I highly recommend. It only takes a few minutes to go through it.

For this article, I’m just going to show rules and sample code that exercises the rules. Installation is simple. It has a python cli, and can be installed into a python virtual environment in your local directory (this “just worked” for me on debian and ubuntu systems):

$ python3 -m venv ./venv
$ . .venv/bin/activate
(.venv) $ pip install semgrep
(.venv) $ semgrep --version

We’ll put all of our rules and sample code in a rules/ directory:

(.venv) $ mkdir rules

Rule 1: AbortWithStatus Should Immediately be Followed by return

Last week I mentioned that calling c.AbortWithStatus without calling return is an easy error to make when writing a gin handler. Let’s see how we can catch that error with a simple semgrep rule.

At the top level of this rule is patterns, which means that all the chid elements must match.

The first child is pattern-either, which will match if any of its children match. Its children are simple patterns that match any of the AbortWithXXX function calls we want to make sure are followed by a return.

The next three children are pattern-not-inside. If any of these patterns are present in the code, the child will fail to match, and the toplevel will not match.

The overall effect of this rule is to say “if any of these match BUT not if any of these other things match”, then log a warning.

Here is rules/abortwithstatus-followed-by-return.yaml:

---
rules:
- id: abortwithstatus-followed-by-return
  languages: [go]
  message: c.AbortWithError, AbortWithStatus, and AbortWithStatusJSON should always be followed by return
  severity: WARNING
  patterns:
  - pattern-either:
    - pattern: $C.AbortWithError(...)
    - pattern: $C.AbortWithStatus(...)
    - pattern: $C.AbortWithStatusJSON(...)
  - pattern-not-inside: |
      $C.AbortWithError(...)
      return      
  - pattern-not-inside: |
      $C.AbortWithStatus(...)
      return      
  - pattern-not-inside: |
      $C.AbortWithStatusJSON(...)
      return

We can test rules by putting some code in a file of the same name but with the target extension. In comments in the file we identify each line that should trigger with ruleid: and the name of the rule. On lines that should not trigger, use ok: and the name of the rule. The test runner will ensure that lines preceded by the former comments trigger reports, and none of lines with the latter comments trigger reports.

Run tests like this:

(.venv) $ semgrep -q --test rules
✓ All tests passed!

(Yes, it’s almost too meta that there are tests for what are almost like tests. But it’s a handy way to make sure the rule works, especially while learning how to use the tool.)

Here is rules/abortwithstatus-followed-by-return.go:

package main

import (
   "log"
   "net/http"

   "github.com/gin-gonic/gin"
)

func test1(c *gin.Context) {
   // ok: abortwithstatus-followed-by-return
   c.AbortWithError(http.StatusInternalServerError)
   return
}

func test2(c *gin.Context) {
   // ruleid: abortwithstatus-followed-by-return
   c.AbortWithError(http.StatusInternalServerError)
   log.Printf("asdf")
}

func test3(c *gin.Context) {
   if true {
           // ok: abortwithstatus-followed-by-return
           c.AbortWithStatus(http.StatusBadRequest)
           return
   }
}

func test4(c *gin.Context) {
   if false {
           // ruleid: abortwithstatus-followed-by-return
           c.AbortWithStatus(http.StatusInternalServerError)
           log.Printf("asdf")
   }
}

func test5(c *gin.Context) {
   // ok: abortwithstatus-followed-by-return
   c.AbortWithStatusJSON(http.StatusInternalServerError)
   return
}

func test6(c *gin.Context) {
   // ruleid: abortwithstatus-followed-by-return
   c.AbortWithStatusJSON(http.StatusInternalServerError)
   log.Printf("asdf")
}

func test7(c *gin.Context) {
   // ruleid: abortwithstatus-followed-by-return
   c.AbortWithStatusJSON(http.StatusInternalServerError)
   log.Printf("other stuff in between is not allowed")
   return
}

Rule 2: Handler Naming Scheme Enforcement

If I don’t have a strong naming scheme, my function names tend to end up a jumbled mix. This rule shows how to enforce a naming scheme.

The first pattern matches a Gin handler function. Note the use of a metavariable $FUNC to match the function name.

The second pattern applies to that regex. It uses pattern-not-regex to match (triggering a warning) whenever the function name does not match the given pattern.

Even if you hate my naming scheme, hopefully this is clear enough that you can implement a rule for whatever you prefer.

This goes in rules/handler-naming.yaml, with tests in rules/handler-naming.go:

---
rules:
- id: handler-naming
  languages: [go]
  message: Naming of handlers should be <thing><CrudAction><Method>
  severity: WARNING
  patterns:
  - pattern: |
      func $FUNC($C *gin.Context) {
        ...
      }      
  - metavariable-pattern:
      metavariable: $FUNC
      patterns:
      # Regex alternatives avoid having a name like thingDeleteDelete...
      - pattern-not-regex: "^[a-z]+((Index|Show|New|Edit)(Get|Post|Patch)|Delete)$"

package main

import (
   "github.com/gin-gonic/gin"
)

// missing http method
// ruleid: handler-naming
func fooIndex(c *gin.Context) {}

// missing crudAction
// ruleid: handler-naming
func bookGet(c *gin.Context) {}

// useless repetition
// ruleid: handler-naming
func thingDeleteDelete(c *gin.Context) {}

// XXX consider allowing exported handlers
// ruleid: handler-naming
func ThingIndexGet(c *gin.Context) {}

// ok: handler-naming
func thingIndexGet(c *gin.Context) {}

// ok: handler-naming
func thingShowGet(c *gin.Context) {}

// ok: handler-naming
func thingNewPost(c *gin.Context) {}

// ok: handler-naming
func thingEditPatch(c *gin.Context) {}

// ok: handler-naming
func thingDelete(c *gin.Context) {}

// ok: handler-naming
func AnythingGoes() {}

Rule 3: r.GET/r.POST use correct handler

Email subscribers know that using snippets can greatly reduce copy-paste errors, but if you’re someone who hasn’t jumped on the snippet bandwagon yet, you might still occasionally make a copy-paste error.

One area I’ve made this mistake is when adding a new route. It’s easy, just copy-paste an existing route, change a couple of things, and you’re done… unless you forget to change the handler name, and you connect a new POST route to an existing GET handler.

This is very likely to be caught in tests, but it might be convenient to have a rule to warn you before you have to catch it in a test.

The first child says that this should only match inside a handler function. Note that it uses the metavariable $R to match the router variable. This is used below.

The second child says that either of its children can match. And then each of those children is also a list of subpatterns that all must match. The first subpattern matches on a call to the router’s ($R) GET method, capturing the $HANDLER function name in a metavariable. And then there’s a metavariable subpattern that will match (triggering a warning) if that handler does not match a regex, in this case ending with “Get”, which the previous rule requires that all GET handlers match.

A similar pair of rules handles POST handlers. Adding rules and tests for the other methods you have in your app is left as a fun exercise for the reader.

This goes in rules/route-handlers.yaml and rules/route-handlers.go:

---
rules:
- id: route-handlers
  languages: [go]
  message: Make sure route handler functions match the method
  severity: WARNING
  patterns:
  - pattern-inside: |
      func $FUNC($R *gin.Engine) {
        ...
      }      
  - pattern-either:
    - patterns:
      - pattern: $R.GET(..., $HANDLER)
      - metavariable-pattern:
          metavariable: $HANDLER
          patterns:
          - pattern-not-regex: "Get$"
    - patterns:
      - pattern: $R.POST(..., $HANDLER)
      - metavariable-pattern:
          metavariable: $HANDLER
          patterns:
          - pattern-not-regex: "Post$"

package main

import "gin-gonic/gin"

func setupRoutes(r *gin.Engine) {
   // ruleid: route-handlers
   r.POST("/blah", blahIndexGet)

   // ruleid: route-handlers
   r.GET("/blah", blahIndexPost)

   // ruleid: route-handlers
   r.GET("/blah", blahIndexHandler)

   // ok: route-handlers
   r.POST("/blah", blahIndexPost)

   // ok: route-handlers
   r.GET("/blah", blahIndexGet)
}

Rule 4: Templates Match Naming Scheme

Just like with handler functions, it’s easy to end up with templates that have apparently random naming conventions.

It’s also pretty easy to enforce a convention. Note that this rule uses the generic language – which is still an experimental part of semgrep.

This rule enforces a two-level template naming scheme.

The first pattern-inside only allows matches inside a template. Note that this only matches on a two-level template. If you have one- or three-level templates then it won’t match. It also won’t match if you have a “.html” suffix in the template name. Adjust as needed for your codebase.

The metavariable-regex uses a negative lookahead assertion – it won’t match if this template is in the base directory. This is where I keep templates that build the foundation of other templates. If you have other directories like this, or “special” directories, you could add them to this regex so they don’t trigger the rule.

The last pattern uses another negative lookahead to avoid matching when the page conforms to the allowed list of template types.

This rule is pretty rigid – a real-world app would need to be more flexible. But this gives a demonstration of how such a rule can work.

Put this in rules/template-naming.yaml and rules/template-naming.html:

---
rules:
- id: template-naming
  languages: [generic]
  paths:
    include:
    - "*.html"
  message: html template does not conform to naming scheme
  severity: WARNING
  patterns:
  - pattern-inside: |
      {{ define "$DIR/$PAGE" }}
      ...
      {{ end }}      
  - metavariable-regex:
      metavariable: $DIR
      regex: (?!base)
  - metavariable-regex:
      metavariable: $PAGE
      regex: '(?!^(delete|edit|new|list|show)$)'

// ok: template-naming
{{ define "base/blah" }} anything {{ end }}

// ruleid: template-naming
{{ define "thing/blah" }} anything {{ end }}

// ok: template-naming
{{ define "stuff/new" }} anything {{ end }}

Rule 5: Templates Contain Header/Footer

Except for templates in base/, we want all templates to include the header and footer. If we have a reliable semgrep rule for this, we don’t have to add tests that look for a fragment from the header and footer in all the pages.

As with the previous rule, this also uses the generic language.

The first pattern matches inside a (two level) template and the second pattern avoids matching in the base directory.

The third pattern-not-inside avoids matching (and thus avoids triggering a warning) if the template incudes both a header at the top and a footer at the bottom. If we wanted to be slightly more flexible with the placement (eg. allowing other content above the header or below the footer) we could add ellipses above the header or below the footer.

It’s also worth noting that semgrep’s generic language will only match ten lines with an ellipsis. I’ve got five in this rule, so this will work with templates up to 50 lines long. This works for me but you may need to adjust if you have very long templates.

This goes in rules/template-header-footer.yaml and rules/template-header-footer.html:

rules:
- id: template-header-footer
  languages: [generic]
  paths:
    include:
    - "*.html"
  message: html templates include header+footer
  severity: WARNING
  patterns:
  - pattern-inside: |
      {{ define "$DIR/$PAGE" }}
      ...
      {{ end }}      
  - metavariable-regex:
      metavariable: $DIR
      regex: (?!base)
  - pattern-not-inside: |
      {{ define "$DIR/$PAGE" }}
      {{ template "base/header" . }}
      ... ... ... ... ...
      {{ template "base/footer" . }}
      {{ end }}

// ok: template-header-footer
{{ define "base/blah" }}
  anything
{{ end }}

// ruleid: template-header-footer
{{ define "thing/blah" }}
  anything
{{ end }}

// ok: template-header-footer
{{ define "stuff/new" }}
{{ template "base/header" . }}
  anything
{{ template "base/footer" . }}
{{ end }}

// ruleid: template-header-footer
{{ define "stuff/new" }}
  anything
{{ template "base/footer" . }}
{{ end }}

// ruleid: template-header-footer
{{ define "stuff/new" }}
{{ template "base/header" . }}
  anything
{{ end }}

Rule 6: Templates Post to Self

In the article on handling forms in Gin, I showed a form that is loaded from /books/new and is posted to /books/new. That pattern is something we’ll see show up in multiple places.

This is another place where copy-paste errors can show up: if you copy a template from, say, /books/new into /authors/new and forget to change the action= attribute in the form, then your app will have a bug. You can catch this with a test, but you have to explicitly remember to add a fragment looking for the correct action= in every form.

Here’s a rule that uses patterns similar to what I’ve shown in previous rules to enforce a convention that all forms must post to the route that matches the template in which they are contained.

Put this in rules/template-posts-to-self.yaml and rules/template-posts-to-self.html:

---
rules:
- id: template-posts-to-self
  languages: [generic]
  paths:
    include:
    - "*.html"
  message: html template form should post to itself
  severity: WARNING
  patterns:
  - pattern-inside: |
      {{ define "$DIR/$TEMPLATE" }}
      ...
      {{ end }}      
  - pattern: <form action="...">...</form>
  - pattern-not: <form action="/$DIR/$TEMPLATE">...</form>

{{ define "abc/xyz" }}
<!-- ok: template-posts-to-self -->
<form action="/abc/xyz"></form>
<!-- ruleid: template-posts-to-self -->
<form action="/mno/xyz"></form>
<!-- ruleid: template-posts-to-self -->
<form action="abc/xyz"></form>
<!-- ruleid: template-posts-to-self -->
<form action="xyz"></form>
{{ end }}

Rule 7: Label Must Have Matching Input

Another convention is that every <label> must have an <input> with a matching name and id. Enforcing this in a test requires fragments for every label and name, and requires the html to conform to rigid matching, which makes them more fragile. (Or for the tests to use regex matching against the fragments, which makes them more complex.)

Note in the tests that the label and input are allowed to have other attributes before the for=, name=, and id=. This makes the rule less prone to false-positives. The extra pattern-not with the attributes in reversed order allows them to appear in the template in either order.

These go in rules/template-label-has-input.yaml and rules/template-label-has-input.html:

---
rules:
- id: template-label-has-input
  languages: [generic]
  paths:
    include:
    - "*.html"
  message: html template label must have corresponding input
  severity: WARNING
  patterns:
  - pattern-inside: |
      <label ... for="$NAME" ...>...</label>
      ...      
  - pattern: <input ...>
  - pattern-not: <input ... name="$NAME" ... id="$NAME" ...>
  - pattern-not: <input ... id="$NAME" ... name="$NAME" ...>

{{ define "abc" }}
<form action="/mno/xyz">

    <label for="aaa">Aaa</label>
    <!-- ok: template-label-has-input -->
    <input type="text" name="aaa" id="aaa">

    <label class="my-label" for="aaa">Aaa</label>
    <!-- ok: template-label-has-input -->
    <input name="aaa" type="text" class="xyz" id="aaa">

    <label class="my-label" for="aaa">Aaa</label>
    <!-- ok: template-label-has-input -->
    <input id="aaa" name="aaa" type="text" class="xyz">

    <label for="aaa">Aaa</label>
    <!-- ruleid: template-label-has-input -->
    <input type="text" name="bbb" id="aaa">

    <label for="aaa">Aaa</label>
    <!-- ruleid: template-label-has-input -->
    <input type="text" name="aaa" id="bbb">

    <label for="aaa">Aaa</label>
    <!-- ruleid: template-label-has-input -->
    <input type="text" name="aaa">

    <label for="aaa">Aaa</label>
    <!-- ruleid: template-label-has-input -->
    <input type="text" id="aaa">

</form>
{{ end }}

Rule 8: Tests Use t.Parallel

The final three rules enforce some conventions in test code.

The first convention is that all tests call t.Parallel. The rule does this by matching on any test function, where “test function” is defined as a function taking a single *testing.T argument.

Two pattern-not are use to exclude code that properly calls t.Parallel as the first statement in the function, and to exclude helper functions that call t.Helper.

Put these in rules/tests-are-parallel.yaml and rules/tests-are-parallel.go:

---
rules:
- id: tests-are-parallel
  languages: [go]
  message: test cases must call t.Parallel
  severity: WARNING
  patterns:
  - pattern: |
      func $F($T *testing.T) {
        ...
      }      
  - pattern-not: |
      func $F($T *testing.T) {
        $T.Parallel()
        ...
      }      
  - pattern-not: |
      func $F($T *testing.T) {
        $T.Helper()
        ...
      }

package main

import (
   "log"
   "testing"
)

// ruleid: tests-are-parallel
func test1(t *testing.T) {
   // Note: parallel must be called.
}

// ruleid: tests-are-parallel
func test2(t *testing.T) {
   // Note: parallel has to be called first.
   log.Print("abc")
   t.Parallel()
}

// ok: tests-are-parallel
func test3(t *testing.T) {
   // Note: parallel is called first. This is ok.
   t.Parallel()
   log.Print("abc")
}

// ruleid: tests-are-parallel
func testHelper1(t *testing.T) {
   // Note: helper has to be called first.
   log.Print("abc")
   t.Helper()
}

// ok: tests-are-parallel
func testHelper2(t *testing.T) {
   // Note: helper is called first. This is ok.
   t.Helper()
   log.Print("abc")
}

Rule 9: Helper Functions Never Return Error

Another convention for tests is that helper functions should not return errors.

This can be enforced with a rule that matches any function that has *testing.T in the argument list, calls t.Helper, and returns error.

The ellipses (..., $T *testing.T, ...) in the argument list will match if the *testing.T is anywhere in the argument list. In theory this should also work in the return list, but when I was building this rule I found what appears to be a semgrep bug – in the second pattern shown it doesn’t match error anywhere in the return list, it only matches in that specific position. If the linked bug is fixed, this rule would match more flexibly and it should only need the second pattern.

Here are rules/test-helpers-dont-return-error.yaml and rules/test-helpers-dont-return-error.go:

---
rules:
- id: test-helpers-dont-return-error
  languages: [go]
  message: test helpers must not return error
  severity: WARNING
  pattern-either:
  - pattern: |
      func $F(..., $T *testing.T, ...) error {
        $T.Helper()
        ...
      }      
  # XXX The pattern below doesn't work the way it should. See
  # https://github.com/returntocorp/semgrep/issues/4896
  - pattern: |
      func $F(..., $T *testing.T, ...) (..., error, ...) {
        $T.Helper()
        ...
      }

package main

import (
   "testing"
)

// ruleid: test-helpers-dont-return-error
func testHelper1(t *testing.T) error {
   t.Helper()
   return nil
}

// ok: test-helpers-dont-return-error
func testHelper2(t *testing.T) int {
   t.Helper()
   return 0
}

// ok: test-helpers-dont-return-error
func testHelper2(t *testing.T) {
   t.Helper()
}

Rule 10: t.Error + t.FailNow should be t.Fatal

This rule enforces a maintenance nit: instead of calling t.Error followed by t.FailNow, instead just call t.Fatal. It uses pattern-either to enforce this against four different flavors of the same code construct.

Put these in rules/error-failnow-fatal.yaml and rules/error-failnow-fatal.go:

---
rules:
- id: error-failnow-fatal
  languages: [go]
  message: t.Error or t.Log followed by t.FailNow should just call t.Fatal
  severity: WARNING
  pattern-either:
  - pattern: |
      $T.Error(...)
      $T.FailNow()      
  - pattern: |
      $T.Errorf(...)
      $T.FailNow()      
  - pattern: |
      $T.Log(...)
      $T.FailNow()      
  - pattern: |
      $T.Logf(...)
      $T.FailNow()

package main

import "testing"

func test1(t *testing.T) {
   // ruleid: error-failnow-fatal
   t.Error("abc")
   t.FailNow()
}

func test2(t *testing.T) {
   // ruleid: error-failnow-fatal
   t.Errorf("%s", "abc")
   t.FailNow()
}

func test3(t *testing.T) {
   // ruleid: error-failnow-fatal
   t.Log("abc")
   t.FailNow()
}

func test4(t *testing.T) {
   // ruleid: error-failnow-fatal
   t.Logf("%s", "abc")
   t.FailNow()
}

Rule 11: Naming Conventions for Handler Tests

This is just another naming convention. Let’s assume that any test that calls postHasStatus or getHasStatus is a handler-testing function. We can enforce that handler test names use a parallel construction to the handler functions.

The rule works by matching on test functions, where a $HELPER is called, and that helper matches a regex. If we add other similar helper functions we can add them to this regex.

The final regex match against $FUNC is similar to the regex in rule 2 above. Note that it is not anchored at the end, so that if multiple functions are needed to test a given handler they can be given unique suffixes.

Here are rules/test-handler-naming.yaml and rules/test-handler-naming.go:

---
rules:
- id: test-handler-naming
  languages: [go]
  message: Naming of tests for handlers should be test<Thing><CrudAction><Method><Any>
  severity: WARNING
  patterns:
  - pattern-inside: |
      func $FUNC($T *testing.T) {
        ...
        $HELPER($T, ...)
        ...
      }      
  - metavariable-regex:
      metavariable: $HELPER
      regex: "(post|get)HasStatus"
  - metavariable-regex:
      metavariable: $FUNC
      regex: "^test(?!([A-Z][a-zA-Z]*)(Index|Show|New|Edit)(Delete|Get|Patch|Post))"

package main

import "testing"

// Doesn't match because it doesn't use getHasStatus/postHasStatus.
// ok: test-handler-naming
func testFoo(t *testing.T) {}

// Has all the parts.
// ok: test-handler-naming
func testThingIndexGet(t *testing.T) {
   getHasStatus(t)
}

// Has all the parts, plus extra.
// ok: test-handler-naming
func testThingIndexGetStuff(t *testing.T) {
   getHasStatus(t)
}

// Has Thing, but missing crud+method.
// ruleid: test-handler-naming
func testFoo(t *testing.T) {
   getHasStatus(t)
}

// Missing "Thing"
// ruleid: test-handler-naming
func testIndexGet(t *testing.T) {
   getHasStatus(t)
}

// Missing method
// ruleid: test-handler-naming
func testThingIndex(t *testing.T) {
   getHasStatus(t)
}

// Wrong order for crud+method
// ruleid: test-handler-naming
func testThingGetIndex(t *testing.T) {
   getHasStatus(t)
}

Coming Up

On Friday we will integrate the tool and these rules into our book tracking project. (Including fixing up some code that conform to the conventions.)

Next week will look at Gin validation and error reporting.

semgrep