Go maps vs switches

Posted on 04 Mar 2021
Tagged with: [ go ]  [ benchmarking

Sometimes, things aren’t faster because you think it is,.. but because you benchmarked them. One of Go’s nice things is that it makes it easy to benchmark things to see if your hunches are correct quickly. And sometimes, they turn out not.

Switch vs. maps

Once in a while, I need to map a string, int, or another type to something else. This usually turns out to become a switch statement where I map each element to something else. It works fine for a small number of elements, but a large list of case statements isn’t very readable as soon as you have a lot. It also increases your gocyclo complexity.

    var message
    switch (status) {
        case 200: 
            message = "ok"
        case 404:
            message = "not found"
        // insert a lot of extra elements
    }

So I tend to use maps instead. That way, I have a small amount of code plus a readable map that doesn’t suffer for readability:

    var httpCodes map[int]string{
        200: "ok",
        404: "not found",
        // much cleaner
    }
    
    message, ok := httpCodes[status]
    if !ok {
       // maybe set something default or return error
    } 

Since both are lookup tables, you would expect both to be pretty similar in performance. Or at least, so I thought. Let’s see what happens when we do a benchmark on both:

package main

import (
	"testing"
)

var httpStatus = map[int]string{
	200: "ok",
	404: "not found",
}

var result = make([]string, 3)

func doMap(code int) string {
	status, ok := httpStatus[code]
	if !ok {
		status = ""
	}

	return status
}

func doSwitch(code int) string {
	var message = ""

	switch (code) {
	case 200:
		message = "ok"
	case 404:
		message = "not found"
	}

	return message
}

func BenchmarkSwitch(b *testing.B) {
	r := make([]string, 3)

	for n:=0; n<b.N; n++ {
		r[0] = doSwitch(200)
		r[1] = doSwitch(404)
		r[2] = doSwitch(999)
	}

	result = r
}

func BenchmarkMap(b *testing.B) {
	r := make([]string, 3)

	for n:=0; n<b.N; n++ {
		r[0] = doMap(200)
		r[1] = doMap(404)
		r[2] = doMap(999)
	}

	result = r
}

Running it with go test --bench=. gives us the following result:

    $ go test --bench=.
    goos: linux
    goarch: amd64
    BenchmarkSwitch-12      808913143                1.47 ns/op
    BenchmarkMap-12         80380466                14.5 ns/op
    PASS

There was an issue that it seems that Go had optimized the whole testing code out, resulting in a way too low ns/op. I’ve added some additional code that should not optimize the functionality we want to test out.

It seems that the map lookup is almost 10 times slower (still fast, though). I’ve compiled the code to assembly to figure out what’s going on. It seems that the switch is fast because it is not directly a jump-table (as I expected), but a linked list (is the code 200, no goto next check etc).

       0x0000 00000 (map_test.go:26)  MOVQ   "".code+8(SP), AX
       0x0005 00005 (map_test.go:26)  CMPQ   AX, $200
       0x000b 00011 (map_test.go:26)  JNE    36
           0x000d 00013 (map_test.go:26)  PCDATA $0, $1
           0x000d 00013 (map_test.go:26)  LEAQ   go.string."ok"(SB), AX
           0x0014 00020 (map_test.go:26)  MOVL   $2, CX
           0x0019 00025 (map_test.go:32)  PCDATA $0, $0
           0x0019 00025 (map_test.go:32)  PCDATA $1, $1
           0x0019 00025 (map_test.go:32)  MOVQ   AX, "".~r1+16(SP)
           0x001e 00030 (map_test.go:32)  MOVQ   CX, "".~r1+24(SP)
           0x0023 00035 (map_test.go:32)  RET
        0x0024 00036 (map_test.go:28)  PCDATA $1, $0
       0x0024 00036 (map_test.go:28)  CMPQ   AX, $404
       0x002a 00042 (map_test.go:28)  JNE    58
           0x002c 00044 (map_test.go:28)  PCDATA $0, $1
           0x002c 00044 (map_test.go:28)  LEAQ   go.string."not found"(SB), AX
           0x0033 00051 (map_test.go:28)  MOVL   $9, CX
        0x0038 00056 (map_test.go:25)  JMP    25
       0x003a 00058 (map_test.go:25)  XORL   AX, AX
       0x003c 00060 (map_test.go:25)  XORL   CX, CX
       0x003e 00062 (map_test.go:25)  JMP    25

However, the map takes a lot more time. This is probably because it calls runtime.mapaccess2_fast64(SB), which might still be a fast access call, but it is still a call and compared to a simple compare in the switch code quite slow.

Mapping from string to int

So let’s turn things around: instead of mapping ints to strings. Let’s map the string to an int instead.

package main

import (
	"testing"
)

var httpStatus = map[string]int{
	"ok": 200,
	"not found": 404,
}

func doMap(message string) int {
	code, ok := httpStatus[message]
	if !ok {
		code = 0
	}

	return code
}

var result = make([]int, 3)

func doSwitch(message string) int {
	var code int

	switch (message) {
	case "ok":
		code = 200
	case "not found":
		code = 404
	}

	return code
}

func BenchmarkSwitch(b *testing.B) {
	r := make([]int, 3)

	for n:=0; n<b.N; n++ {
		r[0] = doSwitch("ok")
		r[1] = doSwitch("not found")
		r[2] = doSwitch("not implemented")
	}

	result = r
}

func BenchmarkMap(b *testing.B) {
	r := make([]int, 3)

	for n:=0; n<b.N; n++ {
		r[0] = doMap("ok")
		r[1] = doMap("not found")
		r[2] = doMap("not implemented")
	}

	result = r
}

And the result:

    goos: linux
    goarch: amd64
    BenchmarkSwitch-12      697366684                1.67 ns/op
    BenchmarkMap-12         100000000               11.4 ns/op
    PASS
    ok      command-line-arguments  2.498s

It seems that both functions perform pretty much the same.

When compiled to assembler, we see that the map functionality compiles into pretty much identical code, except now we have a call to runtime.mapaccess2_faststr. The switch statement has changed too. Instead of comparing simple ints, it first checks the length of a given string against the length of the string we match (so in the first case, it checks if the length of message is 2). If this isn’t equal, there is no point in checking the rest of the string, but if it does match, it will continuously load 8 bytes of string and compare it against the message (8 bytes, since that what can fit in the 64-bit register). Still, it’s a fast process because of the fast-failures in the system.

So which one should we be using?

So it looks like the switch wins, but it seems to have aO(N) complexity because of the iteration overall cases, while the map has an O(1) complexity. It doesn’t necessarily mean that the map is better than the switch, since we need to consider the number of elements (it takes a while before the O(1) is faster), and of course, readability is an issue. It depends on the situation what you would use, and in some cases, I will go for the maps, and other cases for the switch. But if speed is your concern, benchmark your stuff!