DEV Community

Bruce Du
Bruce Du

Posted on

Bugs in Golang caused by Zero Value and features of the gob library

0 Background

Just in the last year, the department platform project I was responsible for released a new version. This version also launched a new function, which is similar to a scheduled task. Everything was ok on the first day, but on the second day there were very few tasks that were not executed as expected (paused tasks continued to be executed, but stared tasks were not executed).

The first reaction of myself and another colleagues was that there were some issues with the logic of scheduled task execution. But after we spent a lot of time debugging and testing, we found that the root of the problem was not the functional logic, but a piece of underlying public code that had been online for a year and had not been touched. The core of this code is the protagonist of this article gob, and the root cause of the problem is a feature of the go language: zero value.

I will use a more simplified example to describe this BUG later.

1 gob and the Zero Value

Let’s briefly introduce gob and zero value first.

1.1 Zero Value

The zero value is a feature of the Go language. Simply put, the Go language will provide a default value for some variables which have not been assigned a value yet.

For example, the following code:

package main

import (
    "fmt"
)

type task struct {
    name  string
    state int
}

func main() {
    t := task{}
    var list []byte
    var f float32
    var s string
    var m map[string]int

    fmt.Println(list, f, s, m)
    fmt.Printf("%+v", t)
}

/* Output:
[] 0  map[]
{name: state:0}
*/
Enter fullscreen mode Exit fullscreen mode

As you see, all variables have been assigned a default value, those default values are the zero value.

The zero value does bring convenience to developers in many cases, but there are also many people who don't like it who think that the existence of the zero value makes the code grammatically lax and brings some uncertainty. For example, the problem I will describe in detail later.

1.2 gob

gob is a standard library in encoding/gob. Gob is actually the abbreviation of go binary, so from its name we can also guess that gob should be related to binary.

In fact, gob is a unique format of Golang for serializing and deserializing program data in binary form, similar to pickle in Python. Its most common usage is to serialize an object (structure) and store it in a disk file, then read the file and deserialize it when needed to achieve object persistence.

I won’t give examples, and this article is not a special topic on the use of gob. This is its Official Document. Readers who are not familiar with gob usage can take a look at the Example part of the document, or just look at the codes I use to describe the problem later.

2 Problem

2.1 Requirement

At the beginning of this article, I briefly described the origin of the problem. Here I use a simpler model to expand the description.

First we define a structure named task:

type task struct {
    // Just as json library, the fields must be public could be serialized
    Name  string
    State int // Task state, it's only 2 states for simplistic, 1: running, 0: stopped
}
Enter fullscreen mode Exit fullscreen mode

Around this structure, we will enter several tasks data, each task is a task object. But for some reasons, we must use gob to persist these task information to the local disk instead of using a database like MySQL.

Next, we have such a requirement:

Iterate and deserialize the locally stored gob files, then count how many tasks are running, and how many of them are stopped.

2.2 Code

Based on the above requirements and background, the code is as follows:

  • defines.go
package main

// Directory where .gob files saved
const DIR = "./tasks"

type task struct {
    // Just as json library, the fields must be public could be serialized
    ID    int
    Name  string
    State int // Task state, it's only 2 states for simplistic, 1: running, 0: stopped
}

// Test tasks which will be serialized and saved into .gob files.
// We define 2 running tasks and 3 stopped tasks here.
var tasks = []task{
    {0, "StoppedTask1", 0},
    {1, "RunningTask1", 1},
    {2, "RunningTask2", 1},
    {3, "StoppedTask2", 0},
    {4, "StoppedTask3", 0},
}
Enter fullscreen mode Exit fullscreen mode
  • serializer.go
package main

import (
    "bytes"
    "encoding/gob"
    "fmt"
    "os"
    "path/filepath"
)

// serialize serializes person objects and save to files,
// the files are ./persons/${t.Name}.gob
func serialize(t task) {
    filename := filepath.Join(DIR, fmt.Sprintf("%s.gob", t.Name))
    buffer := new(bytes.Buffer)
    encoder := gob.NewEncoder(buffer)
    _ = encoder.Encode(t)
    _ = os.WriteFile(filename, buffer.Bytes(), 0o644)
}

// deserialize deserializes .gob files to task objects
func deserialize(path string, t *task) {
    raw, _ := os.ReadFile(path)
    buffer := bytes.NewBuffer(raw)
    decoder := gob.NewDecoder(buffer)
    _ = decoder.Decode(t)
}
Enter fullscreen mode Exit fullscreen mode
  • main.go
package main

import (
    "fmt"
    "path/filepath"
)

func main() {
    storeTasks()
    countStates()
}

func storeTasks() {
    for _, p := range tasks {
        serialize(p)
    }
}

func countStates() {
    counter := make(map[int]int)
    // Use a temporary pointer as a carrier of objects in the file
    // to save the cost of creating new objects.
    tmpT := &task{}
    for _, t := range tasks {
        filename := filepath.Join(DIR, fmt.Sprintf("%s.gob", t.Name))
        // Deserialize task into tmpT
        deserialize(filename, tmpT)
        // Count the state
        counter[tmpT.State]++
    }
    fmt.Printf("Stopped: %+v, Running: %+v\n", counter[0], counter[1])
}
Enter fullscreen mode Exit fullscreen mode

After running code, I got the result below:

Stopped: 1, Running: 4
Enter fullscreen mode Exit fullscreen mode

What? 1 stopped task and 4 running tasks? Bug occurred. The result is different with our test data: 3 stopped tasks and 2 running tasks. So, what went wrong?

2.3 Debug

Let’s add a print statement line in for loop of main.go/countStates() function to see what we could get:

func countStates() {
    // ...
    for _, t := range tasks {
        // ...
        deserialize(filename, tmpT)
        // Add this line
        fmt.Printf("%+v\n", *tmpT)
        // ...
    }
    // ...
}

Output:
{ID:0 Name:StoppedTask1 State:0}
{ID:1 Name:RunningTask1 State:1}
{ID:2 Name:RunningTask2 State:1}
{ID:3 Name:StoppedTask2 State:1}
{ID:4 Name:StoppedTask3 State:1}
Enter fullscreen mode Exit fullscreen mode

Oh man, StoppedTask2 and StoppedTask3 are both running! But the amazing thing is that except for State, all other data are valid! Seeing this result, if you often deal with configuration files such as JSON and Yml just like me, you may take it for granted that the above gob files are read correctly, and there should be a storage problem.

But gob files are binary files, and it is difficult for us to verify them with eyes like JSON files. Even if you use tools like xxd or something, you can only get such ambiguous output:

> xxd .\tasks\RunningTask1.gob
00000000: 2b7f 0301 0104 7461 736b 01ff 8000 0103  +.....task......
00000010: 0102 4944 0104 0001 044e 616d 6501 0c00  ..ID.....Name...
00000020: 0105 5374 6174 6501 0400 0000 15ff 8001  ..State.........
00000030: 0201 0c52 756e 6e69 6e67 5461 736b 3101  ...RunningTask1.
00000040: 0200                                     ..

> xxd .\tasks\StoppedTask2.gob
00000000: 2b7f 0301 0104 7461 736b 01ff 8000 0103  +.....task......
00000010: 0102 4944 0104 0001 044e 616d 6501 0c00  ..ID.....Name...
00000020: 0105 5374 6174 6501 0400 0000 13ff 8001  ..State.........
00000030: 0601 0c53 746f 7070 6564 5461 736b 3200  ...StoppedTask2.
Enter fullscreen mode Exit fullscreen mode

Maybe we can try to forcely parse these binary files to compare the differences between them; or deserialize two identical objects except State into gob files, and then compare them. If you are interested, you can try it. At that time, due to time constraints and other reasons, we did not try this approach. We continue to modify the data to test.

2.4 Find patterns

Observing the results of the previous error, we found that the error data were all StoppedTask. Programmer's intuition tells me that this may not be a coincidence. So I tried to change the order of the data, completely separate RunningTasks and StoppedTasks, and then tested:

// First group, stopped first.
var tasks = []task{
    {0, "StoppedTask1", 0},
    {3, "StoppedTask2", 0},
    {4, "StoppedTask3", 0},
    {1, "RunningTask1", 1},
    {2, "RunningTask2", 1},
}

// Output
{ID:0 Name:StoppedTask1 State:0}
{ID:3 Name:StoppedTask2 State:0}
{ID:4 Name:StoppedTask3 State:0}
{ID:1 Name:RunningTask1 State:1}
{ID:2 Name:RunningTask2 State:1}
Stopped: 3, Running: 2
Enter fullscreen mode Exit fullscreen mode
// Second group, running first.
var tasks = []task{
    {1, "RunningTask1", 1},
    {2, "RunningTask2", 1},
    {0, "StoppedTask1", 0},
    {3, "StoppedTask2", 0},
    {4, "StoppedTask3", 0},
}

// Output
{ID:1 Name:RunningTask1 State:1}
{ID:2 Name:RunningTask2 State:1}
{ID:2 Name:StoppedTask1 State:1}
{ID:3 Name:StoppedTask2 State:1}
{ID:4 Name:StoppedTask3 State:1}
Stopped: 0, Running: 5
Enter fullscreen mode Exit fullscreen mode

The weird phenomenon appears, when StoppedTask first, the result is all right; when RunningTask first, RunningTasks are correct, StoppedTasks are all wrong, and even the ID of StoppedTask1, which is originally 0, has become 2 here!

After repeated testing and observation of the result set, we had come to the conclusion that all the RunningTasks data is fine, and all the problems are with the StoppedTasks data!

To further formalize this conclusion: If the preceding data is a non-zero number, and the following data is a zero, then the following zero will be overwritten by the non-zero in front of it.

3 Explanation

Reviewing the program code again, I noticed this sentence:

// Use a temporary pointer as a carrier of objects in the file
// to save the cost of creating new objects.
tmpT := &task{}
Enter fullscreen mode Exit fullscreen mode

In order to save the extra overhead of creating a new object, I used the same variable to recursively load the data in the file.

Combined with the bug pattern we found earlier, the answer seems to be close at hand: the so-called 0's are being overwritten by non-zero's in front of them, most likely because the same object is being used to load the file, which results in residual data in front of it.

The validation is also simple, just put that public object into the following for loop so that each loop recreates a new object for loading the file data to cut off the effects of the previous data.

Let's modify the code:

for _, t := range tasks {
    tmpT := &task{}
    // ...skip
}

// Output
{ID:1 Name:RunningTask1 State:1}
{ID:2 Name:RunningTask2 State:1}
{ID:0 Name:StoppedTask1 State:0}
{ID:3 Name:StoppedTask2 State:0}
{ID:4 Name:StoppedTask3 State:0}
Stopped: 3, Running: 2
Enter fullscreen mode Exit fullscreen mode

Yes! It’s right!

The result is indeed as we thought, it is the reason of data residuals. But here's another question: Why is it that the old method reads everything correctly when there are 0s and then non-0s (Stopped first then Running tasks)? And why are all other numbers (e.g. non-zero IDs) not affected, except for the 0's?

All questions now seem to point to the special number 0!

It was only at this point that the zero-value property finally caught up with us. So I rushed to read the official documentation of the gob library and found this sentence:

If a field has the zero value for its type (except for arrays; see above), it is omitted from the transmission.

The words before and after this sentence are talking about struct, so field here also refers to a field in a structure, in line with the example in our text.

Based on the conclusions we got earlier, and the official documentation, we can now finally come to a complete conclusion:

The gob library ignores zero values outside of arrays when manipulating data. Whereas our code starts out using a public object to load the file data, since zero values are not transmitted, the fields in the original data that are zero-valued are not read, and we actually see the data from the last non-zero-valued object.

For the example in this article, the solution is also very simple, is what I did above, do not use a public object to load data.

4 Recall

In the project bug I described at the beginning of this article, I also used 0 and 1 to indicate the state of a timed task (stopped, running). As a result, different tasks were interfering with each other due to the zero value issue, which caused the task to execute abnormally, while the other fields that did not involve the zero value were fine. Although this is an online production environment, fortunately, the problem was found early and handled in time, and did not cause any production accidents. However, the whole process and the final answer were deeply imprinted in my mind.

Later I briefly discussed with my colleague why gob chose to ignore the zero value. From my perspective, it was probably to save space. The code we wrote at the beginning also created a public object to save space, and as a result, the two space-saving logics eventually collided to create a hidden bug.

Top comments (4)

Collapse
 
ccoveille profile image
Christophe Colombier • Edited

I will have to read it again, because I don't get what you are trying to explain or solve.

And how it could help other people.

I mean from first reading I get the beware of default value in Go, which is a legitimate piece of advice.

But then you lost me, because I don't get how gob could help. I mean, I think you mention something like default value should be handled with cares when using gob. I don't get if your article is about advising to use gob to solve issues, or to be aware default values could lead to problems when using gob.

Said otherwise, I'm unsure if you are trying to solve issues when using gob, or by using gob

Collapse
 
brucedu521 profile image
Bruce Du

Thank you for reading and commenting on the article!

It's my fault for not making the point I was trying to make clear.
And yes, you're right about part of the point I was trying to make. In the article, I encountered a bug and discovered that it was caused by a combination of gob and zero value. My article wants to tell readers to be careful about the impact of zero value ​​on data when using gob.

For the case in this article, repeatedly deserializing gob data with a single variable may cause the data to be contaminated in the process. Therefore, every time you deserialize gob data, you should use a new variable to load it, so as to solve the problem of dirty data caused by zero value.

I should have written these conclusions in the article to inform the reader, but I did not do so. This was my mistake.

Thanks for pointing out the issues in my article!

Collapse
 
ccoveille profile image
Christophe Colombier

Thanks for the explanation. It's clearer now.

My surprise was more that I came to the conclusion, you had a issue with zero values when using gob. So something that might affect only a few person, even if the introduction about zero values is general for Go.

It's a common issue when dealing with PATCH (partial update) with JSON, fields provided without values are reset to zero values. The classic solution is to use pointers for the fields. Or simply using a PUT (total update)

Would it solve your gob issues?

Thread Thread
 
brucedu521 profile image
Bruce Du

Yes, it solved my problems, thank you! And the approach I made is simply using total update just as you say. It worked well.

Using pointers is indeed a good approach, but I didn't do it. This is mainly out of my personal habits. For the fields of structures, if they are basic types such as string or int, I generally don't like to use pointers😂.