Beware of locking in Go concurrent programming
Go has provided very nice primitives and built-in features which made multicore and multi-processor programming easy. One of the nice things about the Go features is to encourage an approach in which shared values are passed around on channels to different goroutines, so that we can avoid the subtle difficulty of implementing correct access to shared variables through mutex, critical section, etc, the normal mechanism used in concurrent programming. If the variables and objects are managed correctly, this design avoids data races and the program will run faster too.
However, this is not magical. As programmer, you still need to be careful about shared variables. On system with multicore or multi-processor, we want to take advantage of all available CPUs, and keep them busy. To keep the CPU busy and do the job, you want to make sure that they spend as much time as possible doing the computation, and don’t waste time coordinating with each other. Therefore, keeping an eye on your variables, objects, instances, etc, are very important.
In the following example, we are generating a large number of random numbers, then add them up to get the total sum. This is an associative computation, and very easy to parallelize. The following code splits the task into smaller portions and run them on different CPUs.
type Vector []int64
func (v *Vector) doSum(i, n int, c chan int64) {
var sum int64 = 0
for ; i < n; i++ {
sum += (*v)[i]
}
c <- sum
}
func (v *Vector) sum(numCpu int, c chan int64) int64 {
for i := 0; i < numCpu; i++ {
go v.doSum(i*len(*v)/numCpu, (i+1)*len(*v)/numCpu, c)
}
var total int64 = 0
for i := 0; i < numCpu; i++ {
total += <- c
}
return total
}
func (v *Vector) genRand(i, n int, c chan int64) {
for ; i < n; i++ {
(*v)[i] = int64(rand.Intn(10) + 1)
}
c <- 0
}
func (v *Vector) gen(numCpu int, c chan int64) {
for i := 0; i < numCpu; i++ {
go v.genRand(i*len(*v)/numCpu, (i+1)*len(*v)/numCpu, c)
}
for i := 0; i < numCpu; i++ {
<- c
}
}
func test1(num int) {
fmt.Printf("test1:\n")
numCpu := runtime.NumCPU()
//fmt.Printf("Available cpu: %d\n", numCpu)
runtime.GOMAXPROCS(numCpu)
t1 := time.Now()
var c = make(chan int64, numCpu)
var v = make(Vector, num)
v.gen(numCpu, c)
total := v.sum(numCpu, c)
d := time.Since(t1)
fmt.Printf("Total time: %s\n", d.String())
fmt.Printf("Total: %d\n", total)
}
The code above is quite easy. It gets the number of CPUs available on the system, then split the task of generating random numbers to be run on all CPUs. Summing all the numbers follows the same pattern. The task is also split into sub-tasks to all available CPUs, the result of the sub-tasks are finally summed together to get the final total.
The following code will do everything on one single CPU:
func (v *Vector) sum2() int64 {
n := len((*v))
var total int64 = 0
for i := 0; i < n; i++ {
total += (*v)[i]
}
return total
}
func test2(num int) {
fmt.Printf("test2:\n")
fmt.Printf("With 1 cpu:\n")
t1 := time.Now()
var v = make(Vector, num)
for i := 0; i < num; i++ {
v[i] = int64(rand.Intn(10) + 1)
}
total := v.sum2()
d := time.Since(t1)
fmt.Printf("Total time: %s\n", d.String())
fmt.Printf("Total: %d\n", total)
}
This is just traditional sequential programming. Everything looks innocent enough. On a duo-core system, we would expect that the first test, running on duo-core, would be faster. Here are the results:
xp@shanghai:~/workspace-xp/go2$ bin/sum 10000000 Number of integers: 10000000 test1: Total time: 3.549867s Total: 55011331 test2: With 1 cpu: Total time: 1.497247s Total: 54987668 xp@shanghai:~/workspace-xp/go2$ bin/sum 100000000 Number of integers: 100000000 test1: Total time: 31.740154s Total: 550032946 test2: With 1 cpu: Total time: 15.519359s Total: 549971881
Surprise, the test on duo-core takes much longer. Running on duo-core takes twice as much time to finish the task than running on one core. What is going on?
Everything seems innocent. The only place where we used an external component is the random number generator, which is the math/rand package. Looking into the code of the random number generator, we found that it is re-entrant, and used an internal mutex to lock the shared source. As we generate random numbers on more than one core, the two cores are trying to access the same source for generating random number, and suffer a costly overhead.
To achieve the result of real concurrent programming, we have to make sure that the goroutines do not compete for resources, and thus, eliminate the contention overhead. We just need to make a minor modification to the code for generating numbers:
func (v *Vector) genRand(i, n int, c chan int64) {
r := rand.New(rand.NewSource(int64(42)))
for ; i < n; i++ {
(*v)[i] = int64(r.Intn(10) + 1)
}
c <- 0
What we are doing here is very simple. We make sure that each goroutine has its own instance of random number generator, so that they don’t compete for the same source. And here are the results:
xp@shanghai:~/workspace-xp/go2$ bin/sum 100000000 Number of integers: 100000000 test1: Total time: 6.855957s Total: 549996332 test2: With 1 cpu: Total time: 17.449164s Total: 550032946
As you can see, the results are totally different. Now we achieve much better result on duo-core than on one single core. The goroutines work on their task totally separately, and there’s no overhead, and we get the results we expected.
Despite the nice feature provided by Go, concurrent programming still needs careful planning, and a good pair of eyes for details.
Here is the code to the program.