Golang mem model 与内存屏障
前几天有小伙伴读 go101 Memory Order Guarantee 问了我这个问题: 红框中的两句话是不是矛盾的?
我们可以看看官方关于 Go Memory Model 中对应的描述:
- A send on a channel happens before the corresponding receive from that channel completes.
- The closing of a channel happens before a receive that returns a zero value because the channel is closed.
- A receive from an unbuffered channel happens before the send on that channel completes.
- The kth receive on a channel with capacity C happens before the k+Cth send from that channel completes.
除去关闭 channel 的情况后,我们可以结合官方的解释将上面 go101 描述的三种情况分为下面两个时序图:
- unbuffered:
- buffered
为什么要这么做?
Many compilers (at compile time) and CPU processors (at run time) often make some optimizations by adjusting the instruction orders, so that the instruction execution orders may differ from the orders presented in code. Instruction ordering is also often called memory ordering.
我们先来看看 go101 中提到的这段 ”unprofessional“ 代码
package main
import "log"
import "runtime"
var a string
var done bool
func setup() {
a = "hello, world"
done = true
if done {
log.Println(len(a)) // always 12 once printed
}
}
func main() {
go setup()
for !done {
runtime.Gosched()
}
log.Println(a) // expected to print: hello, world
}
上面这段代码中有两种可能引起乱序的原因
setup()
可能被编译器编译或者CPU执行为func setup() { done = true // done assgined at first a = "hello, world" if done { log.Println(len(a)) // always 12 } }
- 即使指令没有被重新排序,CPU 有 Cache 一致性协议 MESI 的情况下,执行仍然可能乱序。(详见这里)
内存屏障
硬件大佬给出了能同时解决上面两个乱序问题的方式:内存屏障(fence/barrier)
x86下,内存屏障在内核中的定义:这里可以看到32位和64位使用的指令是不同的
/*
* Force strict CPU ordering.
* And yes, this is required on UP too when we're talking
* to devices.
*/
#ifdef CONFIG_X86_32
/*
* Some non-Intel clones support out of order store. wmb() ceases to be a
* nop for these.
*/
#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2)
#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)
#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)
#else
#define mb() asm volatile("mfence":::"memory")
#define rmb() asm volatile("lfence":::"memory")
#define wmb() asm volatile("sfence" ::: "memory")
#endif
内存屏障如何解决上述问题
1. 指令重排
简单来说,屏障的作用正如其名:防止屏障前后的指令重排。下表是几种屏障的具体作用,具体细节可以参考 Linux Kernel Development 3rd Edition - Chapter 10 - Ordering and Barriers
Memory and Complier Barrier Methods
Barrier | Description |
---|---|
rmb() | Prevents loads from being reordered across the barrier |
read_barrier_depends() | Prevents data-depends() loads from being reordered across the barrier |
wmb() | Prevents stores from being reordered across the barrier |
mb() | Prevents stores and loads being reordered across the barrier |
2. MESI乱序
在这篇文章中提到了屏障保证MESI不乱序的具体生效方式,这里为了容易理解画了张写屏障简单的时序图:
全文最重要的结论点来了:内存屏障解决了程序并发时的两个问题:1.阻止指令重排 2.MESI顺序保证
和 golang 有什么关系
其实我们可以看到,golang channel 中用到的 lock()
方法,其实调用的是 plan9 中的 lock 指令,具体翻译为 x86 指令也是一个 lock。lock 其实隐含了 屏障功能,lock 指令执行之前,会将未完成读写操作完成。
// runtime/chan.go
func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
if c == nil {
if !block {
return false
}
gopark(nil, nil, waitReasonChanSendNilChan, traceEvGoStop, 2)
throw("unreachable")
}
...........
lock(&c.lock) // <---------------------------- here
if c.closed != 0 {
unlock(&c.lock)
panic(plainError("send on closed channel"))
}
if sg := c.recvq.dequeue(); sg != nil {
// Found a waiting receiver. We pass the value we want to send
// directly to the receiver, bypassing the channel buffer (if any).
send(c, sg, ep, func() { unlock(&c.lock) }, 3)
return true
}
.................
}
//runtime internal atomic
TEXT runtime∕internal∕atomic·Cas64(SB), NOSPLIT, $0-25
MOVQ ptr+0(FP), BX
MOVQ old+8(FP), AX
MOVQ new+16(FP), CX
LOCK
CMPXCHGQ CX, 0(BX)
SETEQ ret+24(FP)
RET
TEXT runtime∕internal∕atomic·Casuintptr(SB), NOSPLIT, $0-25
JMP runtime∕internal∕atomic·Cas64(SB)
回头来看 golang 的设计
The design philosophy of Go is to use as fewer features as possible to support as more use cases as possible, at the same time to ensure a good enough overall code execution efficiency. So Go built-in and standard packages don't provide direct ways to use the CPU fence instructions. In fact, CPU fence instructions are used in implementing all kinds of synchronization techniques supported in Go. So, we should use these synchronization techniques to ensure expected code execution orders.
golang 并没有像 C++ 一样给出 LoadLoad、StoreLoad 或者是 直接调用汇编指令 之类的非常底层同步元语,而是封装了 channel 、atomic 等更高抽象程度的同步方法,来减轻程序员的负担。但是从原理上理解并发编程模型,能让我们写出健壮的代码。写到这里想到 golang 官方在内存模型这一章里写的一句话:
If you must read the rest of this document to understand the behavior of your program, you are being too clever.
Don't be clever.