Golang mem model 与内存屏障

前几天有小伙伴读 go101 Memory Order Guarantee 问了我这个问题: 红框中的两句话是不是矛盾的?

我们可以看看官方关于 Go Memory Model 中对应的描述:

  • A send on a channel happens before the corresponding receive from that channel completes.
  • The closing of a channel happens before a receive that returns a zero value because the channel is closed.
  • A receive from an unbuffered channel happens before the send on that channel completes.
  • The kth receive on a channel with capacity C happens before the k+Cth send from that channel completes.

除去关闭 channel 的情况后,我们可以结合官方的解释将上面 go101 描述的三种情况分为下面两个时序图:

  1. unbuffered:
  2. buffered

为什么要这么做?

Many compilers (at compile time) and CPU processors (at run time) often make some optimizations by adjusting the instruction orders, so that the instruction execution orders may differ from the orders presented in code. Instruction ordering is also often called memory ordering.

我们先来看看 go101 中提到的这段 ”unprofessional“ 代码

package main

import "log"
import "runtime"

var a string
var done bool

func setup() {
    a = "hello, world"
    done = true
    if done {
        log.Println(len(a)) // always 12 once printed
    }
}

func main() {
    go setup()

    for !done {
        runtime.Gosched()
    }
    log.Println(a) // expected to print: hello, world
}

上面这段代码中有两种可能引起乱序的原因

  1. setup() 可能被编译器编译或者CPU执行为

    func setup() {
        done = true // done assgined at first
        a = "hello, world"
        if done {
            log.Println(len(a)) // always 12
        }
    }
  2. 即使指令没有被重新排序,CPU 有 Cache 一致性协议 MESI 的情况下,执行仍然可能乱序。(详见这里

内存屏障

硬件大佬给出了能同时解决上面两个乱序问题的方式:内存屏障(fence/barrier)

x86下,内存屏障在内核中的定义:这里可以看到32位和64位使用的指令是不同的

/*
 * Force strict CPU ordering.
 * And yes, this is required on UP too when we're talking
 * to devices.
 */
#ifdef CONFIG_X86_32
/*
 * Some non-Intel clones support out of order store. wmb() ceases to be a
 * nop for these.
 */
#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2)
#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)
#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)
#else
#define mb()     asm volatile("mfence":::"memory")
#define rmb()    asm volatile("lfence":::"memory")
#define wmb()    asm volatile("sfence" ::: "memory")
#endif

内存屏障如何解决上述问题

1. 指令重排

简单来说,屏障的作用正如其名:防止屏障前后的指令重排。下表是几种屏障的具体作用,具体细节可以参考 Linux Kernel Development 3rd Edition - Chapter 10 - Ordering and Barriers

Memory and Complier Barrier Methods

BarrierDescription
rmb()Prevents loads from being reordered across the barrier
read_barrier_depends()Prevents data-depends() loads from being reordered across the barrier
wmb()Prevents stores from being reordered across the barrier
mb()Prevents stores and loads being reordered across the barrier

2. MESI乱序

这篇文章中提到了屏障保证MESI不乱序的具体生效方式,这里为了容易理解画了张写屏障简单的时序图:

全文最重要的结论点来了:内存屏障解决了程序并发时的两个问题:1.阻止指令重排 2.MESI顺序保证

和 golang 有什么关系

其实我们可以看到,golang channel 中用到的 lock() 方法,其实调用的是 plan9 中的 lock 指令,具体翻译为 x86 指令也是一个 lock。lock 其实隐含了 屏障功能,lock 指令执行之前,会将未完成读写操作完成。

// runtime/chan.go

func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
    if c == nil {
        if !block {
            return false
        }
        gopark(nil, nil, waitReasonChanSendNilChan, traceEvGoStop, 2)
        throw("unreachable")
    }

...........

    lock(&c.lock) // <---------------------------- here

    if c.closed != 0 {
        unlock(&c.lock)
        panic(plainError("send on closed channel"))
    }

    if sg := c.recvq.dequeue(); sg != nil {
        // Found a waiting receiver. We pass the value we want to send
        // directly to the receiver, bypassing the channel buffer (if any).
        send(c, sg, ep, func() { unlock(&c.lock) }, 3)
        return true
    }

.................    

}
//runtime internal atomic

TEXT runtime∕internal∕atomic·Cas64(SB), NOSPLIT, $0-25
    MOVQ    ptr+0(FP), BX
    MOVQ    old+8(FP), AX
    MOVQ    new+16(FP), CX
    LOCK
    CMPXCHGQ    CX, 0(BX)
    SETEQ    ret+24(FP)
    RET

TEXT runtime∕internal∕atomic·Casuintptr(SB), NOSPLIT, $0-25
    JMP    runtime∕internal∕atomic·Cas64(SB)

回头来看 golang 的设计

The design philosophy of Go is to use as fewer features as possible to support as more use cases as possible, at the same time to ensure a good enough overall code execution efficiency. So Go built-in and standard packages don't provide direct ways to use the CPU fence instructions. In fact, CPU fence instructions are used in implementing all kinds of synchronization techniques supported in Go. So, we should use these synchronization techniques to ensure expected code execution orders.

golang 并没有像 C++ 一样给出 LoadLoad、StoreLoad 或者是 直接调用汇编指令 之类的非常底层同步元语,而是封装了 channel 、atomic 等更高抽象程度的同步方法,来减轻程序员的负担。但是从原理上理解并发编程模型,能让我们写出健壮的代码。写到这里想到 golang 官方在内存模型这一章里写的一句话:

If you must read the rest of this document to understand the behavior of your program, you are being too clever.

Don't be clever.

推荐阅读

Memory Order Guarantees in Go

浅谈Memory Reordering

linux内核中的内存屏障_qb_2008的专栏-CSDN博客

Linux内核中的内存屏障(1)

谢宝友:深入理解 Linux RCU 从硬件说起之内存屏障

10 张图打开 CPU 缓存一致性的大门

Go 和 CPU 高速缓存:原理和应用

标签: none
返回文章列表 文章二维码
本页链接的二维码
打赏二维码
添加新评论