created: 2025-02-22 22:16
modified: 2025-03-12 22:19
type: note
status: active
tags:
- documentation
aliases: []
related: []
impact: medium
stage: raw
publish: true
I have been hitting this problem as well: almost every time the runtime decides to shrink or expand the goroutine stack, if there is a uretprobe placed the process will crash because the stack is messed up.
For Golang, the solution I'm actually experimenting with is to "simulate" a uretprobe by using a series of uprobes. In particular:
- Given a Golang binary, parse the ELF symbol table and obtain the address of the symbol we want to trace. If needed, attach a uprobe at such address, for example if we want to calculate the timestamp at which the function began executing for latency purposes (be careful to obtain a unique id such as the goroutine id if this information gets saved into a map, don't use the tid).
- Instead of attaching a uretprobe to the symbol address, read the ELF text section starting at that address and start decoding instructions until the end of the symbol is reached. While scanning, place a uprobe at every instruction that returns from the procedure (e.g. for x86-64 that would be RETN instructions, opcodes 0xC2 and 0xC3). For the symbols I'm interested in, there's always a very manageable number of RETN instructions, in the 1-5 range, so that's reasonable.
- When one of the uprobes installed at the previous point triggers, it's effectively as if we were executing a uretprobe, except that we haven't messed with the stack and so the solution is robust enough to not crash when the Go runtime moves the stack (or so it seems). Also, since the uprobe is placed right before the RET instruction, the stack pointer is already conveniently placed at the beginning of the frame, so we have easy access to the input arguments and return values at a constant offset, since they are all stored on the stack in Go.
Also, this approach has some mild performance benefits, since we avoid the uretprobe overhead. By tracing a tight loop of a simple function from libc:Non-instrumented function call: 2 ns/call Instrumented function call with uretprobe: 4 us/call Instrumented function call with 2 uprobes (at enter + RET instructions): 3 us/call
The drawback is that we now have to decode instructions in userspace, so it's significantly more annoying that the standard alternative, and it's not currently possible with bcc (I'm working using uprobes in a separate project so I have my own BPF loader).
I would appreciate some feedback on this approach, I am not an expert on this matter and would like to know if I'm missing something important.
Thanks
A uretprobe patches a function in a similar way at entry, but it will modify the return address on the stack to a trampoline function. Once hit, the EBPF program is executed and the instruction pointer is modified to the original return address again. If the stack changes, this will likely cause corruption and crashes.
uretprobes should not be used with Go programs.
xref: Trampoline Functions