This post expands on the FOSDEM 2026 Go Devroom talk I co-presented with Hannah S. Kim. The talk, demo code, and all benchmark scenarios are available in the fosdem-2026 repository.
The Problem
Go is one of the best languages for building production backend services. It compiles to native binaries, has excellent concurrency primitives, and produces predictable performance characteristics. But when it comes to auto-instrumentation — adding observability without modifying source code — Go is uniquely difficult.
In the JVM world, bytecode manipulation gives you powerful hooks. Java agents can intercept method calls, inject tracing, and propagate context without the application developer knowing. Python and Node.js have similar dynamic capabilities. Go has none of this.
The reasons are structural:
- Static compilation. Go compiles to a single native binary. There is no intermediate bytecode to rewrite at load time, no classloader to intercept, no dynamic linking by default.
- No
LD_PRELOAD. Go’s default static linking means theLD_PRELOADtrick that works for C/C++ applications (and that the OTel Injector uses for Java, .NET, and Node.js) doesn’t apply. - Unique calling convention. Go’s ABI passes arguments in registers with a convention different from the platform C ABI. This makes dynamic hooking with tools like Frida or ptrace significantly harder — you can’t just read standard frame pointers.
- Goroutine stack management. Goroutines use segmented, growable stacks that the runtime can move at any time. Traditional stack-walking assumptions break.
The gap between “Go is great for production” and “Go is hard to auto-instrument” is real. This is the gap we set out to map.
The Comparison Framework
We built a demo repository with the same Go HTTP server implemented across seven scenarios, each using a different instrumentation approach. The application is deliberately simple — an HTTP server with configurable CPU load, memory allocation, and off-CPU time — so that instrumentation overhead is isolated and measurable.
The Seven Scenarios
| # | Scenario | Approach | What It Does |
|---|---|---|---|
| 1 | default | None | Baseline. No instrumentation of any kind. |
| 2 | manual | OTel SDK | Manual OpenTelemetry SDK integration — explicit tracer initialization, span creation via otelhttp, and context propagation. The “standard” way. |
| 3 | obi | eBPF (OBI) | OpenTelemetry eBPF Instrumentation. Network-level eBPF hooks. Runs as a sidecar, attaches to the running process. No code changes. |
| 4 | ebpf | eBPF (Auto) | OpenTelemetry Go Auto-Instrumentation. Uprobe-based eBPF hooks targeting Go runtime functions. No code changes. |
| 5 | orchestrion | Compile-time | Datadog Orchestrion with OTel SDK. AST transformation via -toolexec at compile time. Requires a rebuild but no source changes. |
| 6 | libstabst | USDT (salp) | USDT probes via salp/libstapsdt, consumed by a bpftrace sidecar that exports to OTLP. Proof of concept. |
| 7 | usdt | USDT (native) | Native USDT probes via a custom Go fork that adds probe points to net/http, database/sql, crypto/tls, and net. Proof of concept. |
Each scenario runs in Docker with an identical observability stack (OTel Collector, Jaeger, Prometheus) and is load-tested with identical parameters.
Evaluation Axes
We compared the approaches across three dimensions:
- Performance overhead — latency, CPU, memory (RSS), throughput
- Robustness — stability across Go versions, container environments, failure modes
- Operational friction — deployment complexity, privilege requirements, debugging
Manual OTel SDK (Baseline for Comparison)
The manual scenario is not auto-instrumentation — it is the standard way to instrument a Go service. You import the OTel SDK, initialize a tracer provider, wrap your HTTP handler with otelhttp.NewHandler, and create spans explicitly.
| |
This gives you full control — custom span attributes, context propagation, error recording. But it requires code changes in every service, and those changes accumulate. Multiply by a hundred microservices and you understand why auto-instrumentation matters.
Compile-Time: Orchestrion and OTel Compile-Time Instrumentation
Orchestrion uses Go’s -toolexec flag to intercept the compilation pipeline. During the AST transformation phase, it injects instrumentation code — adding OTel spans, wrapping handlers, propagating context — without the developer modifying source files.
| |
The mechanism is aspect-oriented: you declare join points (e.g., “any function in package main named LoadHandler”) and advice (e.g., “prepend a span creation statement”). The transformation happens at the AST level before the compiler emits machine code.
Orchestrion supports OpenTelemetry natively — it is not Datadog-specific. In January 2025, Datadog and Alibaba began merging their compile-time instrumentation efforts into a unified solution under the OpenTelemetry Compile-Time Instrumentation SIG.
Trade-offs:
- Requires a rebuild. You cannot instrument already-deployed binaries.
- Deepest instrumentation of all approaches — it can instrument stdlib and dependencies.
- Zero runtime overhead from the instrumentation mechanism itself (the injected OTel code has the same cost as manual instrumentation).
- Stable across Go versions (the toolexec interface is stable).
- No kernel privileges required.
For a deeper dive into the -toolexec mechanism, see my earlier Unleashing the Go Toolchain talk from GopherCon UK 2025.
eBPF Approaches
OBI (OpenTelemetry eBPF Instrumentation)
OBI takes a network-level approach. It uses eBPF programs to hook into kernel-level network operations, intercepting HTTP/S and gRPC traffic. It is multi-language — Go, Java, .NET, Python, Node.js, Ruby, Rust — because it operates at the protocol layer rather than the language runtime layer.
| |
OBI runs as a sidecar container. It attaches to the target process’s PID namespace and loads eBPF programs that intercept network system calls. No source code modification, no recompilation, no restart.
Trade-offs:
- Requires
CAP_SYS_ADMINor privileged containers. Security teams push back on this. - Limited to what eBPF can observe at the network level. Application-internal spans are not visible.
- Protocol coverage is growing: HTTP/S, gRPC, TLS visibility.
- Excellent for topology mapping and network observability beyond just tracing.
OTel Go Auto-Instrumentation
The OpenTelemetry Go Auto-Instrumentation project uses uprobe-based eBPF hooks that target specific Go runtime functions. Unlike OBI’s network-level approach, this hooks directly into Go function prologues.
This project is effectively in maintenance mode. Several of its contributors have moved to OBI. At OTel Unplugged EU 2026, the frank assessment was: the people moved to where the momentum is.
Runtime Injection: Frida and ptrace
The injector scenario explores dynamic instrumentation via Frida, a ptrace-based toolkit for runtime function hooking. The idea is conceptually simple: attach to a running process, find the function you want to hook, and replace its prologue with a trampoline that calls your instrumentation code.
| |
In practice, this is extremely hard for Go binaries. Quarkslab’s excellent three-part series documents the challenges in detail: Go’s register-based calling convention, goroutine stack relocation, and compiler optimizations (inlining, dead code elimination) all conspire against reliable dynamic hooking.
The demo’s injector scenario includes a helper tool that uses unsafe.Offsetof to find http.Request struct field offsets — information you need just to read the HTTP method and path from a hooked function’s arguments.
Trade-offs:
- Works with existing binaries. No rebuild required.
- Requires
-gcflags="all=-N -l"to disable optimizations, which defeats the purpose for production. - Fragile across Go versions — struct layouts and calling conventions change.
- Limited applicability for Go’s statically linked binaries.
For Go, this approach is more useful as a debugging tool than a production instrumentation strategy.
USDT Probes: The Novel Part
USDT (User Statically-Defined Tracing) probes are a mechanism from the DTrace/SystemTap ecosystem. They are marker points compiled into a binary that external tooling (bpftrace, perf, DTrace) can attach to at runtime. The key property: when no consumer is attached, the probe site is a NOP instruction with zero overhead.
We built two proof-of-concept implementations.
libstabst: USDT via salp and bpftrace
The libstabst scenario uses salp, a Go binding to libstapsdt, to create USDT probes at runtime. The application defines probe points for request start and end:
| |
A bpftrace sidecar attaches to these probes and exports events as OTLP traces via a custom exporter bridge.
Known limitations: The salp library has compatibility issues with Go 1.25+, pinning this scenario to Go 1.23.x. It also needs /proc/self/fd/ access for mmap, which fails in many container environments. On bare metal Linux or in a Lima VM, it works.

Native USDT: Custom Go Fork
The more ambitious PoC is a custom Go fork that adds USDT probe points directly to the Go standard library — net/http, database/sql, crypto/tls, and net.
| |
The fork includes a go tool usdt command for listing probes in a binary and generating bpftrace scripts:
| |
This PoC proves that native USDT support in Go is technically feasible. The standard library instrumentation is automatically available in any binary built with the fork — no application code changes, no SDK imports.
Known limitations: ARM64 argument parsing in bpftrace has issues with the probe argument notation emitted by the fork. The fork is strictly a proof of concept and not suitable for production.
Go Runtime PoCs: Flight Recording
Beyond USDT, we explored a flight recorder PoC based on golang/go#63185. The concept: always-on distributed tracing built into the Go runtime, with a bounded ring buffer and GODEBUG-based activation.
| |
The flight recorder PoC watches for trace files produced by the runtime, converts them to OTLP spans, and exports to a collector. If Go’s runtime trace facilities eventually gain W3C Trace Context propagation, this could become the lowest-friction instrumentation path for Go — no SDK, no eBPF, no compile-time tools. Just the runtime doing what runtimes should do.
Benchmark Results
We ran each scenario under identical load conditions using a Docker-based observability stack with 5-minute sustained load tests.
| Approach | CPU | Memory (RSS) | Max Latency | Max Throughput |
|---|---|---|---|---|
| Baseline (no instrumentation) | 10.2% | 202 MiB | 4.50 ms | 3.1k req/sec |
| Manual OTel SDK | 10.3% (+0.1%) | 210 MiB (+8 MiB) | 3.02 ms | 13.97k req/sec |
| eBPF Auto-Instrumentation | 10.0% (-0.3%) | 204 MiB (+2 MiB) | 3.07 ms | 4.57k req/sec |
| Compile-time (Orchestrion) | 9.8% (-0.4%) | 210 MiB (+8 MiB) | 2.59 ms | 27.8k req/sec |
A few things stand out. The CPU and memory overhead across all approaches is negligible for this workload. The throughput differences are more interesting — Orchestrion’s compile-time approach achieved the highest throughput, likely because the OTel code injected at compile time benefits from the same optimizations as the rest of the application. The eBPF approach showed lower throughput, consistent with the overhead of crossing the kernel boundary for each intercepted call.
The USDT scenarios (libstabst and usdt) are not included in the table because they are proof-of-concept implementations with different exporter architectures. The core property of USDT — zero overhead when probes are not attached — was confirmed, but end-to-end benchmarking against the other approaches requires further work.
Full benchmark data and reproduction instructions are in the demo repository.
The Ecosystem Picture
The most grounded take from the OTel Unplugged EU 2026 OBI/eBPF session:
“Document the trade-offs between OBI, compile-time, injector, and SDK. Let people choose. Make them aware of each other and let them work together.”
These approaches are not competing. They serve different deployment scenarios and can coexist.
| Approach | Mechanism | Best For | Limitation |
|---|---|---|---|
| Compile-time (Orchestrion) | AST transformation via -toolexec | Deepest instrumentation, security-sensitive environments | Requires rebuild |
| eBPF/OBI | Kernel-level network hooks | Runtime flexibility, multi-language, no restart | Needs kernel privileges |
| eBPF Auto | Uprobe hooks on Go functions | Go-specific deep tracing without code changes | Maintenance mode, fragile across Go versions |
| Injector/SSI | K8s operator + LD_PRELOAD | Lowest friction onboarding | Does not work for Go’s static binaries |
| USDT | Compiled probe points + bpftrace | Zero overhead when not tracing, future potential | Proof of concept, ecosystem immaturity |
The vision articulated at OTel Unplugged — apt install opentelemetry and everything works — requires all these layers coordinating. OBI detecting the Injector and backing off. Compile-time instrumentation detecting existing SDK usage. USDT probes coexisting with eBPF hooks. We are not there yet, but the direction is clear.
Future Directions
Several threads from the talk and surrounding conversations point forward:
OTel Compile-Time SIG. The merger between Datadog’s Orchestrion and Alibaba’s compile-time instrumentation under the OpenTelemetry umbrella is the most significant near-term development. A vendor-neutral, community-maintained compile-time instrumentation tool for Go would change the adoption curve.
W3C context propagation in runtimes. If language runtimes and compilers understand trace context natively, the instrumentation story simplifies fundamentally. This was a recurring theme at OTel Unplugged.
eBPF Tokens. BPF Tokens could significantly reduce the privilege requirements for eBPF-based instrumentation. Instead of
CAP_SYS_ADMIN, a token-based trust model would lower the bar for security teams.Native USDT in Go. The PoC fork demonstrates feasibility. Whether the Go team would accept USDT probes into the standard library is an open question, but the pattern exists in other ecosystems — Postgres, MySQL, and the JVM all have static tracepoints behind flags.
Flight recording. The
golang/go#63185proposal for always-on flight recording in the Go runtime could eventually provide the foundation for zero-touch distributed tracing without any external tooling.
Closing
The instrumentation tax is real and unavoidable. The question is not whether to pay it, but how to manage it. For Go, the answer is increasingly “you have options” — and those options are getting better.
The slides are available as Markdown in the repository. The demo code, Docker setup, and benchmark scripts are all in the fosdem-2026 repository. The recording is on YouTube.
If you want to get involved: the OTel Compile-Time Instrumentation SIG, OBI, and OTel Go repositories all accept contributions. The #otel-go and #otel-ebpf-sig channels on CNCF Slack are where the discussions happen.
See also: OTel Unplugged EU 2026 field notes for the broader ecosystem context.
