fgprof logo

fgprof

Unified wall‑clock profiler for Go applications with mixed I/O and CPU

fgprof samples all goroutine stacks to show both On‑CPU and Off‑CPU time, letting Go developers pinpoint bottlenecks in programs that combine computation and I/O.

Overview

Overview

fgprof is a sampling profiler for Go that captures the full picture of program execution. By periodically invoking runtime.GoroutineProfile, it records the call stacks of every goroutine, regardless of whether the goroutine is currently on‑CPU or blocked on I/O, channels, locks, etc. The collected data can be exported in the standard go tool pprof format or as folded stacks for Brendan Gregg’s FlameGraph utilities, giving developers a single source of truth for wall‑clock performance.

Who it’s for & How to use it

The tool is aimed at Go engineers who need to understand mixed workloads—services that spend significant time waiting on network or disk while also performing CPU‑intensive work. Integration is straightforward: import the package, register fgprof.Handler() on an HTTP mux, and query the /debug/fgprof endpoint with a seconds parameter. The resulting profile can be visualized with go tool pprof or converted to a flame graph for deeper analysis. Overhead is modest for programs with fewer than 1 000 active goroutines, but may become noticeable above 10 k goroutines.

Deployment considerations

fgprof requires Go 1.19 or newer; older versions can introduce stop‑the‑world pauses in highly concurrent programs. It runs as a background goroutine that wakes ~99 times per second, so it coexists with Go’s built‑in CPU profiler and net/http/pprof without needing to choose between them.

Highlights

Samples all goroutine stacks, capturing both CPU and I/O wait time
Exports to standard pprof format and folded stacks for FlameGraph
Runs alongside Go's built‑in profilers via a simple HTTP handler
Minimal code footprint (< 100 lines) and easy to integrate

Pros

  • Provides a unified view of on‑CPU and off‑CPU activity
  • Works with existing `go tool pprof` visualizer
  • Low overhead for typical Go services (< 1 k goroutines)
  • No additional build steps; just import and register the handler

Considerations

  • Overhead grows with the number of active goroutines, noticeable > 10 k
  • Requires Go 1.19 or newer; older versions may cause STW pauses
  • Only sampling, not deterministic tracing, so very short spikes may be missed
  • Limited to wall‑clock profiling; does not replace detailed tracing tools

Managed products teams compare with

When teams consider fgprof, these hosted platforms usually appear on the same shortlist.

Blackfire Continuous Profiler logo

Blackfire Continuous Profiler

Low-overhead continuous profiling for app performance optimization.

Datadog Continuous Profiler logo

Datadog Continuous Profiler

Always-on code profiling to cut latency and cloud costs.

Elastic Universal Profiling logo

Elastic Universal Profiling

Whole-system, always-on profiling with no instrumentation.

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Microservices that combine network I/O with CPU‑bound processing
  • Debugging performance regressions where builtin CPU profiler is misleading
  • Teams that already use `go tool pprof` and want richer data without new tooling
  • Applications with moderate goroutine counts (< 5 k) where overhead is acceptable

Not ideal when

  • Programs with tens of thousands of goroutines where profiling overhead is critical
  • Environments locked to Go versions older than 1.19
  • Scenarios requiring fine‑grained event tracing rather than sampling
  • Non‑Go workloads where a language‑specific profiler is needed

How teams use it

Identify hidden I/O latency in a web service

Shows that a slow network request dominates wall‑clock time, guiding developers to add caching or retry logic

Validate CPU‑bound optimization

Confirms that a refactored algorithm reduces on‑CPU time without increasing blocking time

Compare mixed workload across releases

Provides comparable flame graphs to track how changes affect both computation and waiting periods

Integrate profiling into CI pipelines

Automates collection of short‑duration profiles to catch regressions before deployment

Tech snapshot

Go100%

Tags

performance-analysisgoperformanceprofiling-libraryprofilinggolang

Frequently asked questions

Do I need to replace the built‑in CPU profiler with fgprof?

No. fgprof can run alongside the built‑in profiler, giving you both views simultaneously.

What Go version is required?

Go 1.19 or newer; older versions may cause significant stop‑the‑world pauses with many goroutines.

How is the profiling data accessed?

By sending an HTTP request to `/debug/fgprof?seconds=N` and then using `go tool pprof` or converting to folded stacks.

Can fgprof profile short‑lived functions?

As a sampling profiler, it may miss very brief spikes; for deterministic tracing use Go's trace tools.

Is there any runtime impact on production services?

Impact is modest for typical services (< 1 k goroutines) but grows with goroutine count; evaluate in staging if you have many goroutines.

Project at a glance

Stable
Stars
3,087
Watchers
3,087
Forks
98
LicenseMIT
Repo age5 years old
Last commit9 months ago
Primary languageGo

Last synced yesterday