Profiling Skia with pprof
Skia binaries (like nanobench and dm) can be instrumented to produce CPU and Heap profiles
compatible with the pprof visualizer.
Prerequisites
# On Debian/Ubuntu:
$ sudo apt-get install libgoogle-perftools-dev
This provides libprofiler.so (for CPU) and libtcmalloc.so (for Heap).
Googlers already have the pprof analysis tool, but external users can do the following to install google-pprof (and may want to make an alias to call it pprof).
# On Debian/Ubuntu:
$ sudo apt-get install google-perftools
Terminology
When analyzing profiles, you will see two primary metrics:
- Flat: Time (or memory) spent strictly within that specific function. High flat time indicates a bottleneck in the function’s own logic (e.g., a heavy loop).
- Cumulative (cum): Total time spent (or memory allocated) in that function plus all functions it calls. High cumulative time with low flat time indicates a bottleneck in one of the function’s children.
Building with Profiling Support
To enable the profiling instrumentation, set skia_use_pprof=true in your args.gn. It may help to use -Og to get accurate line-level attribution without sacrificing the performance benefits of optimization.
# Example args.gn in out/Profile
is_debug = false
skia_use_pprof = true
extra_cflags = ["-Og"]
Then build your target:
$ ninja -C out/Profile nanobench
This links in the CPU instrumenter (which will stop the program repeatedly and note where the program was running, aggregating the samples into the profile) and heap instrumenter (which keeps track of all allocations and frees).
Creating Profiles in Nanobench
When built with skia_use_pprof, nanobench provides flags to enable the profiler(s) to produce output.
CPU Profiling
Use the --cpuprofile flag to specify the output filename. It is often useful to increase the duration of the run to get more samples.
$ ./out/Profile/nanobench --match <bench_name> --cpuprofile <output.prof> --ms 1000
Heap Profiling
Use the --memprofile flag to specify an output prefix. The heap profiler will produce snapshots as the program runs and at the end.
$ ./out/Profile/nanobench --match <bench_name> --memprofile <output.heap>
...
Dumping heap profile to output.heap.0001.heap
...
Dumping heap profile to output.heap.0002.heap
Analysis
Use the pprof tool to visualize the results.
Web Interface
Graph (using GraphViz)
The CPU graph shows how much time was spent with each function on the callstack. This can help identify potential bottlenecks.
$ pprof -web ./out/Profile/nanobench <output.prof>
The alloc_space heap graph shows how much memory was allocated on the heap by each function throughout the entire run (even if it was freed up). This can identify where excess memory was allocated.
$ pprof -alloc_space -web ./out/Profile/nanobench output.heap.0005.heap
Without -alloc_space, only live bytes will be shown (unfreed memory). You can use any of the heap files, but it’s probably most useful to see the latest one.
Annotated Source
pprof can show how much time was spent on individual lines of code, even breaking down the assembly instructions. Due to instruction re-ordering, this isn’t perfect (see Tips below). Large heap allocations can also muddy the performance blame.
$ pprof -weblist <function> ./out/Profile/nanobench <output.prof>
# Pick a function you want to zoom in. You can run the command w/o providing
# a function (or function regex), but it's quite noisy.
The -weblist works similarly for heap profiles. By using -alloc_space, you’ll see how much total memory was allocated for a given line.
$ pprof -alloc_space -weblist <function> ./out/Profile/nanobench output.heap.0005.heap
Flame Graphs
As an alternative view to the web graph, a flame graph can be shown. Googlers, this will be created and uploaded into an internal tool (which is easier to share with coworkers/bugs).
$ pprof -flame ./out/Profile/nanobench <output.prof>
$ pprof -alloc_space -flame ./out/Profile/nanobench output.heap.0005.heap
Command Line
If you don’t want to use the web UI, you can perform quick analysis directly in your terminal.
Top Functions
See where the most “flat” time is spent.
`$ pprof -top ./out/Profile/nanobench <output.prof>`
See which functions are responsible for the most allocations (total).
$ pprof -alloc_space -top ./out/Profile/nanobench output.heap.0005.heap
See which functions allocate the most objects (rather than bytes).
$ pprof -alloc_objects -top ./out/Profile/nanobench output.heap.0005.heap
Annotated Source
Print annotated source code for a specific function.
$ pprof -list <function_name> ./out/Profile/nanobench <output.prof>
$ pprof -alloc_space -list <function_name> ./out/Profile/nanobench output.heap.0005.heap
$ pprof -alloc_objects -list <function_name> ./out/Profile/nanobench output.heap.0005.heap
Comparing Profiles (Diffing)
Comparing two profiles is the best way to verify an optimization or find a memory leak. See the official docs for more on that.
Tips
- Instruction Drifting: If samples appear on the wrong line (e.g. an
ifstatement that should take zero time), it may be due to the compiler reordering instructions. Use-Ogto minimize this.