Linux User- and Kernel- Probes - uprobes and kprobes - are good examples of technologies that seem like magic for those who don’t know how they work. Introduced to Linux a while ago (Kernels 3.5 and 2.6.9 respectively), both remain not known widely, partly, due to specialized nature, partly - because both aren’t too easy to use. Let’s try to figure out what they are, and how they work together, based on a number of papers and some Kernel documentation. We will start with a short example of how those technologies can be used via modern tooling and later dive deep into the implementation details hidden behind the command line facade.

KProbes and UProbes

Both kprobes and uprobes attach a probe - a breakpoint and a handler function - to code which is, or will be executing, in kernel- (that’s what “k” in kprobes stands for) or in user-space. During execution, probe gets access to the CPU registers, kernel and user stacks (if kernel is running in a process context) and memory, which allows handler to read and report (and sometimes - modify) the data available.

Probes allow to dynamically and non-intrusively instrument kernel and user code and are often used by:

  • Kernel developers for debugging and fault injection on a running kernel.
  • System engineers and software developers to collect performance metrics and traces on a running system.

This kind of instrumentation is called Dynamic Tracing, where “dynamic” means that there is no fixed tracepoint in the target system. It does not rely on tracepoints put by developers into strategic places in the code, instead, it uses the knowledge of the internals of the binary image of the program or kernel to intercept the execution, read (or write) the data and proceed further with the system execution.

kprobes and uprobes are fast. In comparison to other technologies like ptrace, they are orders of magnitude faster [6], and allow running instrumentation with low latency in production environments.

There are relatively few sources of information on *probes exclusively, most reliable being the Kernel source code. A lot of insight though can be gained from [5], [6] and [7].

Examples

Capturing Bash Interactive Shells

First, let’s start with a popular example. Running just one following line gives us access to all bash commands executed in interactive shells on the system, without the need to modify any binaries or the system otherwise. And of course this requires some privileges:

$ trace 'r:bash:readline "%s" retval'
PID    TID    COMM         FUNC             -
6468   6468   bash         readline         Hello World
6468   6468   bash         readline         git diff
6468   6468   bash         readline         exit

This command attaches a probe (more on how it is working later) to all bash processes, into readline function, triggered on function return, and prints the value returned by this function as a string. Just coincidentally readline function returns the line which was just executed.

Monitoring Python’s Dynamic Module Loading

Another example, a little more involved, would give us the possibility to look under the hood of python loading dynamic modules, while executing a simple command in python interactive shell:

$ python
>>> import numpy

Here we attach two separate probes, into the same function _PyImport_LoadDynamicModule available from libpython library, on function entry and return, printing different information for both:

  • On function entry, first two arguments are printed as strings
  • On function return, return value is printed as hexadecimal value

This would show us all dynamic python modules being loaded in real time, along with their addresses in target processes’ address spaces.

$ trace 'python:_PyImport_LoadDynamicModule "name: %s path: %s" arg1, arg2' \
        'r:python:_PyImport_LoadDynamicModule "at 0x%x" retval'
PID    TID    COMM         FUNC             -
16183  16183  python       _PyImport_LoadDynamicModule name: readline path: /usr/lib/python2.7/lib-dynload/readline.x86_64-linux-gnu.so
16183  16183  python       _PyImport_LoadDynamicModule at 0xcc11328
16183  16183  python       _PyImport_LoadDynamicModule name: numpy.core.multiarray path: /usr/local/lib/python2.7/dist-packages/numpy/core/multiarray.so
16183  16183  python       _PyImport_LoadDynamicModule at 0xcbb47c0
...

BCC Toolkit

Both commands use bcc toolkit [1], which employs a stack of powerful technologies, including eBPF [2] [3] and uprobes to give us access to the information we need, efficiently and on demand. All technologies above deserve (and sometimes have [4]) a dedicated blog, but I want to focus on *probes for now.

Implementation

All probes are assigned to an instruction residing in a .text segment of the program, all of them have to be registered, after which they start running, and unregistered, to release acquired resources and stop the instrumentation. Finally, often the data probe handlers store or output has to be post-processed, as probe execution is often optimized to limit work in probe handler and its IO activity to a minimum.

Probe Execution

Probe (de)registration, probe handlers and data processing are normally provided by instrumentation routine(s), which could be functions in another process or separate processes/utilities. Probepoint can be set virtually anywhere (with very few minor exceptions), including instrumentation routines themselves.

There are several types of probes available which can be categorized into two following buckets:

  • Instruction Probes - here belong probes that can be attached to virtually any instruction in the kernel or user binary.
  • Function Return Probes - Probes which are executed when a given function returns, allowing access to returned values.

When talking about kprobes, often another type of probes is mentioned - Function Entry Probes (look for jprobes). Even though they might seem as a third category, they are a particular case of instruction probes: attached specifically to an entry into a kernel function, they provide convenient access to its arguments, but require probe handlers to match the probed function signature.

Instruction Probes

Let’s try to make sense of how the probe execution happens. For the sake of keeping user-space focus, we will look into uprobes, but most of the concepts are identical to ones in kprobes. First, high-level overview:

Instruction Probe Execution

Establishing a Probepoint

Given an executable image, instruction and a probe handler addresses, uprobes first copies original instruction into a so called SSOL Area (SSOL stands for Single-Stepping Out-of-Line), and allocates internal data structures.

Then, original instruction opcode is modified to a breakpoint instruction int3 on x86). Note that this overwrites or corrupts the original instruction, which can be several bytes long. (This is why original instruction copy is saved). When done, tracing of the target process will execute once breakpoint is hit.

Probe Execution

Once breakpoint trap is hit, uprobes executes user-supplied handler functions. They are executed in kernel mode, in running process’ context - a mode when kernel is executing on behalf of a process. That means, handlers have access to both kernel and user-space data structures, and in addition can sleep, which is necessary if memory allocations or IO are required.

After all handlers are executed, copy of the original instruction is single-stepped out of line. This is necessary as the original instruction was not yet executed and was modified by the breakpoint. Simply replacing temporarily breakpoint instruction with the original opcode, is not possible as in multi-threaded context other process might skip the tracepoint.

This leads to a final stage where stack and registers states are fixed-up if effect of original instruction depends on its address. This stage is handled by uprobes as well and is implemented specifically for each supported platform.

Here is entire process again, with a bit more detail, applied to the bash::readline function probed above:

Instruction Probe Execution

Return Probes

Return probes provide a great advantage over simple instruction probes - they allow accessing function return value. Unfortunately, implementation gets a little bit more involved as breakpoint should hit on the target function return instruction of which there could be more than one.

Understanding return probe behavior is often simpler when one knows intimately calling conventions of the platform. This page neatly outlines the processes happening during __cdecl function call on Intel x86 platforms. This is mostly outside of our scope, but let’s mingle a little on the most relevant piece of the puzzle.

x68 Stack and uProbes

After caller has saved function parameters onto the stack or in registers, call instruction saves %eip instruction pointer onto the stack (pointing to the next byte after call instruction). That pointer is later replaced with uprobes return trampoline:

Return Probe Execution

In short, step by step:

  • First, an instruction probe at function entry is established.
  • When the probe is hit, uprobes establishes return trampoline: it saves function return address and replaces it with an address of uprobes trampoline, residing in SSOL Area, as it must reside in a target process address space, where another uprobe is set.
  • Once the trampoline is hit, uprobes executes user handlers, after which it restores original return address and allows original function to return.

Again, even though kprobes and uprobes implementations are different, conceptually they are mostly the same, so this description mostly applies to both.

Probes vs. Debugger

So how exactly is that different from attaching a debugger to a running process? Both examples mentioned above are achievable with running gdb -x with a series of commands. There are multiple answers to it:

  • First, probes are significantly more performant. As mentioned in [6], using gdb implies using ptrace which in turn guarantees you multiple trips between user- and kernel- space on almost any action you would like to take (that is why gdb is often perceived as a very slow debugger). The performance impact is so much higher if you would like to attach to events that occur very often and aggregate information or filter and report infrequent events, as in this case all aggregation/filtering can be done in kernel space, with only a small amount of data transferred to user space. This makes them usable in production environments with no performance penalty for the system.
  • Second, in modern kernels uprobes and kprobes come along with an eBPF [3] support on top which allows to do some really cool stuff [1]. In short, one can create programmable tracepoints which provide near-native performance with no risk of harm to the running system. Check it out, very cool :sparkles:
  • Probes can be attached to all processes/threads on the running system and to kernel functions. This makes them exceptionally powerful compared to gdb as often in complex systems multiple services are running on the same machine, possibly interacting. Low-overhead tracing allows to identify issues that wouldn’t occur if process is debugged.

Usage and Tooling

As mentioned in the beginning, uprobes and kprobes are not quite easy to use directly. If you wish to do so, you would have to include either linux/uprobes.h or linux/kprobes.h and use the C API. Quoting [5]:

A probe handler can modify the environment of the probed function – e.g., by modifying kernel data structures, or by modifying the contents of the pt_regs struct (which are restored to the registers upon return from the breakpoint). So Kprobes can be used, for example, to install a bug fix or to inject faults for testing. Kprobes, of course, has no way to distinguish the deliberately injected faults from the accidental ones. Don’t drink and probe.

So, using tooling built around these technologies is preferred in most scenarios. There are lots of candidates around: perf_events, SystemTap, LTTng, bcc and ply. In my opinion, two of the latter deserve a closer look, as they provide truly amazing flexibility with a very low overhead due to the use of eBPF [3].

Conclusion

I was always fascinated with complex technologies which seem like magic when they work. uprobes and kprobes are great examples of those, and here I just scratched the surface of how powerful they could be, especially in combination with other technologies in the Linux Kernel stack. In later posts I would try to look a bit more in detail into other aspects of Dynamic Tracing with uprobes, including more practical examples and of course pretty pictures. This is pretty new world for me, so we could learn about it together. Stay tuned! And don’t forget to go through the references!

References

Leave a Comment