Debugging Elixir NIFs with LLDB

A silver wrench tightening a nut along the length of a sliver screw
Paulo Valente

Software Engineer

Paulo Valente

Elixir gives your team the built-in tools they need to develop quickly and effectively. Book a free consult to learn how we put Elixir to work for you.

Elixir can call compiled code from languages like C or Rust through Native Implemented Functions (NIFs). This article is a brief introduction to how to connect to your Erlang VM instance with LLDB, as well as a brief cheat sheet of useful LLDB commands.

What is LLDB

LLDB is a debugger program that allows a developer to step through compiled code during its execution, as well as explore stack traces whenever a system exception is reached. Its name is a play on words between Low-Level Virtual Machine (LLVM) and DB (which in contexts means “debugger”). For those interested, the same process applies to GNU Debugger (GDB), albeit with different commands, and there is a command mapping on LLDB’s website.

Pre-Requisites for Following This Guide

You’ll need to have Elixir and Erlang installed (which are most easily installed via asdf. This guide used Elixir 1.15.4 and Erlang 26.0.2, but concepts should apply regardless of version.

For building the NIFs, make and g++ (or an equivalent C++ compiler) should be available. The installation for each of these differs quite a bit depending on the platform and environment setup, so it may be helpful to look up instructions.

Creating and Understanding the Code

Before we dive into LLDB, let’s prepare our example code. The instructions below will help us set up a Mix project that provides a natively implemented upcase/1 function. Along the way, we’ll also understand more about how NIFs are compiled and mapped onto Elixir functions.

First, we’ll generate a standard Mix project with mix new nif_example. Then, add the following files inside the project we just generated. Each file will be followed by its purpose in the setup.

nif_example/Makefile:

# Environment variables passed via elixir_make
# ERTS_INCLUDE_DIR
# MIX_APP_PATH

CFLAGS= -fPIC -I$(ERTS_INCLUDE_DIR) -Wall -w

ifdef DEBUG
    CFLAGS += -g
endif

LDFLAGS = -shared -flat_namespace -undefined suppress

priv/libnifexample.so: cache/objs/my_nif.o
                @mkdir -p priv
                $(CXX) $^ -o $@ $(LDFLAGS)

cache/objs/%.o: c_src/%.cc
                @mkdir -p cache/objs
                $(CXX) $(CFLAGS) -c $< -o $@

The Makefile is how we use make to build a project.

There is quite a bit to understanding the Makefile, which isn’t really relevant here. For brevity, we can think of it as the builder script that calls the C++ compiler and builds priv/libnifexample.so which is the shared-object .so library file from where Elixir loads the NIF implementations.

nif_example/c_src/my_nif.h:

#pragma once

#include "erl_nif.h"

ERL_NIF_TERM upcase(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]);

my_nif.h is the C++ header file, which declares the upcase function. In this case, because there’s only one .cc file, we could omit it, but for completeness, it has been included.

nif_example/c_src/my_nif.cc:

#include "my_nif.h"

ERL_NIF_TERM upcase(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {
  if (argc != 1) {
    return enif_make_tuple2(env, enif_make_atom(env, "error"), enif_make_atom(env, "invalid_arg_count"));
  }

  ErlNifBinary bin;
  if (!enif_inspect_binary(env, argv[0], &bin)) {
    return enif_make_tuple2(env, enif_make_atom(env, "error"), enif_make_atom(env, "invalid_argument"));
  }

  ErlNifBinary out_bin;
  // The call below is commented out to introduce an artificial
  // segmentation fault error source.
  // enif_alloc_binary(bin.size, &out_bin);

  for (int i = 0; i < bin.size; i++) {
    char c = bin.data[i];
    if (c >= 'a' && c <= 'z') {
      out_bin.data[i] = c - 'a' + 'A';
    } else {
      out_bin.data[i] = c;
    }
  }

  return enif_make_tuple2(env, enif_make_atom(env, "ok"), enif_make_binary(env, &out_bin));
}

static int load(ErlNifEnv* env, void** priv_data, ERL_NIF_TERM load_info) {
  return 0;
}

static ErlNifFunc nif_funcs[] = {
    {"upcase", 1, upcase}};

ERL_NIF_INIT(Elixir.NifExample, nif_funcs, &load, NULL, NULL, NULL);

my_nif.cc is the main native source file. It includes the upcase function, and the declaration of the NIF functions the library provides. This declaration is done through ERL_NIF_INIT, which ultimately maps an Elixir name and arity to the corresponding C++ function.

As a brief introduction to the native side of NIFs, let’s understand the three arguments that all NIFs receive: env, argc and argv. A more thorough explanation is found in the official docs.

ErlNifEnv env is a special data structure that contains an environment that acts as the communication pipeline between the native side and the Erlang/Elixir side of the execution workflow.

ERL_NIF_TERM[] argv is a list of ERL_NIF_TERM values, which is how the native side gets any value passed as an argument. Generally, these are decoded by special functions, and all NIFs also return an ERL_NIF_TERM value. int argc is the length of this list, an artifact of how arrays are implemented in C. This pair is analogous to the main(int argc, char **argv) usually present in C/C++.

nif_example/lib/nif_example.ex:

defmodule NifExample do
  @on_load :__on_load__

  def __on_load__ do
    # We refer to `:code.priv_dir` indirectly because at runtime, the `priv` dir
    # is not necessarily in the same path as `./priv` -- as an exercise,
    # try running it via `iex -S mix` and check the returned path!
    path = :filename.join(:code.priv_dir(:nif_example), ~c"libnifexample")
    :erlang.load_nif(path, 0)
  end

  def upcase(_binary), do: :erlang.nif_error(:undef)
end

The NifExample Elixir module is where we declare the NIF functions (or function, in this case), using the @on_load module callback to call :erlang.load_nif and load the declarations. Stub implementations must be provided so that the Elixir compiler knows which functions to connect to the NIF declarations.

And finally, let’s add the following to the projects mix.exs:

...

def project do
  [
    ...
    compilers: [:elixir_make | Mix.compilers()]
    ...
  ]
end

...

def deps do
  [
    {:elixir_make, "~> 0.8"}
  ]
end

...

Finally, we must compile with DEBUG=1 mix compile first, or export DEBUG=1 before compiling, so that the Makefile includes the -g flag. -g will add more info to the produced .so file, which allows lldb to reference function names and file locations for each symbol, ultimately enabling a more complete debugging experience.

Now that the project is ready, we can run NifExample.upcase("my string") in IEx to get out upcased string. However, the error we introduced in the code will cause us to get a segmentation fault, which crashes the VM, leaving no room for us to debug in Elixir-land.

Enter LLDB

LLDB, when attached to a process, can capture code backtraces, which are similar to Elixir’s stack traces. This means that we can figure out at least where our code is going wrong in NIF land.

To attach LLDB to a BEAM instance, we first start our IEx shell and use System.pid to obtain the host OS PID for the BEAM instance. Let’s say it outputs 133742. In a separate shell, we can then use sudo lldb --attach-pid 133742 to run LLDB and attach it to the BEAM process.

Notice that as soon as LLDB is done setting up, the BEAM process is frozen. This is because the debugger puts the process on hold until we use the continue command, which resumes execution.

Now, if we again call NifExample.upcase("my string"), we`ll see that LLDB reacts to the crash, with a dump like this one:

(lldb) continue
Process 72797 resuming
Process 72797 stopped
* thread #13, name = 'erts_sched_9', stop reason = EXC_BAD_ACCESS (code=1, address=0xb5d5b)
    frame #0: 0x00000001005a2168 beam.smp`erts_build_proc_bin + 8
beam.smp`erts_build_proc_bin:
->  0x1005a2168 <+8>:  ldr    x8, [x2, #0x10]
    0x1005a216c <+12>: str    x8, [x1, #0x8]
    0x1005a2170 <+16>: ldr    x8, [x0]
    0x1005a2174 <+20>: str    x8, [x1, #0x10]
Target 0: (beam.smp) stopped.

We can use bt to get a more complete backtrace, which, as seen below, points us to my_nif.cc:25:59.

(lldb) bt
* thread #13, name = 'erts_sched_9', stop reason = EXC_BAD_ACCESS (code=1, address=0xb5d5b)
  * frame #0: 0x00000001005a2168 beam.smp`erts_build_proc_bin + 8
    frame #1: 0x0000000100615efc beam.smp`enif_make_binary + 660
    frame #2: 0x0000000103703e30 libnifexample.so`upcase(env=0x000000017049ad38, argc=1, argv=0x000000017049ae40) at my_nif.cc:25:59
    frame #3: 0x000000010048fb3c beam.smp`beam_jit_call_nif(process*, void const*, unsigned long*, unsigned long (*)(enif_environment_t*, int, unsigned long*), erl_module_nif*) + 100
    frame #4: 0x00000001015b0afc

This means the error happens on line 25, column 59 - the call to enif_make_binary. In this case, after some investigation and documentation diving, we can conclude that the issue is that we’re operating on an unallocated binary. We can also conclude that more easily because we manually introduced the error by commenting out the enif_alloc_binary, as called out beforehand.

Breakpoint Debugging with LLDB

For the sake of an example, let’s say we couldn’t find the error. First, let’s restart everything. We can use exit or CTRL-D to exit the LLDB shell at any time. We can then repeat the lldb --attach-pid command with the new PID. Another option is to use detach on the LLDB shell, restart the BEAM, get the new PID, and attach <pid>, again in the LLDB shell.

Now, before running the Elixir code again, let’s set the breakpoint in LLDB with b my_nif.cc:25. We can use frame variable to get the available variables and their values, and either continue to step through to the next breakpoint, or n/s to step through the code – n steps over function calls, while s steps inside function calls. Running help inside the lldb shell is a great way of finding what each command does and discovering new commands. help <command> explains each one in more detail.

With frame variable, we get the output below:

(ErlNifEnv *) env = 0x000000016d3d2d38
(int) argc = 1
(const ERL_NIF_TERM *) argv = 0x000000016d3d2e40
(ErlNifBinary) bin = {
  size = 4
  data = 0x0000000140b34020 "ASDF"
  ref_bin = 0x0000000000000000
  __spare__ = ([0] = 0x00000001030e4a08, [1] = 0x0000000140ca0378)
}
(ErlNifBinary) out_bin = {
  size = 6127693120
  data = 0x0000000140d60b30 "ASDF"
  ref_bin = 0x00000000000b5d4b
  __spare__ = ([0] = 0x000000010750c462, [1] = 0x0000000000000002)
}

This output shows us that out_bin, although with the proper data, is allocated with an incorrect size. This is because enif_alloc_binary properly initializes a variable. We can detach from the process, fix the code, and then attach to a new BEAM instance. LLDB will still have the breakpoint, so we can just run as normal, and get the new frame variable output with the corrected out_bin variable:

(lldb) frame variable
(ErlNifEnv *) env = 0x000000016dbe6d38
(int) argc = 1
(const ERL_NIF_TERM *) argv = 0x000000016dbe6e40
(ErlNifBinary) bin = {
  size = 4
  data = 0x0000000106c655d0 "ASDF\U00000001"
  ref_bin = 0x0000000000000000
  __spare__ = ([0] = 0x0000000102af8a08, [1] = 0x0000000130cc0668)
}
(ErlNifBinary) out_bin = {
  size = 4
  data = 0x0000000106c65610 "ASDF"
  ref_bin = 0x0000000106c655f8
  __spare__ = ([0] = 0x0000000105c9b922, [1] = 0x0000000000000002)
}

Another useful command is p <expr> which allows us to call simple C code in context, which can also be helpful to inspect nested values such as p out_bin.size or p bin.data in the code at hand:

(lldb) p bin.data
(unsigned char *) 0x0000000106c655d0 "ASDF\U00000001"
(lldb) p out_bin.size
(size_t) 4

The fixed code now behaves as expected:

iex(1)> NifExample.upcase("asdf")
{:ok, "ASDF"}
iex(2)> NifExample.upcase("ASDF")
{:ok, "ASDF"}
iex(3)> NifExample.upcase(1)
{:error, :invalid_argument}

Newsletter

Stay in the Know

Get the latest news and insights on Elixir, Phoenix, machine learning, product strategy, and more—delivered straight to your inbox.

Narwin holding a press release sheet while opening the DockYard brand kit box