Taming :ets for High-Performance Software

By: Mike Binns
Legs dangling off building

One of the more powerful features of the Elixir language when building web applications and other high-performance software is the ability to use the Erlang Term Storage library, aka :ets. :ets can often single-handedly replace entire caching products and strategies in your tech stack. The Elixir core language currently does not have a wrapper around the :ets library, therefore users interact directly with the Erlang interface. As is the case with Erlang itself, the developer user experience is a bit rough around the edges, and can be confusing to developers trying to learn the library.

I worked through these issues recently, and want to share what I learned and the library I put together to help myself and other developers work with :ets. There are many great blog posts on what :ets is and how to use it; however, this post will focus on some of the common issues that you might encounter when using :ets, as well as introduce Ets, which is designed to improve the Elixir developer user experience when working with :ets.

Let’s first dive into some of the common pitfalls of :ets.

Creating :ets tables

One often confusing aspect of working with :ets is the experience around creating tables. While Elixir developers are used to keyword lists for options, :ets takes a regular list, containing a mix of both single atoms (e.g. :private) and key/value tuples (e.g. {:write_concurrency, false}). The format of the options aren’t consistent. Some boolean options such as :named_table or :compressed are taken as a single atom flag. Other boolean options such as {write_concurrency: true} and {read_concurrency: true} are taken as key/value tuples. Non-boolean options are similary inconsistent. Some non-boolean options are taken as single atoms — for example, protection level is specified as either :private, :protected, or :public. Other non-boolean options such as {:keypos, 1} appear as key/value tuples. Finally, :ets.new() takes a table name for its first parameter, even when you are creating an unnamed table; in which case, the name is ignored. This mixture of confusing options means that even developers who have used :ets for a while often end up back in the documentation when creating a new table.

Bags and sets, ordered and duplicate

There are four different types of :ets tables, :set, :bag, :ordered_set, and :duplicate_bag. What exactly each of these does is a source of confusion even among long time Elixir developers. In practice, the two bags act pretty much the same as each other most of the time, as do the two sets, and ordered/duplicate probably should have been configuration flags on :set and :bag respectively.

:set and :ordered_set both allow only one record for any single key. Inserting a second record with the same key will overwrite the first record with the second record. The only difference between :set and :ordered_set is that one keeps the records in term order of the keys. This is useful if you want to use first/last/next/prev, but adds overhead to insert compared to a simple :set, so it should only be used if you definitely need order. Side note: using those four functions on anything other than an :ordered_set might result in an ArgumentError, but almost certainly will result in inconsistent results. Additionally, even though lookup on a set can only ever return a single record, the function returns that single value in a list (or an empty list if none found), which has to be taken into account every time you call lookup.

:bag and :duplicate_bag both allow for multiple records with the same key. The only difference between the two is that :bag does not allow two records where all values in the record are the same (e.g. inserting {:a, :b, :c} twice would result in a single entry in a :bag, but two entries in :duplicate_bag). :bag comes with the implementation overhead of checking for duplicates on insert; so unless you explicitly need to prevent duplicate full records, you should use :duplicate_bag over :bag.

What is a record?

Another thing that confused me about :ets is that I initially heard it described as a key-value store like Redis. This, plus the available examples, had me thinking that one element in the tuple was the key, and the other elements in the tuple are all the values associated with that key. I thought that insert with {:a, :b} and then {:a, :c} would result in {:a, :b, :c}, :a being the key, and :b/:c being the two values associated with it. Instead, it’s more like a relational database where each row is a single record, and one of the values in the row (specified by the :keypos option) serves as the key. By default, it’s the first value in the inserted tuple.

For example, if our record is {email, name, phone}, then by default email is the key:

table = :ets.new(:users, [:set])
:ets.insert(table, {"me@example.com", "Mike Binns", "555-867-5309"})
:ets.lookup(table, "me@example.com") # => [{"me@example.com", "Mike Binns", "555-867-5309"}]
:ets.lookup(table, "Mike Binns") # => []

but if we set :keypos to 2, then the name column is the key:

table = :ets.new(:users, [:set, {:keypos, 2}])
:ets.insert(table, {"me@example.com", "Mike Binns", "555-867-5309"})
:ets.lookup(table, "me@example.com") # => []
:ets.lookup(table, "Mike Binns") # => [{"me@example.com", "Mike Binns", "555-867-5309"}]

As an added (somewhat confusing) feature, :ets records don’t all have to be the same size, so you can add {:a, :b, :c} and {:c, :d} to the same table. The only caveat to that is that the record cannot be smaller than the keypos (e.g. {:a} cannot be inserted into a table with {:keypos, 2}), or you will end up with…

ArgumentError screenshot

ArgumentError

You don’t get far using :ets before you run into your first ArgumentError. There are many ways you can mess up when using :ets, from passing incorrect args, to attempting an operation on a table that doesn’t exist or attempting to create a table that already exists, to inserting invalid values such as non-lists or records smaller than the keypos, to attempting lookup_element on a key that doesn’t exist. The issue with :ets is that regardless of what you do wrong, your result is a raised ArgumentError with no additional information. You have to know where to look and what may have caused the error. This is difficult when you are learning :ets and don’t know the common pitfalls. The raise also doesn’t allow the standard Elixir {:ok, value} | {:error, reason} matching, so you have to wrap everything in try/catch if you want to be safe.

:"$end_of_table"

Another interesting quirk of :ets is the :"$end_of_table" atom. The dollar sign in the atom necessitates quotes, hence the awkward :"$ at the beginning of the atom. This atom shows up as a return value in a number of calls, including first, last, next, and prev. It is also returned by match, either solo (if there are no more rows that match) or as part of a tuple (if the current page of results is the last page and isn’t a full page). When working with any of the functions that may return it, you have to specifically check for the atom in your pattern matches, or risk passing it on to your running code.

Introducing the Ets library

When I began working with :ets, I didn’t get far before looking for a nice Elixir wrapper. Unfortunately, the handful of existing wrappers were limited in scope and didn’t address the issues I was dealing with, so the idea of writing a more comprehensive wrapper came up. One of the many benefits of working at DockYard is that client work is done Monday through Thursday, and Fridays are “DockYard Days,” during which DockYarders can work on things like professional development, mentoring, or Open Source contributions. The Elixir Library Ets is the result of a number of my DockYard Days over the past months. The design goals for the Ets library, outlined in the README, are listed below. As you can see, Ets is designed to eliminate or avoid the pitfalls I have described in this post.

From Ets README.md

The purpose of this package is to improve the developer experience when both learning and interacting with Erlang Term Storage.

This will be accomplished by:

  • Conforming to Elixir standards:

    • Two versions of all functions:

      • Main function (e.g. get) returns {:ok, return}/{:error, reason} tuples.
      • Bang function (e.g. get!) returns unwrapped value or raises on :error.
    • All options specified via keyword list.
  • Wrapping unhelpful ArgumentError’s with appropriate error returns.

    • Avoid adding performance overhead by using try/rescue instead of pre-validation
    • On rescue, try to determine what went wrong (e.g. missing table) and return appropriate error
    • Fall back to {:error, :unknown_error} (logging details) if unable to determine reason.
  • Appropriate error returns/raises when encountering $end_of_table.
  • Providing Elixir friendly documentation.
  • Providing Ets.Set and Ets.Bag modules with appropriate function signatures and error handling.

    • Ets.Set.get returns a single item (or nil/provided default) instead of list as sets never have multiple records for a key.
  • Providing abstractions on top of the two base modules for specific usages

    • Ets.Set.KeyValueSet abstracts away the concept of tuple records, replacing it with standard key/value interactions.

Try it out

You can add Ets to your Elixir project by adding {:ets, "~> 0.6.0"} to your mix.exs dependencies. Check for the latest published version on hex. The documentation is available on hexdocs. Please take a look, give it a shot, and let me know how it goes.

DockYard is a digital product agency offering exceptional strategy, design, full stack engineering, web app development, custom software, Ember, Elixir, and Phoenix services, consulting, and training.