How to solve a real machine learning problem with Nx

NOTE: Before reading, I highly recommend checking out this post which serves as an introduction to Nx and the Nx ecosystem.

Introduction

Numerical Elixir (Nx) is a numerical computing library for Elixir. Functionally, that means you don’t need to leave the Elixir ecosystem or install different languages and frameworks such as Python or TensorFlow to solve some machine learning problems like supervised learning with regression or classification.

As a relatively new library, Nx also has some limitations. For example, if you’re coming from the Python universe, keep in mind that Nx is not something like Scikit-Learn with thousands of ready-to-use machine learning algorithms. Instead, it’s more similar to Numpy in terms of being a general-purpose library with functions you can use to implement algorithms from scratch.

Today we’ll tackle a project to let you dip your toes into what Nx can do. I’ll also share some strategies for splitting data between training and testing, as well as reading data from a CSV file. These operations are quite common as part of a machine learning workflow.

The Project

When you start studying machine learning, linear regression is probably the best place to start. Not only because you can build a powerful algorithm with some basic statistical concepts, but also because you can build an understanding of many foundational concepts which you will be able to use for other machine learning algorithms.

Linear regression analysis is used to predict the value of a variable based on the value of another variable by describing their relationship with the help of a straight line. For our example here, we’ll use Nx to create a linear regression model to predict the likely fuel efficiency of an engine based on its horsepower.

The implementation we’re using takes a similar approach to what you would use if you were using Python. We’re taking this route to make the workflow easier to understand. Pretty much all articles and courses available online are based on Python, so whether you already have previous experience with machine learning in Python and want to translate that to Elixir, or you’re brand new to the topic, our example today is a good place to start (and you’ll be able to use existing Python materials for comparison).

You can find the dataset we are going to use here. There are two main variables in this dataset: Horsepower and Fuel Economy.

Below you can find an example of the dataset:

Horse Power	Fuel Economy
118.770799	29.344195
176.326567	24.695934
219.262465	23.952010
187.310009	23.384546
218.594340	23.426739

Creating the project

Let’s create a new Elixir project to solve our problem:

Make sure you have Elixir 1.13 or above installed on your machine and run mix new nx_linear_regression. We are also going to install some dependencies to read CSVfiles and create the model so make sure to add them to your mix.exs file as well.

mix.exs

 defp deps do
    [
      {:nx, "~> 0.2.1"},
      {:nimble_csv, "~> 1.1"},
      {:scholar, "~> 0.1.0", github: "elixir-nx/scholar"}
    ]
 end

Run mix deps.get.

Let’s now create the Linear Regression module:

lib/nx_linear_regression.ex

defmodule NxLinearRegression do
  @moduledoc """
  `NxLinearRegression`.
  """

  # Set of useful machine learning functions.
  # https://github.com/elixir-nx/scholar
  alias Scholar.Preprocessing

  # This allows us to use numerical definitions.
  import Nx.Defn

  # This is the linear regression function.
  # A linear regression line has an equation of the form Y = wX + b.
  defn predict({w, b}, x) do
    w * x + b
  end

  # This calculates the mean squared error.
  # MSE calculates the difference between the predicted fuel economy and the actual economy.
  defn loss(params, x, y) do
    y_hat = predict(params, x)

    (y - y_hat)
    |> Nx.power(2)
    |> Nx.mean()
  end

  # This finds the gradient and updates w and b accordingly.
  # The gradient minimizes the distance between predicted and true outcomes based on the loss function.
  # w and b are weights that must be updated to get closer to the real value.
  # lr stands for learning rate, which is a parameter that determines the step size
  # at each iteration while moving toward a minimum of a loss function.
  defn update({w, b} = params, x, y, lr) do
    {grad_w, grad_b} = grad(params, &loss(&1, x, y))

    {
      w - grad_w * lr,
      b - grad_b * lr
    }
  end

  # This is just to generate some initial values for weights and bias.
  defn init_random_params do
    w = Nx.random_normal({}, 0.0, 0.1)
    b = Nx.random_normal({}, 0.0, 0.1)
    {w, b}
  end

  # This is for training based on the number of epochs.
  @spec train(data :: tuple(), lr :: float(), epochs :: integer()) ::
          {Nx.Tensor.t(), Nx.Tensor.t()}
  def train(data, lr, epochs) do
    init_params = init_random_params()

    {x, y} = Enum.unzip(data)

    x = Preprocessing.standard_scaler(Nx.tensor(x))
    y = Nx.tensor(y)

    for _ <- 1..epochs, reduce: init_params do
      acc -> update(acc, x, y, lr)
    end
  end

  # The train-test split is a technique for evaluating the performance of a machine learning algorithm.
  # It's important to simulate how a model would perform on new/unseen data.
  @spec train_test_split(data :: list(), train_size :: float()) :: tuple()
  def train_test_split(data, train_size) do
    num_examples = Enum.count(data)
    num_train = floor(train_size * num_examples)
    Enum.split(data, num_train)
  end
end

And finally the Fuel Economy module:

lib/fuel_economy.ex

defmodule NxLinearRegression.FuelEconomy do
  @moduledoc """
  Try to predict the likely fuel consumption efficiency
  https://www.kaggle.com/vinicius150987/regression-fuel-consumption
  """

  # Functions to handle CSVfiles
  # https://github.com/dashbitco/nimble_csv
  alias NimbleCSV.RFC4180, as: CSV

  # Set of useful machine learning functions
  # https://github.com/elixir-nx/scholar
  alias Scholar.Preprocessing

  # Let's define some defaults epochs and learning rate.
  @epochs 2000
  @learning_rate 0.1

  # This will call our internal training function with epochs and learning rate we defined above.
  @spec train(data :: tuple) :: {Nx.Tensor.t(), Nx.Tensor.t()}
  def train(data) do
    NxLinearRegression.train(data, @learning_rate, @epochs)
  end

  # This is going to predict based on the params previously learned.
  @spec predict(params :: tuple(), data :: list()) :: Nx.Tensor.t()
  def predict(params, data) do
    x =
      data
      |> Nx.tensor()
      |> Preprocessing.standard_scaler()

    NxLinearRegression.predict(params, x)
  end

  # This is going to calculate the MSE based on the params previously learned.
  @spec mse(params :: tuple(), data :: tuple()) :: Nx.Tensor.t()
  def mse(params, data) do
    {x, y} = Enum.unzip(data)

    x = Preprocessing.standard_scaler(Nx.tensor(x))
    y = Nx.tensor(y)

    NxLinearRegression.loss(params, x, y)
  end

  # This is going to load the data as streams.
  @spec load_data :: Stream.t()
  def load_data do
    "FuelEconomy.csv"
    |> File.stream!()
    |> CSV.parse_stream()
    |> Stream.map(fn [horse_power, fuel_economy] ->
      {
        Float.parse(horse_power) |> elem(0),
        Float.parse(fuel_economy) |> elem(0)
      }
    end)
  end
end

Running the project

# Run Elixir
iex -S mix

# Load Data
data = NxLinearRegression.FuelEconomy.load_data()

# Split into 80% for training and 20% for testing
{train, test} = NxLinearRegression.train_test_split(data, 0.8)

# Train the model and obtain the learned params
params = NxLinearRegression.FuelEconomy.train(train)

# Calculate MSE
# https://en.wikipedia.org/wiki/Mean_squared_error
mse = NxLinearRegression.FuelEconomy.mse(params, test)

# Get the test data
{x_test, _y_test} = Enum.unzip(test)

# Predict some values
# https://findanyanswer.com/what-is-the-difference-between-y-and-y-hat
y_hat = NxLinearRegression.FuelEconomy.predict(params, x_test)

Conclusion

You’ve just solved your first real-world machine learning problem by using Nx!

Despite being young, Nx is mature enough to help you achieve good results without depending on external tools. And if you run into something missing you can always contribute to the community and send a PR by adding the missing feature.

As a next step, I would suggest reading our previous posts on Nx and Axon as well as trying to solve a different classification problem, maybe with more variables or to predict a class instead of a number by using Nx.

You can find the source code of this example here. Enjoy.

Introduction

The Project

Creating the project

Running the project

Conclusion

Newsletter

Stay in the Know