AI-Based Grammar Error Correction (GEC) With Elixir and Livebook

Whether you need machine learning know-how, design to take your digital product to the next level, or strategy to set your roadmap, we can help. Book a free consult to learn more.

Mix.install(
  [
    {:readability, "~> 0.12.1"},
    {:bumblebee, "~> 0.6"},
    {:exla, ">= 0.0.0"},
    {:kino, "~> 0.14"}
  ],
  config: [
    nx: [default_backend: EXLA.Backend]
  ]
)

Introduction

In this article, we’ll explore the use of several Elixir libraries to make a small Livebook tool that checks a web page’s grammar, suggests edits, and gives a score based on the grammatical correctness of the page.

This will use the following libraries and tools:

readability - An Elixir implementation of the Readability library, originally implemented by Mozilla, that powers their Reader mode in the browser. This will use a rules / heuristic-based approach to extract only the readable article text from a web page.
bumblebee - To load and run pre-trained transformer models, in this case, the grammarly/coedit-large model from Grammarly
kino - To add inputs and visualizations to the Livebook

Grammar Error Correction Module

First, let’s set up a GrammarCorrection module where we will house logic for rendering our outputs and formatting them how we want them. This includes some simple helpers to transform a number in [0,1] to a letter grade, and colorizing a string diff output.

Here, we use EEx to reuse and render these templates, and we will use Kino.HTML downstream to render them as HTML within the Livebook.

We make a template for the overall grade and another template for the diff output and use EEx.function_from_string/5 to create helper functions to render the templates given some inputs.

defmodule GrammarCorrection do
  @grade_template """
  <!DOCTYPE html>
  <html lang="en">
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Grammar Grade</title>
    <style>
        body {
            background-color: #f4f4f9;
            font-family: "Arial", sans-serif;
            margin: 0;
            display: flex;
            justify-content: center;
            align-items: center;
            height: 100vh;
            padding: 0;
        }
        .grade-card {
            background-color: #ffffff;
            box-shadow: 0 8px 16px rgba(0,0,0,0.1);
            border-radius: 10px;
            padding: 40px;
            font-size: 24px;
            color: #333333;
            border: 1px solid #dddddd;
            text-align: center;
            width: 300px;
        }
        .header {
            font-size: 28px;
            color: #4A90E2;
            font-weight: bold;
            margin-bottom: 20px;
            border-bottom: 2px solid #E1E1E1;
            padding-bottom: 10px;
        }
        .grade {
            font-weight: bold;
            font-size: 48px;
            margin-top: 20px;
        }
        .high { color: #4CAF50; } /* Green for high grades */
        .medium { color: #FFC107; } /* Amber for medium grades */
        .low { color: #F44336; } /* Red for low grades */
    </style>
  </head>
  <body>
    <div class="grade-card">
        <p class="header">Grammar Grade</p>
        <p class="grade <%= grade_class %>"><%= grade %></p>
    </div>
  </body>
  </html>
  """

  @diff_template """
  <!DOCTYPE html>
  <html lang="en">
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Diff Output</title>
    <style>
        body {
            background-color: #1e1e1e;
            color: #c5c8c6;
            font-family: "Courier New", Courier, monospace;
            margin: 0;
            padding: 20px;
        }
        .suggestion-container {
            background-color: #1d1f21;
            border-left: 4px solid #8abeb7;
            padding: 12px 20px;
            font-size: 16px;
        }
        .insert {
            color: #b5bd68; /* Greenish color for added text */
        }
        .delete {
            color: #cc6666; /* Reddish color for deletions */
        }
        .normal {
            color: #c5c8c6; /* Standard text color */
        }
    </style>
  </head>
  <body>
    <div class="suggestion-container">
        <%= content %>
    </div>
  </body>
  </html>
  """
  require EEx
  EEx.function_from_string(:defp, :render_diff_html, @diff_template, [:content])
  EEx.function_from_string(:defp, :render_grade_html, @grade_template, [:grade, :grade_class])

  def render_grade(score) when is_float(score) do
    {grade, grade_class} =
      cond do
        score >= 0.97 -> {"A+", "high"}
        score >= 0.93 -> {"A", "high"}
        score >= 0.90 -> {"A-", "high"}
        score >= 0.87 -> {"B+", "medium"}
        score >= 0.83 -> {"B", "medium"}
        score >= 0.80 -> {"B-", "medium"}
        score >= 0.77 -> {"C+", "medium"}
        score >= 0.73 -> {"C", "low"}
        score >= 0.70 -> {"C-", "low"}
        score >= 0.67 -> {"D+", "low"}
        score >= 0.63 -> {"D", "low"}
        score >= 0.60 -> {"D-", "low"}
        true -> {"F", "low"}
      end

    render_grade_html(grade, grade_class)
  end

  def colorize_diff(script, output_format \\ :terminal) do
    formatted = Enum.map(script, fn
        {:eq, text} -> colorize(text, :normal, output_format)
        {:ins, text} -> colorize(make_whitespace_visible(text, :insert), :insert, output_format)
        {:del, text} -> colorize(make_whitespace_visible(text, :delete), :delete, output_format)
      end)

    case output_format do
      :terminal ->
        formatted
        |> IO.ANSI.format()
        |> IO.puts()

      :html ->
        formatted
        |> Enum.join()
        |> render_diff_html()
    end
  end

  defp make_whitespace_visible(text, type) do
    space =
      case type do
        :insert -> "<span class=\"insert\">•</span>"
        :delete -> "<span class=\"delete\">•</span>"
        _ -> " "
      end

    tab =
      case type do
        :insert -> "<span class=\"insert\">⇢</span>"
        :delete -> "<span class=\"delete\">⇢</span>"
        _ -> "\t"
      end

    newline =
      case type do
        :insert -> "<span class=\"insert\">↵</span><br>"
        :delete -> "<span class=\"delete\">↵</span><br>"
        _ -> "\n"
      end

    text
    |> String.replace(" ", space)
    |> String.replace("\t", tab)
    |> String.replace("\n", newline)
  end

  defp colorize(text, :normal, _output_format), do: text
  defp colorize(text, :insert, :terminal), do: [IO.ANSI.green(), text, IO.ANSI.reset()]
  defp colorize(text, :delete, :terminal), do: [IO.ANSI.red(), text, IO.ANSI.reset()]
  defp colorize(text, :insert, :html), do: "<span class=\"insert\">#{text}</span>"
  defp colorize(text, :delete, :html), do: "<span class=\"delete\">#{text}</span>"
end

Set Up Bumblebee and Nx.Serving

Next, we set up the Bumblebee model configuration and serving. Since the grammarly/coedit-large is based on the flan-t5 family of transformer models, we can use it immediately with Bumblebee.

We also set EXLA.Backend as the default backend to be used by Nx.

Depending on your setup, running this Livebook notebook on a webpage with large text might take a very long time.

Nx.global_default_backend(EXLA.Backend)
{:ok, model_info} = Bumblebee.load_model({:hf, "grammarly/coedit-large"}, backend: EXLA.Backend)
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "grammarly/coedit-large"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "grammarly/coedit-large"})
generation_config = Bumblebee.configure(generation_config, max_new_tokens: 200)

serving =
  Bumblebee.Text.generation(model_info, tokenizer, generation_config,
    compile: [batch_size: 1, sequence_length: 256],
    defn_options: [compiler: EXLA]
  )

We need to start the serving, so we will do so under the Kino supervisor.

Kino.start_child({Nx.Serving, name: Grammar, serving: serving})

Set Up Form and Kino Frames

This is the only input we’re including. It’s a simple form with a single Input Kino, which will emit an event when the submit button (in this case, labeled “Check”) is clicked.

Note that Kino includes a standalone Kino.Control.button Kino, but using the Kino.Control.form Kino instead will take care of passing the Kino.Input.text content when the form button is pressed, rather than needing to use something like Kino.Control.tagged_stream/1 function to group multiple streams together.

form = Kino.Control.form([page: Kino.Input.text("Page to check")], submit: "Check")

We add two separate Kino.Frame’s for the two items we want to synamically render. Frames are essentially containers for UI (Kinos) that allows us to dynamically re-render, append, clear, etc. within our event handlers.

grade_frame = Kino.Frame.new(placeholder: false)

diff_frame = Kino.Frame.new(placeholder: false)

Set Up Kino Listener

Lastly, we set up the event-handling logic. This is the core logic we want to run when each time a submit event is emitted. There are five main steps that run within this Kino.listen/3:

Scrape the page for readable text, and separate by newlines, filtering for only lines that are between 3 and 500 characters.
Use the grammarly/coedit-large model with Bumblebee’s text_generation task to generate grammar-corrected versions of each line of text.
Calculate the Jaro and Myers distance between each error-corrected text and its original counterpart. Jaro distance outputs a value between zero and one that indicates similarity, with one meaning the texts are the same. This is used for the Grammar Score. Myers distance outputs a list of edits that, when performed, transforms the original text into the grammar-corrected text. This is used to visualize the differences between the texts in a diff-styled output.
Compute the average Jaro distance of the entire article.
Render the diffs and overall grade.

Kino.listen(form, fn %{data: %{page: page}} ->
  original =
    page
    |> Readability.summarize()
    |> Map.get(:article_text)
    |> String.split("\n")
    |> Enum.reduce([], fn line, acc ->
      line = String.trim(line)
      len = String.length(line)

      if len < 3 or len > 500 do
        acc
      else
        [line | acc]
      end
    end)

  corrected =
    Nx.Serving.batched_run(Grammar, original |> Enum.map(fn line -> "Fix grammar: " <> line end))
    |> Enum.to_list()

  generated =
    for %{results: %{text: output}} <- corrected do
      output
    end

  {scores, corrections} =
    Enum.zip(original, generated)
    |> Enum.reduce({[], []}, fn {orig, gen}, {scores_acc, gen_acc} ->
      {[String.jaro_distance(orig, gen) | scores_acc],
       [String.myers_difference(orig, gen) | gen_acc]}
    end)

  avg_score =
    scores
    |> Nx.tensor()
    |> Nx.mean()
    |> Nx.to_number()

  grade_kino = avg_score |> GrammarCorrection.render_grade() |> Kino.HTML.new()

  diff_kino =
    corrections
    |> Enum.map(&GrammarCorrection.colorize_diff(&1, :html))
    |> Enum.join("\n")
    |> Kino.HTML.new()

  Kino.Frame.render(grade_frame, grade_kino)
  Kino.Frame.render(diff_frame, diff_kino)
end)

Possible Improvements

This is a fairly naive implementation, with several possible improvements you can implement already, and others that would require either new models in the Bumblebee ecosystem or more extensive modifications.

One improvement would be to add a feedback mechanism for the end-user while Bumblebee is generating the corrected sentences. This could be as simple as a spinner, or maybe an updated percentage as different steps are performed.

This could provide even more feedback if you enable the stream: true option in the Bumblebee.Text.generation keyword options. If you change to streaming, you could update the diff Kino.Frame after each new sentence is streamed, but you would still have to wait for all corrections before determining the overall grade.

You could also change how you pre-process the web page. Right now we use Readability and some basic filtering by length, but you could change this step to your liking. For example, it would probably make sense to filter out <code> blocks since code does not tend to adhere to normal grammatical rules.

In addition to using the stream: true option, you could also come up with different configuration options when setting up the Nx.Serving that might be better suited for your use case.

Another interesting direction you could go in the field of grammar error correction would be in an effort to reduce the number of tokens being generated by the LLM, meaning less post-processing. The approach offered in the paper Seq2Edits: Sequence Transduction Using Span-level Edit Operations would have the same inputs as we currently have but instead of outputting the whole corrected text, where we then calculate the String.myers_difference, the transformer itself outputs the sequence of edits needed to correct the grammar.

You could attempt to use something like Instructor to do this more generically, but you will likely have to enforce some strict verification rules that would likely result in many retries each time.

Conclusion

In this brief article, we showcased using several libraries to perform Grammar Error Correction on a web page, outputting an overall grammar grade, and visualizing the edits needed to improve the grammar. We showed how models derived from existing Bumblebee architectures can be used as drop-in task-specific replacements for more generic transformer models, which is good to keep in mind when caring about accuracy and performance on a narrow task.

This showcased how to use Kino and Livebook to make a small AI-powered application with dynamic frame updates. You can take some of the principles used throughout this demo to make larger and more complicated notebooks, such as the Interactive AutoFFMPEG CinEx notebook.