TIL: Parsing Strings in Rust with nom is Awesome

Posted on Dec 16, 2022

I’m currently doing Advent of Code (AoC) in Rust to learn the language better. As a Rust beginner, I found AoC quite nice to learn the language. Usually, I try to find a solution myself for a few minutes and then search some Video or Github Repo. After all, I want to learn how to use the language properly and not just hack my way to a solution.

For day 04, I watched this nice tutorial on youtube which introduced me to nom for parsing strings. Before I found the video, I was parsing the day 04 input like this (pseudo rust):

fn run_p1() {
  let input = "2-4,6-8
  2-3,4-5
  5-7,7-9
  2-8,3-7
  6-6,4-6
  2-6,4-8"
  for l in input.lines() {
      let ranges = l.split(",").collect::<Vec<&str>>();
      let (range_1, range_2) = (ranges[0], ranges[1]);
  }
}

In Python I wouldn’t have found that suspicious but with Rust it felt weird. Of course, I cannot directly collect to a tuple because at compile time the number of , characters is unknown. So, you have to allocate a vector of unknown size. However, this intermediate step is awkward.

Enter nom. Nom let’s you write parsers and compose them in a ergonomic way. For example, parsing the ranges left and right of the comma for one line looks like this (pseudo Rust):

use nom::{
    bytes::complete::tag, character, multi::separated_list1, sequence::separated_pair, IResult,
};
use std::ops::RangeInclusive;

fn parse_section(input: &str) -> IResult<&str, RangeInclusive<u32>> {
    let (input, start) = character::complete::u32(input)?;
    let (input, _) = tag("-")(input)?;
    let (input, end) = character::complete::u32(input)?;
    Ok((input, start..=end))
}

fn parse_section_pair(input: &str) -> IResult<&str, (RangeInclusive<u32>, RangeInclusive<u32>)> {
    let (input, (section_1, section_2)) =
        separated_pair(parse_section, tag(","), parse_section)(input)?;
    Ok((input, (section_1, section_2)))
}

fn run_p1() {
  let input = "2-4,6-8
  2-3,4-5
  5-7,7-9
  2-8,3-7
  6-6,4-6
  2-6,4-8"
  for l in input.lines() {
      let (_, (range_1, range_2)) = parse_section_pair(l)
  }
}

Each parser step can be combined with other parser steps, the non-parsed “rest” of the string is returned as the first return value that I assign (shadow) to the input variable. This way, you can guarantee at compile-time that parse_section_pair returns a tuple. If a line would contain multiple commas, you’d split only on the first comma and get the rest of the string as first return value. For AoC, where the input is well-defined, this is not an important feature but it still makes the code much nicer and ergonomic.

You can find my AoC code for the full solution here