TIL: Parsing Strings in Rust with nom is Awesome
I’m currently doing Advent of Code (AoC) in Rust to learn the language better. As a Rust beginner, I found AoC quite nice to learn the language. Usually, I try to find a solution myself for a few minutes and then search some Video or Github Repo. After all, I want to learn how to use the language properly and not just hack my way to a solution.
For day 04, I watched this nice tutorial on youtube which introduced me to nom for parsing strings. Before I found the video, I was parsing the day 04 input like this (pseudo rust):
fn run_p1() {
let input = "2-4,6-8
2-3,4-5
5-7,7-9
2-8,3-7
6-6,4-6
2-6,4-8"
for l in input.lines() {
let ranges = l.split(",").collect::<Vec<&str>>();
let (range_1, range_2) = (ranges[0], ranges[1]);
}
}
In Python I wouldn’t have found that suspicious but with Rust it felt weird. Of course, I cannot directly collect to a tuple because at compile time the number of ,
characters is unknown. So, you have to allocate a vector of unknown size. However, this intermediate step is awkward.
Enter nom. Nom let’s you write parsers and compose them in a ergonomic way. For example, parsing the ranges left and right of the comma for one line looks like this (pseudo Rust):
use nom::{
bytes::complete::tag, character, multi::separated_list1, sequence::separated_pair, IResult,
};
use std::ops::RangeInclusive;
fn parse_section(input: &str) -> IResult<&str, RangeInclusive<u32>> {
let (input, start) = character::complete::u32(input)?;
let (input, _) = tag("-")(input)?;
let (input, end) = character::complete::u32(input)?;
Ok((input, start..=end))
}
fn parse_section_pair(input: &str) -> IResult<&str, (RangeInclusive<u32>, RangeInclusive<u32>)> {
let (input, (section_1, section_2)) =
separated_pair(parse_section, tag(","), parse_section)(input)?;
Ok((input, (section_1, section_2)))
}
fn run_p1() {
let input = "2-4,6-8
2-3,4-5
5-7,7-9
2-8,3-7
6-6,4-6
2-6,4-8"
for l in input.lines() {
let (_, (range_1, range_2)) = parse_section_pair(l)
}
}
Each parser step can be combined with other parser steps, the non-parsed “rest” of the string is returned as the first return value that I assign (shadow) to the input
variable. This way, you can guarantee at compile-time that parse_section_pair
returns a tuple. If a line would contain multiple commas, you’d split only on the first comma and get the rest of the string as first return value.
For AoC, where the input is well-defined, this is not an important feature but it still makes the code much nicer and ergonomic.
You can find my AoC code for the full solution here