Advent of Code: 2025 Day 2 in R

advent of code

puzzle

Published

December 15, 2025

See the puzzle instructions here.

Part 1

Solution

There are two methods to solving this puzzle, and I present both here. The first is to brute force check for matches, and the second is to use regular expressions to check for matches.

Method 1: Brute Force

This is relatively easy because it specifies that the invalid IDs are made entirely of repeated digits. If it said that we had to match for any repeated digit sequence at all, it would be significantly tougher. As it is, we can divide each string of digits in half and see if the first half is the same as the second.

First we need to read in the data and separate it by commas. We’ll do the test input first.

read_input <- function(fname) {
  raw_string <- readLines(fname)
  strsplit(raw_string, ",")[[1]]
}

We can then make a list of vectors where each element contains the digit sequence of the ranges. We first define a function to take a character vector with the endpoints and create a vector with the sequence for those endpoints. Note that R will do some automatic type casting in cases like this, so we don’t even need to convert the character vector to numeric in order to get the sequence. We then apply that function to the list of ID ranges.

get_sequence_from_endpoints <- function(endpoints) {
  seq(endpoints[1], endpoints[2])
}

generate_id_sequences <- function(id_ranges) {
  # Get the endpoints of each ID range
  char_list <- strsplit(id_ranges, "-")
  
  # Apply the sequence function over the list
  lapply(char_list, get_sequence_from_endpoints)
}

We can now iterate through each ID and check whether the first half matches the second half. To do that, we can use strsplit() to break up the numbers into their component characters and check for identical sequences. We can then iterate through the list of ID ranges, check for repeats, and then return either the ID itself or 0. That way we can simply sum the results of applying the function over the list.

We also define a function to easily convert from split character IDs, like c("1", "1"), to numeric values, like 11.

convert_split_id_to_numeric <- function(split_id) {
  as.numeric(paste(split_id, collapse = ""))
}

check_for_repeat <- function(split_id) {
  n <- length(split_id)
  
  # If the length is not divisible by 2, it necessarily cannot contain only a
  # repeated digit sequence, so return 0.
  if (n %% 2 != 0) {
    return(0)
  }
  
  # Get the subvectors for each half and check if they are all the same
  first_half <- split_id[1 : (n /2)]
  second_half <- split_id[((n / 2) + 1) : n]
  if (all(first_half == second_half)) {
    # Convert back into numeric
    return(convert_split_id_to_numeric(split_id))
  } else {
    return(0)
  }
}

We can then create a function to take in a numeric vector representing the sequence of IDs (i.e. 11, 12, …, 22) and return the sum of repeated IDs. First we cast the vector into character, apply strsplit(), then use check_for_repeat() to determine whether each ID is a repeated sequence. We then return the sum of that list.

apply_check_for_repeat <- function(id_sequence) {
  # Cast each ID as character then split into individual characters
  char_seq <- sapply(id_sequence, as.character)
  split_ids <- sapply(char_seq, strsplit, "")
  
  # Apply `check_for_repeat` on each element, then return the sum
  invalid_ids <- sapply(split_ids, check_for_repeat)
  sum(invalid_ids)
}

Finally, we can put it all together into a single function to solve the puzzle.

solve_part_1_brute <- function(fname) {
  input <- read_input(fname)
  id_sequences <- generate_id_sequences(input)
  all_repeats <- sapply(id_sequences, apply_check_for_repeat)
  sum(all_repeats)
}

Run it first on the test data:

solve_part_1_brute("test_input.txt")

[1] 1227775554

That’s the correct answer! Now try it on the real input:

solve_part_1_brute("input.txt")

[1] 29940924880

Correct again!

I want to mention that I initially ran into trouble with the full real input because of a line in check_for_repeat() – at the end of the function call I converted the collapsed character string into integer. Some of the numbers were so large that they exceeded the length of the integer type (32 bit, maximum size 2147483647) resulting in the warning NAs introduced by coercion to integer range, and returning and NA. The issue was fixed by changing as.integer() to as.numeric(), which allows for a larger range of numbers (53 bit, 1.7976931^{308}). There is also the bit64 package that includes a 64 bit integer using as.integer64(). But in this case the default numeric class worked fine.

Method 2: Regular Expressions

We can use regular expressions to solve this. We need to convert the numbers into strings and then search for any group of digits that is repeated. We do this by capturing one group with (\d+) and then using a back-reference to see if it repeats, \1. Note that in R you must use double backslashes for escaped characters, so it’s \\d and \\1. grepl() takes a regex pattern and a character vector and returns a logical vector indicating whether each element was matched or not. We then use that logical vector to index in the original sequence of IDs to extract the matched IDs.

check_for_repeat_regex_1 <- function(id_sequence) {
  matches <- grepl("^(\\d+)\\1$", as.character(id_sequence))  
  id_sequence[matches]
}

solve_part_1_regex <- function(fname) {
  input <- read_input(fname)
  id_sequences <- generate_id_sequences(input)
  all_matches <- lapply(id_sequences, check_for_repeat_regex_1)
  sum(unlist(all_matches))
}

solve_part_1_regex("test_input.txt")

[1] 1227775554

Now on the real input:

solve_part_1_regex("input.txt")

[1] 29940924880

That also returns the correct answer.

Part 2

Solution

Method 1: Brute Force

This is quite a bit trickier than before. The method we will implement will divide each digit string into all of the different combinations of equal length strings and checking whether they are all equal. I will also have to account for the special cases of an ID containing all the same number (e.g. 111) or a single digit number.

We make a new function to check for repeats.

check_for_repeat_part_2 <- function(split_id) {
  # If there's only one element, return 0 because it can't be a repeated sequence
  if (length(split_id) == 1) {
    return(0)
  }
  
  # If all elements are the same, it's necessarily a repeated sequence, 
  # so return that ID
  if (length(unique(split_id)) == 1) {
    return(convert_split_id_to_numeric(split_id))
  }
  
  n <- length(split_id)
  half_n <- floor(n / 2)
  
  # The possible group numbers are the integers from 1 to n/2
  possible_groups <- 1:half_n
  for (i in possible_groups) {
    # Skip i=1, because we have already checked for the case of all digits being
    # equal
    if (i == 1) {
      next
    }
    
    # If you can't divide n evenly by i, skip to the next i
    if (n %% i != 0) {
      next
    }
    
    # Get the starting points of the different groups and initialize a vector
    # to contain the results
    starting_indices <- seq(1, n, by = i)
    subvectors <- numeric(length(starting_indices))
    
    # Iterate through the starting_indices
    for (j in seq_len(length(starting_indices))) {
      # Just for shorthand
      start <- starting_indices[j]
      
      # Get the subvector from the range, collapse it into a single string
      subvectors[j] <- paste(split_id[start : (start + i - 1)], collapse = "")
    }
    
    # If all elements are the same, return the ID
    if (length(unique(subvectors)) == 1) {
      return(convert_split_id_to_numeric(split_id))
    }
  }
  
  # If you haven't returned the ID throughout the loop, return 0
  return(0)
}

The function to apply the function is the same, just updating it to use the new part 2 function. We can then put it together and solve the puzzle.

apply_check_for_repeat_part_2 <- function(id_sequence) {
  # Cast each ID as character then split into individual characters
  char_seq <- sapply(id_sequence, as.character)
  split_ids <- sapply(char_seq, strsplit, "")
  
  # Apply `check_for_repeat` on each element, then return the sum
  invalid_ids <- sapply(split_ids, check_for_repeat_part_2)
  sum(invalid_ids)
}

solve_part_2_brute <- function(fname) {
  input <- read_input(fname)
  id_sequences <- generate_id_sequences(input)
  all_repeats <- sapply(id_sequences, apply_check_for_repeat_part_2)
  sum(all_repeats)
}

solve_part_2_brute("test_input.txt")

[1] 4174379265

solve_part_2_brute("input.txt")

[1] 48631958998

That’s correct! Part 2 was tricky but it was manageable in the end. Note that solving for the real puzzle input is quite slow because we’re using a series of nested for-loops, which is fairly slow in R. But it’s not so slow that we can’t run it.

Method 2: Regular Expressions

The code for part two using regex is almost identical to the code for part one. The only difference is that I change the regex from ^(\d+)\1$ to ^(\d+)\1+$ – the extra + after \1 indicates that there may be more than one match of the captured group in parentheses.

check_for_repeat_regex_2 <- function(id_sequence) {
  matches <- grepl("^(\\d+)\\1+$", as.character(id_sequence))  
  id_sequence[matches]
}

solve_part_2_regex <- function(fname) {
  input <- read_input(fname)
  id_sequences <- generate_id_sequences(input)
  all_matches <- lapply(id_sequences, check_for_repeat_regex_2)
  sum(unlist(all_matches))
}

solve_part_2_regex("test_input.txt")

[1] 4174379265

Also correct. And then the real puzzle input:

solve_part_2_regex("input.txt")

[1] 48631958998

Also correct, and ran dramatically faster.