Who is the Mightiest of the Mighty Nein?

r
d&d
tidyverse
data cleaning
Published

September 7, 2023

This is another post using data from Crit Role Stats. In this post, I will be looking at the damage dealt by each member of the Mighty Nein. Spoilers for campaign 2, but that ended a couple of years ago so it seems reasonable to spoil. Now Crit Role Stats already has a spreadsheet detailing damage dealt, and I want to recreate some of that and more. I will mostly be using the All Rolls spreadsheet. I have downloaded the CSV for just the sheet that I need: All Rolls. I will be looking back at the full spreadsheets online to verify my findings, but this is the only one that I need to pull in to work with.

By the way, my soundtrack for this post is Critical Role’s own Mighty Vibes YouTube playlist. It just feels right.

I am interested in damage dealt by character and by episode to learn who had the highest average damage per hit and the highest average damage per episode. I expect these to differ because of the number of attacks that different characters get. Spellcasters can do a lot of damage, and many of their spells still do half damage even if the opponent successfully saves, but they rarely get more than one damage-dealing action per turn. Beauregard, as a monk, deals less damage per hit but can hit an astonishing number of times in a round, so I expect her per-hit average to be much lower than, say, Yasha’s, but her average per episode may be quite a bit higher.

Before we get to that, there’s a fair bit of cleaning to be done with the All Rolls table.

library(tidyverse)

Data Cleaning

First, read it in.

all_rolls <- read_csv("all_rolls_wildemount_all_episodes.csv")
glimpse(all_rolls)
Rows: 15,364
Columns: 10
$ Episode         <chr> "C2E001", "C2E001", "C2E001", "C2E001", "C2E001", "C2E…
$ Time            <chr> "0:35:40", "0:39:22", "0:39:30", "0:43:04", "0:50:37",…
$ Character       <chr> "Fjord", "Jester", "Beau", "Beau", "Molly", "Molly", "…
$ `Type of Roll`  <chr> "Perception", "Perception", "Perception", "Insight", "…
$ `Total Value`   <chr> "Nat20", "22", "19", "13", "16", "16", "14", "19", "14…
$ `Natural Value` <chr> "20", "19", "16", "8", "11", "11", "14", "14", "9", "1…
$ `Crit?`         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Damage Dealt`  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `# Kills`       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Notes           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "C…

A few things to note:

First, we will need to clean up the Episode column to use as numeric. Not a big deal.

Second, there is a Damage Dealt column, which turns out to be very useful. Looking in the source data, we can see that it tells you the total damage dealt for a given roll. In the example of episode 43, wherein the Mighty Nein fight Avantika and her crew, Caleb uses Wall of Fire and rolls 17 damage. However, the Damage Dealt column tells us that he dealt a total of 101 damage when you consider all of the enemies he was able to hit. Moreover, we can also see that even in cases where there is only one target, the Damage Dealt column tells us whether the damage was taken in full or halved, whereas the Total Value column would just tell us the value of the roll. This does mean that we have a lot of parsing to do on that column because it contains a lot of free text as well, but it does seem to usually start with the actual number so that may be relatively easy. Let’s hope.

First, I will pare down the data set to only work with what we need. This isn’t an enormous data frame, but saving that space will always help speed things up.

print(object.size(all_rolls), units = "MB")
1.9 Mb
damage <- all_rolls |> 
  janitor::clean_names() |> 
  filter(type_of_roll == "Damage") |> 
  select(episode, character, damage_dealt)

What sort of space did that save?

print(object.size(damage), units = "MB")
0.2 Mb

It was still < 2 MB to begin with, but getting down to 0.2 MB will just make things run faster. Not a huge deal, but a nice benefit.

Let’s take a look at the damage_dealt column to see what we need to do to parse it.

select(damage, damage_dealt)
# A tibble: 2,119 × 1
   damage_dealt                      
   <chr>                             
 1 18 to Creature 1                  
 2 11 to Creature 1                  
 3 9 Cold to Creature 1              
 4 11 to Creature 1                  
 5 9 (8 + 1 Cold) to Creature 1      
 6 3 Necrotic to Creature 1          
 7 12 to Creature 2                  
 8 13 to Creature 1                  
 9 17 (12 + 5 Necrotic) to Creature 2
10 14 (10 + 4 SA) to Creature 2      
# ℹ 2,109 more rows

Let’s first look to see if every row begins with a number. I’ll use stringr::str_match() to pull out the leading numbers. If any entries begin with non-numeric values, the result will be NA. I can then see if there are any NAs in the resulting vector to see if there will be a problem with just pulling out the initial digits.

matches <- str_match(damage$damage_dealt, "^\\d+")
anyNA(matches)
[1] TRUE

Ok, so there are NAs. How many?

sum(is.na(matches))
[1] 50

50 is a lot but not insurmountable. Where are they?

damage[is.na(matches), ]
# A tibble: 50 × 3
   episode character damage_dealt                
   <chr>   <chr>     <chr>                       
 1 C2E010  Nott      Unknown to Rat 5            
 2 C2E013  Molly     Unknown to Molly            
 3 C2E013  Molly     Unknown to Shadow Assassin 3
 4 C2E018  Molly     <NA>                        
 5 C2E019  Molly     <NA>                        
 6 C2E019  Molly     Unknown to Molly            
 7 2-20    Fjord     <NA>                        
 8 2-20    Fjord     <NA>                        
 9 2-22    Fjord     <NA>                        
10 2-23    Molly     <NA>                        
# ℹ 40 more rows

There are a lot of Unknowns and a lot of NAs. Looking at a few of these in the original table, I can see that some of them show “Disregarded” in the notes. Others are Molly’s own Blood Maledicts, which damage him but he doesn’t always announce the result. What this tells me is that I need to re-read the damage data frame, this time including Notes from the original all_rolls data. I can look for “Disregarded” and “Blood Curse” in the notes and omit them, because they shouldn’t count toward the total number of hits. After I do that, I can re-run the above code to find non-matches and see where things stand.

damage <- all_rolls |> 
  janitor::clean_names() |> 
  filter(type_of_roll == "Damage") |> 
  select(episode, character, damage_dealt, notes)

damage |> 
  filter(notes != "Disregarded")
# A tibble: 624 × 4
   episode character damage_dealt              notes                            
   <chr>   <chr>     <chr>                     <chr>                            
 1 C2E001  Jester    3 Necrotic to Creature 1  Toll the Dead                    
 2 C2E001  Beau      13 to Creature 2          HDYWTDT                          
 3 C2E003  Molly     3 to Molly                Blood Curse of the Eyeless (ampl…
 4 C2E003  Nott      2 to Zombie 2             Sneak Attack                     
 5 C2E003  Molly     1 to Molly                Rite of the Frozen               
 6 C2E003  Molly     4 Psychic to Imp 1        Vicious Mockery                  
 7 C2E003  Fjord     14 to Kylre               HDYWTDT                          
 8 C2E003  Jester    6 to Imp 1                Word of Radiance                 
 9 C2E005  Molly     3 Psychic to Witherling 1 Vicious Mockery                  
10 C2E005  Jester    9 Cold to Witherling 4    Hellish Rebuke                   
# ℹ 614 more rows

Wait, that just dropped a ton of rows. We’ve gone from 2119 to 624. Turns out that what’s happening is that all of the NA values are getting dropped in addition to those notes that are “Disregarded” so we’re losing a ton of data. I need to create a way to eliminate the “Disregarded” rows but still keep the NA rows, because having no note is pretty much the norm, so we don’t want to drop those. I do this by passing a logical vector to filter() even though the vector isn’t exactly a conditional in the format of <column> <operator> <value>. Instead, I’m using dplyr::if_else() to check equality against “Disregarded” but to also accommodate NA values. To do that, I use the argument missing = TRUE to tell if_else() that I want NA values to be included in the results.

damage <- damage |>
  filter(
    if_else(
      !str_detect(tolower(notes), "disregard"),
      true = TRUE,
      false = FALSE,
      missing = TRUE
    )
  )

Next, I need to remove the rows that start with “blood curse.” I will again use if_else() to account for the many NA values in notes.

damage <- damage |> 
  filter(
    if_else(
      !str_starts(tolower(notes), "blood"),
      true = TRUE,
      false = FALSE,
      missing = TRUE
    )
  )

So let’s repeat the search for those NAs and unknowns:

matches <- str_match(damage$damage_dealt, "^\\d+")
sum(is.na(matches))
[1] 33
damage[is.na(matches), ]
# A tibble: 33 × 4
   episode character damage_dealt                 notes                         
   <chr>   <chr>     <chr>                        <chr>                         
 1 C2E010  Nott      Unknown to Rat 5             HDYWTDT                       
 2 C2E013  Molly     Unknown to Shadow Assassin 3 Dissipates                    
 3 2-28    Nott      <NA>                         <NA>                          
 4 C2E035  Beau      <NA>                         <NA>                          
 5 C2E035  Caduceus  <NA>                         <NA>                          
 6 C2E035  Jester    <NA>                         <NA>                          
 7 C2E039  Beau      <NA>                         <NA>                          
 8 C2E044  Caduceus  <NA>                         Prayer of Healing, Caleb, Bea…
 9 C2E044  Fjord     <NA>                         <NA>                          
10 C2E047  Beau      <NA>                         <NA>                          
# ℹ 23 more rows

The rest of these are judgment calls. Some of them are cases where there was clearly some damage, such as in episode 10 when Nott gets a “How do you want to do this?” against a rat, but it doesn’t say the damage done. I would imagine that the rat had something like 1HP left and any attack would kill it, so Sam didn’t even have to roll. One is a healing roll which should have been omitted to begin with. Some of them are by pseudo-NPCs that a few players are controlling, and those should be omitted. In other cases, I’ve looked back to the original all_rolls table and found that there are values in the total value column. Because of that, I’m going to re-read the data (yet again), and include the natural roll and total value columns for use in my manual cleaning. Because there’s no easy way to code through every judgment, I’m going to write out the current data to a .csv, hand-clean these remaining rows, and re-read it back in. In instances where it was clear there was some damage, I’m just going to assign a value of 1. In cases that should be disregarded, I am going to delete the row. In cases where damage_dealt is NA but there is a total_value or a natural_value, I will populate damage_dealt with the higher of those two values. I hate to edit data files by hand but this situation calls for it, and it’s low stakes so I’m not that worried about it.

all_rolls |> 
  janitor::clean_names() |> 
  filter(type_of_roll == "Damage") |> 
  select(episode, character, total_value, natural_value, damage_dealt, notes) |>
  filter(
    if_else(
      !str_detect(tolower(notes), "disregard"),
      true = TRUE,
      false = FALSE,
      missing = TRUE
    )
  ) |> 
  filter(
    if_else(
      !str_starts(tolower(notes), "blood"),
      true = TRUE,
      false = FALSE,
      missing = TRUE
    )
  ) |> 
  write_csv(file = "damage.csv")

Edit, edit, edit. Clean, clean, clean.

Reimport:

damage <- read_csv("damage_cleaned.csv")

Let’s check if all of the damage_dealt values start with a number now:

matches <- str_match(damage$damage_dealt, "^\\d+")
sum(is.na(matches))
[1] 0
damage[is.na(matches), ]
# A tibble: 0 × 7
# ℹ 7 variables: episode <chr>, character <chr>, total_value <chr>,
#   natural_value <chr>, damage_dealt <chr>, notes <chr>, ...7 <chr>

We’re good!

Now I will once again pull out just the episode, character, and damage_dealt columns.

damage <- damage |> 
  select(episode, character, damage_dealt)

Now I can extract that numeric value from the damage_dealt column and convert the whole thing to numeric.

damage <- damage |>
  mutate(damage_dealt = parse_number(str_extract(damage_dealt, "^\\d+")))

We need to pull out the episode numbers to clean that up:

damage <- damage |> 
  mutate(episode = parse_number(str_extract(episode, "\\d+$")))

Now we can start doing some analysis!

Most Total Damage

Who has done the most total damage?

damage |> 
  group_by(character) |> 
  summarize(total_damage = sum(damage_dealt)) |> 
  arrange(desc(total_damage))
# A tibble: 22 × 2
   character total_damage
   <chr>            <dbl>
 1 Beau              5922
 2 Fjord             5069
 3 Yasha             4741
 4 Caleb             3638
 5 Nott              3636
 6 Jester            3465
 7 Veth              1854
 8 Caduceus          1732
 9 Molly              641
10 Keg                203
# ℹ 12 more rows

Oops! I still have some non-core characters.

damage <- damage |> 
  filter(character %in% c(
    "Beau",
    "Caduceus",
    "Caleb",
    "Fjord",
    "Jester",
    "Molly",
    "Veth",
    "Yasha"
  ))

damage |> 
  group_by(character) |> 
  summarize(total_damage = sum(damage_dealt)) |> 
  arrange(desc(total_damage))
# A tibble: 8 × 2
  character total_damage
  <chr>            <dbl>
1 Beau              5922
2 Fjord             5069
3 Yasha             4741
4 Caleb             3638
5 Jester            3465
6 Veth              1854
7 Caduceus          1732
8 Molly              641

Many of these numbers are pretty close to what Crit Role Stats themselves calculated – I calculate Veth’s total damage at 5490 and they have it at 5583, so pretty close. But some are quite different – I calculate Caleb’s damage at just 3638 but they have him at 5281, a big difference. I really can’t explain what’s going on here, and would love some external validation for why some findings are so different. Most are pretty close so that’s what I’m going with. In another post I could work on reading in data from their Damage Dealt spreadsheet and parsing that within R, but that’s not today. So, sorry Caleb, you’re getting the shaft here.

Episode High

What episode had the most damage dealt?

damage |> 
  group_by(episode) |> 
  summarize(total_damage = sum(damage_dealt)) |> 
  slice_max(n = 1, order_by = total_damage)
# A tibble: 1 × 2
  episode total_damage
    <dbl>        <dbl>
1      98          849

Episode 98 features a big fight on the deck of the Balleater and ends with a dead Fjord and Orly. Big episode, good fight, makes sense that it would have the most damage. This doesn’t totally agree with what Crit Role Stats has, but it’s in the same ballpark.

Damage Over Time

How does damage dealt increase over time?

damage |> 
  group_by(episode) |> 
  summarize(total_damage = sum(damage_dealt)) |> 
  ggplot(aes(x = episode, y = total_damage)) +
  geom_bar(stat = "identity")

It’s kind of all over the place, to be honest. There are ups and downs, which is to be expected with the various arcs, and the highs tend to be higher later in the campaign, but it’s not a terribly clear trend.

damage |> 
  group_by(episode) |> 
  summarize(total_damage = sum(damage_dealt)) |> 
  ggplot(aes(x = episode, y = total_damage)) +
  geom_point() +
  geom_smooth(method = "lm")

It’s clearly a positive slope, but not terribly strong. That’s interesting to me.

Average Damage by Character

We’ve already seen that Beau dealt the most total damage. But who dealt the most damage per hit? Per episode?

damage |> 
  group_by(character) |> 
  summarize(avg_damage = mean(damage_dealt)) |> 
  arrange(desc(avg_damage))
# A tibble: 8 × 2
  character avg_damage
  <chr>          <dbl>
1 Veth           29.0 
2 Caleb          20.6 
3 Yasha          20.2 
4 Caduceus       15.9 
5 Jester         15.8 
6 Fjord          15.5 
7 Beau           10.5 
8 Molly           8.43

It’s interesting, but not entirely unsurprising, that Caleb has the highest average damage. His spells can hit multiple targets, sometimes with very high damage for each one, and that all gets counted together. Yasha being close to the top is unsurprising, since she’s a barbarian and her whole role is to hit things hard. It’s also unsurprising that Beau, the overall champ, is nearly last. Monks don’t hit very hard, they just hit a lot. And poor Mollymauk didn’t get a chance to level up enough to do really hard hitting.

Who has the highest per-episode average?

damage |> 
  summarize(episode_total = sum(damage_dealt), .by = c(character, episode)) |> 
  summarize(episode_average = mean(episode_total), .by = character) |> 
  arrange(desc(episode_average))
# A tibble: 8 × 2
  character episode_average
  <chr>               <dbl>
1 Veth                 88.3
2 Yasha                76.5
3 Beau                 69.7
4 Fjord                64.2
5 Jester               55  
6 Caleb                52.7
7 Molly                37.7
8 Caduceus             35.3

No real surprises here, but it’s interesting that Yasha edges out Beau when we look at episode total damage.

Characters’ Hardest Hits

Where did each of the characters most shine? First, I’ll look at which episode had the most damage for each player.

damage |> 
  group_by(character, episode) |> 
  summarize(total_damage = sum(damage_dealt)) |> 
  slice_max(n = 1, order_by = total_damage)
# A tibble: 8 × 3
# Groups:   character [8]
  character episode total_damage
  <chr>       <dbl>        <dbl>
1 Beau          140          210
2 Caduceus       98          122
3 Caleb          43          297
4 Fjord          82          243
5 Jester         82          209
6 Molly          26           98
7 Veth          139          239
8 Yasha          39          300

Beau and Veth both have their best episodes in the final arc, fighting through Cognouza, and Caduceus has his on the bloodiest episode of the series, 98. Caleb’s comes in the Bad Guys arc, facing Avantika in Darktow. Fjord and Jester both hit their high water marks in episode 82, in the depths of the Happy Fun Ball. Molly does the most damage, sadly, in the episode in which he dies. Lastly, Yasha hits a whopping 300 damage (just beating out Caleb for the highest single episode damage) in the Temple of the False Serpent, walloping a hydra.

Next, I will look at the single hardest hits for each character.

damage |> 
  group_by(character) |> 
  slice_max(order_by = damage_dealt)
# A tibble: 8 × 3
# Groups:   character [8]
  episode character damage_dealt
    <dbl> <chr>            <dbl>
1     133 Beau                55
2      73 Caduceus            42
3      43 Caleb              172
4     123 Fjord              106
5     119 Jester             106
6      23 Molly               21
7     139 Veth               100
8     138 Yasha               78
  • Beau hits for 55 points using an Intuit charge against the Tomb Takers. Makes sense that this was with a device, I don’t know how she could possibly do that much with a single hit.
  • Caduceus does 42 necrotic damage on a Remorhaz using the blight spell
  • Caleb gets his 172 damage from a big fireball in episode 43, which I’ve mentioned before. Worth noting that this hit 10 separate targets.
  • Fjord and Jester tie with 106 damage. Fjord gets his in episode 123, fighting Gelidon, a massive white dragon. The spreadsheet doesn’t give me any details, unfortunately.
  • Jester gets her hit in on episode 119, hitting two Aeorian Abominations for 53 damage each. Again, sadly no details.
  • Poor Molly yet again bringing up the rear at 21 points, with a big hit on a troll
  • Veth has one of the final arc episodes as her highest, hitting for 100 points against Lucien. I don’t have details other than 100 (50x2) so I’m not sure exactly what happened there.
  • Yasha has her all-time high on the episode prior, and surprisingly low damage for her hardest hit.

Conclusion

This has been an interesting exercise. The data cleaning portion had some challenges but it gave me the chance to work with the stringr package and exercise some very basic regex skills. I learned that even with careful coding, I couldn’t get things to line up 100% with Crit Role Stats, which is a valuable lesson. And I learned that while Beau dealt the most damage over time, it’s Caleb who can pack the single hardest punch. And it’s no surprise that he does so with fire.