My Brother,
My Brother & Me (Unofficial)Word Statistics
Loading…
Made by James Howe, Check out my other stuff! I'm looking for a job ;)
Fan-made · No affiliation with MBMBaM or Maximum Fun
Loading episode data

Compare Words

Enter words or phrases, comma-separated(Might have issues on mobile devices)
Presets
Mix word types freely — single words, bigrams, trigrams, and [stage directions] all work. Brackets auto-added for stage directions.
Results will appear below.

Browse

Category
Speaker
Sort
Loading vocabulary…

About

What is this?

A searchable, browsable breakdown of every word spoken across 800+ episodes of My Brother, My Brother and Me — You can search for how often a word or phrase has been said, who says it most, and how usage has shifted over time.

How it works

Episode transcripts are published alongside episodes(Give the transcriber a raise, they are amazing!). And every week, an automated process runs and does the following:

1 — Scrape A script checks the transcript archive for any episodes that haven't been processed yet and downloads them as PDFs.
2 — Parse Each PDF is read line by line. Speaker attribution, dialogue, and stage directions (the stuff in brackets like [laughs]) are extracted and structured.
3 — Index Every word, two-word phrase, and three-word phrase gets counted per speaker per episode — roughly 60,000 unique terms across the full run. The result is a set of JSON files that power the search and browse features on this site.
4 — Deploy If anything changed, the updated data is committed automatically to GitHub and the site updates within seconds. No server, no database — just static files.

The whole thing runs on GitHub Actions on a weekly schedule, so the site stays current without any manual work.

Data source & limitations

All transcript data comes from the McElroy family website. A few things worth knowing:

  • Transcripts aren't available for every episode - some early ones are missing entirely.
  • PDF formatting isn't perfectly consistent across 800 episodes, so occasionally a line gets misattributed or missed during parsing.
  • Speaker counts reflect transcript text only - if a transcript labels something wrong, the data inherits that error.
    • Of note, in "795: The Naming of 2026", "20" and "twenty" are counted separately, and if it was written "2026" then that "20" won't be counted at all.
  • Stage directions like [laughs] are searchable but counted separately from spoken words.
    • Sometimes these are hard to assign to specific speakers due to the injective nature.
    • For example [Griffin stifles a giggle] is assigned to Justin due to it being done while Justin was talking.
  • 2 word and 3 word phrases have a cutoff so filesizes don't balloon, if they were said less than 10 times they aren't here.(For mobile the cutoff is ~20)
  • Due to the large file sizes the site can crash on mobile phones when you try and load a file.

If you notice anything off, feel free to email madjam1231@gmail.com.

Check out my other stuff here!