My Brother,
My Brother & Me
(Unofficial)Word Statistics
Loading…
Made by James Howe, Check out my other stuff! I'm looking for a job ;)
Loading episode data

Compare Words

Enter words or phrases, comma-separated(Might have issues on mobile devices)
Presets
Mix word types freely — single words, bigrams, trigrams, and [stage directions] all work. Brackets auto-added for stage directions.
Results will appear below.

Browse

Category
Speaker
Sort
Loading vocabulary…

Pinboard

Pin charts from Word Search and Compare Words to compare them at a glance. Pins are saved in your browser.

📌
No pins yet — search for a word and hit Pin to Board

About

What is this?

A searchable, browsable breakdown of every word spoken across 800+ episodes of My Brother, My Brother and Me — You can search for how often a word or phrase has been said, who says it most, and how usage has shifted over time.

Inspired by Jimmy O's cursing stats.

How it works

Episode transcripts are published alongside episodes(Give the transcriber a raise, they are amazing!). And every week, an automated process runs and does the following:

1 — Scrape A script checks the transcript archive for any episodes that haven't been processed yet and downloads them as PDFs.
2 — Parse Each PDF is read line by line. Speaker attribution, dialogue, and stage directions (the stuff in brackets like [laughs]) are extracted and structured.
3 — Index Every word, two-word phrase, and three-word phrase gets counted per speaker per episode — roughly 60,000 unique terms across the full run. The result is a set of JSON files that power the search and browse features on this site.
4 — Deploy If anything changed, the updated data is committed automatically to GitHub and the site updates within seconds. No server, no database — just static files.

The whole thing runs on GitHub Actions on a weekly schedule, so the site stays current without any manual work.

Data source & limitations

All transcript data comes from the McElroy family website. A few things worth knowing:

  • Transcripts aren't available for every episode - some early ones are missing entirely.
  • PDF formatting isn't perfectly consistent across 800 episodes, so occasionally a line gets misattributed or missed during parsing.
  • Speaker counts reflect transcript text only - if a transcript labels something wrong(or just in a way that I can't account for), it will be reflected in the data here.
    • For example, in "795: The Naming of 2026", "20" and "twenty" are counted separately, and if it was written "2026" then that "20" won't be counted at all.
  • Stage directions like [laughs] are searchable but counted separately from spoken words.
    • Sometimes these are hard to assign to specific speakers due to the injective nature.
    • For example [Griffin stifles a giggle] is assigned to Justin due to it being done while Justin was talking.
  • 2 word and 3 word phrases have a cutoff so filesizes don't balloon, if they were said less than 10 times they aren't here.(For mobile the cutoff is ~20)
  • Due to the large file sizes the site can crash on mobile phones when you try and load a file.

If you notice anything off, feel free to email madjam1231@gmail.com.

Check out my other stuff here!