Loading episode data
Word Search
Search mode(Might have issues on mobile devices)
Select a mode, type a query, press Search or Enter.
Compare Words
Enter words or phrases, comma-separated(Might have issues on mobile devices)
Presets
Mix word types freely — single words, bigrams, trigrams, and [stage directions] all work.
Brackets auto-added for stage directions.
Results will appear below.
Browse
Loading vocabulary…
About
What is this?
A searchable, browsable breakdown of every word spoken across 800+ episodes of My Brother, My Brother and Me — You can search for how often a word or phrase has been said, who says it most, and how usage has shifted over time.
How it works
Episode transcripts are published alongside episodes(Give the transcriber a raise, they are amazing!). And every week, an automated process runs and does the following:
1 — Scrape
A script checks the transcript archive for any episodes that haven't been processed yet and downloads them as PDFs.
2 — Parse
Each PDF is read line by line. Speaker attribution, dialogue, and stage directions (the stuff in brackets like [laughs]) are extracted and structured.
3 — Index
Every word, two-word phrase, and three-word phrase gets counted per speaker per episode — roughly 60,000 unique terms across the full run. The result is a set of JSON files that power the search and browse features on this site.
4 — Deploy
If anything changed, the updated data is committed automatically to GitHub and the site updates within seconds. No server, no database — just static files.
The whole thing runs on GitHub Actions on a weekly schedule, so the site stays current without any manual work.
Data source & limitations
All transcript data comes from the McElroy family website. A few things worth knowing:
- Transcripts aren't available for every episode - some early ones are missing entirely.
- PDF formatting isn't perfectly consistent across 800 episodes, so occasionally a line gets misattributed or missed during parsing.
- Speaker counts reflect transcript text only - if a transcript labels something wrong, the data inherits that error.
- Of note, in "795: The Naming of 2026", "20" and "twenty" are counted separately, and if it was written "2026" then that "20" won't be counted at all.
- Stage directions like [laughs] are searchable but counted separately from spoken words.
- Sometimes these are hard to assign to specific speakers due to the injective nature.
- For example [Griffin stifles a giggle] is assigned to Justin due to it being done while Justin was talking.
- 2 word and 3 word phrases have a cutoff so filesizes don't balloon, if they were said less than 10 times they aren't here.(For mobile the cutoff is ~20)
- Due to the large file sizes the site can crash on mobile phones when you try and load a file.
If you notice anything off, feel free to email madjam1231@gmail.com.
Check out my other stuff here!