A open-source CLI utility for analyzing word frequency in Telegram chat exports
- JSON Parsing: Automatically handles standard Telegram JSON export structures.
- Stop-Word Filtering: Easily exclude filler words (like "the", "and", "is", "a") using a custom stopwords.txt file.
- Command Line Interface: Fully customizable via flags for inputs, outputs, and result limits.
- No Dependencies: Uses only Python standard libraries (json, re, collections, argparse).
- Open Telegram Desktop.
- Go to the settings of the channel/chat you want to analyze.
- Click the three dots (menu) -> Export chat history.
- Choose JSON as the format.
- Clone this repository and run the script from your terminal:
python3 tgwordcounter.py -i path/to/your/result.json -l 100 -s stopwords.txt| Flag | Description | Default |
|---|---|---|
-i, --input |
Path to your result.json file. |
result.json |
-o, --output |
The filename for the outputted text file. | results.txt |
-l, --limit |
Number of most frequent words to display. | 100 |
-s, --stopwords |
Path to a text file containing words to ignore. | None |
To filter out non-essential words, create a stopwords.txt file in the same folder. Place one word per line:
a
is
in
are