perf(strings): optimize anagram signature using frequency counts by sowndappan5 · Pull Request #12927 · TheAlgorithms/Python

sowndappan5 · 2025-08-24T04:52:19Z

Replaced sorting-based signature implementation with a frequency-based approach using collections.Counter.

This does not change the correctness of results, but improves performance and scalability:

Original: O(n log n) time, O(n) space
Modified: O(n + k) time, O(k) space (k ≤ alphabet size)

Example:
word = "a"*100000 + "b"*100000
Original requires sorting 200k characters.
Modified only counts frequencies, making it much faster.

Describe your change:

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
Documentation change?

Checklist:

Replaced the sorting-based signature implementation with a frequency-based approach using `collections.Counter`. This ensures that the signature represents both characters and their counts, preventing collisions and better grouping of true anagrams. Examples: - "test" → "e1s1t2" - "finaltest" → "a1e1f1i1l1n1s1t2" - "this is a test" → " 3a1e1h1i2s3t3" Also updated the anagram lookup to use the new frequency-based signatures, making results more accurate and avoiding false positives.

sowndappan5 · 2025-08-24T05:02:47Z

Hi, I’ve updated the anagram implementation to use a frequency-based signature. All checks have passed. Please let me know if you’d like me to make further improvements.

MaximSmolskiy

making results more accurate and avoiding false positives.

@sowndappan5 I don't catch idea why it fixes something and makes results more accurate and avoiding false positives.
Please provide some examples with change in program behavior - current results should be different from previous results and in some sense better

sowndappan5 · 2025-08-24T11:52:56Z

My change improves time and space complexity while keeping results correct.

Original (sorted-based signature):
Requires sorting → O(n log n) time, O(n) space.
Modified (frequency-based signature):
Counts characters directly → O(n + k) time, O(k) space (k ≤ alphabet size, e.g. 26).

Example (long input):

word = "a" * 100000 + "b" * 100000

Original: must sort 200k characters (O(n log n)).
Modified: only counts frequencies (O(n)), far faster and lighter in memory.

So while outputs are the same, the new code is more efficient and scalable.

MaximSmolskiy · 2025-08-24T12:20:41Z

My change improves time and space complexity while keeping results correct.

Original (sorted-based signature):
Requires sorting → O(n log n) time, O(n) space.

Modified (frequency-based signature):
Counts characters directly → O(n + k) time, O(k) space (k ≤ alphabet size, e.g. 26).

Example (long input):
word = "a" * 100000 + "b" * 100000
Original: must sort 200k characters (O(n log n)).

Modified: only counts frequencies (O(n)), far faster and lighter in memory.

So while outputs are the same, the new code is more efficient and scalable.

@sowndappan5 This answer is very good and detailed. But description is quite confusing - there is nothing said about time and memory optimization, but it says that the results will now be different - making results more accurate and avoiding false positives. Please try to call things by their proper names right away

sowndappan5 · 2025-08-24T12:28:06Z

@MaximSmolskiy Thank you for the feedback. I’ve updated the PR description to include time/space complexity and an example illustrating the improvement.

for more information, see https://pre-commit.ci

…Algorithms#12927) * fix(strings): use frequency-based signature for anagrams Replaced the sorting-based signature implementation with a frequency-based approach using `collections.Counter`. This ensures that the signature represents both characters and their counts, preventing collisions and better grouping of true anagrams. Examples: - "test" → "e1s1t2" - "finaltest" → "a1e1f1i1l1n1s1t2" - "this is a test" → " 3a1e1h1i2s3t3" Also updated the anagram lookup to use the new frequency-based signatures, making results more accurate and avoiding false positives. * Refactor anagram function return type to list[str] * Update anagrams.py * Update anagrams.py * Update anagrams.py * Update anagrams.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Maxim Smolskiy <mithridatus@mail.ru> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

algorithms-keeper Bot added enhancement This PR modified some existing files awaiting reviews This PR is ready to be reviewed tests are failing Do not merge until tests pass labels Aug 24, 2025

Refactor anagram function return type to list[str]

eb6ec62

algorithms-keeper Bot removed the tests are failing Do not merge until tests pass label Aug 24, 2025

MaximSmolskiy reviewed Aug 24, 2025

View reviewed changes

Merge branch 'master' into patch-1

4b726be

sowndappan5 changed the title ~~fix(strings): use frequency-based signature for anagrams~~ perf(strings): optimize anagram signature using frequency counts Aug 24, 2025

MaximSmolskiy added 3 commits August 24, 2025 15:25

Update anagrams.py

c5db31c

Update anagrams.py

494fcaf

Update anagrams.py

4cf4204

MaximSmolskiy approved these changes Aug 24, 2025

View reviewed changes

algorithms-keeper Bot removed the awaiting reviews This PR is ready to be reviewed label Aug 24, 2025

Update anagrams.py

950a5c3

algorithms-keeper Bot added the awaiting reviews This PR is ready to be reviewed label Aug 24, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

84ac4c2

for more information, see https://pre-commit.ci

MaximSmolskiy merged commit 37b34c2 into TheAlgorithms:master Aug 24, 2025
3 checks passed

algorithms-keeper Bot removed the awaiting reviews This PR is ready to be reviewed label Aug 24, 2025

sowndappan5 deleted the patch-1 branch August 24, 2025 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(strings): optimize anagram signature using frequency counts#12927

perf(strings): optimize anagram signature using frequency counts#12927
MaximSmolskiy merged 8 commits intoTheAlgorithms:masterfrom
sowndappan5:patch-1

sowndappan5 commented Aug 24, 2025 •

edited

Loading

Uh oh!

sowndappan5 commented Aug 24, 2025

Uh oh!

MaximSmolskiy left a comment

Uh oh!

sowndappan5 commented Aug 24, 2025

Uh oh!

MaximSmolskiy commented Aug 24, 2025

Uh oh!

sowndappan5 commented Aug 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sowndappan5 commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your change:

Checklist:

Uh oh!

sowndappan5 commented Aug 24, 2025

Uh oh!

MaximSmolskiy left a comment

Choose a reason for hiding this comment

Uh oh!

sowndappan5 commented Aug 24, 2025

Uh oh!

MaximSmolskiy commented Aug 24, 2025

Uh oh!

sowndappan5 commented Aug 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sowndappan5 commented Aug 24, 2025 •

edited

Loading