perf(strings): optimize anagram signature using frequency counts#12927
perf(strings): optimize anagram signature using frequency counts#12927MaximSmolskiy merged 8 commits intoTheAlgorithms:masterfrom
Conversation
Replaced the sorting-based signature implementation with a frequency-based approach using `collections.Counter`. This ensures that the signature represents both characters and their counts, preventing collisions and better grouping of true anagrams. Examples: - "test" → "e1s1t2" - "finaltest" → "a1e1f1i1l1n1s1t2" - "this is a test" → " 3a1e1h1i2s3t3" Also updated the anagram lookup to use the new frequency-based signatures, making results more accurate and avoiding false positives.
|
Hi, I’ve updated the anagram implementation to use a frequency-based signature. All checks have passed. Please let me know if you’d like me to make further improvements. |
MaximSmolskiy
left a comment
There was a problem hiding this comment.
making results more accurate and avoiding false positives.
@sowndappan5 I don't catch idea why it fixes something and makes results more accurate and avoiding false positives.
Please provide some examples with change in program behavior - current results should be different from previous results and in some sense better
|
My change improves time and space complexity while keeping results correct.
Example (long input): word = "a" * 100000 + "b" * 100000
So while outputs are the same, the new code is more efficient and scalable. |
@sowndappan5 This answer is very good and detailed. But description is quite confusing - there is nothing said about time and memory optimization, but it says that the results will now be different - |
|
@MaximSmolskiy Thank you for the feedback. I’ve updated the PR description to include time/space complexity and an example illustrating the improvement. |
for more information, see https://pre-commit.ci
…Algorithms#12927) * fix(strings): use frequency-based signature for anagrams Replaced the sorting-based signature implementation with a frequency-based approach using `collections.Counter`. This ensures that the signature represents both characters and their counts, preventing collisions and better grouping of true anagrams. Examples: - "test" → "e1s1t2" - "finaltest" → "a1e1f1i1l1n1s1t2" - "this is a test" → " 3a1e1h1i2s3t3" Also updated the anagram lookup to use the new frequency-based signatures, making results more accurate and avoiding false positives. * Refactor anagram function return type to list[str] * Update anagrams.py * Update anagrams.py * Update anagrams.py * Update anagrams.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Maxim Smolskiy <mithridatus@mail.ru> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Replaced sorting-based signature implementation with a frequency-based approach using collections.Counter.
This does not change the correctness of results, but improves performance and scalability:
Example:
word = "a"*100000 + "b"*100000
Original requires sorting 200k characters.
Modified only counts frequencies, making it much faster.
Describe your change:
Checklist: