Add auto-language extra for code block language detection (#361)#706
Open
Darkness1521 wants to merge 1 commit into
Open
Add auto-language extra for code block language detection (#361)#706Darkness1521 wants to merge 1 commit into
Darkness1521 wants to merge 1 commit into
Conversation
Crozzers
reviewed
May 20, 2026
|
|
||
| # -- Python -- | ||
| s = 0 | ||
| if re.search(r'^\s*(def\s+\w+\s*\(|class\s+\w+.*:)', |
Contributor
There was a problem hiding this comment.
Perhaps alot of these consecutive if statements could be compressed?
Maybe something like
python = [
[re.compile(r'^\s*(def\s+\w+\s*\(|class\s+\w+.*:)'), 5],
[re.compile(r'^\s*(import\s+\w+|from\s+\w+\s+import\b)', re.M), 4],
...
]
scores['python'] = sum(score for regex, score in python.items() if regex.search(code))| return '```' in text | ||
|
|
||
|
|
||
| def detect_language(code: str): |
Contributor
There was a problem hiding this comment.
This function is specific to the extra, so should probably be part of the AutoLanguage class
| @@ -0,0 +1,276 @@ | |||
| """ | |||
| Compare our heuristic detect_language() against Pygments guess_lexer() | |||
Contributor
There was a problem hiding this comment.
How many languages does guess_lexer support? If the pygments implementation is wider and more tested, perhaps offering some kind of fallback if the dependency is present?
Contributor
There was a problem hiding this comment.
Oh, just saw the test results you added in the PR about performance vs pygments. This is not needed then I guess
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #361
Summary
auto-languageextra that automatically detects the programminglanguage of fenced code blocks without explicit language tags
Go, Rust, Ruby, PHP, JSON, YAML, C/C++
How it works
Runs before
fenced-code-blocksin the processing pipeline. When a codeblock has no language tag, it analyzes the content and inserts the
detected language name. The existing fenced-code-blocks and Pygments
highlighting then process it normally.
Usage
Tests
On 24 short code snippets (the typical case for markdown code blocks):
detect_language)guess_lexer()The test suite passes with no regressions. All changes comply with the
project's contribution guidelines (PEP8, test coverage, docs updated).
Result
with auto-language:

without auto-language:
