Replies: 2 comments
-
|
Yes — markitdown has a full extension mechanism via custom Option 1: Register a converter directly (lightweight)from markitdown import MarkItDown, PRIORITY_SPECIFIC_FILE_FORMAT
from markitdown._base_converter import DocumentConverter, DocumentConverterResult
from markitdown._stream_info import StreamInfo
from typing import BinaryIO, Any
class MyProprietaryConverter(DocumentConverter):
def accepts(self, file_stream: BinaryIO, stream_info: StreamInfo, **kwargs: Any) -> bool:
ext = (stream_info.extension or "").lower()
return ext == ".myext" # your format's extension
def convert(self, file_stream: BinaryIO, stream_info: StreamInfo, **kwargs: Any) -> DocumentConverterResult:
content = file_stream.read().decode("utf-8") # parse your format here
return DocumentConverterResult(markdown=f"Converted content:\n{content}")
md = MarkItDown()
md.register_converter(MyProprietaryConverter(), priority=PRIORITY_SPECIFIC_FILE_FORMAT)
result = md.convert("myfile.myext")Option 2: Installable plugin package (shareable)Create a Python package with a # your_package/__init__.py
__plugin_interface_version__ = 1
def register_converters(markitdown, **kwargs):
markitdown.register_converter(MyProprietaryConverter())# pyproject.toml
[project.entry-points."markitdown.plugin"]
your_package = "your_package"Then enable plugins at runtime: md = MarkItDown(enable_plugins=True)or via CLI: markitdown --use-plugins myfile.myextThe |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
That language is harder well it to me longer to learn usually u would put value to a variable and went from there |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In some cases the text we want to extract is in a proprietary format. Is there is an extension mechanism that we can use ?
Beta Was this translation helpful? Give feedback.
All reactions