Replies: 1 comment 5 replies
-
|
Please provide the example file. Just to be clear: The search function should find "climate" even when it occurs as "cli-\nmate" in a line break situation. But it will return two hit rectangles, one containing "cli" and a separate one wrapping "mate". IAW there is no (and cannot be a) "joining" in one rectangle. It should be interesting to see what might go wrong in your case. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
From the documentation, I understand that by default, search_for will detect for hyphenated words at the end of a line, and join it with the first word in the next line. However, I'm not sure why it's not working for me.
This is a segment of the output from page.get_text():
If replicated on a large scale,
this would, in theory, enable the
ocean to soak up more of the
planet-warming gas driving cli-
mate change.
When I call page.search_for("climate"), it detects "climate" that appear as a whole in the page, but not the hyphenated one.
The text is extracted from a PDF file, and it is justified, if that matters.
Beta Was this translation helpful? Give feedback.
All reactions