-
Notifications
You must be signed in to change notification settings - Fork 691
Fix: Add extend_edges function to fix table extraction with one strat text and the other non-text #4878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… text and the other non-text
|
We already support parameters |
|
Thank you for your reply!
I proposed this PR because IMHO users may want to use library as easy as possible, and as correct as possible.
|
|
I don't understand most of your response. |
|
I think I know what you mean. But please let me have an example PDF page and how you extract the edge information, and I'm quite confident that I will be able to demonstrate how to do that using virtual vectors. |
text-lines-tables.pdf For the real pdf I met, here's an example. |

Hi, I refered pymupdf to write a library to extract pdf tables, and found a table-extracting bug when one strategy is "text" and the other is not, please see monchin/tablers#8 for more details.
I have fixed it in my library, and I found it also occurs in pymupdf, so I'd like to fix it.