I've worked on document extraction a lot and while the tweet is too flippant for...

wills_forward · 2025-12-20T00:00:55 1766188855

Why not use both? I just built a pipeline for document data extraction that uses PaddleOCR, then Gemini 3 to check + fix errors. It gets close to 99.9% on extraction from financial statements finally on par with humans.

vrc · 2025-12-20T03:21:30 1766200890

I did the opposite. Tesseract to get bboxes, words, and chars and then mistral on the clips with some reasonable reflow to preserve geometry. Paddle wasn’t working on my local machine (until I found RapidOCR). Surya was also very good but because you can’t really tweak any knobs, when it failed it just kinda failed. But Surya > Rapid w/ Paddle > DocTr > Tesseract while the latter gave me the most granularity when I needed it.

Edit: Gemini 2.0 was good enough for VLM cleanup, and now 2.5 or above with structured output make reconstruction even easier.

jadbox · 2025-12-20T01:36:28 1766194588

This is The Way. Remember AI doesn't have to replace existing solutions but can tactfully supplement it.

zerocrates · 2025-12-20T06:17:05 1766211425

Is DeepSeek's not VLM?